Other sop/nondbi

Table Of Contents

Previous topic

DCS tables grouped/ordered by schema

Next topic

Scraping source databases into offline_db

This Page

Daya Bay Links

Content Skeleton

Non DBI access to DBI and other tables

Standard access to the content of offline_db (eg for analysis) should be made using DBI, DybDbi or via services that use these. However some usage of the content is better achieved without DBI.

This is not contrary to the rules Rules for Code that writes to the Database as although all writing to offline_db must use DBI, reading from offline_db can use whatever approach works best for the application.

Warning

Non-DBI access to DBI tables is for READING ONLY

Examples:

  1. monitoring historical variations, for example of DataQuality paramters or monitored temperatures
  2. presenting tables (eg ODM)

Reading from DBI is designed around getting the results for a particular context (at a particular time). When the usage does not fit into this pattern alternative access approaches should be considered.

Summary of Non DBI approaches

Python ORMs (Django, SQLAlchemy)

Object relational mappers (ORMs) provide flexible and simple access to Database content, providing row entries as python objects. It is also possible to map to joins between tables with SQLAlchemy.

Note however a limitation of Django, it does not support composite primary keys. As DBI uses composite primary keys (SEQNO,ROW_COUNTER) for payload tables, these cannot be mapped to Django ORM objects in the general case. However if ROW_COUNTER only ever takes one value the mapping can be kludged to work.

SQLAlchemy does not have this limitation. The dybgaudi:Database/NonDbi package provides some infrastructure that facilitates access to DBI tables with SQLAlchemy. For example:

from NonDbi import session_
session = session_("tmp_offline_db")
YReactor = session.dbikls_("Reactor")   ## class mapped to join of payload and validity tables
n = session.query(YReactor).count()
a = session.query(YReactor).filter(YReactor.SEQNO==1).one()   ## both payload and validity attributes
print vars(a)

For details examples see NonDbi

Warning

NB when connecting to multiple DB the above direct session_ approach encounters issue dybsvn:ticket:1254. The workaround is to use NonDbi.MetaDB, usage examples are provided in the API docs NonDbi.MetaDB (which are derived from the source).

ROOT TSQL

Low level access requiring raw SQL, lots of flexibility but is re-inventing the wheel.

High Performance Approaches

When dealing with many thousands/millions of entries the above approaches are slow.

An experimental fork (from Simon) of MySQL-python that provides NumPy arrays from MySQL queries.

This rather simple patch to MySQL-python succeeds to integrate the primary python tools for MySQL access and large array manipulation.

SQLAlchemy access to DBI tables with NonDbi

How can I access the TIMESTART for a particular run ?

In [1]: from NonDbi import session_

In [2]: session_??         ## read docstring + code

In [3]: session = session_("offline_db")

In [4]: YDaqRunInfo = session.dbikls_("DaqRunInfo")

In [5]: session.query(YDaqRunInfo).count()
Out[5]: 11402L

In [6]: YDaqRunInfo.<TAB>
YDaqRunInfo.AGGREGATENO          YDaqRunInfo.TIMESTART            YDaqRunInfo.__dict__             YDaqRunInfo.__hash__             YDaqRunInfo.__ne__               YDaqRunInfo.__subclasses__       YDaqRunInfo.partitionName
YDaqRunInfo.INSERTDATE           YDaqRunInfo.VERSIONDATE          YDaqRunInfo.__dictoffset__       YDaqRunInfo.__init__             YDaqRunInfo.__new__              YDaqRunInfo.__subclasshook__     YDaqRunInfo.runNo
YDaqRunInfo.ROW_COUNTER          YDaqRunInfo.__abstractmethods__  YDaqRunInfo.__doc__              YDaqRunInfo.__instancecheck__    YDaqRunInfo.__reduce__           YDaqRunInfo.__weakref__          YDaqRunInfo.runType
YDaqRunInfo.SEQNO                YDaqRunInfo.__base__             YDaqRunInfo.__eq__               YDaqRunInfo.__itemsize__         YDaqRunInfo.__reduce_ex__        YDaqRunInfo.__weakrefoffset__    YDaqRunInfo.schemaVersion
YDaqRunInfo.SIMMASK              YDaqRunInfo.__bases__            YDaqRunInfo.__flags__            YDaqRunInfo.__le__               YDaqRunInfo.__repr__             YDaqRunInfo._sa_class_manager    YDaqRunInfo.triggerType
YDaqRunInfo.SITEMASK             YDaqRunInfo.__basicsize__        YDaqRunInfo.__format__           YDaqRunInfo.__lt__               YDaqRunInfo.__setattr__          YDaqRunInfo.baseVersion
YDaqRunInfo.SUBSITE              YDaqRunInfo.__call__             YDaqRunInfo.__ge__               YDaqRunInfo.__module__           YDaqRunInfo.__sizeof__           YDaqRunInfo.dataVersion
YDaqRunInfo.TASK                 YDaqRunInfo.__class__            YDaqRunInfo.__getattribute__     YDaqRunInfo.__mro__              YDaqRunInfo.__str__              YDaqRunInfo.detectorMask
YDaqRunInfo.TIMEEND              YDaqRunInfo.__delattr__          YDaqRunInfo.__gt__               YDaqRunInfo.__name__             YDaqRunInfo.__subclasscheck__    YDaqRunInfo.mro

In [6]: q = session.query(YDaqRunInfo)

In [7]: q
Out[7]: <sqlalchemy.orm.query.Query object at 0x920058c>

In [8]: q.count()
Out[8]: 11408L

In [9]: q[0]
Out[9]: <NonDbi.YDaqRunInfo object at 0x9214f8c>

In [11]: p vars(q[-1])
         ...

In [17]: q.filter_by(runNo=12400).one()
Out[17]: <NonDbi.YDaqRunInfo object at 0x91fd4ac>

In [18]: vars(q.filter_by(runNo=12400).one())
Out[18]:
{u'AGGREGATENO': -1L,
 u'INSERTDATE': datetime.datetime(2011, 8, 16, 0, 0, 53),
 u'ROW_COUNTER': 1L,
 'SEQNO': 11185L,
 u'SIMMASK': 1,
 u'SITEMASK': 127,
 u'SUBSITE': 0,
 u'TASK': 0,
 u'TIMEEND': datetime.datetime(2011, 8, 15, 23, 57, 19),
 u'TIMESTART': datetime.datetime(2011, 8, 15, 6, 55, 55),
 u'VERSIONDATE': datetime.datetime(2011, 8, 15, 6, 55, 55),
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x91fd4cc>,
 u'baseVersion': 1L,
 u'dataVersion': 813L,
 u'detectorMask': 230L,
 u'partitionName': 'part_eh1',
 u'runNo': 12400L,
 u'runType': 'Physics',
 u'schemaVersion': 17L,
 u'triggerType': 0L}

In [19]: o = q.filter_by(runNo=12400).one()

In [21]: o.TIMESTART
Out[21]: datetime.datetime(2011, 8, 15, 6, 55, 55)

Note that this SQLAlchmey access to DBI tables is entirely general. For the common task of run lookups DybDbi.IRunLookup has dedicated functionality to allow this.

In [23]: import os

In [24]: os.environ['DBCONF'] = 'offline_db'

In [25]: from DybDbi import IRunLookup

In [26]: irl = IRunLookup( 12400, 12681 )
DbiRpt<GDaqRunInfo>::MakeResultPtr extended query ctor, sqlcontext: 1=1 datasql:runNo in (12400, 12681)
Using DBConf.Export to prime environment with : from DybPython import DBConf ; DBConf.Export('offline_db') ;
dbconf:export_to_env from $SITEROOT/../.my.cnf:~/.my.cnf section offline_db
Successfully opened connection to: mysql://dybdb2.ihep.ac.cn/offline_db
This client, and MySQL server (MySQL 5.0.45-community) does support prepared statements.
DbiCascader Status:-
Status   URL

Closed         0 mysql://dybdb2.ihep.ac.cn/offline_db

In table DaqRunInfo row 0 column 4 (TRIGGERTYPE) value "0" of type Long may be truncated before storing in Int
Caching new results: ResultKey: Table:DaqRunInfo row:GDaqRunInfo.  2 vrecs (seqno min..max;versiondate min..max): 11185..11408;2011-08-15 06:55:55..2011-08-19 02:30:53
DbiTimer:DaqRunInfo: Query done. 2rows,    0.1Kb Cpu   0.5 , elapse   2.0


In [33]: irl[12400].vrec.contextrange
Out[33]:
|site 0x007f|sim 0x007f
        2011-08-15 06:55:55.000000000Z
        2011-08-15 23:57:19.000000000Z