Other sop/dbwrite

Table Of Contents

Previous topic

CalibPmtFineGain

Next topic

DB Table Reading

This Page

Daya Bay Links

Content Skeleton

DB Table Writing

Warning

Always check which Database you are connected to

Before doing any DB operations, avoid accidents/confusion by using status in mysql shell or gDbi.Status() in ipython or checking DBCONF settings used in scripts and the corresponding configuration in your configuration file ~/.my.cnf, see Configuring DB Access for details.

Using DybDbi to write to tmp_offline_db

DybDbi enables usage of DBI from python in a simple way

Configure Target DB

Warning

do not use easily overridden config such as os.environ.setdefault

In [27]: import os

In [28]: os.environ['DBCONF'] = "tmp_offline_db"

CSV handling

DybDbi.CSV provides CSV reading/validation facilities, invalid .csv files throw exceptions

In [1]: from DybDbi import CSV

In [2]: src = CSV( "$DBWRITERROOT/share/DYB_MC_AD1.txt" )

In [3]: src.read()

In [4]: print len(src)

In [5]: print src[0]   ## first source csv row, note the srcline

In [5]: print src[-1]   ## last source csv row, note the srcline

In [6]: print src.fieldnames   ## fields

Map CSV fieldnames to DBI attributes

DybDbi.Mapper provides CSV fieldname to DBI attribute name mappings, and type conversions (CSV returns everything as a string )

Using the same CSV fieldnames as DBI attributes may allow auto mapping, otherwise manual mappings must be set.

Generic Advantage

Each genDbi/DybDbi generated class knows the full specification of itself and the corresponding database table, see DB Table Creation , thus given the mapping from CSV fieldname to DBI attribute the appropriate type conversions are used.

An incomplete mapping throws exceptions:

In [12]: from DybDbi import Mapper, GCalibPmtSpec

In [13]: mpr = Mapper( GCalibPmtSpec, src.fieldnames )

After interactively adding manual mappings, succeed to create the the mapper:

In [16]: mpr = Mapper(  GCalibPmtSpec, src.fieldnames , afterPulse='AfterPulseProb', sigmaSpe='SigmaSpeHigh', prePulse='PrePulseProb', description='Describ' )

In [17]: print mpr

All elements from a .csv are strings. Note the fieldname and type convertion after the mpr instance operates on one src dict item.

In [11]: print src[0]

In [12]: print mpr(src[0])

Apply the mpr instance over all items in the src:

In [13]: dst = map(mpr, src )

In [14]: len(dst)

In [16]: print dst[0]

Create DbiWriter<T> and set ContextRange

In [18]: from DybDbi import Site, SimFlag, TimeStamp, ContextRange

In [19]: wrt = GCalibPmtSpec.Wrt()

In [20]: cr = ContextRange( Site.kAll,  SimFlag.kData|SimFlag.kMC , TimeStamp.GetBOT() ,TimeStamp.GetEOT())

In [21]: wrt.ctx( contextrange=cr, dbno=0, versiondate=TimeStamp(0,0), subsite=0 , task=7, logcomment="DybDbi demo write" )

Notes:

  1. dbno=0, selects the slot in the DB cascade to write to
  2. logcomment="msg" are currently ignored, as DBI is not operating in an Authorising DB manner with a GLOBALSEQNO table, dybsvn:ticket:803 seeks to assess the implications of migrating to Authorising DB usage
  3. versiondate=TimeStamp(0,0) switches on overlay date validity handling

Todo

enforce usage of overlay date in pre-commmit hook

Convert CSV rows and write to DB

In [23]: for r in map(mpr,src):    ## __call__ method of mpr invoked on all src items
   ....:     instance = GCalibPmtSpec.Create( \*\*r )
   ....:     wrt.Write( instance )

Crucial last step that writes the DBI row instances from memory to the DB:

In [25]: assert wrt.Close()   ## DB is accessed here
DbiWrt<GCalibPmtSpec>::Close

(this step is skipped on building these docs)

Command line and filename Parsing

Using some simple python techniques for commandline parsing and filename parsing can avoid the anti-pattern of duplicating a writing script and making small changes.

See the examples:

A simple regular expression is used to match the name of a .csv file, for example :

In [1]: import re

In [2]: ptt = "^(?P<site>All|DayaBay|Far|LingAo|Mid|SAB)_(?P<subsite>AD1|AD2|AD3|AD4|All|IWS|OWS|RPC|Unknown)_(?P<simflag>MC|Data)\.csv"

In [3]: ptn = re.compile( ptt )

In [4]: match = ptn.match( "SAB_AD2_Data.csv" )

In [5]: print match.groupdict()
{'subsite': 'AD2', 'simflag': 'Data', 'site': 'SAB'}

The script then converts these to enum values using the enum FromString functions.

Such an approach has several advantages:

  1. standardized file names
  2. reduced number of parameters/options on commandline
  3. eliminates pointlessly duplicated code

Hands On Exercise 4 : Write $DBWRITERROOT/share/DYB_MC_AD1.txt into CalibPmtSpec

Warning

This exercise requires write permissions into a tmp_username_offline_db database, and a recent NuWa installation

DIY steps:

  1. Use interactive ipython to perform the steps of the previous section
  2. Remember to read the API help as you go along eg: CSV? Mapper?
  3. Use mysql client to examine your additions to the copied DB

Note

Very little added code is required to complete this (hint: manual field name mappings), extra points for using a realistic contextrange

Hint to help with field mapping, genDbi classes know their .spec so ask the class with eg SpecMap():

  In [12]: cls.Spec<TAB>
  cls.SpecKeys  cls.SpecList  cls.SpecMap

  In [12]: cls.SpecMap()
  Out[12]: <ROOT.TMap object ("TMap") at 0xb068dc0>

  In [13]: cls.SpecMap().asdod()
  Out[13]:
  {'AfterPulseProb': {'code2db': '',
                      'codetype': 'double',
                      'dbtype': 'float',
                      'description': 'Probability of afterpulsing',
                      'legacy': 'PMTAFTERPULSE',
                      'memb': 'm_afterPulseProb',
                      'name': 'AfterPulseProb'},
   'DarkRate': {'code2db': '',
                'codetype': 'double',
                'dbtype': 'float',
                'description': 'Dark Rate',
                'legacy': 'PMTDARKRATE',
                'memb': 'm_darkRate',
                'name': 'DarkRate'},
...

  In [14]: cls.SpecKeys().aslist()
  Out[14]:
  ['PmtId',
   'Describ',
   'Status',
   'SpeHigh',
   'SigmaSpeHigh',
   'SpeLow',
   'TimeOffset',
   'TimeSpread',
   'Efficiency',
   'PrePulseProb',
   'AfterPulseProb',
   'DarkRate']

Assigning Applicability of Constants

The arguments to the writer establish the range of applicability of the constants to be written.

from DybDbi import GCalibPmtSpec as cls
from DybDbi import Site, SimFlag, TimeStamp, ContextRange
wrt = cls.Wrt()
cr = ContextRange( Site.kAll,  SimFlag.kData|SimFlag.kMC , TimeStamp.GetBOT() ,TimeStamp.GetEOT())
wrt.ctx( contextrange=cr, dbno=0, versiondate=TimeStamp(0,0), subsite=0 , task=0, logcomment="DybDbi demo write" )

The crucial line:

wrt.ctx( contextrange=cr, dbno=0, versiondate=TimeStamp(0,0), subsite=0 , task=7, logcomment="DybDbi demo write" )

is python shorthand (via DbiCtx.__call__() and setter properties) for defining the attributes of the C++ class DybDbi.DbiCtx defined in dybgaudi:Database/DybDbi/DybDbi/DbiCtx.h The choice of attributes determines which underlying DbiWriter<GTableName> ctor is invoked.

DbiCtx attribute notes
contextrange object described below
dbno slot in the cascade to write to, usually should be 0
versiondate always use TimeStamp(0,0) to signify overlay versioning
subsite Dbi::SubSite enum integer
task Dbi::Task enum integer
logcomment description of update, see dybsvn:ticket:803

The Dbi:: enums are defined in databaseinterface:Dbi.h

Todo

try changing implementation of enums to make them usable from python

Context Range

Example of instanciation from python:

from DybDbi import Site, SimFlag, TimeStamp, ContextRange
cr = ContextRange( Site.kAll,  SimFlag.kData|SimFlag.kMC , TimeStamp.GetBOT() ,TimeStamp.GetEOT())

Warning

All times stored in the offline database must be in UTC, this includes validity range times

For the details on these classes see the API docs DybDbi.ContextRange, DybDbi.TimeStamp

argument notes
siteMask An OR of site enum integers conventions:Site.h
simMask An OR of simflag enum integers conventions:SimFlag.h
tstart Start of validity, possibly corresponding to start of run time
tend End of validity, this will very often be TimeStamp::GetEOT() signifying a far future time

Choosing TIMEEND

Recommendations :

  1. when a definite end time is known use that
  2. use TimeStamp.GetEOT() when the end time is not known
  3. if constants need decommissioning this can be done with payload-less writes (in consultation with DB managers)

Do not adopt a policy of blindly using EOT, use the contextrange that best expresses the nature of that set of constants. Note that decommissioning allows particular context ranges to yield no constants. This is preferable to inappropriate constants as it is trivial to handle in services.

Things not to do:

  1. use random far future times, instead standardize on TimeStamp.GetEOT()

Determine run start time from a run number

First approach that brings the full table into memory:

runNo = 5000
from DybDbi import GDaqRunInfo
rpt = GDaqRunInfo.Rpt()
rpt.ctx( sqlcontext="1=1" , task=-1 , subsite=-1 )  ## wideopen validity query
row = rpt.FirstRowWithIntValueForKey( "RunNo" , runNo )
vrec = rpt.GetValidityRec( row )
print vrec.seqno, vrec.contextrange.timestart, vrec.contextrange.timeend

Second approach that brings in only a single row into memory:

runNo = 5000
from DybDbi import GDaqRunInfo
rpt = GDaqRunInfo.Rpt()
rpt.ctx( sqlcontext="1=1", datasql="runNo = %s" % runNo , task=-1, subsite=-1 )
assert len(rpt) == 1 , "should only be a single entry for the runNo %s " % runNo
row = rpt[0]
vrec = rpt.GetValidityRec( row )
print vrec.seqno, vrec.contextrange.timestart, vrec.contextrange.timeend

A discussion of the relative merits of these approaches is in dybgaudi:Database/DybDbi/tests/test_find_vrec.py

Both techniques require the DaqRunInfo table to be accessible, you can make this so without copying the table to you DB (which would be painful to maintain) by using a DBI cascade. Your script could define a default cascade with:

os.environ.setdefault('DBCONF','tmp_offline_db:offline_db')

Using the above form of setting DBCONF defines the default cascade yet allows commandline environment overrides. More details on DBCONF can be found at N ways to set an envvar

Overlay Versioning Demonstration

The tests in dybgaudi:Database/DybDbi/tests/test_demo_overlay.py demonstrate overlay versioning in action:

test_write_ugly
write a mixture of good and bad constants via GDemo class into table Demo
test_read_ugly
verify read back what was written
test_write_good
an overriding context to correct some the bad constants
test_read_good
verify can read back the overriding constants
test_read_allgood
verify can read back all good constants

By virtue of using overlay versioning, as enabled with versiondate in the write context:

versiondate=TimeStamp(0,0)

Synthetic VERSIONDATE are used which coincide with TIMESTART unless there is data present already, in which case one minute offsets are made in order to override prior writes. In the validity table, there is a one minute VERSIONDATE offset for SEQNO = 11:

mysql> select * from DemoVld ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|     1 | 2010-01-01 00:00:00 | 2010-01-11 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-01-01 00:00:00 | 2011-05-03 08:18:13 |
|     2 | 2010-01-11 00:00:00 | 2010-01-21 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-01-11 00:00:00 | 2011-05-03 08:18:13 |
|     3 | 2010-01-21 00:00:00 | 2010-01-31 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-01-21 00:00:00 | 2011-05-03 08:18:13 |
|     4 | 2010-01-31 00:00:00 | 2010-02-10 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-01-31 00:00:00 | 2011-05-03 08:18:13 |
|     5 | 2010-02-10 00:00:00 | 2010-02-20 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-02-10 00:00:00 | 2011-05-03 08:18:13 |
| **  6 | 2010-02-20 00:00:00 | 2010-03-02 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-02-20 00:00:00 | 2011-05-03 08:18:13 |
|     7 | 2010-03-02 00:00:00 | 2010-03-12 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-03-02 00:00:00 | 2011-05-03 08:18:13 |
|     8 | 2010-03-12 00:00:00 | 2010-03-22 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-03-12 00:00:00 | 2011-05-03 08:18:13 |
|     9 | 2010-03-22 00:00:00 | 2010-04-01 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-03-22 00:00:00 | 2011-05-03 08:18:13 |
|    10 | 2010-04-01 00:00:00 | 2010-04-11 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2010-04-01 00:00:00 | 2011-05-03 08:18:13 |
| ** 11 | 2010-02-20 00:00:00 | 2010-03-02 00:00:00 |      127 |       1 |       0 |    0 |          -1 |*2010-02-20 00:01:00*| 2011-05-03 08:18:13 |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
11 rows in set (0.00 sec)

Payload table for the bad write and its override:

mysql> select * from Demo where seqno in (6,11) ;
+-------+-------------+------+------+
| SEQNO | ROW_COUNTER | Gain | Id   |
+-------+-------------+------+------+
|     6 |           1 | 5000 |    5 |
|     6 |           2 | 5000 |    5 |
|     6 |           3 | 5000 |    5 |
|    11 |           1 |  500 |    5 |
|    11 |           2 |  500 |    5 |
|    11 |           3 |  500 |    5 |
+-------+-------------+------+------+
6 rows in set (0.00 sec)

When not using overlay versioning, by setting versiondate=TimeStamp() or any other time than TimeStamp(0,0) the consequences are:

  1. payload table is the same

  2. test_read_good and test_read_allgood fail

  3. validity table has VERSIONDATE (in this case aligned with INSERTDATE)

    mysql> select * from DemoVld ;
    +-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
    | SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
    +-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
    |     1 | 2010-01-01 00:00:00 | 2010-01-11 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     2 | 2010-01-11 00:00:00 | 2010-01-21 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     3 | 2010-01-21 00:00:00 | 2010-01-31 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     4 | 2010-01-31 00:00:00 | 2010-02-10 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     5 | 2010-02-10 00:00:00 | 2010-02-20 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     6 | 2010-02-20 00:00:00 | 2010-03-02 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     7 | 2010-03-02 00:00:00 | 2010-03-12 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     8 | 2010-03-12 00:00:00 | 2010-03-22 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |     9 | 2010-03-22 00:00:00 | 2010-04-01 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |    10 | 2010-04-01 00:00:00 | 2010-04-11 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    |    11 | 2010-02-20 00:00:00 | 2010-03-02 00:00:00 |      127 |       1 |       0 |    0 |          -1 | 2011-05-03 08:38:18 | 2011-05-03 08:38:18 |
    +-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
    11 rows in set (0.00 sec)
    

Overlay versioning is the default if no versiondate is set in the write context.

Many more examples of DB writing with DybDbi

Many examples of writing to the DB using DybDbi are in dybgaudi:Database/DybDbiTest/tests/test_07.py The full range of DBI functionality is exercised from DybDbi by the tests in dybgaudi:Database/DybDbiTest/tests/

Using DBWriter to write to tmp_offline_db

The dybgaudi:Database/DBWriter is implemented mostly in C++ and is currently rather inflexible. dybsvn:ticket:???