Other sop/dbi

Table Of Contents

Previous topic

DB Definitions

Next topic

Rules for Code that writes to the Database

This Page

Daya Bay Links

Content Skeleton

DBI Very Briefly

Validity Tables

DBI validity tables are the heart of how DBI operates:

mysql> describe TableNameVld ;
+-------------+------------+------+-----+---------+----------------+
| Field       | Type       | Null | Key | Default | Extra          |
+-------------+------------+------+-----+---------+----------------+
| SEQNO       | int(11)    | NO   | PRI | NULL    | auto_increment |
| TIMESTART   | datetime   | NO   |     | NULL    |                |
| TIMEEND     | datetime   | NO   |     | NULL    |                |
| SITEMASK    | tinyint(4) | YES  |     | NULL    |                |
| SIMMASK     | tinyint(4) | YES  |     | NULL    |                |
| SUBSITE     | int(11)    | YES  |     | NULL    |                |
| TASK        | int(11)    | YES  |     | NULL    |                |
| AGGREGATENO | int(11)    | YES  |     | NULL    |                |
| VERSIONDATE | datetime   | NO   |     | NULL    |                |
| INSERTDATE  | datetime   | NO   |     | NULL    |                |
+-------------+------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)

Validity Table Timestamps

Each validity entry includes 4 timestamps:

TIMESTART
start of context range
TIMEEND
end of context range, often end-of-time
VERSIONDATE
used by overlay versioning to distinguish otherwise equal validities, overlay versioning usage if signalled by using versiondate=TimeStamp(0,0) in writer contexts. allowing easy overriding ... just rewrite with same contextrange to override
INSERTDATE
the actual insert time, used by rollback to select a snapshot of DB at a chosen time (or times ... this can be a per-table time) This means : NEVER CHEATED ... should always be actual UTC now of the offline_db update.

How these times fit in

Stating the obvious, in order to clarify the large numbers of timestamps floating around:

The timestamps embedded into real datafiles and simulation files, form the contexts used to make DBI queries so database validity TIMESTART/TIMEEND must be appropriate for those embedded timestamps.

Choosing Validity Ranges

The choice of validity range should be made as appropriate to the parameters.

In the case of MC production runs which have pre-defined non-overlapping and monotonically increasing time ranges, it is straighforward to choose TIMESTART. Where you suspect validity may extend beyond a single production using TimeStamp.GetEOT() for TIMEEND is the appropriate choice. Subsequent writes can of course override these entries.

Rollback and Production

The DBI ROLLBACK feature is very important for controlled production usage of DBI. A global timestamp or per-table timestamps are defined that all DBI queries incorporate, allowing the DB tables seen by all production jobs(or reruns thereof) to be the same no matter what DB updates are done in the meantime.

Reprocessing an existing dataset following DB updates with improved parameters would entail definition of a new set of rollback dates to benefit from the improved parameters.

Note that these rollback dates pertain only to the INSERTDATE used. This is orthogonal to the TIMESTART/TIMEEND which pertains to the timestamps which are embedded into the files.

Note this presupposes DBI is used appropriately:

  1. no deletions
  2. no changes to existing entries
  3. only additions are permissible

Deletions/changes are only allowed at the initial setup stage.

Using Rollback to Debug/Workaround problem DB entries

To verify that a DB update is causing issues or to workaround such problems it is possible to utilise DBI rollback to return to a prior state of all or some of the tables in the DB. This works by applying INSERTDATE < rollbackdate cuts .

For example setting the rollback date for all tables:

DBCONF_ROLLBACK="* = 2011-10-01 08:08:08" nuwa.py ...etc...

Single tables:

DBCONF_ROLLBACK=”CalibPmtSpec = 2011-10-01 08:08:08” nuwa.py ....etc...

Multiple tables via comma delimited mappings:

DBCONF_ROLLBACK="CalibPmtSpec = 2011-10-01 08:08:08,EnergyRecon = 2011-05-01 08:08:08, "  nuwa.py ....etc...

Wildcarded sets of tables:

DBCONF_ROLLBACK="Cal* = 2011-10-01 08:08:08"  nuwa.py ....etc...

Combine global setting with table specific ones using comma delimited string:

DBCONF_ROLLBACK="* = 2011-10-01 08:08:08,Cal* = 2011-10-01 08:08:08"

The above envvar setting approach is bash specific, if you must use inferior shells you will probably need to ranslate into “setenv DBCONF_ROLLBACK ... ; nuwa.py ...”

What is TASK for ?

TASK is usually left at its default value of zero, greater than zero values are used for testing out non-default algorithms.

Features of Clean Validity Tables

  1. SEQNO starting from 1 and with no gaps, with maximum corresponding to the LASTUSEDSEQNO
  2. Far future times all using TimeStamp.GetEOT() namely 2038-01-19 03:14:07
  3. Overlay versioning in use, see below

An example of a clean table with SEQNO = 1:213:

mysql> select * from CableMapVld ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|    *1*| 2009-03-16 11:27:43 | 2038-01-19 03:14:07 |        1 |       2 |       2 |    0 |          -1 | 2009-03-16 11:27:43 | 2011-06-14 06:58:50 |
|     2 | 2009-03-16 11:27:43 | 2038-01-19 03:14:07 |        2 |       2 |       6 |    0 |          -1 | 2009-03-16 11:27:43 | 2011-06-14 06:58:50 |
|     3 | 2009-03-16 11:27:43 | 2038-01-19 03:14:07 |        4 |       2 |       6 |    0 |          -1 | 2009-03-16 11:27:43 | 2011-06-14 06:58:50 |
|     4 | 2009-03-16 11:27:43 | 2038-01-19 03:14:07 |        4 |       2 |       5 |    0 |          -1 | 2009-03-16 11:27:43 | 2011-06-14 06:58:50 |
|     5 | 2009-03-16 11:27:43 | 2038-01-19 03:14:07 |        4 |       2 |       4 |    0 |          -1 | 2009-03-16 11:27:43 | 2011-06-14 06:58:50 |
...
|   208 | 2011-05-23 08:22:19 | 2038-01-19 03:14:07 |        2 |       2 |       7 |    0 |          -1 | 2011-05-23 08:22:19 | 2011-06-14 06:58:50 |
|   209 | 2011-05-23 08:22:19 | 2038-01-19 03:14:07 |        4 |       2 |       7 |    0 |          -1 | 2011-05-23 08:22:19 | 2011-06-14 06:58:50 |
|   210 | 2011-05-23 08:22:19 | 2038-01-19 03:14:07 |        1 |       2 |       7 |    0 |          -1 | 2011-05-23 08:22:19 | 2011-06-14 06:58:50 |
|   211 | 2011-05-23 13:09:43 | 2038-01-19 03:14:07 |        2 |       2 |       7 |    0 |          -1 | 2011-05-23 08:23:19 | 2011-06-14 06:58:50 |
|   212 | 2011-05-23 13:09:43 | 2038-01-19 03:14:07 |        4 |       2 |       7 |    0 |          -1 | 2011-05-23 08:23:19 | 2011-06-14 06:58:50 |
|  *213*| 2011-05-23 13:09:43 | 2038-01-19 03:14:07 |        1 |       2 |       7 |    0 |          -1 | 2011-05-23 08:23:19 | 2011-06-14 06:58:50 |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
*213* rows in set (0.34 sec)

LOCALSEQNO table contains the last used SEQNO for each table, 213 for CableMap:

mysql> select * from LOCALSEQNO ;
+--------------+---------------+
| TABLENAME    | LASTUSEDSEQNO |
+--------------+---------------+
| *            |             0 |
| CalibFeeSpec |           113 |
| CalibPmtSpec |            29 |
| FeeCableMap  |             3 |
| CableMap     |           213 |   <<<<<<<<  213 <<<<<<
| HardwareID   |           172 |
+--------------+---------------+
6 rows in set (0.14 sec)

Overlay Versioning

Overlay versioning is visible by the 1 min differences in VERSIONDATE between overlapping validities. These VERSIONDATE are filled in automatically by DBI when signalled to do so by the special context argument versiondate=TimeStamp(0,0) . As DBI validity queries are done in descending VERSIONDATE order with the SQL: ordered by VERSIONDATE desc this allows updates to prior entries to be simply achieved by re-writing with the same contextrange and with overlay versioning enabled.

Query to find overlapping validities, that are distinguished by VERSIONDATE:

mysql> select * from CableMapVld where sitemask=1 and subsite=1 ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|    14 | 2009-03-16 11:27:43 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:27:43 | 2011-06-14 06:58:50 |
|    22 | 2009-06-03 21:36:27 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:28:43 | 2011-06-14 06:58:50 |
|    35 | 2010-12-07 19:14:20 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:29:43 | 2011-06-14 06:58:50 |
|    57 | 2011-02-08 15:49:51 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:30:43 | 2011-06-14 06:58:50 |
|    71 | 2011-02-22 12:38:11 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:31:43 | 2011-06-14 06:58:50 |
|    85 | 2011-02-22 17:08:51 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:32:43 | 2011-06-14 06:58:50 |
|    99 | 2011-02-22 18:07:45 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:33:43 | 2011-06-14 06:58:50 |
|   113 | 2011-02-23 10:49:36 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:34:43 | 2011-06-14 06:58:50 |
|   127 | 2011-03-25 19:31:49 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:35:43 | 2011-06-14 06:58:50 |
|   143 | 2011-04-01 17:29:23 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:36:43 | 2011-06-14 06:58:50 |
|   159 | 2011-04-18 03:42:40 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:37:43 | 2011-06-14 06:58:50 |
|   175 | 2011-04-19 23:56:10 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:38:43 | 2011-06-14 06:58:50 |
|   191 | 2011-05-03 02:35:09 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:39:43 | 2011-06-14 06:58:50 |
|   207 | 2011-05-05 17:42:22 | 2038-01-19 03:14:07 |        1 |       2 |       1 |    0 |          -1 | 2009-03-16 11:40:43 | 2011-06-14 06:58:50 |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
14 rows in set (0.09 sec)

DBI Q and A

Doesnt TIMEEND of EOT overshadow valid entries when we correct an earlier entry ?

This is the most frequently stated fallacy about DBI. See the above section Overlay Versioning. Essentially DBI always orders validities(Vld entries) by VERSIONDATE, NOT by INSERTDATE. This means that by virtue of overlay versioning (VERSIONDATE is derived from TIMESTART with minute offsets) you can go back and override a former commit (still using EOT) and not override your recent entries for subsequent times.

How do we make sure not to end up with SEQNO gaps ?

  1. use DBI/DybDbi to prepare updates
  2. avoid raw SQL fixes or doing nasty things like editing your ascii catalogs
  3. be careful regards re-running updates more that once, you can always start with a fresh tmp_offline_db if you do this by mistake
  4. check LOCALSEQNO table after updates, it should contain the LASTUSEDSEQNO for your updated tables

If my update has a given SEQNO in my tmp_offline_db, will it have the same in the offline_db ?

Yes, but it is unwise to do anything based on hardcoded SEQNO

Your table in tmp_offline_db is rdumpcat into dybaux then rloadcat into offline_db in a way that keeps the content exactly the same and SEQNO is preserved. The only thing that is changed is the INSERTDATE, which is fastforwarded to the UTC now date of the actual insert.

What are fastforward commits ? Why are they needed ?

Fastforward commits are changes to the INSERTDATE validities that are made by the script (dbaux.py) that DB managers use to propagate a dybaux catalog commits into offline_db. After updates are propagated these working copy changes are committed to dybaux.

This fastforwarding of INSERTDATE to the time of the actual offline_db insert in required to avoid windows of ambiguity between the time the insert is done into tmp_offline_db and the time that gets propagated into offline_db.

Or is the offline_db smart so that it automatically gives it the next number in the sequence ?

DBI supplies the next SEQNO in your tmp_offline_db, the steps from there to offline_db simply copy it.

How can SEQNO be missed?

Either directly by deletions or from failure modes, eg:

  1. a re-run that doubles up your SEQNO in LOCALSEQNO, followed by cleanup of Payload and Vld but not LOCALSEQNO entry could result in missing many SEQNO

Automatic and manual validations should pick up such issues.