Other sop/overlayversioningbug

Table Of Contents

Previous topic

DBI Internals

Next topic

DBI from C++

This Page

Daya Bay Links

Content Skeleton

DBI Overlay Versioning Bug

See also dybsvn:ticket:948

Background

Root Cause of Issue

When assigning versiondates DBI considers prior validity corresponding only to the TIMESTART of the new entry. In anything but the simplest of overlay histories this leads to a VERSIONDATE that collide with those from subsequent TIMESTART, this does not cause an issue at the initial TIMESTART but it does at later ones.

Validity queries at latter times see multiple validities tied in VERSIONDATE. Which wins is kinda undefined.

Earlier validity can leak forwards in time.

Ordering of validity queries

The ordering of DBI validity queries crucially determines which of overlayed validities wins. Problems with the ordering such as caused by duplicated VERSIONDATE lead to insidous DBI behavior of silently returning wrong values in some regions of (INSERTDATE,TIME).

Validity Query Ordering  
a VERSIONDATE desc historical DBI default : OK without VERSIONDATE duplications
b VERSIONDATE desc, SEQNO desc makes the higher SEQNO of degenerate sets win (intended fix)
c VERSIONDATE desc, SEQNO asc makes the lower SEQNO of degenerate sets win (canary to see problems)

MySQL has implicit SEQNO asc as SEQNO is the PK. But observed diffences between VERSIONDATE desc and VERSIONDATE desc, SEQNO asc indicates this does not follow thru to multi-column orderings.

Comparing VLUT created with different validity query ordering allows ambiguities to be located by varying the way that VERSIONDATE degeneracy is broken. Interpreting differences:

Difference Interpretation
a-b smoking-gun for affliction
b-c/a-c skating on thin-ice

Validity Look Up Tables

DBI Validity Look Up Tables (VLUTs) express all possible DBI validity results(SEQNO) determined by performing DBI queries at all TIMESTARTs with rollbacks to all INSERTDATES. They are presented as SEQNO values within tables with INSERTDATE vertically and TIMESTART horizontally.

Comparisons between such VLUTs enable problem periods to be identified, such comparisons lists the SEQNO values in the cell when differences are found.

DBI Scanning

Scanning scripts to create VLUTs for all contexts in all tables DybDbi.vld.vlut (dybgaudi:Database/DybDbi/python/DybDbi/vld/vlut.py) The results are accessible beneath http://belle7.nuu.edu.tw/dbiscan/ DBI scanning is an expensive operation that should not be done to production DB, instead copy tables from offline_db into local tmp_offline_db.

Within each context variations to default DBI validity ordering are made, and the resulting VLUTs compared for 2 DBCONF:

Current Intended code/SOP changes

  1. extra ordering to break validity degeneracy from VERSIONDATE collisions is mandatory, how to do that fairly clear:

    1. SEQNO desc (best approach as follows in spirit of VERSIONDATE, and makes future degeneracy impossible)
    2. SEQNO asc (lower SEQNO wins in VERSIONDATE collisions, good canary )
    3. TIMESTART asc (future possibilities of degeneracy are not eliminated when using TIMESTART, must use SEQNO for that)
  2. adopt timestart floored VERSIONDATE

    1. reduces occurence of degeneracy, and makes VERSIONDATEs more understandable
  3. enforce no VERSIONDATE collisions

    1. DBI will refuse to write entries with collisions in the written context
    2. table experts will have to manually set VERSIONDATEs to achieve the desired overlaying, this cannot be automated as DBI cannot read the mind of the expert as to the intended overlaying
    3. there is still possibility of collisions when reading from wider contexts than written, thus must still pin down the extra ordering

Planned rollout of DBI modifications

As multiple people need to check tables as migration is done it is not practical to change all tables at once.

Extra Ordering Fix

The extra ordering fix is a fundamental change to DBI:

  1. it touches almost all DBI operations, including reading and writing
  2. it changes the results of validity queries and thus DBI results in many rollback/time regions
  3. almost by definition changes are restricted to afflicted tables CableMap,HardwareID,CalibPmtSpec
  4. implemented within DBI, in the DbiDBProxy.cxx ctor
  5. DybDbi spec key CanFixOrdering = [kTRUE|kFALSE] allows per-table testing/rollout

Note

to minimise behaviour transitions the change in standard table .spec it is preferable to be done in concert with a DB rebuild

The ordering can be overridden (for test purposes only) with:

kls.GetTableProxy().GetDBProxy().SetExtraOrdering("SEQNO desc")

Timestart Flooring and Collision Avoidance

These changes effect writing only and can be controlled table-by-table either:

  1. via the spec writer default wctx strings:

    1. RequireUniqueVersionDate.kTRUE
    2. TimeStartFlooredVersionDate.kTRUE
  2. dynamically on configuring the writer, wrt.ctx( requireuniqueversiondate=True )

DybDbi propagates these ctx settings in void  DbiWrt<T>::MakeWriter():

m_wrt->SetTimeStartFlooredVersionDate(m_ctx.GetTimeStartFlooredVersionDate()) ;
m_wrt->SetRequireUniqueVersionDate(m_ctx.GetRequireUniqueVersionDate()) ;
m_wrt->SetUniqueVersionDateSiteMask(m_ctx.GetUniqueVersionDateSiteMask()) ;
m_wrt->SetUniqueVersionDateSimMask( m_ctx.GetUniqueVersionDateSimMask()) ;

Intended Migration of Existing DB entries

Rebuild Approach

Where possible it is much preferable to correct DBI and rerun over source loadfiles rather than attempting to fix things up and risk inconsistencies/confusion/doubts.

Tables that have been recreated from source reloading, listed with dybaux revision numbers:

CableMap                       31    [4898, 4913, 4914, 4915, 4916, 4917, 4918, 4919, 4920, 4921, 4922, 4923, 4924, 4925, 4926, '...', 4928, 4929, 4930, 4931, 4932, 4937, 4938, 4939, 4940, 4964, 4965, 4966, 4967, 5031, 5061]
CalibPmtSpec                   71    [4942, 4943, 4944, 4945, 4946, 4947, 4948, 4949, 4950, 4951, 4952, 4953, 4954, 4955, 4956, '...', 5011, 5012, 5013, 5023, 5024, 5025, 5026, 5027, 5029, 5036, 5037, 5053, 5054, 5055, 5056]
HardwareID                     22    [4898, 4913, 4914, 4917, 4919, 4920, 4921, 4922, 4923, 4924, 4925, 4926, 4927, 4928, 4929, 4930, 4931, 4932, 4937, 4940, 4967, 5032]

Other tables that need taming:

CalibPmtHighGain               4     [5019, 5042, 5048, 5063]     ## rollinggain entries may be problematic to reproduced
CalibPmtPedBias                1     [5034]

CoordinateAd                   1     [4974]
CoordinateReactor              1     [4974]
Reactor                        1     [5065]

Table Rebuilding and Insertdates

Implications of slated approach for table rebuilding Exceptional Operating Procedures for Major Changes

  1. changes INSERTDATE to the times of the re-insertions
  2. simplest approach would even clump everything under one INSERTDATE MUST AVOID THIS

What to do with insertdates ?

  1. staying honest with INSERTDATE, is important part of DBI contract do not want to kludge this

  2. some tables CableMap/HardwareID hold little information in INSERTDATE

  3. avoid everything going in under a single INSERTDATE

    1. when employing SOP loading with dbaux.py this requires separate dybaux commits (eg for each loaded file)
    2. a long sequence of OVERRIDE commits ? actually only the start scratching commit needs to be OVERRIDE
  4. other tables with significant information in INSERTDATE ?

    1. compromise and forgo rebuilding these : just code changes and future write protections

Table Summary

Table Recreation? Notes
CableMap/HardwareID OK Many duplicated loads and little information in INSERTDATEs (as they recreate prior static history) MOST IN NEED OF REBUILD
CalibPmtSpec OK(?) Sept 30th load not verified 2 ctx with issues, but fairly localized
CalibPmtHighGain not-OK rollinggain entries not easily recreatable  
CableMap/HardwareID

Done:

  1. developed blind duplication script/driver file, which succeeds to precisely recreate tables (warts and all)

  2. developed simple all in one (de-duped) recreation script/driver file that uses the G*Fix classes in dybgaudi/Database/TableTests/TestCableMap/python/TestCableMap to load de-duped driver in fixed mode

    1. see no collisions, ie no need to set manual VERSIONDATEs db.py sssta checks confirm this
  3. payload digest comparisons (at last INSERTDATE) between de-duped tmp_offline_db and current tmp_copy_db

    1. no digest differences for CableMap and HardwareID, thus no difference between payloads returned (at last INSERTDATE)

Todo:

  1. chopped dybaux-rebuild commit script (one commit per input loadfile)
    1. chopping is essential to avoid all under one INSERTDATE
    2. test into an NTU repository

Testing auxcommited updating:

catdir=~/ntudybaux/catalog/tmp_offline_db
svn st $catdir

## just drops and recreates empty tables from dir with exports
cd t
../share/load_static.py -r 0:0 -l INFO --DROP ../recreate/driver_fix.txt

## rdumpcat empties into catalog, as clobbering must use OVERRIDE
db.py tmp_offline_db rdumpcat $catdir  --OVERRIDE

svn st $catdir   ## check only expected chaneged to CableMap/HardwareID/LOCALSEQNO

## as find PhysAd diffs due to local PhysAd testing not in catalog, recreate tmp_offline_db and try again
db.py offline_db dump ~/offline_db.sql
db.py tmp_offline_db load ~/offline_db.sql

## 1st check that current offline_db and the catalog are in sync
db.py tmp_offline_db rdumpcat $catdir
##    ==> find not in sync due to prior dump with local PhysAd tests

## checkout fresh catalog wc
rm -rf $catdir/*        ## leaves .svn to allowing "svn up"
svn up $catdir/..

## OR from scratch approach
( rm -rf $catdir ; cd $(dirname $catdir)  ; svn co http://dayabay.phys.ntu.edu.tw/svn/dybaux/catalog/tmp_offline_db )

## huh, svn checkout is hanging half way through

maybe corruption in recovered dybaux repo, as lsof is pointing to getting stuck on a single rev:

[root@cms02 log]# lsof | grep dybaux
httpd     22803  nobody   16r      REG        3,2   3056650    2667632 /var/scm/svn/dybaux/db/revs/4970

from Trac http://dayabay.phys.ntu.edu.tw/tracs/dybaux/changeset/4970 that is a full catalog recreation revision checkout eventually fails:

A    tmp_offline_db/SimPmtSpec/SimPmtSpecVld.csv
svn: REPORT of '/svn/dybaux/!svn/vcc/default': Could not read response body: Connection reset by peer (http://dayabay.phys.ntu.edu.tw)

repeating, reveals that /var/scm/svn/dybaux/db/revs/4970 again gets stuck in craw * http://subversion.apache.org/faq.html#stuck-bdb-repos * http://svnbook.red-bean.com/en/1.6/svn.reposadmin.maint.html#svn.reposadmin.maint.tk

Note crucial points:

  1. prevent other access to repo while using svnadmin, by sv stop apache
  2. use the appropriate apache user, to avoid subsequenct permission issues

Verification goes thru every revision, taking ~second for each:

sudo -u nobody svnadmin verify /var/scm/svn/dybaux

Tarball appears normal:

tar ztvf  /data/var/scm/backup/dayabay/svn/dybaux/last/dybaux-5069.tar.gz
CalibPmtSpec

The 2 problem CalibPmtSpec contexts

  1. http://belle7.nuu.edu.tw/dbiscan/CalibPmtSpec/aggno-1_simflag1_site32_subsite1_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. Broad ambiguity (many TIMESTARTs) between (29L, 39L), but only for 2 INSERTDATEs:

      2011-06-27 13:34:26
      2011-06-27 13:35:00
      
  2. http://belle7.nuu.edu.tw/dbiscan/CalibPmtSpec/aggno-1_simflag1_site1_subsite1_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

  1. Narrow ambiguity for TIMESTARTs:

    2011-07-07 03:17:21
    2011-07-08 06:23:13
    2011-07-09 05:58:01
    

    for 3 INSERTDATES:

    (85L, 89L)       2011-07-26 07:22:30
    (86L, 90L)       2011-07-26 07:22:56
    (87L, 91L)       2011-07-26 07:23:23
    
CalibPmtHighGain

Live(without rollback) ambiguity for CalibPmtHighGain, with 4 smoking guns from the INSERTDATE 2011-09-30 01:12:27 mixing SEQNO (66L, 390L)

For TIMESTARTs:

2011-07-31 18:59:05
2011-07-31 19:58:03
2011-07-31 19:59:05
2011-07-31 20:58:03

Test Rebuilding

Test rebuilds using non-standard switched on .spec to determine:

  1. how many collisions occur ?
  2. can they be reduced by re-ordering loads ? eg TIMESTART ordering (backdating is know to increase degeneracy liklihood)
  3. how to supply manual VERSIONDATEs ?

Fixup Validity Approach

Appling a correction to all VERSIONDATEs to become TIMESTART floored reducing degeneracy etc.. is easy to do. It is also possible to devise clever schemes that attempt to replicate what DBI would have done with a changed policy.

While easy to go behind DBIs back and diddle with the VERSIONDATEs, it is not easy to know that the resulting changes in DBI results are OK. There are too many changes for it be feasible to confirm that the changes match the requirements of the table experts.

The extent and location of changes are visible from scans and summaries thereof.

Current Transfixion Approach

Done by DybDbi.vld.versiondate (dybgaudi:Database/DybDbi/python/DybDbi/vld/versiondate.py )

  1. copies all DBI tables from tmp_offline_db into fix_offline_db with VERSIONDATE changed to timestart floored scheme.

    1. uses kls.GetTableProxy().QueryOverlayVersionDate DBI call (with timestart floored option) to arrive at the VERSIONDATE
    2. this DBI call is done in the fix_ DB with a SEQNO asc growing validity table

VLUT Comparisons

Summary tables created from the full DBI scan by DybDbi.vld.vsmry (dybgaudi:Database/DybDbi/python/DybDbi/vld/vsmry.py)

Summary Examination

The dynamically derived version of this context summary is at * http://belle7.nuu.edu.tw/dbiscan/Summary/ctxsmry/

An intermediate presentation listing contexts with differences, and including TIME and INSERTDATE ranges afflicted by differences is at * http://belle7.nuu.edu.tw/dbiscan/Summary/difctx/

These summaries reference the full VLUT tables, such as * http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site2_subsite2_task0/tmp_offline_db/vlutorderingSEQNOasc_cf_orderingSEQNOdesc/

tmp_offline_db (copy of offline_db)

Cells show number of ctxs with differences over total number of ctxs.

tn vlut_cf_orderingSEQNOasc.rst vlutorderingSEQNOasc_cf_orderingSEQNOdesc.rst vlut_cf_orderingSEQNOdesc.rst
CableMap 16/35 19/35 19/35
CalibFeeSpec 0/1 0/1 0/1
CalibPmtHighGain 0/6 0/6 0/6
CalibPmtPedBias 0/1 0/1 0/1
CalibPmtSpec 2/9 2/9 2/9
CoordinateAd 0/1 0/1 0/1
CoordinateReactor 0/1 0/1 0/1
Demo 1/1 1/1 1/1
FeeCableMap 0/3 0/3 0/3
HardwareID 14/33 19/33 19/33
Reactor 0/6 0/6 0/6
SimPmtSpec 0/1 0/1 0/1
alltn 33/98 41/98 41/98
  1. Impact of changing from default to controlled 2ndary ordering is apparent
  2. issue is restricted to tables with significant overlaying : CableMap, CalibPmtSpec, HardwareID

fix_offline_db (with VERSIONDATE TIMESTART flooring)

tn vlut_cf_orderingSEQNOasc.rst vlutorderingSEQNOasc_cf_orderingSEQNOdesc.rst vlut_cf_orderingSEQNOdesc.rst
CableMap 0/35 0/35 0/35
CalibFeeSpec 1/1 1/1 1/1
CalibPmtHighGain 0/6 0/6 0/6
CalibPmtPedBias 0/1 0/1 0/1
CalibPmtSpec 0/9 0/9 0/9
CoordinateAd 0/1 0/1 0/1
CoordinateReactor 0/1 0/1 0/1
Demo 0/1 0/1 0/1
FeeCableMap 0/3 0/3 0/3
HardwareID 0/33 0/33 0/33
Reactor 0/6 0/6 0/6
SimPmtSpec 0/1 0/1 0/1
alltn 1/98 1/98 1/98
  1. very little ordering dependency as almost all degeneracy has been eliminated

  2. a single pathalogical context, with a single pair of SEQNO causing issue.

Two timestarts with only 40s between em:

mysql> select * from tmp_offline_db.CalibFeeSpecVld where SEQNO in (97,99)  ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|    97 | 2010-01-07 06:45:28 | 2038-01-19 03:14:07 |       32 |       1 |       1 |    0 |          -1 | 2010-01-07 14:51:28 | 2010-04-08 06:02:50 |
|    99 | 2010-01-07 06:44:12 | 2038-01-19 03:14:07 |       32 |       1 |       1 |    0 |          -1 | 2010-01-07 14:56:12 | 2010-04-08 09:29:48 |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
2 rows in set (0.01 sec)

mysql> select * from fix_offline_db.CalibFeeSpecVld where SEQNO in (97,99)  ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|    97 | 2010-01-07 06:45:28 | 2038-01-19 03:14:07 |       32 |       1 |       1 |    0 |          -1 | 2010-01-07 07:02:16 | 2010-04-08 06:02:50 |
|    99 | 2010-01-07 06:44:12 | 2038-01-19 03:14:07 |       32 |       1 |       1 |    0 |          -1 | 2010-01-07 07:02:16 | 2010-04-08 09:29:48 |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
2 rows in set (0.00 sec)
  1. check this corresponds to transfixion error

Check this corresponds to the error during transfixion of CalibFeeSpec, TODO: avoid this:

INFO:__main__:transfix_tab CalibFeeSpec
WARNING:__main__:transfixion of 89 sees collidingSeqno 83
WARNING:__main__:transfixion of 90 sees collidingSeqno 84
WARNING:__main__:transfixion of 91 sees collidingSeqno 77
WARNING:__main__:transfixion of 92 sees collidingSeqno 85
WARNING:__main__:transfixion of 94 sees collidingSeqno 78
WARNING:__main__:transfixion of 95 sees collidingSeqno 86
WARNING:__main__:transfixion of 96 sees collidingSeqno 87
WARNING:__main__:transfixion of 98 sees collidingSeqno 88
WARNING:__main__:transfixion of 99 sees collidingSeqno 97
WARNING:__main__:transfixion of 100 sees collidingSeqno 79
WARNING:__main__:transfixion of 101 sees collidingSeqno 80

tmp_offline_db_cf_fix_offline_db

It is straightforward to devise a fix, that has this better order change behavior (by removing degeneracy) ... but this is changing the results of DBI validity queries in some regions of (INSERTDATE, TIME).

Counting ctx with difference/totals when comparing tmp_offline_db with fix_offline_db , the corresponding orderings are used. Although that is fairly mute for fix_offline_db as it has very little extra ordering dependency, it is very relevant for tmp_offline_db

tn vlut.rst vlutorderingSEQNOasc.rst vlutorderingSEQNOdesc.rst
CableMap 16/35 1/35 19/35
CalibFeeSpec 1/1 1/1 1/1
CalibPmtHighGain 0/6 0/6 0/6
CalibPmtPedBias 0/1 0/1 0/1
CalibPmtSpec 3/9 1/9 3/9
CoordinateAd 0/1 0/1 0/1
CoordinateReactor 0/1 0/1 0/1
Demo 1/1 1/1 1/1
FeeCableMap 0/3 0/3 0/3
HardwareID 14/33 0/33 19/33
Reactor 0/6 0/6 0/6
SimPmtSpec 0/1 0/1 0/1
alltn 35/98 4/98 43/98
  1. vlutorderingSEQNOasc favors lower SEQNO in degenerate collisions (in tmp_)

    1. this almost matches between tmp_ and fix_ WHY?
    2. because in fix_ degeneracies are almost eliminated, the effect is that lower SEQNO results are peeking out that formerly were improperly overlayed
  2. vlutorderingSEQNOdesc favors higher SEQNO in degenerate collisions #. I initially expected vlutorderingSEQNOdesc.rst would be most matched... #. But on further consideration, this is due to the breaking apart of degeneracy done by the fix

    1. timestart flooring used to create fix_ almost eliminates degenerates (so in cases where
    2. SEQNOasc plucks lower SEQNO from degenerate tmp_offline_db ... which corresponds to the undegenerated fix_offline_db
    1. all academic, the important one is vlut.rst as this comparing current DBI with intended future (modulo fix ordering , but that should not matter)
    2. tmp_offline_db/vlutorderingSEQNOdesc.rst is kinda current offline_db with degenerates fixed in place
  3. Three/Four red herrings ?

    1. http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site32_subsite1_task0/tmp_offline_db_cf_fix_offline_db/vlutorderingSEQNOasc/ ndif:23

    2. http://belle7.nuu.edu.tw/dbiscan/CalibFeeSpec/aggno-1_simflag1_site32_subsite1_task0/tmp_offline_db_cf_fix_offline_db/vlutorderingSEQNOasc/ ndif:94

    3. http://belle7.nuu.edu.tw/dbiscan/CalibPmtSpec/aggno-1_simflag1_site32_subsite2_task0/tmp_offline_db_cf_fix_offline_db/vlutorderingSEQNOasc/ ndif:305 !!!

      1. overlapping with very close TIMESTARTs is suspected to be implicated
  4. better to do the cross comparisons

    1. tmp_offline_db/vlut cf fix_offline_db/SEQNOdesc
    2. tmp_offline_db/vlut cf fix_offline_db/SEQNOasc
  5. BUT these are expected to match the first column however....

    1. need to present 35/98 ctxs with differences palatably ... ( need mismatch fractions within each )

Observations

  1. in tmp_ cf fix_ comparisons tis notable that fix_ usually comes up with lower SEQNO : check generality of this

Sampling VLUT extracts

tmp_offline_db_cf_fix_offline_db vlut orderingSEQNOdesc insertdates:19 timestarts:18 ndif:19

insertdate 2009-03-16 11:27:43 2009-06-03 21:36:27 2010-12-07 19:14:20 2011-02-08 15:49:51 2011-02-22 12:38:11 2011-02-22 17:08:51 2011-02-22 18:07:45 2011-02-23 10:49:36 2011-03-25 19:31:49 2011-04-01 17:29:23 2011-04-18 03:42:40 2011-04-19 23:56:10 2011-05-03 02:35:09 2011-05-05 17:42:22 2011-05-23 08:22:19 2011-05-23 13:09:43 2011-06-01 00:00:00
2011-06-24 05:02:54 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
2011-06-24 05:03:04 29 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51
2011-06-24 05:03:39 29 51 59 73 73 73 73 73 73 123 139 155 171 187 187 187 187
2011-06-24 05:04:21 29 51 209 (209L, 73L) (209L, 73L) (209L, 73L) (209L, 73L) (209L, 73L) (209L, 73L) 123 139 155 171 187 187 187 187
2011-06-24 05:04:44 29 51 209 223 223 223 223 223 223 (223L, 123L) 139 155 171 187 187 187 187
2011-06-24 05:05:08 29 51 209 223 223 223 223 223 223 (223L, 123L) 139 155 171 187 187 187 187

The first pair:

mysql> select * from tmp_offline_db.HardwareIDVld where SEQNO in (209,73) ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|    73 | 2011-02-08 15:49:51 | 2038-01-19 03:14:07 |        1 |       2 |       2 |    0 |          -1 | 2009-03-16 11:33:43 | 2011-06-24 05:03:39 |
|   209 | 2010-12-07 19:14:20 | 2038-01-19 03:14:07 |        1 |       2 |       2 |    0 |          -1 | 2009-03-16 11:33:43 | 2011-06-24 05:04:21 |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
2 rows in set (0.00 sec)

Observe:

  1. a later insert is doing a backdated(earlier TIMESTART) override, this is prone to degeneracy issues
  2. looks like the fix might be scrubbing an intended backdated override in this case ?
  3. Earlier validity can leak forwards in time, but that is sometimes the desired.

Alternative to Recreat Rather Than Fix Approach

CableMap/HardwareID

CableMap + HardwareID can be recreated (after quite a bit of detective work dealing with duplications and code changes impacting results: takebogus ) with:

../share/recreate_from_scratch.sh
  1. http://dayabay.ihep.ac.cn/tracs/dybsvn/ticket/880
  2. http://dayabay.ihep.ac.cn/tracs/dybsvn/browser/dybgaudi/trunk/Database/TableTests/TestCableMap/share/dlfcrs.sh?rev=12754
  3. http://dayabay.ihep.ac.cn/tracs/dybsvn/browser/dybgaudi/trunk/Database/TableTests/TestCableMap/doc/notes.rst?rev=12373
  4. http://dayabay.ihep.ac.cn/tracs/dybsvn/log/dybgaudi/trunk/DataModel/DataSvc/share/feeCableMap.txt log of feeCableMap.txt

Pumping dybaux history with auxlog.py

#. Most INSERTDATE groupings correspond to single TIMESTARTs, except h3#14, c2#2, c5#16 Notes:

  1. the hN and cN correlate these auxlog commits with the INSERTDATE groupings below, indicating consistent SEQNO

    1. multi-timestart
    2. INSERTDATE correspondence only when both CableMap and HardwareID updated together
    3. need to make association between load files and commits
    4. INSERTDATE groupings works well for tmp_copy_db (a copy of offline_db ) due to artificial ff alignment, not so clear for local recreation in tmp_offline_db
    5. Most INSERTDATE groupings correspond to single TIMESTARTs, except h3:14, c2:2, c5:16

Re-creation Discrepancy in FEC loading : resolved with takebogus option

are in lock step until hit the fec:

CableMap   : left TIMESTART 2011-05-23 08:22:19 [18][6 ][r12183:fecCableMap_fake_old.txt:viktor:6] having 3 SEQNO [245, 246, 247]
HardwareID : left TIMESTART 2011-05-23 08:22:19 [18][6 ][r12183:fecCableMap_fake_old.txt:viktor:6] having 3 SEQNO [203, 204, 205]
  1. confirmed that RPC bogosity code changes are changing selection of entries ?

Relevant Tickets

  1. dybsvn:ticket:892 fixing feeCableMap.txt using Database/TableTests/TestCableMap/share/fix_static_feeCableMap.py. The fix is in dybsvn:r12820
  2. dybsvn:ticket:928 tracing warning
  3. dybsvn:ticket:937 bogus reporting due to not yet committed to DB
  4. dybsvn:ticket:940 splitting a dybaux commit

Whole table groupby INSERTDATE for overview

Grouping by INSERTDATE is informative for tmp_copy_db but not for the locally recreated, due to the fastforward clumping under a single INSERTDATE done by SOP.

Groupings:

mysql> select min(SEQNO),max(SEQNO),min(TIMESTART),max(TIMESTART),count(distinct(TIMESTART)) as dTS,INSERTDATE from tmp_copy_db.HardwareIDVld group by INSERTDATE ;
+------------+------------+---------------------+---------------------+-----+---------------------+
| min(SEQNO) | max(SEQNO) | min(TIMESTART)      | max(TIMESTART)      | dTS | INSERTDATE          |
+------------+------------+---------------------+---------------------+-----+---------------------+
|          1 |         42 | 2009-03-16 11:27:43 | 2009-03-16 11:27:43 |   1 | 2011-06-24 05:02:54 |
|         43 |         58 | 2009-06-03 21:36:27 | 2009-06-03 21:36:27 |   1 | 2011-06-24 05:03:04 |
|         59 |        208 | 2010-12-07 19:14:20 | 2011-05-23 13:09:43 |  14 | 2011-06-24 05:03:39 |
|        209 |        222 | 2010-12-07 19:14:20 | 2010-12-07 19:14:20 |   1 | 2011-06-24 05:04:21 |
|        223 |        236 | 2011-02-08 15:49:51 | 2011-02-08 15:49:51 |   1 | 2011-06-24 05:04:44 |
|        237 |        248 | 2011-02-22 12:38:11 | 2011-02-22 12:38:11 |   1 | 2011-06-24 05:05:08 |
|        249 |        254 | 2011-02-22 17:08:51 | 2011-02-22 17:08:51 |   1 | 2011-06-24 05:05:34 |
|        255 |        260 | 2011-02-22 18:07:45 | 2011-02-22 18:07:45 |   1 | 2011-06-24 05:05:58 |
|        261 |        266 | 2011-02-23 10:49:36 | 2011-02-23 10:49:36 |   1 | 2011-06-24 05:06:24 |
|        267 |        272 | 2011-03-25 19:31:49 | 2011-03-25 19:31:49 |   1 | 2011-06-24 05:06:52 |
|        273 |        288 | 2011-04-01 17:29:23 | 2011-04-01 17:29:23 |   1 | 2011-06-24 05:07:18 |
|        289 |        304 | 2011-04-18 03:42:40 | 2011-04-18 03:42:40 |   1 | 2011-06-24 05:07:47 |
|        305 |        320 | 2011-04-19 23:56:10 | 2011-04-19 23:56:10 |   1 | 2011-06-24 05:08:15 |
|        321 |        336 | 2011-05-03 02:35:09 | 2011-05-03 02:35:09 |   1 | 2011-06-24 05:08:46 |
|        337 |        352 | 2011-05-05 17:42:22 | 2011-05-05 17:42:22 |   1 | 2011-06-24 05:09:17 |
|        353 |        355 | 2011-05-23 08:22:19 | 2011-05-23 08:22:19 |   1 | 2011-06-24 05:09:47 |
|        356 |        358 | 2011-05-23 13:09:43 | 2011-05-23 13:09:43 |   1 | 2011-06-24 05:10:23 |
|        359 |        372 | 2010-12-07 19:14:20 | 2010-12-07 19:14:20 |   1 | 2011-06-28 02:26:02 |
|        373 |        386 | 2011-06-01 00:00:00 | 2011-06-01 00:00:00 |   1 | 2011-08-09 06:35:49 |
+------------+------------+---------------------+---------------------+-----+---------------------+
19 rows in set (0.00 sec)

mysql> select min(SEQNO),max(SEQNO),min(TIMESTART),max(TIMESTART),count(distinct(TIMESTART)) as dTS,INSERTDATE from tmp_copy_db.CableMapVld group by INSERTDATE ;
+------------+------------+---------------------+---------------------+-----+---------------------+
| min(SEQNO) | max(SEQNO) | min(TIMESTART)      | max(TIMESTART)      | dTS | INSERTDATE          |
+------------+------------+---------------------+---------------------+-----+---------------------+
|          1 |         42 | 2009-03-16 11:27:43 | 2009-03-16 11:27:43 |   1 | 2011-06-24 05:02:54 |
|         43 |         59 | 2009-06-03 21:36:27 | 2009-12-27 23:52:51 |   2 | 2011-06-24 05:03:04 |
|         60 |         60 | 2009-12-27 23:52:51 | 2009-12-27 23:52:51 |   1 | 2011-06-24 05:03:14 |
|         61 |         61 | 2010-03-02 11:34:36 | 2010-03-02 11:34:36 |   1 | 2011-06-24 05:03:24 |
|         62 |        252 | 2010-06-11 23:28:25 | 2011-05-23 13:09:43 |  16 | 2011-06-24 05:03:39 |
|        253 |        254 | 2010-09-08 17:12:31 | 2010-09-08 17:12:31 |   1 | 2011-06-24 05:03:59 |
|        255 |        270 | 2010-12-07 19:14:20 | 2010-12-07 19:14:20 |   1 | 2011-06-24 05:04:21 |
|        271 |        284 | 2011-02-08 15:49:51 | 2011-02-08 15:49:51 |   1 | 2011-06-24 05:04:44 |
|        285 |        298 | 2011-02-22 12:38:11 | 2011-02-22 12:38:11 |   1 | 2011-06-24 05:05:08 |
|        299 |        312 | 2011-02-22 17:08:51 | 2011-02-22 17:08:51 |   1 | 2011-06-24 05:05:34 |
|        313 |        326 | 2011-02-22 18:07:45 | 2011-02-22 18:07:45 |   1 | 2011-06-24 05:05:58 |
|        327 |        340 | 2011-02-23 10:49:36 | 2011-02-23 10:49:36 |   1 | 2011-06-24 05:06:24 |
|        341 |        354 | 2011-03-25 19:31:49 | 2011-03-25 19:31:49 |   1 | 2011-06-24 05:06:52 |
|        355 |        370 | 2011-04-01 17:29:23 | 2011-04-01 17:29:23 |   1 | 2011-06-24 05:07:18 |
|        371 |        386 | 2011-04-18 03:42:40 | 2011-04-18 03:42:40 |   1 | 2011-06-24 05:07:47 |
|        387 |        402 | 2011-04-19 23:56:10 | 2011-04-19 23:56:10 |   1 | 2011-06-24 05:08:15 |
|        403 |        418 | 2011-05-03 02:35:09 | 2011-05-03 02:35:09 |   1 | 2011-06-24 05:08:46 |
|        419 |        434 | 2011-05-05 17:42:22 | 2011-05-05 17:42:22 |   1 | 2011-06-24 05:09:17 |
|        435 |        437 | 2011-05-23 08:22:19 | 2011-05-23 08:22:19 |   1 | 2011-06-24 05:09:47 |
|        438 |        440 | 2011-05-23 13:09:43 | 2011-05-23 13:09:43 |   1 | 2011-06-24 05:10:23 |
|        441 |        441 | 2010-03-02 11:34:36 | 2010-03-02 11:34:36 |   1 | 2011-06-28 02:24:13 |
|        442 |        442 | 2010-06-11 23:28:25 | 2010-06-11 23:28:25 |   1 | 2011-06-28 02:24:50 |
|        443 |        444 | 2010-09-08 17:12:31 | 2010-09-08 17:12:31 |   1 | 2011-06-28 02:25:25 |
|        445 |        460 | 2010-12-07 19:14:20 | 2010-12-07 19:14:20 |   1 | 2011-06-28 02:26:02 |
|        461 |        474 | 2011-06-01 00:00:00 | 2011-06-01 00:00:00 |   1 | 2011-08-09 06:35:22 |
|        475 |        475 | 2011-06-22 03:02:52 | 2011-06-22 03:02:52 |   1 | 2011-09-01 02:22:58 |
+------------+------------+---------------------+---------------------+-----+---------------------+
26 rows in set (0.00 sec)

DBI Validity Ordering Change

dybsvn:r14814 changes DBI reading and writing, now validity ordering uses VERSIONDATE desc, SEQNO desc rather than VERSIONDATE desc

This means:

  1. higher SEQNO breaks ties in VERSIONDATE collisions, making overlay versioning do what it meant to do
  2. the fix changes many SEQNO returned by DBI queries, in small pockets of the INSERTDATE/TIMESTART plane
  3. some actual payloads returned are changed, in very small regions of INSERTDATE/TIMESTART

The regions of INSERTDATE/TIMESTART impacted are reported below

Payload Digest Rather than SEQNO comparison

Created with:

dybdbi
cd python/DybDbi/vld
~/rst/bin/python vlut.py --table CableMap --ctx ALL
~/rst/bin/python vlut.py --table HardwareID --ctx ALL

## machinery  misbehaviour regards index overwriting, and self recursive index,  but re-running with '''--ctx ALL''' seems to iron out

CableMap

Listing ctx and VLUT regions of payload change between legacy and extra ordering fixed SEQNO desc

  1. http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site1_subsite7_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. INSERTDATE 2011-06-24 05:09:47 3 ambi-cells (250L, 437L) from TIMES: 2011-05-23 13:09:43 2011-06-01 00:00:00 2011-06-22 03:02:52
  2. http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site2_subsite7_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. INSERTDATE 2011-06-24 05:09:47 3 ambi-cells (248L, 435L) from TIMES: 2011-05-23 13:09:43 2011-06-01 00:00:00 2011-06-22 03:02:52
  3. http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site4_subsite7_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. INSERTDATE 2011-06-24 05:09:47 3 ambi-cells (249L, 436L) TIMES 2011-05-23 13:09:43 2011-06-01 00:00:00 2011-06-22 03:02:52
  4. http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site2_subsite5_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. INSERTDATE 2011-06-24 05:04:21 1 ambi-cell (92L,268L) 2011-02-08 15:49:51
  5. http://belle7.nuu.edu.tw/dbiscan/CableMap/aggno-1_simflag2_site1_subsite5_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. INSERTDATE 2011-06-24 05:04:21 1 ambi-cell 87L, 263L) 2011-02-08 15:49:51

HardwareID

  1. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site1_subsite6_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. I 2011-06-24 05:04:21 T 2011-02-08 15:49:51 (80L, 217L)
  2. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site1_subsite7_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. I 2011-06-24 05:09:47 T 2011-05-23 13:09:43 2011-06-01 00:00:00 (208L, 355L)
  3. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site1_subsite5_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

    1. I 2011-06-24 05:04:21 T 2011-02-08 15:49:51 (79L, 216L)
  4. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site2_subsite5_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/

  1. I 2011-06-24 05:04:21 T 2011-02-08 15:49:51 (84L, 221L)
  1. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site2_subsite6_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/
  1. I 2011-06-24 05:04:21 T 2011-02-08 15:49:51 (74L, 210L)
  1. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site2_subsite7_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/
  1. I 2011-06-24 05:09:47 T 2011-05-23 13:09:43 2011-06-01 00:00:00 (206L, 353L)
  1. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site4_subsite5_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/
  1. I 2011-06-24 05:04:21 T 2011-02-08 15:49:51 (76L, 212L)
  1. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site4_subsite6_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/
  1. I 2011-06-24 05:04:21 T 2011-02-08 15:49:51 (75L, 211L)
  1. http://belle7.nuu.edu.tw/dbiscan/HardwareID/aggno-1_simflag2_site4_subsite7_task0/tmp_offline_db/vlut_cf_orderingSEQNOdesc/
  1. I 2011-06-24 05:09:47 T 2011-05-23 13:09:43 2011-06-01 00:00:00 (207L, 354L)