Other sop/dbops

Table Of Contents

Previous topic

Configuring DB Access

Next topic

Table Specific Instructions

This Page

Daya Bay Links

Content Skeleton

DB Table Updating Workflow

Objectives

The mission critical nature of calibration parameters requires DB updating procedures to be:

  1. highly controlled
  2. carefully recorded
  3. easily reproducible

Also DB updating procedures should be:

  1. straightforward and quick

Suggestions for amendments to the workflow steps presented should be made in dybsvn:ticket:607.

Workflow Outline

  1. Calibration expert obtains values intended to be inserted into offline_db
  2. Calibration expert makes a temporary copy of the central DB tmp_offline_db
  3. Calibration expert inserts values into his/her copy of the central DB tmp_offline_db
  4. Calibration expert validates new values inserted into their tmp_offline_db
  5. Calibration expert contacts DB Managers (Liang/Simon B) and request update propagated from tmp_offline_db into offline_db, Demonstrate tests and Request Propagation
  6. Following successful validation DB Managers propagate updates into Master offline_db
  7. Master offline_db DB is propagated to slaves

Rationale for this workflow

Why such caution ? Why not just write directing into offline_db ?

  1. Avoid inconsistent/conflicting updates
  2. Avoid inconsistencies as a result of mysql slave propagation (it may be necessary to briefly halt propagation while updates are made)

Planning Update Size and Frequency

Points to bear in mind when planning update size and frequency:

  1. not too big to cause handling problems, aim to not exceed ~few MB of csv change (guesswork yet to be informed by experience)
  2. not too small, if that necessitates repetition - to avoid manual labor and delays
  3. each dybaux commit is loaded into offline_db with a single INSERTDATE, that means that you cannot distinguish via ROLLBACK within a single commit

workflow steps in detail

Section Names or Database names

This documentation refers to databases by their configuration file section names such as tmp_offline_db rather than by the actual database name (eg tmp_username_offline_db), as this parallels the approach taken by the tools: db.py and DBI.

See Configuring DB Access for details of configuration file ~/.my.cnf creation and troubleshooting.

Copy offline_db to tmp_offline_db

The db.py script (a sibling of nuwa.py) is used to perform the copy, by loading and dumping tables.

digraph copy {
pad=0.2 ;

offdb [ style="filled", label="offline_db"] ;
dump  [ shape="note", label="/tmp/offline_db.sql" ];
tmpdb [ label="tmp_offline_db" ];

offdb -> dump [ label="dump" ];
dump -> tmpdb [ label="load" ] ;

{ rank=same; offdb; tmpdb }
}

Make the copy in 2 steps

  1. Dump selection of offline_db tables to be updated into mysqldump file using the -t/--tselect option with a comma delimited list of payload table names
db.py -t CalibPmtSpec        offline_db dump offline_db.sql
db.py -t CableMap,HardwareID offline_db dump offline_db.sql
  1. Load the mysqldump file into temporary database copy:
db.py tmp_offline_db load offline_db.sql

Note that no special privileges are needed in offline_db but DATABASE DROP and DATABASE CREATE privileges are needed in tmp_offline_db. Also the tmp_offline_db does not need to be local. Example of tmp_offline_db content after the load containing just the tables to be updated (and a partial LOCALSEQNO table):

mysql> show tables ;
+--------------------------+
| Tables_in_tmp_offline_db |
+--------------------------+
| CableMap                 |
| CableMapVld              |
| HardwareID               |
| HardwareIDVld            |
| LOCALSEQNO               |
+--------------------------+
5 rows in set (0.00 sec)

mysql> select * from LOCALSEQNO ;
+------------+---------------+
| TABLENAME  | LASTUSEDSEQNO |
+------------+---------------+
| *          |             0 |
| HardwareID |           386 |
| CableMap   |           475 |
+------------+---------------+
3 rows in set (0.00 sec)

For readonly access to other tables such as DaqRunInfo use DBI cascades configured with a colon delimited DBCONF.

Perform updates and validation on tmp_offline_db

Warning

do not attempt to use raw SQL or hand edited .csv

DB Writing must use DBI eg

  1. service approach dybgaudi:Database/DBWriter
  2. python script using DybDbi, see DB Table Writing

digraph update {
pad=0.2 ;

offdb [ style="filled", label="offline_db"] ;
tmpdb  [ label="tmp_offline_db" ];
csva [ label="table1.csv" , shape="note" ];
csvb [ label="table2.csv" , shape="note" ];

csva -> tmpdb [ label="DybDbi.CSV" ] ;
csvb -> tmpdb [ label="DBWriter" ] ;

{ rank=same; offdb; tmpdb }
}

Configure writing scripts with:

import os
os.environ['DBCONF'] = 'tmp_offline_db'

Or invoke services:

DBCONF=tmp_offline_db nuwa.py ...

Early Validations

Warning

dbupdatecheck currently only contains dbivalidate, other packages with tests to be run before and after updates need to be added

The standard set of validation tests can be run by Table managers prior to checkin to SVN with:

DBCONF=tmp_offline_db:offline_db ./dybinst trunk tests dbupdatecheck

dbupdatecheck is a alias for a list of packages defined in installation:dybinst/scripts/dybinst-common.sh

After table managers commit the candidate update to dybaux anyone (with permissions in an available DB) can validate, using:

cd ; svn co http://dayabay.ihep.ac.cn/svn/dybaux/catalog/tmp_offline_db
svn up ~/tmp_offline_db
DBCONF=tmp_offline_db_ascii:offline_db ./dybinst trunk tests dbupdatecheck

Configuration details in Configuring access to ascii catalog

This allows any os.environ.setdefault nosetest to be run against the candidate update.

DB Validation includes ideas on update targeted tests.

Communicate updates via SVN repository

Using an SVN repository to funnel updates has advantages:

  1. familiar system for storing the history of updates
  2. Easy communication of changes
  3. Trac web interface with timeline, presenting the history

digraph share {
pad=0.2 ;

offdb [ style="filled", label="offline_db"] ;
tmpdb [ label="tmp_offline_db" ];
auxrepo [ label="dybaux SVN", shape="box3d" ];
auxwc [ label="DBI catalog" , shape="folder" ];

tmpdb -> auxwc [ label="rdumpcat" ];
auxrepo -> auxwc [ label="checkout" ];
auxwc -> auxrepo [ label="checkin" ] ;

{ rank=same; offdb; tmpdb } ;
{ rank=source; auxrepo };
{ rank=max; auxwc };
}

Checkout the tmp_offline_db DBI ascii catalog from SVN repository:

mkdir -p ~/dybaux/catalog ; cd ~/dybaux/catalog
svn co http://dayabay.ihep.ac.cn/svn/dybaux/catalog/tmp_offline_db

Use rdumpcat to export updated database as DBI catalog on top of the SVN checkout, allowing the nature of the update to be checked with svn diff etc..:

db.py tmp_offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db
svn status ~/dybaux/catalog/tmp_offline_db
svn diff ~/dybaux/catalog/tmp_offline_db

Try to commit the update to SVN:

svn ci ~/dybaux/catalog/tmp_offline_db \
 -m "New tables for CableSvc see dybsvn:source:dybgaudi/trunk/DybSvc/DybMetaDataSvc/src/DybCableSvc.txt@12352 "

For the rationale behind the link see Annotating Updates, note that:

  1. the link path must start dybsvn:source:dybgaudi/trunk
  2. when multiple links are included only the first is checked
  3. use the Trac search box to check links, without the dybsvn: when using the dybsvn instance or with it when using dybaux

Decoupled Updating Workflow

Features:

  1. db.py adopts the -d/--decoupled option as default from dybsvn:r14289
  2. tmp_offline_db contains only the tables being updated + a partial LOCALSEQNO metadata table.
  3. partial LOCALSEQNO table is merged with the shared real one when doing rdumpcat into the dybaux catalog
  4. (in principal) removes updating bottleneck by allowing parallel updating assuming no cross table dependencies

Tables are selected on the initial dump and subsequent load and rdumpcat operate on those selected tables:

db.py -t CableMap,HardwareID offline_db     dump     ~/offline_db.sql
db.py                        tmp_offline_db load     ~/offline_db.sql                      ## clobbers any pre-existing content
db.py                        tmp_offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db

Note that the rdumpcat must be into into a pre-existing catalog such as ~/dybaux/catalog/tmp_offline_db as decoupled tables are not viable on their own. Features of decoupled rdumpcat:

  1. LOCALSEQNO entries are merged into the pre-existing LOCALSEQNO.csv
  2. payload and validity table .csv changes must add to existing ones in the catalog
  3. no catalog .cat file is written, permissable updates cannot change the catalog file

Before doing the rdumpcat it is good practice to check LOCALSEQNO in tmp_offline_db and in the catalog, and be aware of the changes:

cat ~/dybaux/catalog/tmp_offline_db/LOCALSEQNO/LOCALSEQNO.csv
TABLENAME char(64),LASTUSEDSEQNO int(11),PRIMARY KEY (TABLENAME)
"*",0
"CableMap",474
"CalibFeeSpec",113
"CalibPmtHighGain",6
"CalibPmtPedBias",1
"CalibPmtSpec",96
"CoordinateAd",1
"CoordinateReactor",1
"FeeCableMap",3
"HardwareID",386
"Reactor",372

Serialized Updating

Note

you need to login to dybaux with your dybsvn identitity in order to see commits

In order to coordinate the serialized updating check the dybaux timeline http://dayabay.ihep.ac.cn/tracs/dybaux/timeline before making commits there. If you see a recent catalog commit that is not followed up a fastforward OVERRIDE commit by one of the DB managers then an update is queued up ahead of you in the final validation stage.

That means:

  1. you will need to refresh(dump+load) your tmp_offline_db and rerun your updater script after the update ahead is completed (you will see the fastforward OVERRIDE commit on the timeline)
  2. hold off making your dybaux catalog commit until the 1st step is done
  3. continue testing in your tmp_offline_db, such that once you have a valid starting point your update is able to proceed smoothly and quickly and does not cause delays in the final validations stage

How the SVN ascii catalog is primed

Direct approach, using rdumpcat from offline_db into ascii catalog:

svn co http://dayabay.ihep.ac.cn/svn/dybaux/catalog       ## just empty tmp_offline_db created by zhanl
db.py offline_db rdumpcat ~/catalog/tmp_offline_db        ## dump non-scraped default subset of tables

Check machinery and transfers (and prepare a local DB to work with as side effect) by going via local DB tmp_offline_db:

db.py offline_db dump ~/tmp_offline_db.sql
db.py tmp_offline_db load ~/tmp_offline_db.sql
db.py tmp_offline_db rdumpcat ~/tmp_offline_db_via_local

Compare the direct and via_local catalogs:

diff -r --brief ~/catalog/tmp_offline_db ~/tmp_offline_db_via_local | grep -v .svn
Only in /home/blyth/catalog/tmp_offline_db: tmp_offline_db.cat
Only in /home/blyth/tmp_offline_db_via_local: tmp_offline_db_via_local.cat

Add to repository, and commit with override:

svn add tmp_offline_db/*
svn status
svn ci -m "initial commit of ascii catalog prepared with {{{db.py offline_db rdumpcat ~/catalog/tmp_offline_db}}} see oum:sop/dbops OVERRIDE "

Annotating Updates

When making updates it is required that brief documentation is provided in a text file housed in dybsvn . Appropriate locations for the documentation are:

Expected features for the annotation of an update:

  1. brief summary of nature/motivation, a few lines only (refer to more detailed descriptions)
  2. include date of update
  3. refer to related docdb documents
  4. refer to related database tables
  5. refer to dybsvn packages updated, name revision numbers where appropriate
  6. refer to related tickets

In order to associate the annotation with the dybaux commit of the candidate DB update, it is required that the commit message provides a revisioned Trac Link that points at the updated document containing the annotation.

In the above example, the revisioned Trac link points to a real example of an annotation document.

Pre-commit enforced validation : DBI Gatekeeper

The dybaux repository is configured to perform validations prior to allowing the commit to proceed. When commits are denied the validation error is returned. Validations are implemented in python module DybPython.dbsvn, currently:

  1. Only expected tables are touched (LOCALSEQNO + DBI pair)
  2. Only row additions are allowed, no deletions or alterations
  3. Commit message includes valid revisioned dybsvn Trac Link, precisely identifying code/documentation for the update

Intended additions:

  1. verify use of versiondate=TimeStamp(0,0) signaling overlay dates

Pre-commit validations must be quick and self-contained as cannot run tests on SVN server.

Test Locally

Validations can be run locally using DybPython.dbsvn script.

Demonstrate tests and Request Propagation

Send email to the Database Managers and the offline mailing list mailto:theta13-offline@lists.lbl.gov requesting that your dybaux revision is propagated. The email needs to contain:

  1. dybaux revision to be propagated
  2. proof of testing in the form of nosetest output from running tests against your tmp_ DB and context information

Proof and context can conveniently be provided by copy and pasting the output from:

pwd ; date ; svnversion . ; nosetests -v

Detailed guidelines on update testing techniques and responsibilities are in DB Testing

Database Managers Propagate updates from dybaux into offline_db

After re-running validations as described in Early Validations are found to be successful DB managers can perform updates first on their tmp_offline_db and then on the central offline_db.

Prepare a fresh tmp_offline_db:

db.py offline_db dump ~/offline_db.sql
db.py tmp_offline_db load ~/offline_db.sql

Get uptodate with dybaux:

mkdir ~/dybaux ;  cd ~/dybaux ; svn co http://http://dayabay.ihep.ac.cn/svn/dybaux/catalog
svn up ~/dybaux/catalog

Use rcmpcat to see changed tables and added SEQNO in the dybaux ascii catalog relative to the DB tmp_offline_db:

db.py tmp_offline_db rcmpcat ~/dybaux/catalog/tmp_offline_db

Proceed to rloadcat into tmp_offline_db:

db.py tmp_offline_db rloadcat  ~/dybaux/catalog/tmp_offline_db
db.py tmp_offline_db rcmpcat ~/dybaux/catalog/tmp_offline_db     ## should report no updates

Repeating rloadcat should detect no updates and do nothing. Note that the catalog working copy is changed by the rloadcat due to INSERTDATE fastforwarding (dybsvn:ticket:844), check with:

svn status ~/dybaux/catalog/tmp_offline_db    ## will show Vld table changes

Following the definitive rloadcat into offline_db the changed *Vld.csv should be committed into dybaux with a commit message including OVERRIDE (only admin users configured in the pre-commit hook can do this).

Propagation of multiple commits with dbaux.py

When multiple commits need to be propagated the script dbaux.py should be used, it takes as arguments commit number ranges and internally invokes the db.py script described above.

Usage examples:

dbaux.py --help        ## for details on all options
dbaux.py -c -r --dbconf tmp_offline_db rloadcat 5036:5037 --logpath dbaux-rloadcat-5036-5037.log

In order to make complete fastforward commits after using the dbaux.py -r/--reset it is necessary to do an rdumpcat to get all the fastforward changes first, for example with:

db.py offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db
svn diff ~/dybaux/catalog/tmp_offline_db      ## INSERTDATE changes for all SEQNO added should be observed
svn ci -m "fastforward updates following offline_db rloadcat of r5036:r5037 OVERRIDE  " ~/dybaux/catalog/tmp_offline_db

Handling non-propagated dybaux commits

Commits to the dybaux catalog are sometimes not propagated to offline_db, eg due to finding a problem with validity ranges. In this case it is necessary to bring the dybaux catalog back into correspondence with the offline_db via returning to the state before the bad commit with an OVERRIDE commit backing out of the change. As an OVERRIDE is needed this must be done by a DB manager. In simple cases where the bad commit is the last one made:

cd ~/dybaux/catalog/tmp_offline_db
svn status
svn up -r <goodrev>
svnversion .
svn status    # check are at the intended clean revision
svn ci -m "return to r<goodrev> removing r<badrev> and r<otherbadrev> OVERRIDE"
svn up
svn status

In more involved cases a piecewise approach to returning to the desired state can be used, by doing updates restricted to particular tables.

Note that it is also possible to re-prime dybaux from offline_db by doing an rdumpcat into the working copy and committing the changes. Indeed this technique is used as part of the Post-propagation cross check where normally no changes are expected.

Post-propagation cross check

Get uptodate with dybaux and rdumpcat from offline_db ontop of it:

svn up ~/dybaux/catalog/tmp_offline_db
db.py offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db
svn status  ~/dybaux/catalog/tmp_offline_db

The svn status is expected to return no output, indicating no differences. If differences are observed only in *Vld.csv tables INSERTDATE then Database managers omitted to commit the fastforwarded catalog.

Quick Validations

After getting into environment and directory of dybgaudi:Database/DbiValidate, run a collection of tests that traverse over all DBI tables, performing many queries:

DBCONF=offline_db nosetests -v

[blyth@cms01 DbiValidate]$ DBCONF=offline_db nosetests -v
test_dbi_tables.test_counts ... ok
test_dbi_tables.test_vld_description('CableMapVld', 'assert_fields') ... ok
...
DbiTimer:CableMap: Query done. 2592rows,   62.2Kb Cpu   0.1 , elapse   1.7
Caching new results: ResultKey: Table:CableMap row:GCableMap.  1 vrec (seqno;versiondate): 213;2011-05-23 08:23:19
DbiTimer:CableMap: Query done. 1728rows,   41.5Kb Cpu   0.1 , elapse   1.7
ok
----------------------------------------------------------------------
Ran 159 tests in 685.069s        ## MUCH FASTER WHEN LOCAL TO DB
OK

Exceptional Operating Procedures for Major Changes

Major changes need to be discussed with Database Managers. As such changes will not pass the SOP validations, a modified Exceptional operating procedure is used:

  1. Table experts develop zero argument scripts (which can internally invoke more flexible scripts and capture arguments used) using their tmp_offline_db
  2. Table experts communicate update to be made via dybsvn revision and path of their scripts
  3. DB experts use the script to create tables in their tmp_offline_db and perform override commit of new tables into dybaux
  4. table experts check that the tables in dybaux match those from their tmp_offline_db (eg via rdumpcat onto the working copy)
  5. DB experts proceed to load into offline_db once table experts have confirmed the change

The steps are mostly the same, but who does what is modified.

Hands On Exercise 3 : Copy Offline DB

Warning

This exercise requires write permissions into a tmp_username_offline_db database

DIY steps:

  1. Configure a tmp_offline_db section of your config file
  2. Use db.py to dump and load into your tmp_username_offline_db database : which corresponds to section tmp_offline_db
  3. Use techniques from exercises 1 and 2 to compare row counts in offline_db and tmp_offline_db

Nosetests of workflow steps

Note

these are tests of the workflow machinery, for other generic tests and more table specific validations see DB Validation

Nosetests covering most of the steps of the workflow are available in dybgaudi:DybPython/tests in particular:

To run these tests, get into the directory and environment of dybgaudi:DybPython then:

  1. Examine what the tests are going to do
  2. Review the configuration section names used in the tests (typically tmp_offline_db and offline_db ). Find these by looking for any DBCONF=sectname and first arguments to the db.py script
  3. Review the corresponding sections of your configuration ~/.my.cnf ensuring that you are talking to the intended DB with identities holding appropriate permissions
  4. You may need to add/rename some sections of your configuration file if they are not present
  5. Invoke the tests from the package directory, not the tests directory with the below commands
nosetests -v -s tests/test_dbops.py
nosetests -v -s tests/test_dbsvn.py
nosetests --help    ## for explanations of the options

Development History of Workflow

The general approach was first expounded in doc:5646, but has subsequently been improved substantially following feedback from Jiajie, Craig and Brett. The changes avoid some painful aspects of the initial suggestion.

  1. Remove local restriction on the mysql server, enabling your NuWa installation and mysqld server to be on separate machines
  2. Eliminate need for DB Managers to keep dybaux DBI catalog uptodate, as Table managers now start by copying the actual offline_db