Other api/db

Table Of Contents

Previous topic

NuWa Python API

Next topic

DBAUX

This Page

Daya Bay Links

Content Skeleton

DB

DybPython.db

$Id: db.py 22557 2014-02-20 07:08:30Z blyth $

DB operations performed via MySQLdb:

./db.py [options] <dbconf> <cmd>  

Each invokation of this script talks to a single database only. A successful connection to “sectname” requires the config file (default ~/.my.cnf) named section to provide the below keys, eg:

[offline_db]
host = dybdb1.ihep.ac.cn
user = dayabay
password = youknowit
database = offline_db

[tmp_username_offline_db]
...

For a wider view of how db.py is used see DB Table Updating Workflow

TODO

  1. dry run option to report commands that would have been used without doing them
  2. better logging and streamlined output

Required Arguments

dbconf
the name of the section in ~/.my.cnf that specifies the host/database/user/password to use in making connection to the mysql server
cmd
perform command on the database specified in the prior argument. NB some commands can only be performed locally, that is on the same node that the MySQL server is running on.

command summary

Command Action Note
dump performs mysqldump, works remotely special LOCALSEQNO handling
load loads mysqldump, works remotely very slow when done remotely, insert statement for every row
rdumpcat dumps ascii catalog, works remotely duplicates dumpcat output using low level _mysql uses LOCALSEQNO merging
rloadcat loads ascii catalog, works remotely mysqlimport implementation,
rcmpcat compare ascii catalog with DB readonly command
ls lists tables in various sets  
cli emit mysql client connection cmdline Does not actually connect

former commands

Command Action Note
dumpcat dumps ascii catalog, LOCAL ONLY SELECT ... INTO OUTFILE
loadcat loads ascii catalog, LOCAL ONLY LOAD DATA LOCAL INFILE ... INTO TABLE

Former loadcat and dumpcat can be mimicked with --local option of rdumpcat and rloadcat. These are for expert usage only into self administered database servers.

using db.py in standalone manner (ie without NuWa)

This script is usuable with any recent python which has the mysql-python (1.2.2 or 1.2.3) package installed.

Check your python and mysql-python with:

which python
python -V
python -c "import MySQLdb as _ ; print _.__version__ "

Checkout DybPython/python/DybPython in order to access db.py, dbcmd.py and dbconf.py, for example with

cd  
svn co http://dayabay.ihep.ac.cn/svn/dybsvn/dybgaudi/trunk/DybPython/python/DybPython
chmod u+x DybPython/db.py

Use as normal:

~/DybPython/db.py --help
~/DybPython/db.py offline_db count 

checkout offline_db catalog from dybaux

Example, checkout OR update the catalog:

mkdir ~/dybaux 
cd ~/dybaux 
svn co http://dayabay.ihep.ac.cn/svn/dybaux/catalog    

OR

cd ~/dybaux/catalog
svn up 

rdumpcat tmp_offline_db into dybaux working copy:

db.py tmp_offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db

Test usage of serialized ascii DB

Get into environment and directory of pkg dybgaudi:Database/DybDbi Modify the config to use ascii DB, for an example see dybgaudi:Database/DybDbi/tests/test_calibpmtspec.py

rloadcat testing, DB time machine

Warning

forced_rloadcat is for testing only, it skips checks and ploughs ahead with the load, also --DROP option drops and recreates tables

Fabricate a former state of the DB using forced_rloadcat and an earlier revision from dybaux, with:

## get to a clean revision of catalog (blowing away prior avoids conflicts when doing that)
rm -rf ~/dybaux/catalog/tmp_offline_db  ; svn up -r 4963 ~/dybaux/catalog/tmp_offline_db

## forcefully propagate that state into the tmp_offline_db 
./db.py tmp_offline_db forced_rloadcat ~/dybaux/catalog/tmp_offline_db --DROP

## compare DB and catalog .. no updates should be found
./db.py tmp_offline_db rcmpcat ~/dybaux/catalog/tmp_offline_db 

## wind up the revision 
rm -rf ~/dybaux/catalog/tmp_offline_db  ; svn up -r 4964 ~/dybaux/catalog/tmp_offline_db

## compare DB and catalog again  ... updates expected, check timeline diffs
./db.py tmp_offline_db rcmpcat ~/dybaux/catalog/tmp_offline_db 

## test rloadcat operation and check diffs afterwards
./db.py tmp_offline_db rloadcat ~/dybaux/catalog/tmp_offline_db 
./db.py tmp_offline_db rcmpcat ~/dybaux/catalog/tmp_offline_db 

DybPython.db.DB

class DybPython.db.DB(sect=None, opts={}, **kwa)[source]

Bases: object

Initialize config dict corresponding to section of config file

Parameters:sect – section in config file
allseqno

Provides a table name keyed dict containing lists of all SEQNO in each Vld table The tables included correspond to the read DBI tables (namely those in LOCALSEQNO)

check_(*args, **kwa)[source]

check connection to DB by issuing a SELECT of info functions such as DATABASE() and CURRENT_USER() command

check_allseqno()[source]
check_seqno()[source]

Compares the LASTUSEDSEQNO entries read into self._seqno with the max(SEQNO) results of selects on the DB payload and validity tables.

cli_(*args, **kwa)[source]

Emit to stdout the shell commandline for connecting to a mysql DB via the client, without actually doing so. The section names depends on content of ~/.my.cnf

Usage:

eval $(db.py tmp_offline_db cli)

Bash function examples to define in ~/.bash_profile using this command:

    idb(){ local cnf=$1 ; shift ; eval $(db.py $cnf cli) $* ; }   
    offline_db(){             idb $FUNCNAME $* ; }        
    tmp_offline_db(){         idb $FUNCNAME $* ; }
    tmp_etw_offline_db(){     idb $FUNCNAME $* ; }
    tmp_jpochoa_offline_db(){ idb $FUNCNAME $* ; }
    ihep_dcs(){               idb $FUNCNAME $* ; }

Invoke the shortcut with fast start extra argument for the client::

    ihep_dcs -A   

Note a lower level almost equivalent command to this sub-command for standalone usage without db.py is provided by my.py which can probably run with the older system python alone. Install into your PATH with:

svn export http://dayabay.ihep.ac.cn/svn/dybsvn/dybgaudi/trunk/DybPython/scripts/my.py
count_(*args, **kwa)[source]

List table counts of all tables in database, usage example:

db.py offline_db count

offline_db is ~/.my.cnf section name specifying host/database/user/password

desc(tab)[source]

Header line with table definition in .csv files shift the pk definition to the end

describe(tab)[source]
classmethod docs()[source]

collect the docstrings on command methods identified by naming convention of ending with _ (and not starting with _)

dump_(*args, **kwa)[source]

Dumps tables from any accessible database into a mysqldump file. Usage:

db.py                        offline_db dump /tmp/offline_db.sql  ## without -t a default list of tables is dumped
db.py -t CableMap,HardwareID offline_db dump /tmp/offline_db.sql
tail -25  /tmp/offline_db.sql                                     ## checking tail, look for the LASTUSEDSEQNO entries

Use the -t/--tselect option with a comma delimited list of to select payload tables. Corresponding validity tables and the LOCALSEQNO table are included automatically.

The now default -d/--decoupled option means that the LOCALSEQNO table is dumped separately and only contains entries corresponding to the selected tables. The decoupled dump can be loaded into tmp_offline_db without any special options, as the table selection is reflected within the dump:

db.py tmp_offline_db load  /tmp/offline_db.sql

Partial dumping is implemented using:

mysqldump ... --where="TABLENAME IN ('*','CableMap','HardwareID')" LOCALSEQNO 
fabseqno

Summarizes db.allseqno, by fabricating a dict keyed by table name contaoning the number of Vld SEQNO (from length of values in db.allseqno)

This dict can be compared with db.seqno, which is obtained from the LASTUSEDSEQNO entries in the LOCALSEQNO table:: Assuming kosher DBI handling of tables this fabricated dict db.fabseqno should match db.seqno, meaning that SEQNO start from 1 and have no gaps.

In [1]: from DybPython import DB

In [2]: db = DB("tmp_fake_offline_db")

In [3]: db.seqno   ## queries the LOCALSEQNO table in DB
Out[3]: 
{'CableMap': 213,
 'CalibFeeSpec': 113,
 'CalibPmtSpec': 29,
 'FeeCableMap': 3,
 'HardwareID': 172}

In [4]: db.fabseqno    ## a summarization of db.allseqno
Out[4]: 
{'CableMap': 213,
 'CalibFeeSpec': 111,
 'CalibPmtSpec': 8,
 'FeeCableMap': 3,
 'HardwareID': 172}

In [5]: db.miscreants   ## assertions avoided by miscreant status
Out[5]: ('CalibPmtSpec', 'CalibFeeSpec')
forced_rloadcat_(*args, **kwa)[source]

Forcible loading of a catalog ... FOR TESTING ONLY

get_allseqno()[source]

Provides a table name keyed dict containing lists of all SEQNO in each Vld table The tables included correspond to the read DBI tables (namely those in LOCALSEQNO)

get_fabseqno()[source]

Summarizes db.allseqno, by fabricating a dict keyed by table name contaoning the number of Vld SEQNO (from length of values in db.allseqno)

This dict can be compared with db.seqno, which is obtained from the LASTUSEDSEQNO entries in the LOCALSEQNO table:: Assuming kosher DBI handling of tables this fabricated dict db.fabseqno should match db.seqno, meaning that SEQNO start from 1 and have no gaps.

In [1]: from DybPython import DB

In [2]: db = DB("tmp_fake_offline_db")

In [3]: db.seqno   ## queries the LOCALSEQNO table in DB
Out[3]: 
{'CableMap': 213,
 'CalibFeeSpec': 113,
 'CalibPmtSpec': 29,
 'FeeCableMap': 3,
 'HardwareID': 172}

In [4]: db.fabseqno    ## a summarization of db.allseqno
Out[4]: 
{'CableMap': 213,
 'CalibFeeSpec': 111,
 'CalibPmtSpec': 8,
 'FeeCableMap': 3,
 'HardwareID': 172}

In [5]: db.miscreants   ## assertions avoided by miscreant status
Out[5]: ('CalibPmtSpec', 'CalibFeeSpec')
get_seqno()[source]

SEQNO accessor, reading and checking is done on first access to self.seqno with

db = DB()
print db.seqno   ## checks DB 
print db.seqno   ## uses cached
del db._seqno    ## force a re-read and check
print db.seqno 
has_table(tn)[source]
Parameters:tn – table name
Return exists:if table exists in the DB
load_(*args, **kwa)[source]

Loads tables from a mysqldump file into a target db, the target db is configured by the parameters in the for example tmp_offline_db section of the config file. For safety the name of the configured target database must begin with tmp_

Note

CAUTION IF THE TARGET DATABASE EXISTS ALREADY IT WILL BE DROPPED AND RECREATED BY THIS COMMAND

Usage example:

db.py tmp_offline_db load /tmp/offline_db.sql
loadcsv(cat, tn)[source]
Parameters:
  • cat – AsciiCat instance
  • tn – string payload table name or LOCALSEQNO
ls_(*args, **kwa)[source]

Usage:

./db.py tmp_offline_db ls

Annotation ‘-‘ indicates tables not in the table selection, typically only the below types of tables should appear with ‘-‘ annotation.

  1. non-SOP tables such as scraped tables
  2. temporary working tables not intended for offline_db

If a table appears with annotation ‘-‘ that is not one of the above cases then either db.py tselect needs to be updated to accomodate a new table (ask Liang to do this) OR you need to update your version of db.py. The first few lines of db.py --help lists the revision in use.

See dybsvn:ticket:1269 for issue with adding new table McsPos that this command would have helped to diagnose rapidly.

mysql(*args, **kwa)[source]
noop_(*args, **kwa)[source]

Do nothing command, allowing to just instanciate the DB object and provide it for interactive prodding, eg:

~/v/db/bin/ipython -- ~/DybPython/db.py tmp_offline_db noop   

In [1]: db("show tables")     ## high level 

In [2]: db.llconn.query("select * from CalibPmtSpecVld")    ## lowlevel _mysql    
In [3]: r = db.conn.store_result()

This also demonstrates standalone db.py usage, assuming svn checkout:

svn co http://dayabay.ihep.ac.cn/svn/dybsvn/dybgaudi/trunk/DybPython/python/DybPython
optables

List of tables that commands such as rdumpcat perform operations on, outcome depends on:

  1. table selection from the -t/–tselect option
  2. decoupled option setting
  3. DBCONF section name, where name offline_db is regarded as special

The default value of the table selection option constitutes the current standard set of DBI tables that should be reflected in the dybaux catalog.

When following the SOP in the now default “decoupled” mode the offline_db rdumpcat needs to abide by the table selection in force, whereas when dumping from tmp_offline_db onto a dybaux checkout need to dump all of the subset. Rather than the default table selection.

This special casing avoids the need for the -t selection when rdumpcating tmp_offline_db

outfile(tab)[source]

Path of raw outfile as dumped by SELECT ... INTO OUTFILE

paytables

list of selected DBI payload tables

predump()[source]

Checks performed before : dump, dumpcat, rdumpcat

rcmpcat_(*args, **kwa)[source]

Just dumps a comparison between target DB and ascii catalog, allowing the actions an rloadcat will do to be previewed.

Compares DBI vitals such as LASTUSEDSEQNO between a DBI database and a DBI ascii catalog, usage:

./db.py tmp_offline_db rcmpcat ~/dybaux/catalog/tmp_offline_db
rdumpcat_(*args, **kwa)[source]

Dumps DBI tables and merges LOCALSEQNO from tmp_offline_db into a pre-existing ascii catalog. Usage:

db.py -d   tmp_offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db       ## -d/--decoupled is now the default
db.py      tmp_offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db      

svn status ~/dybaux/catalog/tmp_offline_db                               ## see whats changed 

Features of the default -d/--decoupled option:

  1. requires dumping into a pre-existing catalog
  2. subset of tables present in the DB are dumped
  3. partial LOCALSEQNO.csv is merged into the pre-existing catalog LOCALSEQNO.csv
  4. performs safe writes, if the merge fails detritus files with names ending .csv._safe and .csv._merged will be left in the working copy

With alternate -D/--nodecoupled option must ensure that the table selection is appropriate to the content of the DB:

db.py -D -t CableMap,HardwareID   offline_db rdumpcat ~/offline_db

To obtain the dybaux SVN catalog:

mkdir ~/dybaux 
cd ~/dybaux ; 
svn co http://dayabay.ihep.ac.cn/svn/dybaux/catalog

The ascii catalog is structured

~/dybaux/catalog/tmp_offline_db
           tmp_offline_db.cat
           CalibFeeSpec/
               CalibFeeSpec.csv
               CalibFeeSpecVld.csv
           CalibPmtSpec/
               CalibPmtSpec.csv
               CalibPmtSpecVld.csv
           ...
           LOCALSEQNO/
               LOCALSEQNO.csv

The .csv files comprise a single header line with the table definition and remainder containing the row data.

ADVANCED USAGE OF ASCII CATALOGS IN CASCADES

The resulting catalog can be used in a DBI cascade by setting DBCONF to:

tmp_offline_db_ascii:offline_db

Assuming a section:

[tmp_offline_db_ascii]
host = localhost
user = whatever
password = whatever
db = tmp_offline_db#/path/to/catname/catname.cat

NB from dybsvn:r9869 /path/to/catname/catname.cat can also be a remote URL such as

http://dayabay:youknowit\@dayabay.ihep.ac.cn/svn/dybaux/trunk/db/cat/zhe/trial/trial.cat
http://dayabay:youknowit\@dayabay.ihep.ac.cn/svn/dybaux/!svn/bc/8000/trunk/db/cat/zhe/trial/trial.cat

When stuffing basic authentication credentials into the URL it is necessary to backslash escape the “@” to avoid confusing DBI(TUrl) Note the use of ”!svn/bc/NNNN” that requests apache mod_dav_svn to provide a specific revision of the catalog. rather than the default latest.

ADVANTAGES OF CATALOG FORMAT OVER MYSQLDUMP SERIALIZATIONS

  • effectively native DBI format that can be used in ascii cascades allowing previewing of future database after updates are made
  • very simple/easily parsable .csv that can be read by multiple tools
  • very simple diffs (DBI updates should be contiguous additional lines), unlike mysqldump, this means efficient storage in SVN
  • no-variants/options that change the format (unlike mysqldump)
  • no changes between versions of mysql
  • much faster to load than mysqldumps

IMPLEMENTATION NOTES

  1. mysql does not support remote SELECT ... INTO OUTFILE even with OUTFILE=/dev/stdout
  2. mysqldump -Tpath/to/dumpdir has the same limitation

To workaround these limitations a csvdirect approach is taken where low level mysql-python is used to perform a select * on selected tables and the strings obtained are written directly to the csv files of the catalog. Low-level mysql-python is used to avoid pointless conversion of strings from the underlying mysql C-api into python types and then back into strings.

read_desc(tabfile)[source]

Read first line of csv file containing the description

read_seqno(tab='LOCALSEQNO')[source]

Read LASTUSEDSEQNO entries from table LOCALSEQNO

rloadcat_(*args, **kwa)[source]

Loads an ascii catalog into a possibly remote database. This is used by DB managers in the final step of the update SOP to propagate dybaux updates into offline_db.

Usage:

./db.py tmp_offline_db rloadcat ~/dybaux/catalog/tmp_offline_db

Steps taken by rloadcat:

  1. compares tables and SEQNO present in the ascii catalog with those in the DB and reports diffences found. The comparison looks both at the LOCALSEQNO tables that DBI uses to hold the LASTUSEDSEQNO for each table and also by looking directly at all SEQNO present in the validity tables. The rcmpcat command does only these comparisons.
  2. if updates are found the user is asked for consent to continue with updating
  3. for the rows (SEQNO) that are added by the update the catalog validity tables INSERTDATE timestamps are fastforwarded inplace to the current UTC time
  4. catalog tables are imported into the DB with the mysqlimport tool. For payload and validity tables the mysqlimport option --ignore is used meaning that only new rows (as determined by their primary keys) are imported, other rows are ignored. For the LOCALSEQNO table the option --replace is used in order to replace the (TABLENAME,LASTUSEDSEQNO) entry.
Returns:dictionary keyed by payload table names with values containing lists of SEQNO values
Return type:dict

You might be tempted to use rloadcat as a faster alternative to load however this is not advised due to the extra things that rloadcat does such as update comparisons and fastforwarding and potentially merging in (when the decouped option is used).

In comparison the load command blasts what comes before it, this can be done using forced_rloadcat with the --DROP option:

./db.py --DROP tmp_offline_db forced_rloadcat ~/dybaux/catalog/tmp_offline_db    

After which you can check operation via an rdumpcat back onto the working copy, before doing any updates:

./db.py tmp_offline_db rdumpcat ~/dybaux/catalog/tmp_offline_db 
svn st ~/dybaux/catalog/tmp_offline_db    ## should show no changes 

Reading full catalog into memory is expensive.

  1. can I omit the payload tables from the read ?
seqno

SEQNO accessor, reading and checking is done on first access to self.seqno with

db = DB()
print db.seqno   ## checks DB 
print db.seqno   ## uses cached
del db._seqno    ## force a re-read and check
print db.seqno 
showpaytables

list names of all DBI payload tables in DB as reported by SHOW TABLES LIKE ‘%Vld’ with the ‘Vld’ chopped off

NB the result is cached so will become stale after deletions or creations unless nocache=True option is used

showtables

list names of all tables in DB as reported by SHOW TABLES, NB the result is cached so will become stale after deletions or creations unless nocache=True option is used

tab(name)[source]
Parameters:name – DBI payload table name
tabfile(tab, catfold)[source]

path of table obtained from

tables

list of selected table names to operate on plus the mandatory LOCALSEQNO Poorly named should be table_selection

tmpdir

Create new temporary directory for each instance, writable by ugo

tmpfold

Path to temporary folder, named after the DBCONF section. The base directory can be controlled by tmpbase (-b) option

vdupe(tab)[source]

Currently is overreporting as needs to be balkanized by context

vdupe_(*args, **kwa)[source]

Report the first Vlds which feature duplicated VERSIONDATEs:

mysql> SELECT SEQNO,VERSIONDATE,COUNT(VERSIONDATE) AS dupe  FROM DemoVld GROUP BY VERSIONDATE HAVING ( COUNT(VERSIONDATE) > 1 ) ;
+-------+---------------------+------+
| SEQNO | VERSIONDATE         | dupe |
+-------+---------------------+------+
|    71 | 2011-08-04 05:55:47 |    2 | 
|    72 | 2011-08-04 05:56:47 |    3 | 
+-------+---------------------+------+
2 rows in set (0.00 sec)

mysql> select * from DemoVld ;
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | SITEMASK | SIMMASK | SUBSITE | TASK | AGGREGATENO | VERSIONDATE         | INSERTDATE          |
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
|    70 | 2011-08-04 05:54:47 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:54:47 | 2011-08-11 10:12:32 | 
|    71 | 2011-08-04 06:15:46 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:55:47 | 2011-08-11 10:12:32 | 
|    72 | 2011-08-04 07:02:51 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:56:47 | 2011-08-11 10:12:32 | 
|    73 | 2011-08-04 05:54:47 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:55:47 | 2011-08-11 10:12:32 | 
|    74 | 2011-08-04 06:15:46 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:56:47 | 2011-08-11 10:12:32 | 
|    75 | 2011-08-04 05:54:47 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:56:47 | 2011-08-11 10:12:32 | 
|    76 | 2011-08-04 06:15:46 | 2038-01-19 03:14:07 |      127 |       1 |       0 |    0 |          -1 | 2011-08-04 05:57:47 | 2011-08-11 10:12:32 | 
+-------+---------------------+---------------------+----------+---------+---------+------+-------------+---------------------+---------------------+
7 rows in set (0.00 sec)
vsssta(tab)[source]

Look at VERSIONDATE/TIMESTART/... within SSSTA groups

wipe_cache()[source]

Wipe the cache forcing DB access to retrieve the info afresh This is needed when wish to check status after a DB load from the same process that performed the load.