Other database/database_interface

Table Of Contents

Previous topic

Reconstruction

Next topic

Database Maintanence

This Page

Daya Bay Links

Content Skeleton

Database

Database Interface

This chapter is organized into the following sections.

Concepts
is an introduction to the basic concepts behind the DatabaseInterface. You can skip this section if you are in a hurry, but reading it will help you understand the package.
Installing and Running
provides a few tips on building running programs that use the DatabaseInterface.
Accessing Existing Tables
tells you how you write code to retrieve data from existing tables.
Creating New Tables
describes how new tables are added to the database and the corresponding classes, that serve the data, are designed.
Filling Tables
explains how new data is added to existing tables in the database.
MySQL Crib
gives the bare minimum necessary to use MySQL to manage a database. The DatabaseInterface runs directly on top ROOT under which MySql and flat ASCII files are used to implement a hierarchical database.

Concepts

Types of Data

Besides the data from the detector itself, off-line software requires additional types of data. Some possible examples:

Detector Description
i.e. data that describes the construction of the detector and how it responds to the passage of particles through it. The geometry, the cabling map and calibration constants are all examples of this type of data.
Reactor Data
i.e. reactor power, fuel makeup, or extrapolated neutrino spectra
Physics Data
i.e. cross-section tables, optical constants, etc.

It is the purpose of the DatabaseInterface to provide simple and efficient access to such data and to provide a framework in which new types of data can be added with minimal effort.

Simple, Compound and Aggregated

Within the database, data is organised into tables. When the user requests data from a table, the DatabaseInterface collect rows of data from the appropriate table. From the perspective of the interface, there are 3 types of organisation:-

Simple
A single row is retrieved. Algorithm Configuration data is always simple; even if multiple configurations are possible, only one can be selected at a time. Detector Description, on the other hand, is almost never Simple.
Compound
Multiple rows are retrieved. Each row represents a single sub-system and the request retrieves data for a complete set of sub-systems. For example a request for PMT positions will produce a set of rows, one for each PMT.
Aggregated
A special form of Compound depending on the way new data is added to the database:-
  • If data for the entire detector is written as a single logical block, then it

    is Compound. A table that describes the way PMTs to electronics channels might be compound: a complete description is written as a single unit

  • If it is written in smaller chunks (called aggregates) then it is Aggregated.

    For example, it might be possible to calibrate individual electronics cards independently of the rest of the detectors at on sit. When calibrated, you will want to update only a subset of the calibrations in the database. One of the jobs of the interface is to reassemble these aggregates so that the user only ever sees a complete set.

    There are two types of aggregation:-

    Complete

    In this type the number of aggregates present at any time is constant, with the possible exception of detector construction periods during which the number increases with time. This is the normal form and is used to describe a set of sub-systems that are permanently present e.g. the set of steel planes.

    Sparse

    In this type the number of aggregates present at any time is variable, there could even be none. This form is used to describe abnormal conditions such as alarms.

Tables of Data

The DatabaseInterface provides a simple, uniform concept regardless of the data being accessed. Each request for data produces a pointer giving read access to a results table, which is effectively a slice of the underlying database table. Each row of the results table is an object, the type of which is table-specific. These table row objects give access to the data from one row but can hide the way the database table is organised. So changes to the physical layout of a database table should only effect its table row object, not the end users of the data. Note that a single request only ever accesses a single table; the interface does not support retrieval of data from multiple database tables simultaneously.

If the request for data fails for some reason, then the resulting table will be empty, otherwise it will have a single row for Simple organisation and more than one row for Compound and Aggregated. The user can ask how many rows the table has and can directly access any of them. The physical ordering of the rows in the table reflects the way the data was originally written, so for Aggregated data, the ordering is not optimised for retrieval. To deal with this, each table row object can declare a natural index, independent of its physical position, and this natural index can be used to retrieve data.

A Cascade of Databases

The DatabaseInterface can access data for more than one database. During initialisation it is given a list of database URLs. The list order reflects priority; the interface first looks for data in the first database in the list, but if that fails, tries the others in turn until all have been tried or data is found. This scheme allows a user to override parts of the official database by creating a mini-database with their own data and then placing it in the list ahead of the official database. The concept of a set of overlaying databases is called a cascade.

Context Sensitive

In principle, any of the data retrieved by the interface could depend on the the current event being processed. Clearly Detector Descriptions, such as calibration constants, will change with time and the interface has to retrieve the right ones for the current event. For this reason, all requests for data through the interface must supply information about the:-

  • The type of data: real or Monte Carlo.
  • The site of the detector: Daya Bay, Ling Ao, Mid, Far, or Aberdeen
  • The date and times of the event.

Collectively this information is called the Context and is represented by the Context class of the Context package. Note that in common with event data and times

Note

All Database date and times are in UTC.

In the database all data is tagged by a Context Range which identifies the types of data and detector and the ranges of date times for which it is valid. This is represented by the ContextRange class of the Context package. Some data is universal; the same database data can be used for any event. Others may be very specific to a single type of data and detector and a limited date time range.

Note that the Context Range of the data defines the context at for which the data will be accessed, NOT where data is generated. For example, reactor data will be associated with all detector sites, not assigned to a reactor site.

Physically, the way to associate the Context Range metadata with the actual data is to have a pair of tables:-

Context Range Table
This table consists of rows of ContextRange objects, each with a unique sequence number which is used as a key into the Main Data Table.
Main Data Table
Each row has a sequence number corresponding to an entry in the Context Range Table.

The interface first finds a match in the Context Range Table for the current context and then retrieves all rows in the Main Data Table that match its sequence number. The reasons for this two step approach are:-

  • To simplify the task of Context Management.
  • To avoid repeated data. For Compound and Aggregated data, many rows can share a single Context Range. So this range only appears once and only a simple sequence number has to be repeated in the main table.

Extended Context

The primary function of DatabaseInterface is to provide the best information for a specific context, but it can also retrieve information for much more general queries. The query is still broken into two parts: the “context” which is matched to the Context Range Table and then the data from the main table is taken for the selected sequence number(s). However the user can supply a context such as “All ranges that start between this time and that time” hence the term “Extended Context”. Further, during the retrieval of data from the main table addition restrictions can be imposed. The result of an Extended Context query is a collection of rows that will not normally represent the state of the detector at a single moment in time and it is up to the user to interpret the results meaningfully. However, it does allow the user the power of raw SQL queries.

SimFlag Association

As explained in the preceding section, the interface finds the database data that best matches the context of the data. There are occasions when this matching needs to be changed, for example there can be times when Monte Carlo data needs to be treated exactly as if it were event data and this includes the way it retrieves from the database. To support this the user can specify, for any type of data, an associated list of data types. If this is done then, instead of using the current type, each of the alternative types are tried until a match is found. This matching takes precedence over the cascade i.e. all associated types are tried on the first database in the cascade before moving on to the second and subsequent cascade members. This ensures that higher members, which might even refer back to the ORACLE database at FNAL, are only tried as a last resort.

Authorising Databases and Global Sequence Numbers

As explained in the previous section, sequence numbers in the Context Range Table are unique. However this can present a problem if the same type of data is being entered into several different databases. For example calibration constants will be created in the Near, Far and Calibration detectors. Eventually the tables will be merged but it is essential that there is no conflict in the sequence numbers. To solve this problem, certain databases are special: they are able to produce globally unique sequences numbers. They do this as each is allocated a unique block of 10,000,000 sequence numbers ( which is enough to allow a new entry to be made every minute for 20 years!). These blocks are recorded in a special table: GLOBALSEQNO that holds the last used sequence number for each table. The block 1..9,999,999 is used for local sequence numbers i.e. ones that are only guaranteed unique within the current database table.

By default permanent data written to an authorising database will be written with global sequence numbers. For temporary data, or if writing to a non- authorising database, local sequence numbers are used and in this case a LOCALSEQNO table is generated automatically if required.

Important:-

Note

Merging database tables that have local sequence numbers will require a special procedure to avoid conflicts.

Note

GLOBALSEQNO and LOCALSEQNO tables must never be propagated between databases.

Validity Management

For constants that change with time (if that is not a contradiction in terms!) it makes sense to have overlapping Context Ranges. For example, suppose we know that a certain sort of calibration constants drifts with time and that, once determined, is only satisfactory for the next week’s worth of data. A sensible procedure would be to limit its validity to a week when writing to the database but to determine new constants every few days to ensure that the constants are always “fresh” and that there is no danger that there will be a gap. However, this means that the interface has to perform two types of Validity Management:-

Ambiguity Resolution
When faced with two or more sets of data the interface has to pick the best. It does this simply by picking the one with the latest creation date time.
Context Range Trimming
Having found the best set, the interface wants to know how long it will remain the best. Any set whose creation date is later will be better according to the above rule and so the retrieved data has its range trimmed so as not to overlap it. This reduced Context Range is called the Effective Context Range. This only happens in memory; the database itself is not modified, but it does mean that the interface does not need to check the database again for this set of data until the Effective Context Ranges has expired. This trimming also applies between databases in a cascade, with sets in higher priority databases trimming those in lower ones.
Overlay Version Dates

As explained above, creation dates play a crucial role in resolving which set of data to use; later creation dates take priority over earlier ones. This scheme assumes that constants from earlier runs are created before constants from later runs, but this isn’t always true. When improving e.g. calibration constants, it’s quite normal to recalibrate recent runs before going back and fixing earlier ones and then, simply to use the date when the constants were created would mean that the constants from earlier runs would take priority over any later runs they overlapped. To allow constants to be created in any order the interface provides a system for deducing the best creation dates for any constants as follows:-

  • A query is made using as the context, the start of the validity for the new

    constants.

  • If the query finds no data, the creation date of the new constants is set to

    its validity start date.

  • If the query finds data, the creation date of the new data is set to be 1

    minute greater than the creation date of the found data i.e. just late enough to replace it.

    The scheme means that creation dates always follow that dates of the runs that they correspond to rather than the dates when their constants were created. When using the scheme its probably better to consider the “dates” to be version numbers.

Rollback

The database changes almost constantly to reflect the state of the detector, particularly with regard to the calibration constants. However this can mean that running the same job twice can produce different results if database updates that have occurred between the two runs. For certain tasks, e.g. validation, its necessary to decouple jobs from recent updates and this requires database rollback i.e. restoring the database to a previous state. Rollback works by exploiting the fact that data is not, in general, ever deleted from the database. Instead new data is added and, by the rules of Ambiguity Resolution (see the previous section) supersede the old data. All data is tagged by the date it was inserted into the local database, so rollback is implemented by imposing an upper limit on the insertion date, effectively masking out all updates made after this limit.

Lightweight Pointers to Heavyweight Data

One of the interface’s responsibilities is to minimise I/O. Some requests, particularly for Detector Configuration, can pull in large amounts of data but users must not load it once at the start of the job and then use it repeatedly; it may not be valid for all the data they process. Also multiple users may want access to the same data and it would be wasteful for each to have their own copy.

To deal with both of the above, the interface reuses the concept of a handle, or proxy, that appears in other packages such as Candidate. The system works as follows:-

  1. When the user wants to access a particular table they construct a table- specific pointer object. This object is very small and is suitable to be stack based and passed by value, thus reducing the risk of a memory leak.
  2. During construction of the pointer, a request for data is passed down through the interface and the results table, which could be large, is created on the heap. The interface places the table in its cache and the user’s pointer is attached to the table, but the table is owned by the interface, not the user.
  3. Each request for data is first sent to the cache and if already present then the table is reused.
  4. Each table knows how many user pointers are connected to it. As each pointer is discarded by its owner, it disconnects itself from the table it points to.
  5. Once a table has no pointers left it is a candidate for being dropped by its cache. However this is not done at once as, between events, there are likely to be no user pointers, so just because a table is not currently being pointed to, it doesn’t mean that it won’t be needed again.

Natural Table Index

For Detector Description data, tables can be large and the user will require direct access to every row. However, the way the table is arranged in memory reflects the way the data was originally written to the database. For Simple and Compound data the table designer can control this organisation as complete sets are written as a single unit. For Aggregated data, the layout reflects the way aggregates are written. This allows the interface to replace individual aggregates as their validity expires. However this means that the physical layout may not be convenient for access. To deal with this table row objects, which all inherit from DbiTableRow are obliged to return a Natural Table Index, if the physical ordering is not a natural one for access. Tables can then be accessed by this index.

Task

Task will provide a way to further select the type of data retrieved. For example:-

  • There might be nominal set of geometry offsets, or a jittered geometry to test for systematic effects.
  • Detector Configuration data could have two tasks, one for raw calibration and another for refined calibration.

The aim is that Task will allow a particular database table to be sub-divided according to the mode of use. Currently Task is a data type defined in Dbi i.e. Dbi::Task and is implemented as an integer. The default value is zero.

Sub-Site

Sub-Site can be used like the Task to disambiguate things at a single site. For example, this can be used to distinguish between antineutrino detector modules, between electronics crates, etc.

Currently SubSite is a data type defined in Dbi i.e. Dbi::SubSite and is implemented as an integer. The default value is zero.

Level 2 (disk) Cache

Loading a large table from the database is a lot of work:-

  1. The query has to be applied and the raw data loaded.
  2. The row objects have to be individually allocated on the heap.
  3. Each data word of each row object has to be individually converted through several layers of the support database software from the raw data.

Now as the detector configuration changes slowly with time identically the same process outlined above is repeated many times, in many jobs that process the data, so the obvious solution is to cache the results to disk in some way that can be reloaded rapidly when required. The technique essentially involves making an image copy of the table to disk. It can only be applied to some tables, but these include the Calibration tables which represent the largest database I/O load, and for these tables loading times can be reduced by an order of magnitude.

Running

Setting up the Environment

The interface needs a list of Database URLs, a user name and a password. This was previously done using envvars ENV_TSQL_URL, ENV_TSQL_USER, ENV_TSQL_PSWD that directly contained this configuration information. As this approach resulted in the configuration information being duplicated many times a new DBCONF approach has now been adopted.

The DBCONF approach is based on the standard mysql configuration file HOME/.my.cnf which has the form :

[testdb]

host = dybdb1.ihep.ac.cn
user = dayabay
password = youknowit
database = testdb

[dyb_cascade]
host = dybdb1.ihep.ac.cn
user = dayabay
password = youknowit
database =

db1 = offline_db
db2 = dyb_temp

Typical configurations can be communicated via the setting of a single environment variable DBCONF that points to a named section in the configuration file. Other envvars can also be used to change the default behaviour allowing more complex configurations such as cascades of multiple databases to be configured.

envvar default notes
DBCONF   name of section in config file
DBCONF_URL mysql://%(host)s/%(database)s  
DBCONF_USER %(user)s  
DBCONF_PSWD %(password)s  
DBCONF_HOST %(host)s  
DBCONF_DB %(database)s  
DBCONF_PATH /etc/my.cnf:$SITEROOT/../.my.cnf: /.my.cnf list of config file paths

The defaults are python patterns that are filled in using the context variables obtained from the section of the config

The meanings are as follows.

DBCONF_PATH

Colon delimited list of paths (which can include envvars such as $SITEROOT and the home directory tilde symbol). Non-existing paths are silently ignored and sections from the later config files override sections from prior files. Using the default paths shown in the table allows the system administrator to manage config in /etc/my.cnf which is overridden by the dybinst administrator managed $SITEROOT/../.my.cnf.

Users only need to create their own config file in HOME/.my.cnf if they need to override the standard configuration.

DBCONF_URL

This is a semi-colon separated list of URLs. Each URL takes the form:-

protocol://host[:port]/[database][?options]
where:
   protocol - DBMS type , e.g. mysql etc.
   host - host name or IP address of database server
   port - port number
   database - name of database
   options - string key=value's separated by ';' or '&'
 Example:
 "mysql://myhost:3306/test?Trace=Yes;TraceFile=qq.log"
DBCONF_USER
Pattern that yields database user name. Only needs to be set if you require different names for different databases in the cascade then this can be a semi- colon separated list in the same order as DBCONF_URL. If the list is shorter than that list, then the first entry is used for the missing entries.
DBCONF_PSWD
Pattern that yields database password. As with DBCONF_USER it can be a semi-colon separated list with the first entry providing the default if the list is shorter than DBCONF_URL. It only needs to be set if you require different passwords for the different databases in a cascade. Security risks are avoided by never using actual passwords in this envvar but rather using a pattern such as %(pass1)s;%(pass2)s that will be filled in using the parameters from the config file section identified by DBCONF. Setting it to null will mean that it will be prompted for when the interface initializes.

These variable should be set for the standard read-only configuration. These variables can be trivially overridden for specific jobs by resetting the environment variables in the python script:

Note that using setdefault allows the config to be overridded without editing the file

import os
os.environ.setdefault('DBCONF','dyb_offline')
print 'Using Database Config %s ' % os.environ['DBCONF']

For framework jobs when write-access to the database is required, or other special configuration is desired a less flexible approach is preferred. With a comment pointing out that some special configuration in /.my.cnf is required. Be careful not to disclose real passwords; passwords do not belong in repositories.

"""
    NB requires section of ~/.my.cnf

    [dyb_offline]
    host = dybdb1.ihep.ac.cn
    user = dayabay
    password = youknowit
    db1 = dyb_offline
    db2 = dyb_other

"""
import os
os.environ['DBCONF'] = 'dyb_offline'
os.environ['DBCONF_URL'] = 'mysql://%(host)s/%(db1)s;mysql://%(host)s/%(db2)s'
print 'Using Database Config %s ' % os.environ['DBCONF']

Configuring

The database can be configured through a Gaudi Service before starting your job.

Once the job is running you can configure the DatabaseInterface via the DbiSvc:

from gaudimodule import *
theApp = AppMgr()
theApp.Dlls += ['Conventions']
theApp.Dlls += ['Context']
theApp.Dlls += ['DatabaseInterface']
theApp.createSvc('DbiSvc')

dbisvc = theApp.service('DbiSvc')
dbisvc.<property>=<newvalue>
dbisvc.<property>=<newvalue>
...

Rollback

To impose a global rollback date to say September 27th 2002:-

theApp.service('DbiSvc').RollbacDates ='* = 2002-09-27 00:00:00'

This will ensure that the interface ignores data inserted after this date for all future queries. The hours, minutes and seconds can be omitted and default to 00:00:00.

Rollback can be more selective, specifying either a single table or a group of tables with a common prefix. For example:-

theApp.service('DbiSvc').RollbackDates ='*          =  2002-09-01';
theApp.service('DbiSvc').RollbackDates ='Cal*       =  2002-08-01'
theApp.service('DbiSvc').RollbackDates ='CalPmtGain =  2002-07-01'

Now the table CalPmtGain is frozen at July 2002, other Cal tables at August and all other tables at September. The ordering of the commands is not important; the interface always picks the most specific one to apply to each table.

Rollback only applies to future queries, it does not invalidate any existing query result in the cache which are still available to satisfy future requests. So impose rollback conditions at the start of the program to ensure they apply consistently.

MakeConnectionsPermanent

By default the DatabaseInterface closes connection to the database between queries, to minimise use of resources - see section Holding Open Connections. If the job is doing a lot of database I/O, for example creating calibration constants then this may degrade performance in which case all connections can be made permanent by:-

theApp.service('DbiSvc').MakeConnectionsPermanent='true'

Ordering Context Query Results

By default when the DatabaseInterface retrieves the data for a Context Query, it does not impose an order on the data beyond requiring that it be in sequence number order. When an ordering is not imposed, the database server is under no obligation to return data in a particular order. This means that the same job running twice connected to the same database could end up with result sets that contain the same data but with different ordering. Normally this doesn’t matter, the ordering of rows is not significant. However, results from two such jobs may not be identical as floating point calculations can change at machine level precision if their ordering is changed. There are situations where it is required that the results be identical. For example:-

  • When bug hunting.
  • When checking compatibility between two databases that should be identical.

and for such occasions it is possible to completely specify the ordering of rows within a sequence number by forcing sub-ordering by ROW_COUNTER, a column that should be present in all Main Data tables:-

theApp.service('DbiSvc').OrderContextQuery='true'

Level 2 Cache

Enabling the Level 2 Cache allows certain large tables query results to be written to disk from which they can be reloaded by subsequent jobs saving as much as an order of magnitude in load time. Data in the cache will not prevent changes in the database from taking affect for the DatabaseInterface does an initial (lightweight) query of the database to confirm that the data in the cache is not stale. To enable the cache, the user specifies a directory to which they have read/write access. For example, to make the current working directory the cache:-

theApp.service('DbiSvc').Level2Cache='./'

Cache files all have the extension .dbi_cache. Not all tables are suitable for Level 2 caching; the DatabaseInterface will only cache the ones that are.

Cache files can be shared between users at a site to maximise the benefit. In this case the local Database Manager must set up a directory to which the group has read/write access. Management is trivial, should the cache become too large, it can simply be erased and then the next few jobs that run will re- populate it with the currently hot queries.

Note that Cache performance is achieved by doing raw binary I/O so the cache files are platform specific, so if running in a heterogeneous cluster the Database Manager should designate a platform specific directory. To simplify this, the name of the directory used by the cache can include environmental variables e.g.:-

theApp.service('DbiSvc').Level2Cache='$DBI_L2CACHE'

Output Level

The verbosity of the error log from the DatabaseInterface can be controlled by:

theApp.service('DbiSvc').OutputLevel = 3

The output levels are standard Gaudi levels.

Accessing Existing Tables

Introduction

To access database data, the user specifies the database table to be accessed and supplies a “context” for the query. The context describes the type and date time of the current event. This is stored in a Context package Context object.

FIXME Need a description here of how to get a Context from a Data Model object.

It should be something like:

Context   GetContext() const

methods to get their context. The DatabaseInterface uses the context to extract all the rows from the database table that are valid for this event. It forms the result into a table in memory and returns a object that acts like a pointer to it.

You are NOT responsible for deleting the table; the Database Interface will do that when the table is no longer needed

You have random access to any row of the results table. Each row is an object which is specific to that table. The key to understanding how to get data from a database table is study the class that represent a row of it results table.

Accessing Detector Descriptions

Making the Query

As explained above, the key to getting data is to locate the class that represents one row in a database table. To understand how this all works look at one of the sample tables included in the DbiTest package and imaginatively called DbiDemoData1, DbiDemoData2 and DbiDemodata3. For purposes of illustration we will pick the first of these. Its header can be found in:-

DbiTest/DbiDemoData1.h

To make a query you create a DbiResultPtr object. Its header can be found in:-

DatabaseInterface/DatabaseInterface/DbiResultPtr.h

This is a class that is templated on the table row class, so in this case the instantiated class is:-

DbiResultPtr<DbiDemoData1>

and to instantiate an object of this class you just need a Context object. Suppose vc is such an object, then this creates the pointer:-

DbiResultPtr<DbiDemoData1> myResPtr(vc);

This statement creates a DbiResultPtr for DbiDemoData1 class. First it searches through the database for all DbiDemoData1 objects that are valid for vc, then it assembles them into a table and finally passes back a pointer to it. Not bad for one statement! The constructor can take a second argument:-

DbiResultPtr(Context vc,Dbi::SubSite subsite=0,Dbi::Task task=0);

Dbi::SubSite is an optional parameter that sub-divides a table to select a specific component at a given detector Site, e.g. an antineutrino detector.

Dbi::Task offers a way to sub-divided a table according to the mode of operation. For example a Detector Configuration data could have two modes, one for raw calibration and another for refined calibration.

If the concept of a subsite or task is not relevant for a particular database table, then the parameter should be left at its default value of 0. Otherwise data should be allocated a unique positive number and then selection will only pick rows with the required value of task.

The constructor can take further arguments which can normally be left at their default values - a Dbi::AbortTest see section Error Handling and a Bool_t findFullTimeWindow see section Truncated Validity Ranges.

Accessing the Results Table

Having got a pointer to the table the first thing you will want to know is how many rows it has. Do this using the method:-

UInt_t GetNumRows() const;

If the query failed then the number of rows returned will be zero. This could either be the result of some catastrophic failure, for example the database could not be opened, or simply that no appropriate data exists for the current event. If you want to know which of these it is you can use the:-

const DbiValidityRec* GetValidityRec() const;

If this returns a null pointer, then the failure was a major one, see Error Logging. If not then the DbiValidityRec tells you about the validity of the gap. Its method:-

const ContextRange& GetContextRange() const;

returns a Context package ContextRange object that can yield the start and end times of the gap. Due to the way the DatabaseInterface forms the query, this may be an underestimate, but never an overestimate.

If the table has rows then the GetContextRange() will give you an object that tells you the range of the data. Again, the range may be an underestimate. To get to the data itself, use the method:-

const T* GetRow(UInt_t i) const;

where T = DbiDemoData1 in this case. This gives you a const pointer to the i^{th} row where i is in the range 0 <= i <
\textrm{GetNumRows}().

FIXME Need complete example here including DataModel object.

Putting this all together, suppose you have a CandDigitListHandle object cdlh, and you want to loop over all DbiDemoData1 objects that are valid for it, the code is:-

DbiTest/DbiDemoData1.h
DatabaseInterface/DbiResultPtr.h

...

DbiResultPtr<DbiDemoData1> myResPtr(cdlh.GetContext());

for ( UInt_t irow = 0; irow < myResPtr.GetNumRows(); ++ires) {
  const DbiDemoData1* ddd1 = myResPtr.GetRow(irow);

// Process row.

}

GetRow is guaranteed to return a non-zero pointer if the row number is within range, otherwise it returns zero. The ordering of rows reflects the way the data was written to the database. For some types of data this layout is not well suited for access. For example, for pulser data, all the strip ends illuminated by an LED will appear together in the table. To deal with this table row object are obliged to return a Natural Table Index, if the physical ordering is not a natural one for access. You get rows from a table according to their index using the method:-

const T* GetRowByIndex(UInt_t index) const;

You should always check the return to ensure that its non-zero when using this method unless you are absolutely certain that the entry must be present.

Getting Data from a Row

Having got to the table row you want, the last job is to get its data. Its up to the table row objects themselves to determine how they will present the database table row they represent. In our example, the DbiDemoData1 is particularly dumb. Its internal state is:-

Int_t    fSubSystem;
Float_t  fPedestal;
Float_t  fGain1;
Float_t  fGain2;

which it is content to expose fully:-

  Int_t GetSubSystem() const { return fSubSystem; }
Float_t GetPedestal() const { return fPedestal; }
Float_t GetGain1() const { return  fGain1; }
Float_t GetGain2() const { return fGain2; }

Its worth pointing out though that it is the job of the table row object to hide the physical layout of the database table and so shield its clients from changes to the underlying database. Its just another example of data encapsulation.

Making Further Queries

Even though a DbiResultPtr is lightweight it is also reusable; you can make a fresh query using the NewQuery method:-

UInt_t NewQuery(Context vc, Dbi::Task task=0);

which returns the number of rows found in the new query. For example:-

DbiResultPtr<DbiDemoData1> myResPtr(vc);
...
Context newVc;
...
myResPtr.NewQuery(newVc);
...

Having made a query you can also step forwards or backwards to the adjacent validity range using the method:-

UInt_t NextQuery(Bool_t forwards = kTRUE);

supply a false value to step backwards. This method can be used to “scan” through a database table, for example to study calibration constants changes as a function of time. To use this efficiently you need to request accurate validity ranges for your initial query, although this is the default see section Truncated Validity Ranges. For aggregated data stepping to a neighbouring range will almost certainly contain some rows in common unless all component aggregates have context ranges that end on the boundary you are crossing. See the next section for a way to detect changes to data using the DbiResult::GetID() method.

Simple Optimisation

The first, and most important, level of optimisation is done within the DatabaseInterface itself. Each time it retrieves data from the database it places the data in an internal cache. This is then checked during subsequent queries and reused as appropriate. So the first request for a large table of calibration constants may require a lot of I/O. However the constants may remain valid for an entire job and in which case there is no further I/O for this table.

Although satisfying repeat requests for the same data is quick it still requires the location of the appropriate cache and then a search through it looking for a result that it is suitable for the current event. There are situations when even this overhead can be a burden: when processing many rows in a single event. Take for example the procedure of applying calibration. Here every digitization needs to be calibrated using its corresponding row in the database. The naive way to do this would be to loop over the digits, instantiating a DbiResultPtr for each, extracting the appropriate row and applying the calibration. However it would be far more efficient to create a little calibration object something like this:-

class MyCalibrator {
public:
  MyCalibrator(const Context vc): fResPtr(vc) {}
  Float_t Calibrate(DataObject& thing) {
     /*  Use fResPtr to calibrate thing  */
}

private
  DbiResultPtr<DbiDemoData1> fResPtr;
};

MyCalibrator is a lightweight object holding only a pointer to a results table. It is created with a Context object which it uses to prime its pointer. After that it can be passed DataObject objects for which it returns calibrated results using its Calibrate method. Now the loop over all digitizations can use this object without any calls to the DatabaseInterface at all. Being lightweight MyCalibrator is fine as a stack object, staying in scope just long enough to do its job.

Another optimisation strategy involves caching results derived from a query. In this case it is important to identify changes in the query results so that the cached data can be refreshed. To aid this, each DbiResult is given an key which uniquely identifies it. This key can be obtained and stored as follows:-

DbiResultKey MyResultKey(myResPtr.GetKey());

This should be stored by value (the DbiResultKey pointed to by GetKey will be deleted when the results expire) as part of the cache and checked each time a change is possible:-

if ( ! MyResultKey.IsEqualTo(myResPtr.GetKey()) ) {

  // recreate the cache data ...

  MyResultKey = *myResPtr.GetKey();
}

Caution: This tests to see that the current DbiResult has exactly the same data as that used when the cached was filled, but not that it is physically the same object. If there have been intervening queries the original object may have been deleted but this should not matter unless the cache holds pointers back to the DbiResult. In this case the result ID should be used. Initialise with:-

Int_t MyResultID(myResPtr.GetResultID());

and then check as follows:-

if ( MyResultID != (myResPtr.GetResultID()) ) {

  // recreate the cache data ...

   MyResultID = myResPtr.GetResultID();
}

Extended Context Queries

Making the Query

The constructor of a DbiResultPtr for an Extended Context Query is:-

DbiResultPtr(const string& tableName,
             const DbiSqlContext& context,
             const Dbi::SubSite& subsite = Dbi::kAnySubSite,
             const Dbi::Task& task = Dbi::kAnyTask,
             const string& data = "",
             const string& fillOpts = "",

Dealing with each of these arguments in turn:-

const string& tableName
The name of the table that is to be accessed. This allows any type of DbiTableRow to be loaded from any type of table, but see section Filling Tables on filling if you are going to play tricks!
const DbiSqlContext& context

This argument provides the extended context through the utility class DbiSqlContext. Consider the following code:-

// Construct the extended context: FarDet data that starts on Sept 1 2003.
// (note: then end time stamp is exclusive)
TimeStamp tsStart(2003,9,1,0,0,0);
TimeStamp   tsEnd(2003,9,2,0,0,0);
DbiSqlContext context(DbiSqlContext::kStarts,tsStart,
                      tsEnd,Site::kFar,SimFlag::kData);

You supply the type of context (in this case DbiSqlContext::kStarts), the date range and the detector type and sim flag. Other types of context are kEnds and kThroughout. See

DatabaseInterface/DbiSqlContext.h

for the complete list.

You are not limited to the contexts that DbiSqlContext provides. If you know the SQL string you want to apply then you can create a DbiSqlContext with the WHERE clause you require e.g.:-

DbiSqlContext myContext("SITEMASK & 4")

which would access every row that is suitable for the CalDet detector.

const Dbi::Task& task

The task is as for other queries but with the default value of:-

Dbi::kAnyTask

which results in the task being omitted from the context query and also allows for more general queries: anything that is is valid after the where is permitted. For example:-

DbiSqlContext myContext("versiondate > '2004-01-01 00:00:00' "
                         " order by versiondate limit 1");

The SQL must have a where condition, but if you don’t need one, create a dummy that is always true e.g.:-

DbiSqlContext myContext("1 = 1 order by timeend desc limit 1 ")
const string& data

This is an SQL fragment, that if not empty (the default value) is used to extend the WHERE clause that is applied when querying the main table. For example consider:-

DbiSqlContext context(DbiSqlContext::kStarts,tsStart,tsEnd,
                      Site::kFar,SimFlag::kData);
DbiResultPtr<DbuSubRunSummary>
                      runs("DBUSUBRUNSUMMARY",context,
                            Dbi::kAnyTask,"RUNTYPENAME = 'NormalData'");

This query reads the DBUSUBRUNSUMMARY table, and besides imposing the context query also demands that the data rows satisfies a constraint on RUNTYPENAME.

const string& fillOpts
This is a string that can be retrieved from DbiResultSet when filling each row so could be used to program the way an object fills itself e.g. by only filling certain columns. The DatabaseInterface plays no part here; it merely provides this way to communicate between the query maker and the the author of the class that is being filled.

Accessing the Results Table

Accessing the results of an Extended Context query are essentially the same as for a standard query but with following caveats:-

  • If the method:-

    const DbiValidityRec* GetValidityRec(const DbiTableRow* row=0) const;
    

    is used with the default argument then the “global validity” of the set i.e. the overlap of all the rows is returned. Given the nature of Extended Queries there may be no overlap at all. In general it is far better to call this method and pass a pointer to a specific row for in this case you will get that validity of that particular row.

  • The method:-

    const T* GetRowByIndex(UInt_t index) const;
    

    will not be able to access all the data in the table if two or more rows have the same Natural Index. This is prohibited in a standard query but extended ones break all the rules and have to pay a price!

Error Handling

Response to Errors

All DbiResultPtr constructors, except the default constructor, have a optional argument:-

Dbi::AbortTest abortTest = Dbi::kTableMissing

Left at its default value any query that attempts to access a non-existent table will abort the job. The other values that can be supplied are:-

kDisabled
Never abort. This value is used for the default constructor.
kDataMissing
Abort if the query returns no data. Use this option with care and only if further processing is impossible.

Currently aborting means just that; there is no graceful shut down and saving of existing results. You have been warned!

Error Logging

Errors from the database are recorded in a DbiExceptionLog. There is a global version of that records all errors. The contents can be printed as follows:-

#include "DatabaseInterface/DbiExceptionLog.h"
...
LOGINFO(mylog) << "Contents of the Global Exception Log: \n"
           <<  DbiExceptionLog::GetGELog();

Query results are held in a DbiResult and each of these also holds a DbiExceptionLog of the errors (if any) recorded when the query was made. If myResPtr is a DbiResultPtr, then to check and print associated errors:-

const DbiExceptionLog& el(myResPtr.GetResult()->GetExceptionLog());
if ( el.Size() == 0 ) LOGINFO(mylog) << "No errors found" << endl;
else                  LOGINFO(mylog) << "Following errors found" << el << endl;

Creating New Tables

Choosing Table Names

The general rule is that a table name should match the DbiTableRow subclass object that it is used to fill. For example the table CalPmtGain corresponds to the class CalPmtGain. The rules are

These restrictions also apply to column names. Moreover, column names should be all capital letters.

Creating Detector Descriptions

A Simple Example

Creating new Detector Descriptions involves the creation of a database table and the corresponding table row Class. The main features can be illustrated using the example we have already studied: DbiDemoData1. Recall that its state data is:-

  Int_t  fSubSystem;
Float_t  fPedestal;
Float_t  fGain1;
Float_t  fGain2;

Its database table, which bears the same name, is defined, in MySQL, as:-

CREATE TABLE DBIDEMODATA1(
         SEQNO INTEGER not null,
         ROW_COUNTER INTEGER not null,
         SUBSYSTEM  INT,
         PEDESTAL   FLOAT,
         GAIN1      FLOAT,
         GAIN2      FLOAT,
         primary key(SEQNO,ROW_COUNTER));

as you can see there is a simple 1:1 correspondence between them except that the database table has two additional leading entries:-

SEQNO INTEGER not null,
ROW_COUNTER INTEGER not null,

and a trailing entry:-

primary key(SEQNO,ROW_COUNTER));

ROW_COUNTER is a column whose value is generated by the interface, it isn’t part of table row class. Its sole purpose is to ensure that every row in the table is unique; an import design constraint for any database. This is achieved by ensuring that, for a given SEQNO, each row has a different value of ROW_COUNTER. This allows the combination of these two values to form a primary (unique) key, which is declared in the trailing entry.

All database tables supported by the DatabaseInterface have an auxiliary Context Range Tables that defines validity ranges for them. Each validity range is given a unique sequence number that acts as a key and corresponds to SeqNo. In our case, indeed every case apart from the table name, the definition is:-

create table  DbiDemoData1Vld(
         SEQNO integer not null primary key,
         TIMESTART datetime not null,
         TIMEEND datetime not null,
         SITEMASK tinyint(4),
         SIMMASK tinyint(4),
         TASK integer,
         AGGREGATENO integer,
         VERSIONDATE datetime not null,
         INSERTDATE datetime not null,
         key TIMESTART (TIMESTART),
         key TIMEEND (TIMEEND));

When the DatabaseInterface looks for data that is acceptable for a give validity it:-

  1. Matches the validity to an entry in the appropriate Context Range Table and gets its SeqNo.
  2. Uses SeqNo as a key into the main table to get all the rows that match that key.

So, as a designer, you need to be aware of the sequence number, and the row counter must be the first two columns in the database table, but are not reflected in the table row class.

Filling a table row object from the database is done using the class’s Fill method. For our example:-

void DbiDemoData1::Fill(DbiResultSet& rs,
                        const DbiValidityRec* vrec) {

  rs >> fSubSystem >> fPedestal >> fGain1 >> fGain2;

}

the table row object is passed a DbiResultSet which acts rather like an input stream. The sequence number has already been stripped off; the class just has to fill its own data member. The DatabaseInterface does type checking (see the next section) but does not fail if there is a conflict; it just produces a warning message and puts default data into the variable to be filled.

The second argument is a DbiValidityRec which can, if required, be interrogated to find out the validity of the row. For example:-

const ContextRange& range = vrec->GetContextRange();

vrec may be zero, but only when filling DbiValidityRec objects themselves. On all other occasions vrec should be set.

Creating a Database Table

The previous section gave a simple MySQL example of how a database table is defined. There is a bit more about MySql in section MySQL Crib. The table name normally must match the name of the table row class that it corresponds to. There is a strict mapping between database column types and table row data members, although in a few cases one column type can be used to load more than one type of table row member. The table Recommended table row and database column type mappings gives the recommended mapping between table row, and MySQL column type.

Recommended table row and database column type mappings
Table Row Type MySQL Type Comments
Bool_t CHAR  
Char_t CHAR  
Char_t* CHAR(n) n<4 n <4
Char_t* TEXT n >3
string TEXT  
Short_t TINYINT 8 bit capacity
Short_t SMALLINT 16 bit capacity
Int_t TINYINT 8 bit capacity
Int_t SMALLINT 16 bit capacity
Int_t INT or INTEGER 32 bit capacity
Float_t FLOAT  
Double_t DOUBLE  
TimeStamp DATETIME  

Notes

  1. To save table space, select CHAR(n) for characters strings with 3 or less characters and select the smallest capacity for integers.
  2. The long (64 bit) integer forms are not supported as on (some?) Intel processors they are only 4 bytes long.
  3. Although MySQL supports unsigned values we banned them when attempting to get a previous interface to work with ORACLE, so unsigned in database column type should be avoided. It is allowed to have unsigned in the table row when a signed value is not appropriate and the interface will correctly handle I/O to the signed value in the database even if the most significant bit is set i.e. the signed value in the database is negative. It is unfortunate that the signed value in the database will look odd in such cases.

Designing a Table Row Class

Here is a list of the requirements for a table row class.

Must inherit from DbiTableRow
All table row objects must publicly inherit from the abstract classDbiTableRow. DbiTableRow does provide some default methods even though it is abstract.
Must provide a public default constructor

e.g.:-

DbiDemoData1::DbiDemoData1() { }

The DatabaseInterface needs to keep a object of every type of table row class.

Must implement CreateTableRow method

e.g.:-

virtual DbiTableRow* CreateTableRow() const {
                            return new DbiDemoData1; }

The DatabaseInterface uses this method to populate results tables.

May overload the GetIndex method

As explained in section Accessing the Results Table the ordering of rows in a table is determined by the way data is written to the database. Where that does not form a natural way to access it, table row objects can declare their own index using:-

UInt_t GetIndex(UInt_t defIndex) const

DbiDemoData2 provides a rather artificial example:-

UInt_t GetIndex(UInt_t defIndex) const { return fSubSystem/10; }

and is just meant to demonstrate how a unique index could be extracted from some packed identification word.

The following is required of an index:-

  • The number must be unique within the set.

  • It must fit within 4 bytes.

    GetIndex returns an unsigned integer as the sign bit has no special significance, but its O.K. to derive the index from a signed value, for example:-

    Int_t PlexStripEndId::GetEncoded() const
    

    would be a suitable index for tables indexed by strip end.

Must implement Fill method

This is the way table row objects get filled from a DbiResultSet that acts like an input stream. We have seen a simple example in DbiDemoData1:-

void DbiDemoData1::Fill(DbiResultSet& rs,
                        const DbiValidityRec* vrec) {

  rs >> fSubSystem >> fPedestal >> fGain1 >> fGain2;

}

However, filling can be more sophisticated. DbiResultSet provides the following services:-

      string DbiResultSet::CurColName() const;
      UInt_t DbiResultSet::CurColNum() const;
      UInt_t DbiResultSet::NumCols() const;
DbiFieldType DbiResultSet::CurColFieldType() const;

The first 3 give you the name of the current column, its number (numbering starts at one), and the total number of columns in the row. DbiFieldType can give you information about the type, concept and size of the data in this column. In particular you can see if two are compatible i.e. of the same type:-

Bool_t DbiFieldType::IsCompatible(DbiFieldType& other) const;

and if they are of the same capacity i.e. size:-

Bool_t DbiFieldType::IsSmaller(DbiFieldType& other) const;

You can create DbiFieldType objects e.g:-

DbiFieldType myFldType(Dbi::kInt)

see enum Dbi::DataTypes for a full list, to compare with the one obtained from the current row.

In this way filling can be controlled by the names, numbers and types of the columns. The Fill method of DbiDemoData1 contains both a “dumb” (take the data as it comes) and a “smart” (look at the column name) code. Here is the latter:-

Int_t numCol = rs.NumCols();

//  The first column (SeqNo) has already been processed.
for (Int_t curCol = 2; curCol <= numCol; ++curCol) {
  string colName = rs.CurColName();
  if (      colName == "SubSystem" ) rs >> fSubSystem;
  else if ( colName == "Pedestal" )  rs >> fPedestal;
  else if ( colName == "Gain1" )     rs >> fGain1;
  else if ( colName == "Gain2" )     rs >> fGain2;
  else {
    LOGDEBUG1(dbi) << "Ignoring column " << curCol
                     << "(" << colName << ")"
                     << "; not part of DbiDemoData1" << endl;
    rs.IncrementCurCol();
  }
}

*Being "smart" comes at a price; if your table has many  rows valid at at time,
defensive programming like this can cost  performance!*

In such cases, and if the table only exists is a few variants, its better to determine the variant and then branch to code that hardwires that form

Other services that DbiResultSet offers are:-

UInt_t DbiResultSet::CurRowNum() const;
Bool_t DbiResultSet::IsExhausted() const;
string DbiResultSet::TableName();

These tell you the current row number, whether there is no data left and the name of the table.

Also note that it is not a rule that database columns and class data members have to be in a 1:1 correspondence. So long as the table row can satisfy its clients (see below) it can store information derived from the database table rather than the data itself.

Must impliment the Store method

Similar to the Fill method, a row must know how to store itself in the database. Again, this is usually simple; you simply stream out the row elements to the stream provided:

void DbiDemoData1::Store(  (DbiOutRowStream& ors,
                           const DbiValidityRec* /* vrec */) const {

  ors << fSubSystem << fPedestal << fGain1 << fGain2;

}
must impliment the GetDatabaseLayout method

This method is used by a user wanting to do first-time creation of the databases from within the code. Doing this simplifies the table creation process slightly: simply list the columns that this class requires.

std::string  DbiDemoData1::GetDatabaseLayout()
{
  std::string table_format =
    "SUBSYSTEM  int,      "
    "PEDESTAL    float,   "
    "GAIN1       float,   "
    "GAIN2       float    ";
  return table_format;
}
May overload the CanL2Cache method

As explained in section Concepts the Level 2 cache allows table loading to be speeded up by caching the query results as disk files. Only certain tables support this option which by default is disabled. To enable it the table row object overrides this method as follows:-

Bool_t CanL2Cache() const { return kTRUE; }

Only table row classes who data members are built-in data types (ints, floats and chars) should do this. Table rows having objects or dynamic data e.g. string or pointers must not claim to support L2 caching. Note the table row doesn’t need code to save/restore to the cache, this is handled by the DbiTableProxy

Must Provide Services to its Clients
There would not be much point in its existence otherwise would there? However its not necessarily the case that all its does is to provide direct access to all the data that came from the table. This subject is explored in the next section.

The Dictionary files

FIXME Need to include instructions for properly doing dict.h and dict.xml files describing table rows, DbiResultPtr and DbiWriter, if I ever figure out how.

Data Encapsulation

A table row object is the gateway between a database table and the end users who want to use the data it contains. Like any good OO design, the aim should be to hide implementation and only expose the abstraction. There is nothing wrong in effectively giving a 1:1 mapping between the columns of the database table and the getters in the table row object if that is appropriate. For example, a table that gives the position of each PMT in a detector is going to have an X, Y and Z both in the database and in the getter. However at the other extreme there is calibration. Its going to be well into detector operation before the best form of calibration has been found, but it would be bad design to constantly change the table row getters. Its far better to keep the data in the database table very generic, for example:-

SeqNo      int,
SubSystem  int,
CalibForm  int,
parm0      float,
parm1      float,
parm2      float,
...

The significance of parm0,... depends on CalibForm. The table row object could then provide a calibration service:-

Float_t Calibrate(Float_t rawValue) const;

rather than expose parm0,.. Calibrate() would have code that tests the value of CalibForm and then uses the appropriate formula involving parm0... Of course some validation code will want to look at the quality of the calibration by looking at the calibration constants themselves, but this too could be abstracted into a set of values that hide the details of the form of the calibration.

However, it is strongly advised to make the raw table values available to the user.

Filling Tables

Overview

DatabaseInterface can be used to write back into any table from which it can read. To do this you need the services of a DbiWriter which is a templated class like DbiResultPtr. For example, to write DbiDemoData1 rows you need an object of the class:-

DbiWriter<DbiDemoData1>

DbiWriter only fills tables, it does not create them

Always create new tables with mysql before attempting to fill them

If you want to create the tables within the same job as the one that fills it then you can do so as follows:-

// Create a single instance of the database row, and use
// it to prime the database.  This needs only be done once.
// It will do nothing if the tables already exist.
MyRowClass dummy; // Inherits from DbiTableRow.
int db = 0;       // DB number. If 0, this data is put into the first
                  // database in the cascade;
                  // i.e. the first database in the ENV_TSQL_URL
dummy.CreateDatabaseTables(db);

In outline the filling procedure is as follows:-

  1. Decide the validity range of the data to be written and store it in a ContextRange object.
  2. Instantiate a DbiWriter object using this ContextRange object together with an aggregate number and task. Aggregate numbers are discussed below.
  3. Pass filled DbiTableRow sub-class objects (e.g. DbiDemoData1) to the DbiWriter. It in turn will send these objects their Store message that performs the inverse of the Fill message. DbiWriter caches the data but performs no database I/O at this stage.
  4. Finally send the DbiWriter its Close message which triggers the output of the data to the database.

The fact that I/O does not occur until all data has been collected has a couple of consequences:-

  • It minimises the chances of writing bad data. If you discover a problem with the data while DbiWriter is assembling it you use DbiWriter‘s Abort method to cancel the I/O. Likewise if DbiWriter detects an error it will not perform output when Close is invoked. Destroying a DbiWriter before using Close also aborts the output.

  • Although DbiWriter starts life as very lightweight, it grows as the table rows are cached.

    Be very sure that you delete the DbiWriter once you have finished with it or you will have a serious memory leak!

    To cut down the risk of a memory leak, you cannot copy construct or assign to DbiWriter objects.

Aggregate Numbers

As explained in Concepts (see section Concepts) some types of data are written for the entire detector as a single logical block. For example the way PMT pixels map to electronics channels might be written this way. On the other hand if it is written in smaller, sub-detector, chunks then it is Aggregated. For example light injection constants come from pulser data and it is quite possible that a calibration run will only pulse some LEDs and so only part of a full detector set of constants gets written to the database for the run. Each chunk is called an aggregate and given an aggregate number which defines the sub-section of the detector it represents. For pulser data, the aggregate number will probably be the logical (positional) LED number A single DbiWriter can only write a single aggregate at a time, for every aggregate can in principle have a different validity range. For unaggregated data, the aggregate number is -1, for aggregated data numbers start at 0,1,2...

The way that the DatabaseInterface assembles all valid data for a given context is as follows:-

  • First if finds all aggregate records that are currently valid.
  • For each aggregate number it finds the best (most recently created) record and loads all data associated with it.

This has two consequences:-

  • For a given table, the regime whereby the data is organised into aggregates should remain constant throughout all records in the table. If absolutely necessary the regime can be changed, but no records must have validities that span the boundary between one regime and another. Were that to be the case the same entry could appear in two valid records with different aggregates numbers and end up appearing in the table multiple times. The system checks to see that this does not happen by asking each row to confirm it’s aggregate number on input.
  • For any given context it is not necessary for all detector elements to be present; just the ones that are really in the detector at that time. For example, the Far detector will grow steadily over more than a year and this will be reflected in some database tables with the number of valid aggregates similarly growing with time. What aggregates are present can appear in any order in the database tables, the interface will assemble them into the proper order as it loads them.

Its perfectly possible that a calibration procedure might produce database data for multiple aggregates at a single pass. If you are faced with this situation and want to write all aggregates in parallel, then simply have a vector of DbiWriter‘s indexed by aggregate number and pass rows to the appropriate one. See DbiValidate::Test_6() for an example of this type of parallel processing.

Simple Example

We will use the class DbiDemoData1 to illustrate each of the above steps.

  1. Set up ContextRange object. — Typically the ContextRange will be based on the Context for the event data that was used to generate the database data that is to be stored. For our example we will assume that DbiDemoData1 represents calibration data derived from event data. It will be valid for 1 week from the date of the current event and be suitable for the same type of data.

    Context now;  // Event context e.g. CandHandle::GetContext()
    TimeStamp start = now.GetTimeStamp();
    // Add 7 days (in secs) to get end date.
    time_t vcSec = start.GetSec() + 7*24*60*60;
    TimeStamp    end(vcSec,0);
    // Construct the ContextRange.
    ContextRange      range(now.GetDetector(),
                        now.GetSimFlag(),
                        start,
                        end,
                        "Demo");
    
  2. Instantiate a DbiWriter. — Create a DbiDemoData1 writer for unaggregated data task 0.

    Int_t aggNo = -1;
    Dbi::SubSite subsite = 0;
    Dbi::Task task = 0;
    // Decide a creation date (default value is now)
    TimeStamp create;
    DbiWriter<DbiDemoData1> writer(range,aggNo,subsite,task,create);
    
  3. Pass filled DbiDemoData1 objects.

    // Create some silly data.
    DbiDemoData1 row0(0,10.,20.,30.);
    DbiDemoData1 row1(0,11.,21.,31.);
    DbiDemoData1 row2(0,12.,22.,32.);
    
    // Store the silly data.
    writer << row0;
    writer << row1;
    writer << row2;
    

    The DbiWriter will call DbiDemoData1‘s Store method.

    Again notice that the SeqNo, which is part of the table row, but not part of the class data, is silently handled by the system.

  4. Send the DbiWriter its Close message.

    writer.Close();
    

Using DbiWriter

  • The DbiWriter‘s constructor is:-

    DbiWriter(const ContextRange& vr,
              Int_t aggNo,
              Dbi::SubSite subsite= 0,
              Dbi::Task task = 0,
              TimeStamp versiondate = TimeStamp(0,0),
              UInt_t dbNo = 0,
              const std::string& LogComment = "",
              const std::string& tableName = ""
              );
    
  • The first argument determines the validity range of the data to be written,

    i.e. what set of Contexts it is suitable for. You can control the date range as well as the type(s) of data and detector.

  • The second argument is the aggregate number. For unaggregated data it is -1,

    for aggregated data its a number in the range 0..n-1 where n is the number of aggregates.

  • The third argument is the SubSite of the data. It has a default of 0.

  • The third argument is the Task of the data. It has a default of 0.

  • The fourth argument supplies the data’s version date. The default is a

    special date and time which signifies that DbiWriter is to use Overlay Version Dates (see Concepts section dbi:overlayversiondates.) Alternatively, at any time before writing data, use the method:-

    void SetOverlayVersionDate();
    

    to ensure that DbiWriter uses Overlay Version Dates.

  • The fifth argument defines which entry in the database cascade the data is

    destined for. By default it is entry 0 i.e. the highest priority one.

    Caution: Supplying the entry number assumes that at execution time the cascade is defined in a way that is consistent with the code that is using the DbiWriter. As an alternative, you can supply the database name (e.g. offline) if you know it and are certain it will appear in the cascade.

  • The sixth argument supplies a comment for the update. Alternatively, at any

    time before writing data, use the method:-

    void  SetLogComment(const std::string& LogComment)
    

    Update comments are ignored unless writing to a Master database (i.e. one used as a source database e,g. the database at FNAL), and in this case a non-blank comment is mandatory unless the table is exempt. Currently only DBI, DCS and PULSER tables are exempt.

    If the first character on the string is the ‘@’ character then the rest of the string will be treated as the name of a file that contains the comment. If using DbiWriter to write multiple records to the same table as part of a single update then only create a single DbiWriter and use the Open method to initialise for the second and subsequent records. That way a single database log entry will be written to cover all updates.

  • The last argument supplies the name of the table to be written to. Leaving it

    blank will mean that the default table will be used i.e. the one whose name matches, apart from case, the name of object being stored. Only use this feature if the same object can be used to fill more than one table.

  • Having instantiated a DbiWriter, filled table row objects must be passed using the operator:-

    DbiWriter<T>& operator<<(const T& row);
    

    for example:-

    writer << row0;
    writer << row1;
    writer << row2;
    

    DbiWriter calls the table row’s Store method, see the next section. It also performs some basic sanity checks:-

  • The row’s aggregate number matches its own.

  • The type of the data written is compatible with database table.

    If either check fails then an error message is output and the data marked as bad and the subsequent Close method will not produce any output.

  • Once all rows for the current aggregate have been passed to DbiWriter the data can be output using:-

    Bool_t Close();
    

    which returns true if the data is successfully output.

    Alternatively, you can write out the data as a DBMauto update file by passing the name of the file to the Close command:-

    Close("my_dbmauto_update_file.dbm");
    
  • On output a new sequence number is chosen automatically. By default, if writing permanent data to an authorising database or if writing to a file, a global sequence number will be allocated. In all other cases a local sequence number will be be used. For database I/O, as opposed to file I/O, you can change this behaviour with

    void SetRequireGlobalSeqno(Int_t requireGlobal)
    
    Where requireGlobal
      > 0  Must be global
      = 0  Must be global if writing permanent data to an authorising database
      < 0  Must be local
    
  • At any time before issuing the Close command you can cancel the I/O by either:-

  • Destroying the DbiWriter.

  • Using the method:-

    void Abort();
    
  • If you want to, you can reuse a DbiWriter by using:-

    Bool_t Open(const ContextRange& vr,
                Int_t aggNo,
                Dbi::Task task = 0,
                TimeStamp versionDate = TimeStamp(),
                UInt_t dbNo = 0);
    

    The arguments have the same meaning as for the constructor. An alternative form of the Open statement allows the database name to be supplied instead of its number. If the DbiWriter is already assembling data then the Close method is called internally to complete the I/O. The method returns true if successful. As explained above, the Open method must be used if writing multiple records to the same table as part of a single update for then a single database log entry will be written to cover all updates.

Table Row Responsibilities

All DbiTableRow sub-class objects must support the input interface accessed through DbiResultPtr. The responsibilities that this implies are itemised in section Designing a Table Row Class. The output interface is optional; the responsibilities listed here apply only if you want to write data to the database using this interface.

Must override GetAggregateNo method if aggregated
DbiTableRow supplies a default that returns -1. The GetAggregateNo method is used to check that table row objects passed to a particular DbiWriter have the right aggregate number.
Must override Store Method

The Store method is the inverse to Fill although it is passed a DbiOutRowStream reference:-

void Store(DbiOutRowStream& ors) const;

rather than a DbiResultSet reference. Both these classes inherit from DbiRowStream so the same set of methods:-

      string DbiResultSet::CurColName() const;
      UInt_t DbiResultSet::CurColNum() const;
      UInt_t DbiResultSet::NumCols() const;
DbiFieldType DbiResultSet::CurColFieldType() const;
      UInt_t DbiResultSet::CurRowNum() const;
      string DbiResultSet::TableName();

are available. So, as with the Fill method, there is scope for Store to be “smart”. The quotes are there because it often does not pay to be too clever! Also like the Fill method its passed a DbiValidityRec pointer (which is only zero when filling DbiValidityRec objects) so that the validity of the row can be accessed if required.

Creating and Writing Temporary Tables

It is possible to create and write temporary tables during execution. Temporary tables have the following properties:-

  • For the remainder of the job they look like any other database table, but they are deleted when the job ends.
  • They completely obscure all data from any permanent table with the same name in the same database. Contrast this with the cascade, which only obscures data with the same validity.
  • They are local to the process that creates them. Even the same user running another job using the same executable will not see these tables.

Temporary tables are a good way to try out new types of table, or different types of data for an existing table, without modifying the database. Writing data is as normal, by means of a DbiWriter, however before you write data you must locate a database in the cascade that will accept temporary tables and pass it a description of the table. This is done using the DbiCascader method CreateTemporaryTable. You can access the cascader by first locating the singleton DbiTableProxyRegister which is in overall charge of the DatabaseInterface. The following code fragment shows how you can define a new table for DbiDemoData1:-

#include "DatabaseInterface/DbiCascader.h"
#include "DatabaseInterface/DbiTableProxyRegistry.h"

...

//  Ask the singleton DbiTableProxyRegistry for the DbiCascader.
  const DbiCascader& cascader
               = DbiTableProxyRegistry::Instance().GetCascader();

//  Define the table.
  string tableDescr = "(SEQNO INT,   SUBSYSTEM  INT, PEDESTAL FLOAT,"
                      " GAIN1 FLOAT, GAIN2 FLOAT )";
//  Ask the cascader to find a database that will accept it.
  Int_t dbNoTemp = cascader.CreateTemporaryTable("DbiDemoData1",
                                                 tableDescr);

  if ( dbNoTemp < 0 ) {
     cout << "No  database to will accept temporary tables. " << endl;
  }

You pass CreateTemporaryTable the name of the table and its description. The description is a parenthesised comma separated list. It follows the syntax of the MYSQL CREATE TABLE command, see section MySQL Crib.

In principle not every database in the cascade will accept temporary tables so the cascader starts with the highest priority one and works done until it finds one, returning its number in the cascade. It returns -1 if it fails. For this to work properly the first entry in the cascade must accept it so that it will be taken in preference to the true database. It is recommended that the first entry be the temp database, for everyone has write-access to that and write- access is needed to create even temporary tables. So a suitable cascade might be:-

setenv ENV_TSQL_URL "mysql://pplx2.physics.ox.ac.uk/temp;\
                     mysql://pplx2.physics.ox.ac.uk/offline"

Having found a database and defined the new or replacement table, you can now create a DbiWriter and start writing data as describe in section Filling Tables. You have to make sure that the DbiWriter will output to the correct database which you can either do by specifying it using the 5th arg of its constructor:-

DbiWriter(const ContextRange& vr,
          Int_t aggNo,
          Dbi::Task task = 0,
          TimeStamp versionDate = TimeStamp(),
          UInt_t dbNo = 0);

or alternatively you can set it after construction:-

DbiWriter<DbiDemoData1> writer(range,aggNo);
writer.SetDbNo(dbNoTemp);

As soon as the table has been defined it will, as explained above, completely replace any permanent table in the same database with the same name. However, if there is already data in the cache for the permanent table then it may satisfy further requests for data. To prevent this from happening you can clear the cache as described in the next section.

Do NOT write permanent data to any temporary database for it could end up being used by anyone who includes the database for temporary tables. Database managers may delete any permanent tables in temporary databases without warning in order to prevent such problems.

Clearing the Cache

Normally you would not want to clear the cache, after all its there to improve performance. However if you have just created a temporary table as described above, and it replaces an existing table, then clearing the cache is necessary to ensure that future requests for data are not satisfied from the now out of date cache. Another reason why you may want to clear the cache is to study database I/O performance.

Although this section is entitled Clearing the Cache, you cannot actually do that as the data in the cache may already be in use and must not be erased until its clients have gone away. Instead the data is marked as stale, which is to say that it will ignored for all future requests. Further, you don’t clear the entire cache, just the cache associated with the table that you want to refresh. Each table is managed by a DbiTableProxy that owns a DbiCache. Both DbiWriter and DbiResultPtr have a TableProxy method to access the associated DbiTableProxy. The following code fragment shows how to set up a writer and mark its associated cache as stale:-

DbiWriter<DbiDemoData1> writer(range,aggNo);
writer.SetDbNo(dbNoTemp);
writer.TableProxy().GetCache()->SetStale();

ASCII Flat Files and Catalogues

Overview

ASCII flat files and catalogues provide a convenient way to temporarily augment a database with additional tables under your control. A flat file is a file that contains, in human readable form, the definition of a table and its data. It can be made an entry in a cascade and, by placing before other entries allows you to effectively modify the database just for the duration of a single job. As has already been explained, for each Main Data Table there is also an auxiliary Context Range Table, so you need 2 entries in the cascade for each table you want to introduce. The problem with this scheme is that, if introducing a number of tables, the cascade could get rather large. To avoid this catalogues are used. A catalogue is actually nothing more that a special ASCII flat file, but each row of its data is a URLs for another ASCII flat file that becomes part of the same cascade entry. In this way a single cascade entry can consist of an arbitrary number of files.

Flat Files

An ASCII flat file defines a single database table.

Format

The format is sometimes referred to as Comma Separated Value (CSV). Each line in the file corresponds to a row in the table. As you might suspect, values are separated by commas, although you can add additional white space (tabs and spaces) to improve readability (but heed the caution in section Example). The first row is special, it contains the column names and types. The types must valid MySQL types, see table Recommended table row and database column type mappings for some examples. If the special row is omitted or is invalid then the column names are set to C1, C2, ... etc. and all types are set to string (TEXT). Here is a simple example of a CSV file:-

SeqNo int, Pedestal float, SubSystem int, Gain1 float, Gain2 float
1,    1.0,    0,    10.,    100.
1,    1.1,    1,    11.,    110.
1,    1.2,    2,    12.,    120.
1,    1.3,    3,    13.,    130.

Its in a convention to use the file extension .csv, but it is not compulsory.

If any value is a string or a date, it must be delimited by double quotes.

URL

The database URL is based on the standard one extended by adding the suffix

#absolute-path-to-file

For example:-

mysql://coop.phy.bnl.gov/temp#/path/to/MyTable.csv

The table name is derived from the file name after stripping off the extension. In this example, the table name will be MyTable

Catalogues

These are special types of ASCII Flat File. Their data are URLs to other flat files. You cannot nest them i.e. one catalogue cannot contain a URL that is itself catalogue.

Format

The first line of the file just contains the column name “name”. The remaining lines are URLs of the flat files. Here is a simple example:-

name
file:/home/dyb/work/MyData.csv
file:/home/dyb/work/MyDataVld.csv
file:$MY_ENV/MyDataToo.csv
file:$MY_ENV/MyDataTooVld.csv

This catalogue defines two tables MyData and MyDataToo each with its associated auxiliary validity range table. Note that files names must be absolute but can begin with an environmental variable.

URL

The URL is identical to any other flat file with one additional constraint: the extension must be .cat or .db. For example:

mysql://coop.phy.bnl.gov/dyb_offline#/home/dyb/work/MyCatalogue.db

Example

The stand-alone testing of the Database Interface includes an example of an ASCII Catalogue. The URL of the cascade entry is:-

mysql://coop.phy.bnl.gov/dyb_test#\$DATABASEINTERFACE_ROOT/DbiTest/scriptsDemoASCIICatalogue.db

If you look at the file:-

\$DATABASEINTERFACE_ROOT/DbiTest/scripts/DemoASCIICatalogue.db

you will see it contains 4 lines, defining the tables DEMOASCIIDATA (a Detector Descriptions table) and DEMOASCIICONFIG ( Algorithm Configurations table):-

file:$DBITESTROOT/scripts/DEMOASCIIDATA.csv
file:$DBITESTROOT/scripts/DEMOASCIIDATAVld.csv
file:$DBITESTROOT/scripts/DEMOASCIICONFIG.csv
file:$DBITESTROOT/scripts/DEMOASCIICONFIGVld.csv

In both cases, the auxiliary validity range table defines a single validity range, although there is no reason why it could not have defined any number. For the DEMOASCIIDATA, there are 5 rows, a header row followed by 4 rows of data:-

SEQNO INT, UNWANTED INT, PEDESTAL FLOAT, SUBSYSTEM INT, GAIN1 FLOAT, GAIN2 FLOAT
1,99,1.0,0,10.,100.
1,99,1.1,1,11.,110.
1,99,1.2,2,12.,120.
1,99,1.3,3,13.,130.

For the DEMOASCIICONFIG table, there are only two rows:-

SEQNO INT, CONFIGSTRING TEXT
1,"mybool=1 mydouble=1.23456789012345678e+200 mystring='This is a string' myint=12345"

Caution: Note, don’t have any white space between the comma and the leading double quote of the configuration string.

MySQL Crib

This provides the absolute bare minimum to install, manage and use a MySQL database in the context of the DatabaseInterface.

Introduction

The following are useful URLs:-

A good book on MySQL is:-

MySQL by Paul DuBois, Michael Widenius. New Riders Publishing; ISBN: 0-7357-0921-1

Running mysql

mysql is a utility, used both by system administrators and users to interact with MySQL database. The command syntax is:-

mysql [-h host_name] [-u user_name] [-pyour_pass]

if you are running on the server machine, with you Unix login name and no password then:-

mysql

is sufficient. To exit type:-

\q

Note: most mysql commands are terminated with a semi-colon. If nothing happens when you type a command, the chances are that mysql is still waiting for it, so type it and press return again.

System Administration

This also has to be done as root. As system administrator, MySQL allows you to control access, on a user by user basis, to databases. Here are some example commands:-

create database dyb_offline;
grant all on    dyb_offline.*       to  smart@coop.bnl.phy.gov
grant all on    dyb_offline.*       to  smart@"%"
grant select    dyb_offline.Boring  to  dumb@coop.bnl.phy.gov
\q
  • The first lines creates a new database called dyb_offline. With MySQL you can have multiple databases.
  • The next two lines grants user smart, either logged in locally to the server, or remotely from anywhere on the network all privileges to all tables in that database.
  • The next line grants user dumb, who has to be logged in locally, select (i.e. read) access to the table Boring in the same database.

Selecting a Database

Before you can use mysql to create, fill or examine a database table you have to tell it what database to use. For example:-

use dyb_offline

‘use’ is one of the few commands that does not have a trailing semi-colon.

Creating Tables

The following commands create, or recreate, a table and display a description of it:-

drop table if exists DbiDemoData1;
create table DbiDemoData1(
    SeqNo      int,
    SubSystem  int,
    Pedestal   float,
    Gain1      float,
    Gain2      float
 );
describe DbiDemoData1;

See table Recommended table row and database column type mappings for a list of MySQL types that the DatabaseInterface currently supports.

Filling Tables

The following commands add data from the file DemoData1.dat to an existing table:-

load data local infile 'DemoData1.dat' into table DbiDemoData1;

Each line of the file corresponds to a row in the table. Columns should be separated with tabs. Table Example data formats. shows typical formats of the various data types.

Example data formats.
MySQL Type Table Row Type
CHAR a
TINYINT -128
SMALLINT -32768
INT or INTEGER -2147483647
FLOAT -1.234567e-20
DOUBLE 1.23456789012345e+200
TEXT ‘This is a string’
DATETIME ‘2001-12-31 04:05:06’

Making Queries

Here is a sample query:-

select * from DbiDemoData2Validity where
    TimeStart <= '2001-01-11 12:00:00'
and TimeEnd    > '2000-12-22 12:00:00'
and SiteMask & 4
order by TimeStart desc
;

Performance

Holding Open Connections

Connections to the database are either permanent i.e. open all the time or temporary i.e. they are closed as soon as a I/O operation is complete. A connection is made permanent if:-

  • Connecting to a ASCII flat file database as re-opening such a database would involve re-loading all the data.
  • Temporary data is written to the database for such data would be lost if the connection were closed.

In all other cases the connection is temporary so as to minimise resources (and in the case ORACLE resources that have to be paid for!). For normal operations this adds little overhead as typically there are several major database reads at the start of a production job after which little or no further database I/O occurs. However if you require the connection to remain open throughout the job then you can force any entry in the cascade to be permanent. The following code sets entry 0 in the cascade to have a permanent connection:-

#include "DatabaseInterface/DbiCascader.h"
#include "DatabaseInterface/DbiTableProxyRegistry.h"


//  Ask the singleton DbiTableProxyRegistry for the DbiCascader.
  const DbiCascader& cascader
               = DbiTableProxyRegistry::Instance().GetCascader();
// Request that entry 0 is permanently open.
  cascader.SetPermanent(0);

Note that this won’t open the connection but will prevent it from closing after its next use.

If you want all connections to remain open this can be set through the configuration parameter MakeConnectionsPermanent. See section MakeConnectionsPermanent.

Truncated Validity Ranges

Standard context specific queries are first trimmed to a time window to limit the number of Vld records that have to be analysed. Having established the best data, a further 4 calls to query the Vld table is made to determine the full validity. For data with long validities, these extra calls are worthwhile as they can significantly increase the lifetime of the results. However there are two cases where these should not be use:-

  • For data that changes at high frequency (minutes or hours rather than days) it may waste time doing the extra searches, although the results would be valid.
  • For sparse aggregation - see Simple, Compound and Aggregated. The algorithm opens up the window on the basis of the aggregates present at the supplied context so won’t take account of aggregates not present and might over- estimate the time window.

The following DbiResultPtr methods support this request:-

DbiResultPtr(const Context& vc,
             Dbi::Task task = Dbi::kDefaultTask,
             Dbi::AbortTest abortTest = Dbi::kTableMissing,
             Bool_t findFullTimeWindow = true);

DbiResultPtr(const string& tableName,
             const Context& vc = Dbi::fgDefaultContext,
             Dbi::Task task = Dbi::kDefaultTask,
             Dbi::AbortTest abortTest = Dbi::kTableMissing,
             Bool_t findFullTimeWindow = true);
UInt_t NewQuery(Context vc,
             Dbi::Task task=0,
             Bool_t findFullTimeWindow = true);

It is selected by passing in the value false for findFullTimeWindow.

Timing

DbiTimerManager is a static object that provides performance printout when enabled. By default it is enabled but can be disabled by:-

DbiTimerManager::gTimerManager.Enable(false);

Warning

latexparser did not recognize : href