Other analysis/datafiles

Table Of Contents

Previous topic

Introduction

Next topic

NuWa Basics

This Page

Daya Bay Links

Content Skeleton

Daya Bay Data Files

Daya Bay uses ROOT files for data analysis. Basic analysis can be done with these files using only the ROOT program (http://root.cern.ch). For more complex analysis, see the Section NuWa Basics on using NuWa. If you do not have ROOT installed on your computer, you can access it on the computer clusters as part of the NuWa software (Sec. Loading the NuWa software).

Opening data files

Daya Bay data files can be opened using the ROOT program,

shell> root
root[0] TFile f("recon.NoTag.0002049.Physics.DayaBay.SFO-1._0001.root");
root[1] TBrowser b;
root[1] b.BrowseObject(&f);

The ROOT browser window will display the contents of the file, as shown in Fig. fig:tesbrowser. Event data is found under the path /Event, as summarized in Table Standard paths for Event Data. A section on each data type is included in this document. Simulated data files may include additional data paths containing “truth” information. A complete list of data paths are given in Sec. Data File Contents.

../../_images/tesBrowser.png

fig:tesbrowser

Data File Contents
Standard paths for Event Data
  Real and Simulated Data  
/Event/Readout Raw data produced by the experiment Sec. Raw DAQ Data
/Event/CalibReadout Calibrated times and charges of PMT and RPC hits Sec. Calibrated Data
/Event/Rec Reconstructed vertex and track data Sec. Reconstructed Data
  Simulated Data Only  
/Event/Gen True initial position and momenta of simulated particles  
/Event/Sim Simulated track, interactions, and PMT/RPC hits (Geant)  
/Event/Elec Simulated signals in the electronics system  
/Event/Trig Simulated signals in the trigger system  
/Event/SimReadout Simulated raw data  

A set of standard data ROOT files will be maintained on the clusters. The file prefix is used to identify the contents of the file, as shown in Table Standard NuWa Event Data files. The location of these files on each cluster are listed in Section Standard Data Files.

Standard NuWa Event Data files
File Prefix Readout CalibReadout Rec Coinc Spall Simulation Truth (Gen,``Sim``)
daq. yes         optional
calib. optional yes       optional
recon. some events some events yes     optional
coinc. some events some events some events yes   optional
spall. some events some events some events   yes optional

Each data paths in the ROOT file contains ROOT trees. You can directly access a ROOT tree,

root[0] TFile f("recon.NoTag.0005773.Physics.SAB-AD2.SFO-1._0001.root");
root[1] TTree* AdSimple = (TTree*)f.Get("/Event/Rec/AdSimple");

The next section gives examples of working with these ROOT Trees. See the ROOT User’s Guide for more details on working with Trees, http://root.cern.ch/download/doc/12Trees.pdf.

Histogramming data

Data can be histogrammed by selecting items in the TBrowser, or by using the Draw() function of the tree. For example, Figure fig:reconbrowser shows the data contained in a reconstructed event.

../../_images/reconBrowser.png

fig:reconbrowser

Example Reconstructed Data

The Draw() function allows the addition of selection cuts. For example, we can draw the reconstructed energy for all events where the reconstruction was successful by selecting events with energyStatus==1 and energy < 15 MeV,

root[2] AdSimple->Draw("energy","energyStatus==1 && energy<15");

Two- and three-dimensional histograms can be drawn by separating the variables with a colon. The third colz argument will use a color scale for a two- dimensional histogram. Fig. fig:reconhists shows the resulting histograms.

root[3] AdSimple->Draw("z:sqrt(x*x+y*y)","positionStatus==1","colz");
../../_images/energyHist.png

fig:reconhists

../../_images/positionHist.png

fig:reconhists

Example Histograms

A weighting can be added to each entry in histogram by multiplying your selection by the weighting factor (i.e. weight*(selection). This can be used to draw the calibrated PMT charge distribution in AD2 (Fig. figs:calibhists.) The charge distribution for a specfic event can be selected using the event number.

root[1] TTree* CalibReadoutHeader = (TTree*)f.Get("/Event/CalibReadout/CalibReadoutHeader");
root[2] CalibReadoutHeader->Draw("ring:column",
                                 "chargeAD*(detector==2)","colz")
root[3] CalibReadoutHeader->Draw("ring:column",
                                 "chargeAD*(detector==2 && eventNumber==12345)","colz")
../../_images/calibChargeAdHist.png

fig:calibhists

../../_images/calibChargeAdOneEventHist.png

fig:calibhists

The calibrated PMT charge (in photoelectrons) for all events and for an individual event.

The trigger time is divided into two parts; a count of seconds from January 1970 (i.e. unixtime), and a precise count of nanoseconds from the last second. To draw the absolute trigger time, you must add these two counts. Figure fig:chargevstimehist shows a histogram of the calibrated PMT hit charges versus trigger time [1]. The ROOT Sum$() function will histogram the sum of a quantity for each event; it can be used to histogram the sum of charge over all AD PMTs.

root[2] CalibReadoutHeader->Draw("chargeAD:triggerTimeSec+triggerTimeNanoSec*1e-9",
        "(detector==2 && ring==4 && column==15 && chargeAD>-3 && chargeAD<7)",
        "colz");
root[3] CalibReadoutHeader->Draw("Sum$(chargeAD):triggerTimeSec+triggerTimeNanoSec*1e-9",
                                  "detector==2 && Sum$(chargeAD)<1500","colz");
../../_images/chargeVsTimeHist.png

fig:chargevstimehist

../../_images/sumChargeVsTimeHist.png

fig:chargevstimehist

The calibrated charge (in photoelectrons) for one PMT and for the sum of all PMTs versus trigger time.

Histogramming Raw DAQ data

To properly histogram raw DAQ data from /Event/Readout, you will need to use part of the Daya Bay software in addition to ROOT. You must load the NuWa software, as described in Sec. Loading the NuWa software. Running load.C will allow you to call functions in your Draw() command. For example, you can call the function to draw the raw fine-range ADC and TDC distributions for PMT electronics board 6, connector 5 (Fig. fig:rawhists.) The selection on context.mDetId==2 selects the detector AD2; Sec. Conventions and Context lists the allowed detector and site IDs. If you have a raw .data file produced by the DAQ, see section Conversion from .data to wrap it in a ROOT tree so that you can directly histogram the raw data.

root[0] .x $ROOTIOTESTROOT/share/load.C
root[1] TFile f("daq.NoTag.0005773.Physics.SAB-AD2.SFO-1._0001.root");
root[2] TTree* ReadoutHeader = (TTree*)f.Get("/Event/Readout/ReadoutHeader");
root[3] ReadoutHeader->Draw("daqPmtCrate().adcs(6,5,1).value()","context.mDetId==2");
root[4] ReadoutHeader->Draw("daqPmtCrate().tdcs(6,5,1).value()","context.mDetId==2");
../../_images/rawAdcsHist.png

fig:rawhists

../../_images/rawTdcsHist.png

fig:rawhists

Histograms of Raw fine-range ADC and TDC values from PMT FEE board 6, connector 5.

Some ROOT Tree Tricks

A ROOT TChain can be used to combine the trees of the same path from multiple files into one large tree. For example, if a data run produced two files, you can combine the trees from these files:

root[0] TChain AdSimple("/Event/Rec/AdSimple");
root[1] AdSimple.Add("recon.NoTag.0005773.Physics.SAB-AD2.SFO-1._0001.root");
root[2] AdSimple.Add("recon.NoTag.0005773.Physics.SAB-AD2.SFO-1._0002.root");
root[3] AdSimple.Draw("energy","energyStatus==1 && detector==2");

To combine all the variables from trees at different data paths into a single tree, you can use the TTree::AddFriend() function. This can be used to histogram or select using variables from both trees. This should only be done for trees that are synchronized. The raw, calibrated, and reconstructed data are generally synchronized, as long as the data has not been filtered. The simulated truth trees at /Event/Gen and /Event/Sim are generally not synchronized with the data trees since one simulated event may produce an arbitary number of triggered readouts.

root[1] TTree* CalibReadoutHeader = (TTree*)f.Get("/Event/CalibReadout/CalibReadoutHeader");
root[2] TTree* AdSimple = (TTree*)f.Get("/Event/Rec/AdSimple");
root[3] AdSimple->AddFriend( CalibReadoutHeader );
root[4] AdSimple->Draw("energy:nHitsAD","detector==2","colz");

See the ROOT User’s Guide for more details on working with Trees, http://root.cern.ch/download/doc/12Trees.pdf.

Analysis Examples (or A Treatise on Cat-skinning)

What is the best / simplest / fastest way for me to examine event data and generate my histograms?

If this is your question, then please read this section. As discussed in the preceding sections, you can directly use ROOT to inspect NuWa event data files. Within ROOT, there are a few different methods to process event data. Alternatively, you can use the full power NuWa to process data. To demonstrate these different methods, a set of example scripts will be discussed in this section. Each example script generates the exact same histogram of number of hit PMTs versus reconstructed energy in the AD, but uses a different methods. Each ROOT script shows how to “chain” trees from multiple files, and how to “friend” data trees from the same file. All example scripts can be found in the dybgaudi:Tutorial/Quickstart software package.

  • dybTreeDraw.C: ROOT script using TTree::Draw()
  • dybTreeGetLeaf.C: ROOT script using TTree::GetLeaf()
  • dybTreeSetBranch.C: ROOT script using TTree::SetBranchAddress()
  • dybNuWaHist.py: NuWa algorithm using the complete data classes

The example dybTreeDraw.C is the simplest approach; it is recommended that you try this method first when generating your histograms. If you plan to include your algorithm as part of standard data production, you will eventually need to use a NuWa algorithm such as dybNuWaHist.py. The other two methods are only recommended for special circumstances. A detailed description of the advantages and disadvantages of each approach are provided in the following sections.

dybTreeDraw.C

This is the easiest approach and usually requires the least programming. Please consider using this approach first if possible.

Advantages:

  • Simple to run
  • Requires the least programming
  • Easy for others to understand and reproduce
  • Allows chaining and friending of data files

Disadvantages:

  • Slower when you need to make many histograms
  • Some cuts or variables cannot be expressed in a draw command
  • No access to geometry, database, other external data
  • Cannot be integrated with production analysis job

To run this example, use the following approach:

root [0] .L dybTreeDraw.C+
root [1] dybTreeDraw("recon*.root")

The key lines from the script are:

// Fill histograms
// AD#1
reconT.Draw("calibStats.nHit:energy>>nhitVsEnergyAD1H",
         "context.mDetId==1 && energyStatus==1");
// AD#2
reconT.Draw("calibStats.nHit:energy>>nhitVsEnergyAD2H",
         "context.mDetId==2 && energyStatus==1");

dybGetLeaf.C

There are some cases where the variables and cuts cannot be expressed in a simple TTree::Draw() command. Is this case, using TTree::GetLeaf() is an alternative. This is also a better alternative for those familiar with TSelector or TTree::MakeClass, since it allows chaining and friending of data files.

Advantages:

  • Fairly simple to run
  • Requires some minimal programming
  • Allows chaining and friending of data files

Disadvantages:

  • No access to geometry, database, other external data
  • Cannot be integrated with production analysis job

To run this example, use the following approach:

root [0] .L dybTreeGetLeaf.C+
root [1] dybTreeGetLeaf("recon*.root")

The key lines from the script are:

// Process each event
int maxEntries=reconT.GetEntries();
for(int entry=0;entry<maxEntries;entry++){

  // Get next event
  reconT.GetEntry(entry);

  // Get event data
  int detector = (int) reconT.GetLeaf("context.mDetId")->GetValue();
  int energyStatus = (int) reconT.GetLeaf("energyStatus")->GetValue();
  double energy = reconT.GetLeaf("energy")->GetValue();
  int nHit = (int)reconT.GetLeaf("calibStats.nHit")->GetValue();

  // Fill histograms
  if(energyStatus==1){ // Reconstruction was successful
    if(detector==1){
   // AD#1
   nhitVsEnergyAD1H->Fill(energy,nHit);
    }else if(detector==2){
   // AD#2
   nhitVsEnergyAD2H->Fill(energy,nHit);
    }
  }
}

dybTreeSetBranch.C

Use this approach only if you really need the fastest speed for generating your histograms, and cuts cannot be expressed in a simple TTree::Draw() command. The example script relies on TTree::SetBranchAddress() to explicitly manage the event data location in memory. By avoiding reading data unnecessary data from the file, it also demonstrates how to achieve the highest speed.

Advantages:

  • Fastest method to histogram data
  • Allows chaining and friending of data

Disadvantages:

  • Requires some careful programming
  • No access to geometry, database, other external data
  • Cannot be integrated with production analysis job

To run this example, use the following approach:

root [0] .L dybTreeSetBranch.C+
root [1] dybTreeSetBranch("recon*.root")

The key lines from the script are:

// Enable only necessary data branches
reconT.SetBranchStatus("*",0); // Disable all
calibStatsT.SetBranchStatus("*",0); // Disable all

// Must reenable execNumber since the tree indexing requires it
reconT.SetBranchStatus("execNumber",kTRUE);
reconT.SetBranchStatus("calibStats.execNumber",kTRUE);

int detector = 0;
reconT.SetBranchStatus("context.mDetId",kTRUE);
reconT.SetBranchAddress("context.mDetId",&detector);

int energyStatus = 0;
reconT.SetBranchStatus("energyStatus",kTRUE);
reconT.SetBranchAddress("energyStatus",&energyStatus);

float energy = -1;
reconT.SetBranchStatus("energy",kTRUE);
reconT.SetBranchAddress("energy",&energy);

int nHit = -1;
reconT.SetBranchStatus("calibStats.nHit",kTRUE);
reconT.SetBranchAddress("calibStats.nHit",&nHit);

// Process each event
int maxEntries=reconT.GetEntries();
for(int entry=0;entry<maxEntries;entry++){

  // Get next event
  reconT.GetEntry(entry);

  // Fill histograms
  if(energyStatus==1){ // Reconstruction was successful
    if(detector==1){
   // AD#1
   nhitVsEnergyAD1H->Fill(energy,nHit);
    }else if(detector==2){
   // AD#2
   nhitVsEnergyAD2H->Fill(energy,nHit);
    }
  }
}

dybNuWaHist.py

This example uses a full NuWa algorithm to generate the histogram. Use this approach when you need complete access to the event data object, class methods, geometry information, database, and any other external data. You must also use this approach if you want your algorithm to be included in the standard production analysis job. It is the most powerful approach to analysis of the data, but it is also the slowest. Although it is the slowest method, it may still be fast enough for your specific needs.

Advantages:

  • Full data classes and methods are available
  • Full access to geometry, database, other external data
  • Can be integrated with production analysis job

Disadvantages:

  • Slowest method to histogram data
  • Requires some careful programming
  • Requires a NuWa software installation

To run this example, use the following approach:

shell> nuwa.py -n -1 -m"Quickstart.dybNuWaHist" recon*.root

The key lines from the script are:

def execute(self):
    """Process each event"""
    evt = self.evtSvc()

    # Access the reconstructed data
    reconHdr = evt["/Event/Rec/AdSimple"]
    if reconHdr == None:
        self.error("Failed to get current recon header")
        return FAILURE
    # Access the calibrated data statistics
    calibStatsHdr = evt["/Event/Data/CalibStats"]
    if reconHdr == None:
        self.error("Failed to get current calib stats header")
        return FAILURE

    # Check for antineutrino detector
    detector = reconHdr.context().GetDetId()
    if detector == DetectorId.kAD1 or detector == DetectorId.kAD2:

        # Found an AD.  Get reconstructed trigger
        recTrigger = reconHdr.recTrigger()
        if not recTrigger:
            # No Reconstructed information
            self.warning("No reconstructed data for AD event!?")
            return FAILURE

        # Get reconstructed values
        energyStatus = recTrigger.energyStatus()
        energy = recTrigger.energy()
        nHit = calibStatsHdr.getInt("nHit")

        # Fill the histograms
        if energyStatus == ReconStatus.kGood:
            if detector == DetectorId.kAD1:
                self.nhitVsEnergyAD1H.Fill( energy/units.MeV, nHit )
            elif detector == DetectorId.kAD2:
                self.nhitVsEnergyAD2H.Fill( energy/units.MeV, nHit )

    return SUCCESS

The next section provides more information on data analysis using NuWa (Sec. NuWa Basics).

Advanced Examples

The following section presents advanced examples of working with Daya Bay data files. All example scripts can be found in the dybgaudi:Tutorial/Quickstart software package.

Combining ‘Unfriendly’ Trees

The examples in the previous section show how to histogram data by ‘friending’ trees. Trees can only be ‘friended’ if there is a natural relationship between the trees. The Coincidence and Spallation trees collect data from multiple triggers into one entry. As a consequence, you cannot ‘friend’ these trees with the trees which contain data with one trigger per entry (e.g. CalibStats, AdSimple, etc.). For example, you may want to histogram data in the Coincidence tree, but you want to apply a cut on a variable that is only present in CalibStats.

It is possible to combine data from these ‘unfriendly’ trees. The approach is to manually look up the data for the corresponding entries between the ‘unfriendly’ trees. By building on the example dybTreeGetLeaf.C, the advanced example dybTreeGetLeafUnfriendly.C generates a histogram with data from both the Coincidence and CalibStats data. The first step in this process is to create an index to allow a unique look-up of an entry from the CalibStats tree:

// Disable pre-existing index in the calib stats trees
//  (Another reason ROOT is frustrating; we must manually do this)
calibStatsT.GetEntries();
Long64_t* firstEntry = calibStatsT.GetTreeOffset();
for(int treeIdx=0; treeIdx<calibStatsT.GetNtrees(); treeIdx++){
  calibStatsT.LoadTree(firstEntry[treeIdx]);
  calibStatsT.GetTree()->SetTreeIndex(0);
}

// Build a new look-up index for the 'unfriendly' tree
//  (Trigger number and detector id uniquely identify an entry)
calibStatsT.BuildIndex("triggerNumber","context.mDetId");

Once this index is available, we can manually load a specific CalibStats entry with the call:

// Look up corresponding entry in calib stats
int status = calibStatsT.GetEntryWithIndex(triggerNumber, detector);

Now that we are prepared, we can step through each entry in the Coincidence tree. For each Coincidence multiplet we can look up all of the corresponding entries from the CalibStats tree. Here is the main loop over Coincidence entries from the example script, demonstrating how to fill a histogram with data from these unfriendly trees:

// Process each coincidence set
int maxEntries=adCoincT.GetEntries();
for(int entry=0;entry<maxEntries;entry++){

  // Get next coincidence set
  adCoincT.GetEntry(entry);

  // Get multiplet data
  int multiplicity = (int) adCoincT.GetLeaf("multiplicity")->GetValue();
  int detector = (int) adCoincT.GetLeaf("context.mDetId")->GetValue();
  std::vector<int>& triggerNumberV = getLeafVectorI("triggerNumber",&adCoincT);
  std::vector<int>& energyStatusV = getLeafVectorI("energyStatus",&adCoincT);
  std::vector<float>& energyV = getLeafVectorF("e",&adCoincT);

  // Loop over AD events in multiplet
  for(int multIdx=0; multIdx<multiplicity; multIdx++){

    // Get data for each AD trigger in the multiplet
    int triggerNumber = triggerNumberV[multIdx];
    int energyStatus = energyStatusV[multIdx];
    float energy = energyV[multIdx];

    // Look up corresponding entry in calib stats
    int status = calibStatsT.GetEntryWithIndex(triggerNumber, detector);
    if(status<=0){
   std::cout << "Failed to find calib stats for trigger number "
             << triggerNumber << " and detector ID " << detector
             << std::endl;
   continue;
    }
    // Get data from matching calib stats entry
    double nominalCharge = calibStatsT.GetLeaf("NominalCharge")->GetValue();

    // Fill histograms
    if(energyStatus==1 && energy>0){ // Reconstruction was successful
   if(detector==1){
     // AD#1
     chargeVsEnergyAD1H->Fill(energy,nominalCharge/energy);
   }else if(detector==2){
     // AD#2
     chargeVsEnergyAD2H->Fill(energy,nominalCharge/energy);
   }
    }

  }  // End loop over AD triggers in the multiplet
} // End loop over AD coincidence multiplets

Using TTree::Draw() with ‘Unfriendly’ Trees

The previous example script allowed us to correlate and histogram data between the ‘unfriendly’ Coincidence and CalibStats trees. This example required that we manually loop on the individual entries in the Coincidence tree, and fill the histograms entry-by-entry. An alternate approach is to reformat the data from the ‘unfriendly’ CalibStats tree into a ‘friendly’ format. Once in this ‘friendly’ format, we can return to simple calls to TTree::Draw() to place cuts and histogram data. This approach is more technical to setup, but can be useful if you want to continue to use TCuts, or if you want to repeatedly histogram the data to explore the variations of cuts.

As discussed, this approach relies on reformatting the data from an ‘unfriendly’ tree into a ‘friendly’ format. The example script dybTreeDrawUnfriendly.C generates the same histograms as the previous example dybTreeGetLeafUnfriendly.C, but uses this alternate approach. The following lines shows this in practice:

// Create 'friendly' version of data from CalibStats
std::string mainEntriesName = "multiplicity";
std::vector<string> calibVarNames; //variable names to copy from CalibStats
calibVarNames.push_back("MaxQ");
calibVarNames.push_back("NominalCharge");
std::string indexMajorName = "triggerNumber";
std::string indexMinorName = "context.mDetId";
TTree* calibStatsFriendlyT = makeFriendTree(&adCoincT,
                                         &calibStatsT,
                                         mainEntriesName,
                                         calibVarNames,
                                         indexMajorName,
                                         indexMinorName);
if(!calibStatsFriendlyT){
  std::cout << "Failed to create friendly tree" << std::endl;
  return;
}
// Add new friendly tree to coincidence tree
adCoincT.AddFriend(calibStatsFriendlyT,"calibStats");

Once this ‘friendly’ tree has been generated, we can use TTree::Draw() with the CalibStats variables:

// Fill histograms
// AD#1
adCoincT.Draw("calibStats.NominalCharge/e:e>>chargeVsEnergyAD1H",
           "context.mDetId==1 && energyStatus==1 && e>0","colz");
// AD#2
adCoincT.Draw("calibStats.NominalCharge/e:e>>chargeVsEnergyAD2H",
           "context.mDetId==2 && energyStatus==1 && e>0","colz");

The reformatted CalibStats data is available in the newly created tree calibStatsFriendlyT, which is dynamically created and kept in memory. Once you close your ROOT session, this tree will be deleted. If you wish to keep this ‘friendly’ tree around for later reuse, then you should write it to a file:

TFile outputFile("friendlyCalibStats.root","RECREATE");
calibStatsFriendlyT.SetDirectory(&outputFile);
calibStatsFriendlyT.Write();

The generation of this reformatted ‘friendly’ tree relies on the fairly complex helper function makeFriendTree:

TTree* makeFriendTree(TChain* mainT,
                   TChain* unfriendlyT,
                   const string& mainEntriesName,
                   const std::vector<string>& friendVarNames,
                   const string& indexMajorName,
                   const string& indexMinorName)

One entry in the tree mainT corresponds to multiple entries in the unfriendlyT tree; these are the Coincidence and CalibStats trees respectively in our example. mainEntriesName is the name of the branch in mainT that tells us the count of unfriendlyT entries that correspond to the current mainT entry. This is the variable multiplicity in our example, which tells us how many AD triggers are in the current coincidence multiplet. The variables names given in friendVarNames are reformatted from single numbers (i.e. float friendVar) in the unfriendlyT tree to arrays (i.e. float friendVar[multiplicity]) in the new ‘friendly’ tree returned by the function. For our example, these are the CalibStat variables MaxQ and NominalCharge. The indexMajorName and indexMinorName variables are present in both trees, and are used to correlate one entry in the mainT with multiple entries in the unfriendlyT tree. These are the variables triggerNumber and context.mDetId. Note that one or both of these index variables must be an array in the mainT tree to properly describe the ‘unfriendly’ one-to-many relationship between entries in mainT and unfriendlyT.

This helper function may require some slight modification for your specific case. It assumes that the branches have the following types:

  • mainEntriesName: integer in mainT
  • friendVarNames: float in unfriendlyT
  • indexMajorName: vector<int> in mainT and int in unfriendlyT
  • indexMinorName: int in both mainT and unfriendlyT

This helper function could be extended to dynamically check these variable types (eg. float, int, vector<float>, vector<int>, etc), and then respond accordingly. This is left as an exercise for the analyzer.

Footnotes

[1]The trigger time can be converted to a readable Beijing local time format using the lines described in Sec. Time Axes in ROOT