/search.css" rel="stylesheet" type="text/css"/> /search.js">
| Classes | Job Modules | Data Objects | Services | Algorithms | Tools | Packages | Directories | Tracs |

In This Package:

Public Member Functions | Public Attributes | Properties | Private Member Functions | Static Private Attributes
Scraper::base::target::Target Class Reference

List of all members.

Public Member Functions

def __init__
def __repr__
def instance
def writer
def require_manual
def seed
def lastvld

Public Attributes

 dbconf
 kls

Properties

 database = property(lambda self:self.dbconf.get('database'))
 dbi_loglevel = property(lambda self:self.get('dbi_loglevel',"INFO"))
 seed_target_tables = property(lambda self:self.get('seed_target_tables',False))
 DROP_TARGET_TABLES = property(lambda self:self.get('DROP_TARGET_TABLES',False))
 ALLOW_DROP_CREATE_TABLE = property(lambda self:self.get('ALLOW_DROP_CREATE_TABLE',False))

Private Member Functions

def _kls
def _wrt
def _lastvld

Static Private Attributes

tuple _wrt = classmethod( _wrt )

Detailed Description

Encapsulate DybDbi dealings here to avoid cluttering ``Scraper``

Relevant config parameters 

:param timefloor: None or a datetime or string such as '2010-09-18 22:57:32' used to limit the expense of validity query

Definition at line 7 of file target.py.


Constructor & Destructor Documentation

def Scraper::base::target::Target::__init__ (   self,
  args,
  kwa 
)

Definition at line 16 of file target.py.

00017                                      :
00018         dict.__init__(self, *args, **kwa )
00019         from DybPython import DBConf
00020         self.dbconf = DBConf( os.environ['DBCONF'] )
00021         self.kls = self._kls()  


Member Function Documentation

def Scraper::base::target::Target::__repr__ (   self)

Definition at line 30 of file target.py.

00031                       :
00032         return "Target %s:%s:%s " % ( self['target'], self.database, self['kls'] )

def Scraper::base::target::Target::_kls (   self) [private]
:rtype DybDbi class:  returns DybDbi/genDbi class specified by ``kls`` key within this dict

Also does DybDbi housekeeping:

#. sets loglevel according to ``dbi_loglevel`` key
#. creates target tables if both ``seed_target_tables`` key is True and ``ALLOW_DROP_CREATE_TABLE`` is True
#. sets convenience kls attribute `kls.attnames` providing the list of attribute names : for dict consistency checking 

.. warning:: As this is called by ctor, must set options impacting it on dict instanciation

Definition at line 33 of file target.py.

00034                   :
00035         """
00036         :rtype DybDbi class:  returns DybDbi/genDbi class specified by ``kls`` key within this dict
00037         
00038         Also does DybDbi housekeeping:
00039 
00040         #. sets loglevel according to ``dbi_loglevel`` key
00041         #. creates target tables if both ``seed_target_tables`` key is True and ``ALLOW_DROP_CREATE_TABLE`` is True
00042         #. sets convenience kls attribute `kls.attnames` providing the list of attribute names : for dict consistency checking 
00043 
00044         .. warning:: As this is called by ctor, must set options impacting it on dict instanciation
00045 
00046         """
00047         from DybDbi import gDbi, Level          
00048         import DybDbi
00049 
00050         kls = getattr( DybDbi, self['kls'] )
00051         assert len(gDbi.cascader) == 1
00052         dbname = gDbi.cascader[0].dbname
00053         kls.attnames = kls.SpecKeys().aslist()
00054 
00055         gDbi.outputlevel = Level.From( self.dbi_loglevel )
00056 
00057         allow_dropcreate = self.ALLOW_DROP_CREATE_TABLE
00058         stt = self.seed_target_tables
00059         dtt = self.DROP_TARGET_TABLES
00060         sve = 'SUPERVISOR_ENABLED' in os.environ
00061         log.warn("target._kls stt %s dtt %s sve %s " % (stt,dtt,sve) )
00062 
00063         if sve and (stt or dtt):
00064             msg = "seeding/dropping target tables is not appropriate for longterm usage under supervisord : do manually with scr.py  "
00065             log.fatal(msg)
00066             raise Exception(msg)  
00067      
00068         if stt: 
00069             log.warn("seed_target_tables proceeding against : %r " % dict( self.dbconf, password="***") )
00070         elif dtt and allow_dropcreate: 
00071             if dbname == 'offline_db' or self.database == 'offline_db':
00072                 log.warn("seed_target_tables proceeding against offline_db for target %r " % dict( self.dbconf, password="***") )
00073             kls().CreateDatabaseTables( 0, kls.__name__[1:] )     
00074         else:
00075             log.warn("seed_target_tables/DROP_TARGET_TABLES is not allowed with target %r without ALLOW_DROP_CREATE_TABLE option " % dict( self.dbconf, password="***") )
00076         return kls
00077 

def Scraper::base::target::Target::instance (   self,
  kwa 
)
Might fail with TypeError if kwa cannot be coerced, eg from aggregate queries 
returning None when zero samples

If the attribute names are not expected for the target kls they are skipped.
This will be the case for the system attributes `_date_time_min` `_date_time_max` 

Definition at line 78 of file target.py.

00079                                :
00080         """
00081         Might fail with TypeError if kwa cannot be coerced, eg from aggregate queries 
00082         returning None when zero samples
00083 
00084         If the attribute names are not expected for the target kls they are skipped.
00085         This will be the case for the system attributes `_date_time_min` `_date_time_max` 
00086 
00087         """
00088         d = dict((k,v) for k,v in kwa.items() if k in self.kls.attnames)
00089         if len(kwa) != len(d):
00090             log.debug("target.instance skipped some attributes %r in instance preparation " % list(set(kwa).difference(set(d))) )  
00091         return self.kls.Create(**d)

def Scraper::base::target::Target::writer (   self,
  sv,
  localstart = None,
  localend = None 
)
Prepare DybDbi writer for target class, with contextrange/subsite appropriate for the source instance

Use of non-default localstart and localend type is typically only used for
aggregate_group_by quering where the instance datetimes such as `sv[0].date_time`
do not correspond to the contextrange of the aggregate dict. 

:param sv:  source vector instance that contains instances of an SA mapped class
:param localstart: default of `None` corresponds to `sv[0].date_time`
:param localend: default of `None` corresponds to `sv[-1].date_time`

Definition at line 92 of file target.py.

00093                                                           :
00094         """
00095         Prepare DybDbi writer for target class, with contextrange/subsite appropriate for the source instance
00096 
00097         Use of non-default localstart and localend type is typically only used for
00098         aggregate_group_by quering where the instance datetimes such as `sv[0].date_time`
00099         do not correspond to the contextrange of the aggregate dict. 
00100 
00101         :param sv:  source vector instance that contains instances of an SA mapped class
00102         :param localstart: default of `None` corresponds to `sv[0].date_time`
00103         :param localend: default of `None` corresponds to `sv[-1].date_time`
00104         """
00105 
00106         start = localstart if localstart else sv[0].date_time
00107         end   = localend if localend else sv[-1].date_time
00108         return Target._wrt( sv[0].__class__ , self.kls , start, end ) 

def Scraper::base::target::Target::_wrt (   cls,
  skls,
  tkls,
  localstart,
  localend 
) [private]
Classmethod to prepare DybDbi writer for the target class with 
with contextrange/subsite appropriate to the source class

:param skls: SQLAlchemy source class which determines the appropriate contextrange to use for target class writer
:param tkls: DybDbi target class for which the writer is to be prepared 
:param localstart: local datetime which is converted to UTC TimeStamp for timestart of CR
:param localend: local datetime which is converted to UTC TimeStamp for timeend of CR

Assumptions made:

#. source tablename encodes site, subsite (IF NOT THEN MUST GET FROM INSTANCE ?) 
#. source date_time are naive localtimes 

Definition at line 109 of file target.py.

00110                                                        :
00111         """
00112         Classmethod to prepare DybDbi writer for the target class with 
00113         with contextrange/subsite appropriate to the source class
00114 
00115         :param skls: SQLAlchemy source class which determines the appropriate contextrange to use for target class writer
00116         :param tkls: DybDbi target class for which the writer is to be prepared 
00117         :param localstart: local datetime which is converted to UTC TimeStamp for timestart of CR
00118         :param localend: local datetime which is converted to UTC TimeStamp for timeend of CR
00119 
00120         Assumptions made:
00121 
00122         #. source tablename encodes site, subsite (IF NOT THEN MUST GET FROM INSTANCE ?) 
00123         #. source date_time are naive localtimes 
00124 
00125         """
00126         wrt = tkls.Wrt()
00127         xtn = skls.xtn   
00128 
00129         from DybDbi import ContextRange, TimeStamp, Site, SimFlag, DetectorId
00130         log.debug( "_wrt localstart %s localend %s " % ( localstart.ctime(), localend.ctime() )) 
00131         timestart = TimeStamp.UTCfromLocalDatetime( localstart ) 
00132         timeend   = TimeStamp.UTCfromLocalDatetime( localend ) 
00133         cr = ContextRange( xtn.sitemask,  SimFlag.kData , timestart, timeend )
00134         wrt.ctx( contextrange=cr, dbno=0, versiondate=TimeStamp(0,0), subsite=xtn.subsite, task=0 ) 
        return wrt
def Scraper::base::target::Target::require_manual (   self,
  msg 
)
Require manual operation (ie running scr.py from commandline) 
preventing usage of rare operations/options under supervisor control 

Definition at line 138 of file target.py.

00139                                   :
00140         """
00141         Require manual operation (ie running scr.py from commandline) 
00142         preventing usage of rare operations/options under supervisor control 
00143         """
00144         sve = 'SUPERVISOR_ENABLED' in os.environ
00145         if sve:
00146             log.fatal(msg)
00147             raise Exception(msg)  
00148         else:
00149             log.info("manual running detected")            
00150 

def Scraper::base::target::Target::seed (   self,
  srcs,
  scraper,
  dummy = False 
)
This is invoked at scraper instanciation when the conditions are met:

#. ``seed_target_tables`` is configured True
  
Seed entries are written to the target table. The seed validity range is configured with
the options: `seed_timestart` `seed_timeend` and formerly the payload entry 
was specified by the `def seed()` method implemented in the scraper class.

Attempts to perform seeding under supervisor raises an exception, 
to enforce this restriction.

When testing seeding start from scratch with eg::

     mysql> drop table DcsAdTemp, DcsAdTempVld ;    
     mysql> update LOCALSEQNO set LASTUSEDSEQNO=0 where TABLENAME='DcsAdTemp' ;  

       Changes from Oct 2012, 

#. allow use against an existing table 
#. remove table creation functionality is removed
#. move to payloadless seeds (removing need for dummy payload instances) 

Motivated by the need to add new sources that contribute 
to an existing target which has already collected data
eg for adding ADs to the DcsAdWpHv scraper.

Definition at line 151 of file target.py.

00152                                                 :
00153         """
00154         This is invoked at scraper instanciation when the conditions are met:
00155 
00156         #. ``seed_target_tables`` is configured True
00157   
00158         Seed entries are written to the target table. The seed validity range is configured with
00159         the options: `seed_timestart` `seed_timeend` and formerly the payload entry 
00160         was specified by the `def seed()` method implemented in the scraper class.
00161 
00162         Attempts to perform seeding under supervisor raises an exception, 
00163         to enforce this restriction.
00164 
00165         When testing seeding start from scratch with eg::
00166 
00167              mysql> drop table DcsAdTemp, DcsAdTempVld ;    
00168              mysql> update LOCALSEQNO set LASTUSEDSEQNO=0 where TABLENAME='DcsAdTemp' ;  
00169 
00170        Changes from Oct 2012, 
00171 
00172         #. allow use against an existing table 
00173         #. remove table creation functionality is removed
00174         #. move to payloadless seeds (removing need for dummy payload instances) 
00175 
00176         Motivated by the need to add new sources that contribute 
00177         to an existing target which has already collected data
00178         eg for adding ADs to the DcsAdWpHv scraper.
00179 
00180         """
00181         self.require_manual("seed is not appropriate for longterm usage under supervisord : do seeding manually ")
00182         if not self['seed_target_tables']:
00183             log.warn("improper call to seed " )
00184             return
00185 
00186         localstart, localend = self['seed_timestart'], self['seed_timeend'] 
00187         kls = self.kls
00188         for src in srcs:
00189             tlv = self.lastvld(src.xtn)
00190             if tlv: 
00191                 log.info("tlv exists already : src %r tlv %r " % ( src,tlv.seqno))
00192             else:
00193                 log.warn("writing seed for src : %r " % (src) )
00194                 wrt = self._wrt( src, kls , localstart, localend  )
00195                 if dummy: 
00196                     inst = kls.Create( **scraper.seed(src) )
00197                     wrt.Write( inst )
00198                 seqno = wrt.Close()
00199                 assert seqno, "failed to seed_update for source %r " % src 
00200                 log.warn("wrote seed for src : %r with seqno %s  " % (src,seqno) )
00201 

def Scraper::base::target::Target::lastvld (   self,
  source 
)
Last validity record in target database for context corresponding to `source` class.
Query expense is restricted by the `timefloor`.  
If `timefloor` is None a sequence of progressively 
more expensive queries are performed to get the target last validty.

:param source:  source context instance either an **xtn** of MockSource instance with subsite and sitemask attributes
:param timefloor: time after which to look for validity entries in target database or None

Note this is called only at scraper initialization, in order for the scraper to 
find its time cursor.

Definition at line 202 of file target.py.

00203                                :
00204          """
00205          Last validity record in target database for context corresponding to `source` class.
00206          Query expense is restricted by the `timefloor`.  
00207          If `timefloor` is None a sequence of progressively 
00208          more expensive queries are performed to get the target last validty.
00209 
00210          :param source:  source context instance either an **xtn** of MockSource instance with subsite and sitemask attributes
00211          :param timefloor: time after which to look for validity entries in target database or None
00212 
00213          Note this is called only at scraper initialization, in order for the scraper to 
00214          find its time cursor.
00215 
00216          """
00217          timefloor = self.get('timefloor', None)
00218          if timefloor:
00219              log.debug("using timefloor %r " % timefloor )
00220              return self._lastvld( source, timefloor )
00221          else:
00222              log.warn("no timefloor ... inventing ")
00223              now = datetime.now()
00224              for days in (14,100,):
00225                  ts = now + timedelta(days=-days)  
00226                  tlv = self._lastvld( source , ts )
00227                  if tlv:
00228                      return tlv
00229              return self._lastvld(source, None ) 

def Scraper::base::target::Target::_lastvld (   self,
  source,
  timefloor 
) [private]
Obtain last target validity record for context corresponding to source class

:param source: SA mapped class

Definition at line 230 of file target.py.

00231                                           :
00232         """
00233         Obtain last target validity record for context corresponding to source class
00234 
00235         :param source: SA mapped class
00236 
00237         """ 
00238         from DybDbi import SimFlag
00239         subsite = source.subsite
00240         task = self.get('task', 0 )
00241         sqlcontext = "SiteMask & %s and SimMask & %s " % ( source.sitemask , SimFlag.kData )
00242         if timefloor:
00243             sqlcontext += " and TIMESTART > '%s' " % timefloor
00244         lastvld = self.kls.GetTableProxy().LastValidityRec( sqlcontext, subsite, task )
00245         return lastvld  
00246 
00247 
00248 


Member Data Documentation

tuple Scraper::base::target::Target::_wrt = classmethod( _wrt ) [static, private]

Definition at line 135 of file target.py.

Definition at line 16 of file target.py.

Definition at line 16 of file target.py.


Property Documentation

Scraper::base::target::Target::database = property(lambda self:self.dbconf.get('database')) [static]

Definition at line 22 of file target.py.

Scraper::base::target::Target::dbi_loglevel = property(lambda self:self.get('dbi_loglevel',"INFO")) [static]

Definition at line 25 of file target.py.

Scraper::base::target::Target::seed_target_tables = property(lambda self:self.get('seed_target_tables',False)) [static]

Definition at line 26 of file target.py.

Scraper::base::target::Target::DROP_TARGET_TABLES = property(lambda self:self.get('DROP_TARGET_TABLES',False)) [static]

Definition at line 27 of file target.py.

Scraper::base::target::Target::ALLOW_DROP_CREATE_TABLE = property(lambda self:self.get('ALLOW_DROP_CREATE_TABLE',False)) [static]

Definition at line 28 of file target.py.


The documentation for this class was generated from the following file:
| Classes | Job Modules | Data Objects | Services | Algorithms | Tools | Packages | Directories | Tracs |

Generated on Fri May 16 2014 09:50:03 for Scraper by doxygen 1.7.4