/search.css" rel="stylesheet" type="text/css"/> /search.js">
| Classes | Job Modules | Data Objects | Services | Algorithms | Tools | Packages | Directories | Tracs |

In This Package:

Public Member Functions | Public Attributes
Scraper::base::scraper::Scraper Class Reference
Inheritance diagram for Scraper::base::scraper::Scraper:
Inheritance graph
[legend]
Collaboration diagram for Scraper::base::scraper::Scraper:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def changed
def propagate
def tunesleep
def __call__

Public Attributes

 tunesleepmod
 sleep
 None when not behind.
 maxiter

Detailed Description

Base class holding common scrape features, such as the scrape logic
which assumes:

#. source instances correspond to fixed time measurement *snapshots*
#. target instances represent source measurements over time ranges 
#. 2 source instances are required to form one target instance, the target validity is derived from the datetimes of two source instances

Initialisation in `Propagator` superclass

Definition at line 9 of file scraper.py.


Member Function Documentation

def Scraper::base::scraper::Scraper::changed (   self,
  sv 
)
Override in subclasses to return if a significant 
change in source instances is observed.  This together
with age checks is used to decide is the propagate method is called.

:param sv:  source vector containing two source instances to interrogate for changes

Definition at line 20 of file scraper.py.

00021                           :
00022         """
00023         Override in subclasses to return if a significant 
00024         change in source instances is observed.  This together
00025         with age checks is used to decide is the propagate method is called.
00026 
00027         :param sv:  source vector containing two source instances to interrogate for changes
00028         """
00029         return True

def Scraper::base::scraper::Scraper::propagate (   self,
  sv 
)
Override this method in subclasses to yield one or more 
write ready target dicts derived from the `sv[-1]` source instance or `sv[-1].aggd` aggregate dict 

:param sv:  source vector containing two source instances to propagate to one target write  

Definition at line 30 of file scraper.py.

00031                             :
00032         """
00033         Override this method in subclasses to yield one or more 
00034         write ready target dicts derived from the `sv[-1]` source instance or `sv[-1].aggd` aggregate dict 
00035 
00036         :param sv:  source vector containing two source instances to propagate to one target write  
00037         """
00038         yield {}
  
def Scraper::base::scraper::Scraper::tunesleep (   self,
  i 
)
Every `self.tunesleepmod` iterations check lags behind sources and adjust sleep time 
accordingly.  Allowing to turn up the beat in order to catchup.

Tune heuristic uses an effective heartbeat, which is is the time between 
entries of interest to the scrapee, ie time between source updates scaled by offset+1 

Only makes sense to tune after a write, as it is only then that **tcursor** gets moved ahead.
When are close to current the sleep time can correspond to the timecursor **interval**
when behind sleep to allow swift catchup

POSSIBLE ISSUES
     
#. if ebeatlag never gets to 0, the sleep time will sink to the minimum 
    
   #. minimum was formerly 0.1, adjusted to max(0.5,ebeatsec/10.) out of concern for excessive querying
   #. adjusting to ebeatsec would be too conservative : would prevent catchup 


Definition at line 39 of file scraper.py.

00040                            :
00041         """
00042         Every `self.tunesleepmod` iterations check lags behind sources and adjust sleep time 
00043         accordingly.  Allowing to turn up the beat in order to catchup.
00044         
00045         Tune heuristic uses an effective heartbeat, which is is the time between 
00046         entries of interest to the scrapee, ie time between source updates scaled by offset+1 
00047 
00048         Only makes sense to tune after a write, as it is only then that **tcursor** gets moved ahead.
00049         When are close to current the sleep time can correspond to the timecursor **interval**
00050         when behind sleep to allow swift catchup
00051 
00052         POSSIBLE ISSUES
00053                      
00054         #. if ebeatlag never gets to 0, the sleep time will sink to the minimum 
00055             
00056            #. minimum was formerly 0.1, adjusted to max(0.5,ebeatsec/10.) out of concern for excessive querying
00057            #. adjusting to ebeatsec would be too conservative : would prevent catchup 
00058 
00059 
00060         """
00061         if self.tunesleepmod == -1:return
00062         if i % self.tunesleepmod != 0: return   
00063 
00064         maxlag = self.maxlag()                                        ## None when not behind
00065        
00066         osleepsec = float(self.sleep.seconds)
00067 
00068         offset    = self.offset
00069         ebeatsec  = float(self.heartbeat.seconds)*float(offset+1)    ## effective beat scaled by offset+1
00070         lagsec    = float(maxlag.seconds) if maxlag else 0
00071         ebeatlag  = lagsec/ebeatsec                                  ## lag in eheartbeats
00072 
00073         assert ebeatlag >= 0 , ebeatlag   
00074         if ebeatlag > 10:
00075             sleepsec = max(0.5, ebeatsec/10.)                        ## long way behind ... accelerate to top speed
00076         else: 
00077             tunefac = float(1.)/float(1.+ebeatlag)                   ## getting close ... decelerate to match speed to effective heartbeat  
00078             sleepsec = ebeatsec*tunefac
00079 
00080         self.sleep = timedelta(seconds=sleepsec)
00081 
00082         log.info( " %s   [ lagmin %6.2f / lagsec %-5s ]  [  ebeatsec %-5s  offset %-3s ]   [ ebeatlag %s ] : tune sleepsec %5.2f => %5.2f   ie %s  " % ( i, lagsec/60,lagsec,ebeatsec,offset, ebeatlag, osleepsec, sleepsec, self.sleep  ) )
00083 

def Scraper::base::scraper::Scraper::__call__ (   self)
Spin the scrape infinite loop, sleeping at each pass.
Within each pass loop over source vectors and check for propagate readyiness 

The ready to propagate check defers to the subclass to decide if there 
is significant change to proceed to propagate the source instances from the source 
vector into target instances.

Note the critical importance of the setting of ``sv.tcursor`` this triggers the 
source vector to collapse. Also note that the progression of the ``tcursor`` is not 
regular despite the constant interval stepping, as it is dictated by source changes 
or maxage settings if ``def changed()`` never returns ``True``.

Assumptions:

#. sources update on the same schedule, allowing a common sleep to be used for all 

Definition at line 84 of file scraper.py.

00085                       :
00086         """
00087         Spin the scrape infinite loop, sleeping at each pass.
00088         Within each pass loop over source vectors and check for propagate readyiness 
00089 
00090         The ready to propagate check defers to the subclass to decide if there 
00091         is significant change to proceed to propagate the source instances from the source 
00092         vector into target instances.
00093 
00094         Note the critical importance of the setting of ``sv.tcursor`` this triggers the 
00095         source vector to collapse. Also note that the progression of the ``tcursor`` is not 
00096         regular despite the constant interval stepping, as it is dictated by source changes 
00097         or maxage settings if ``def changed()`` never returns ``True``.
00098 
00099         Assumptions:
00100 
00101         #. sources update on the same schedule, allowing a common sleep to be used for all 
00102 
00103         """
00104         if self.seed_target_tables:
00105             log.warn("seed then scrape is no longer permitted, set `seed_target_tables = False` and run again for normal scraping")
00106             return
00107 
00108         i,w = 0,0
00109         while i<self.maxiter or self.maxiter==0:
00110             i += 1
00111             log.debug("i %s " % i )
00112             for sv in self:
00113                 proceed = sv()  
00114                 if not proceed:
00115                     continue 
00116 
00117                 wrt = self.target.writer( sv ) 
00118                 #log.debug("wrt %r " % wrt )
00119 
00120                 for td in self.propagate( sv ):           ## iterate over "yield" supplier of target dicts   
00121                     log.debug("td %r " % td )
00122                     tdi = self.target.instance(**td)      ## may fail with type errors
00123                     log.debug("tdi %r " % tdi )
00124                     wrt.Write(tdi)                        ## RuntimeError basic_string::_S_construct NULL not valid (C++ exception) here
00125 
00126                 if wrt.Close():                           ## hits target DB here
00127                     log.debug("write %s succeeded %s " % ( w, sv) )
00128                     timeend = wrt.ctx.contextrange.timeend     ## UTC TimeStamp
00129                     sv.tcursor = timeend.UTCtoNaiveLocalDatetime + self.interval
00130                     self.tunesleep(w) 
00131                     w += 1
00132                 else:
00133                     log.fatal("writing failed %s " %  sv )
00134                     assert 0
00135             self.handle_signal()
00136             time.sleep(self.sleep.seconds) ## uncomfortable with common sleep here ... 
00137         pass


Member Data Documentation

Definition at line 58 of file scraper.py.

None when not behind.

effective beat scaled by offset+1 lag in eheartbeats long way behind ... accelerate to top speed getting close ... decelerate to match speed to effective heartbeat

Definition at line 63 of file scraper.py.

Definition at line 101 of file scraper.py.


The documentation for this class was generated from the following file:
| Classes | Job Modules | Data Objects | Services | Algorithms | Tools | Packages | Directories | Tracs |

Generated on Fri May 16 2014 09:50:03 for Scraper by doxygen 1.7.4