User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2014/11/25 08:25] – [The python solution] chronomission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2015/01/16 18:00] – [The bash solution] chrono
Line 28: Line 28:
 In order to start collecting long-term data to prove the [[lab:ucsspm]], reliable reference data was needed. A good industry produced pyranometer is too expensive at this time and hacking a cheap one just introduces the problem of reference data for calibration again. So I searched the net for data sources and found [[http://www.meteo.physik.uni-muenchen.de/dokuwiki/doku.php?id=wetter:stadt:messung|this site]] of the LMU.  In order to start collecting long-term data to prove the [[lab:ucsspm]], reliable reference data was needed. A good industry produced pyranometer is too expensive at this time and hacking a cheap one just introduces the problem of reference data for calibration again. So I searched the net for data sources and found [[http://www.meteo.physik.uni-muenchen.de/dokuwiki/doku.php?id=wetter:stadt:messung|this site]] of the LMU. 
  
-Unfortunately the data isn't accessible through an API or at least some JSON export of the raw data. Which meant I needed to devise a robot that would periodically scrape the data from that web page, extract all needed values and feed that data into the UCSSPM to calculate with real data for reference. Once it has done all that it has to push all usable raw data and the results of the UCSSPM prediction into an influxdb shard running on the stargazer so that the data can be stored, queried and live reviewed with the following VFCC dashboards:+Unfortunately the data isn't accessible through an API or at least some JSON export of the raw data. Which meant I needed to devise a robot that would periodically scrape the data from that web page, extract all needed values and feed that data into the UCSSPM to calculate with real data for reference. Once it has done all that it has to push all usable raw data and the results of the UCSSPM prediction into an influxdb shard running on the stargazer so that the data can be stored, queried and (re)viewed live on the following VFCC dashboards:
  
   * [[https://apollo.open-resource.org/flight-control/vfcc/#/dashboard/db/aquarius-external-environment|External Environment Data]]   * [[https://apollo.open-resource.org/flight-control/vfcc/#/dashboard/db/aquarius-external-environment|External Environment Data]]
Line 97: Line 97:
 Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, since bash can't do it.  Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, since bash can't do it. 
  
-And finally, the data structure and shipping method to influxdb is more than questionable, it would never scale. Each metric produces another new HTTP request creating a lot of wasteful overhead. But at the point of writing I simply didn'know enough to make it better. +And finally, the data structure and shipping method to influxdb is more than questionable, it would never scale. Each metric produces another new HTTP request creating a lot of wasteful overhead. But at the point of writing I simply didn'knew enough to make it better. 
  
 ===== The python solution ===== ===== The python solution =====
Line 158: Line 158:
  
  return WORK  return WORK
 +                else: 
 +                        return fallback
  except:  except:
  
Line 284: Line 285:
 </sxh> </sxh>
  
-And that's that. Success. The only thing left to do, in order to close the circle again, was to share this knowledge, so that the next person looking for ways to scrape data from web pages with python can copy these examples, adapt them according to the new use case and fail and learn and come up with new ideas as well. Hopefully in even less time. And it also made it pretty obvious that the UCSSPM code has to be refactored again, so that it can be included as a python lib in order to get rid of the system call and all the input/output piping :)+And that's that. Success. The only thing left to do, in order to close the circle again, was to share this knowledge, so that the next person looking for ways to scrape data from web pages with python can copy these examples, adapt them according to the new use case and fail and learn and come up with new ideas as well. Hopefully in even less time. And it also made it pretty obvious that the [[lab:ucsspm|UCSSPM]] code has to be refactored again, so that it can be included as a python lib in order to get rid of the system call and all the input/output piping :) 
 + 
 +You can see the results of this robot's actions in the **[[https://apollo.open-resource.org/flight-control/vfcc/|Virtual Flight Control Center (VFCC)]]**
  
 And of course it goes without saying that this also serves to show pretty well how important learning computer languages will become. We cannot create a army of slaves to do our bidding (for that is what all these machines/computers/systems like smartphones, IoT devices, automatons really are) if we don't know how to command them. Our current technological state is only possible because we already give an essential part of our workload to machines.  And of course it goes without saying that this also serves to show pretty well how important learning computer languages will become. We cannot create a army of slaves to do our bidding (for that is what all these machines/computers/systems like smartphones, IoT devices, automatons really are) if we don't know how to command them. Our current technological state is only possible because we already give an essential part of our workload to machines.