User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2015/01/09 09:42]
chrono [The python solution]
mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2016/08/09 19:13] (current)
chrono Updated VFCC links
Line 22: Line 22:
 ===== In the beginning there was the copy ===== ===== In the beginning there was the copy =====
  
-Even if it appears unique and original to us, there always was some other inspiration/model to copy from. Most of what we do is based on other ideas and concepts laid out by other people before. And their ideas also evolved in the same matter. It's basically all about perception. I could present you the final python robot and say: "This is my awesome original work". And you might believe it, since it's slick, streamlined and very efficient. But that is just the current result. You wouldn't (and in most cases won't) see how crappy it began and how it evolved into its current form. But this is exactly what we're going to do today.+Even if it appears unique and original to us, there always was some other inspiration/model to copy from. Most of what we do is based on other ideas and concepts laid out by other people before. And their ideas also evolved in the same manner. It's basically all about perception. I could present you the final python robot and say: "This is my awesome original work". And you might believe it, since it's slick, streamlined and very efficient. But that is just the current result. You wouldn't (and in most cases won't) see how crappy it began and how it evolved into its current form. But this is exactly what we're going to do today.
  
 ===== The Problem ===== ===== The Problem =====
Line 30: Line 30:
 Unfortunately the data isn't accessible through an API or at least some JSON export of the raw data. Which meant I needed to devise a robot that would periodically scrape the data from that web page, extract all needed values and feed that data into the UCSSPM to calculate with real data for reference. Once it has done all that it has to push all usable raw data and the results of the UCSSPM prediction into an influxdb shard running on the stargazer so that the data can be stored, queried and (re)viewed live on the following VFCC dashboards: Unfortunately the data isn't accessible through an API or at least some JSON export of the raw data. Which meant I needed to devise a robot that would periodically scrape the data from that web page, extract all needed values and feed that data into the UCSSPM to calculate with real data for reference. Once it has done all that it has to push all usable raw data and the results of the UCSSPM prediction into an influxdb shard running on the stargazer so that the data can be stored, queried and (re)viewed live on the following VFCC dashboards:
  
-  * [[https://apollo.open-resource.org/flight-control/vfcc/#/dashboard/db/aquarius-external-environment|External Environment Data]] +  * [[https://apollo.open-resource.org/flight-control/vfcc/dashboard/db/aquarius-external-environment|External Environment Data]] 
-  * [[https://apollo.open-resource.org/flight-control/vfcc/#/dashboard/db/aquarius-solar-power|Aquarius Solar Power]] +  * [[https://apollo.open-resource.org/flight-control/vfcc/dashboard/db/aquarius-solar-power|Aquarius Solar Power]] 
-  * [[https://apollo.open-resource.org/flight-control/vfcc/#/dashboard/db/odyssey-solar-power|Odyssey Solar Power]]+  * [[https://apollo.open-resource.org/flight-control/vfcc/dashboard/db/odyssey-solar-power|Odyssey Solar Power]]
 ===== The bash solution ===== ===== The bash solution =====
  
Line 97: Line 97:
 Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, since bash can't do it.  Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, since bash can't do it. 
  
-And finally, the data structure and shipping method to influxdb is more than questionable, it would never scale. Each metric produces another new HTTP request creating a lot of wasteful overhead. But at the point of writing I simply didn'know enough to make it better. +And finally, the data structure and shipping method to influxdb is more than questionable, it would never scale. Each metric produces another new HTTP request creating a lot of wasteful overhead. But at the point of writing I simply didn'knew enough to make it better. 
  
 ===== The python solution ===== ===== The python solution =====