This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2014/11/25 18:36] – [The python solution] chrono | mission:log:2014:11:24:using-python-lxml-request-as-simple-scrape-robot-for-metrics-from-webpages [2015/07/08 08:07] – [In the beginning there was the copy] chrono | ||
---|---|---|---|
Line 22: | Line 22: | ||
===== In the beginning there was the copy ===== | ===== In the beginning there was the copy ===== | ||
- | Even if it appears unique and original to us, there always was some other inspiration/ | + | Even if it appears unique and original to us, there always was some other inspiration/ |
===== The Problem ===== | ===== The Problem ===== | ||
Line 97: | Line 97: | ||
Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, | Infrequently upstream data changed and introduced some incomprehensible white space changes as a consequence and sometimes just delivered 999.9 values. Pain to maintain. And since most relevant values came as floats there was no other solution than to use bc for floating point math & comparisons, | ||
- | And finally, the data structure and shipping method to influxdb is more than questionable, | + | And finally, the data structure and shipping method to influxdb is more than questionable, |
===== The python solution ===== | ===== The python solution ===== | ||
Line 285: | Line 285: | ||
</ | </ | ||
- | And that's that. Success. The only thing left to do, in order to close the circle again, was to share this knowledge, so that the next person looking for ways to scrape data from web pages with python can copy these examples, adapt them according to the new use case and fail and learn and come up with new ideas as well. Hopefully in even less time. And it also made it pretty obvious that the UCSSPM code has to be refactored again, so that it can be included as a python lib in order to get rid of the system call and all the input/ | + | And that's that. Success. The only thing left to do, in order to close the circle again, was to share this knowledge, so that the next person looking for ways to scrape data from web pages with python can copy these examples, adapt them according to the new use case and fail and learn and come up with new ideas as well. Hopefully in even less time. And it also made it pretty obvious that the [[lab: |
+ | |||
+ | You can see the results of this robot' | ||
And of course it goes without saying that this also serves to show pretty well how important learning computer languages will become. We cannot create a army of slaves to do our bidding (for that is what all these machines/ | And of course it goes without saying that this also serves to show pretty well how important learning computer languages will become. We cannot create a army of slaves to do our bidding (for that is what all these machines/ |