Read This If You Are Using a Script to Pull Data From This Site
I love it when people write tools to pull data from this site, and we try to accommodate automated tools like this with our API. but sometimes, scripts go bad and we keep having cases were scripts pull the same data several times a second. I would love to let the owner of the script know, but often this is hard.
To prevent some of these issues, I am going to enforce a new rule going forward: Your User-Agent has to include a contact for the script. I prefer a simple e-mail address. A URL will do if that is easier for you. The data will exclusively be used to contact you in case of a problem.
To enforce this, generic user agents will be blocked (like "Python-urllib/2.7", "Wget/1.12 (linux-gnu)", "curl/7.38.0"). I will start doing so with older pages that should no longer be used by automated scripts anyway (as they are not designed for automation like our API), and initially only block specific User Agents.
If you hit the page with a blocked User Agent, a "403" error will be returned (Forbidden) and a simple text message pointing to this post [1].
[1] https://tools.ietf.org/html/rfc7231#section-6.5.3
---
Johannes B. Ullrich, Ph.D., Dean of Research, SANS Technology Institute
STI|Twitter|
Network Monitoring and Threat Detection In-Depth | Singapore | Nov 18th - Nov 23rd 2024 |
Comments
Anonymous
May 11th 2017
7 years ago
Anonymous
May 11th 2017
7 years ago