How to best start the new year? How about a new tool: what-is-new.py.
It's something I have to do often, and I'm sure you do too: you make lists at regular intervals (for example every week), and you want to know what is new, e.g. what haven't you seen before. This is what my tool what-is-new.py helps you with: you give it text files, and it reports every line it hasn't seen before (it keeps a database).
For example, I use this tool to review the User Agent Strings of the HTTP(S) requests to my web servers. Every week I produce a list of User Agent Strings found in my web server logs, and feed this to what-is-new: this gives me a list of User Agent Strings not seen before.
Detail: the problem is that User Agent Strings contain version numbers, and that makes for a long list of "new" User Agent Strings every week. I solve this problem by using a custom, canonical representation of the User Agent String: I only keep the letters.
For example, User Agent String "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30 CyanogenMod/10.2/grouper" becomes "Mozilla X Linux x AppleWebKit KHTML like Gecko Version Safari CyanogenMod grouper".
By using this representation, I have about 50 new User Agent Strings every week.
Here are some interesting ones found in the last months:
And apparently, someone visited my site from a Cray supercomputer :-)
"Mozilla/0.3 (Cray UNICOS) Lynx/18.104.22.168"
Some visitors cherish their privacy explicitly:
"Mozilla/5.0 (have a guess) recent but undisclosed"
And finally, since cryptocurrencies have become so popular:
This is from a web site that checks if web sites use your browser to mine crypto currencies:
Best wishes from the Internet Storm Center!
Jan 1st 2018
5 months ago