Scripting Web Categorization
When you are dealing with a huge amount of data, it can be very useful to enhance them by adding more valuable content. Example:
- Geolocalization for IP addresses
- Get an IP address DShield score
- Lookup domain names in list of malicious domains
- ...
When you are processing many URLs during a security incident investigation or while extracting IOC's from a malware sample or logs, it can also be very interesting to categorize them. The process of categorization helps to tag an URL with a label like the classic "Adult Content", "Government", "Forums", etc. Many commercial solutions offer this feature. It can be very powerful to configure your firewall to deny access to non-business categories. But, integrated in closed solutions, it's not easy to re-use them to benefit of this information in your own scripts. For years, Bluecoat has a product called "K9" that helps to protect kids surfing the web. It's free, you just can get a license key and install the tool or... use the online API! I had to categorize a bunch of URLs , so I decided to take some time to write a few lines of Python to automate this task.
My script webcat.py fetches the defined categories at regular interval (every two hours) and perform a lookup for each URL passed as argument:
$ ./webcat.py isc.sans.org isc.sans.org,Education
Multiple URLs can be passed on the same command line or the script can be fed via STDIN if you use "-" as parameter:
$ ./webcat.py isc.sans.org blog.rootshell.be isc.sans.edu,Education blog.rootshell.be,Technology/Internet $ cat suspicious-urls.tmp | ./webcat.py - getmooresuccess.com,Business/Economy weddingme.net,Business/Economy riverbird.usa.cc,Malicious Outbound Data/Botnets 1ntershipping.co,Malicious Outbound Data/Botnets secureemail.bz,Malicious Sources/Malnets vsreviewsa.com,Malicious Sources/Malnets felceconserve.com,Malicious Outbound Data/Botnets flashsync.cf,Uncategorized cy-m0ld.com,Malicious Outbound Data/Botnets berettitdint.ru,Malicious Outbound Data/Botnets vehanmace.ru,Malicious Outbound Data/Botnets redderbest.gq,Uncategorized googlemails.ga,Uncategorized msportf1.com,Sports/Recreation www.vai-t.com,Malicious Sources/Malnets duotthenaning.ru,Malicious Sources/Malnets duotthenaning.ru,Malicious Sources/Malnets littrecdintoft.ru,Malicious Sources/Malnets vsreviewsa.com,Malicious Sources/Malnets doncglobal.com,Malicious Outbound Data/Botnets
The API returns an hexadecimal code corresponding to the web category. That's why the script fetches them at regular interval and store them in a local file:
$ ./webcat.py -h usage: webcat.py [-h] [-f CACHEFILE] [-F] [URL [URL ...]] Categorize URL using BlueCoat K9 positional arguments: URL the URL(s) to check. Format: fqdn[:port] optional arguments: -h, --help show this help message and exit -f CACHEFILE, --file CACHEFILE Categories local cache file (default: /var/tmp/categories.txt) -F, --force force a fetch of categories
Before using the script, you have to register to get your K9 license, add it to the script (line 30).
Note: I'm not aware of any rate-limit in place while querying the API. During my investigations, I was never blocked.
Xavier Mertens
ISC Handler - Freelance Security Consultant
PGP Key
Comments
Anonymous
Dec 3rd 2022
9 months ago
Anonymous
Dec 3rd 2022
9 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
Anonymous
Dec 26th 2022
9 months ago
Anonymous
Dec 26th 2022
9 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
9 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
9 months ago
Anonymous
Dec 26th 2022
9 months ago
https://defineprogramming.com/
Dec 26th 2022
9 months ago
distribute malware. Even if the URL listed on the ad shows a legitimate website, subsequent ad traffic can easily lead to a fake page. Different types of malware are distributed in this manner. I've seen IcedID (Bokbot), Gozi/ISFB, and various information stealers distributed through fake software websites that were provided through Google ad traffic. I submitted malicious files from this example to VirusTotal and found a low rate of detection, with some files not showing as malware at all. Additionally, domains associated with this infection frequently change. That might make it hard to detect.
https://clickercounter.org/
https://defineprogramming.com/
Dec 26th 2022
9 months ago
rthrth
Jan 2nd 2023
8 months ago