Normalizing IPv6 Addresses
One of the annoyances with IPv6 addresses is that they may be abbreviated. Leading "0"s may be omitted, and groups of all ":0000:" may be replaced with "::". The key annoyance is the word "may". Some logs (for example iptables) will not abbreviate, others, like for example nginx or apache, will abbreviate, making correlating logs more difficultly.
Lately, I started using a little perl script to "normalize" my IPv6 addresses in logs. The script will insert all the missing "0"s making it easier to find a specific IP address. The script I am using:
#!/usr/bin/perl
use strict;
while (<> ) {
my $line=$_;
if ( $line=~/[0-9a-f:]+/ ) {
my $old=$&;
my $new=fillv6($old);
$line=~ s/$old/$new/;
}
print $line;
}
sub fillv6 {
my $in=shift;
$in =~ s/^:/0000:/;
my @parts=split(/:/,$in);
my $partn=scalar @parts;
if ( $partn < 7 ) {
my $x= ':0000' x (9-$partn);
$in =~ s/::/$x:/;
$in =~ s/:://g;
@parts=split(/:/,$in);
}
while ( my $part=each(@parts) ) {
$parts[$part] = sprintf("%04s",$parts[$part]);
}
return join(':',@parts);
}
What I could use is a bit more diverse IPv6 logs to see if it covers all possible cases. The script is right now in a "works for me" state, so let me know if it works for you too.
------
Johannes B. Ullrich, Ph.D.
SANS Technology Institute
Twitter
Keywords: ipv6
3 comment(s)
Comments
/* Given an IP address, determine whether the IP is an IPv6 address and normal-
ize its compression if that's the case. Since IPs are stored as strings, two
address strings might technically be the same IPs but would not be counted
as equivalent by MySQL because the strings do not match. This function
should help cut down on potential duplicate addresses getting into the
database, as well as finding matches during queries. The alphabetic charac-
ters are also normalized to lowercase for the same reason. If the input
string is not a valid IPv6 address, the string is passed back out
unaltered, making this method safe to call on any input. */
function compress_ipv6($ip) {
if (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6)) {
/* Get rid of leading zeros, first in the first section, then in the
subsequent sections following the colons. Note that we also force
all the alphabetic characters to lowercase, so we won't have miss-
matches with mixed case letters. */
$ip = preg_replace('/^0+/', '', strtolower($ip));
$ip = preg_replace('/\:0+/', ':', $ip);
/* The above might give us groups of three or more
colons, which should be compressed to just two
colons: */
$ip = preg_replace('/\:{3,}/', '::', $ip);
}
return $ip;
}
I currently maintain a block list that includes both IPv4 and IPv6 addresses, all of which were caught attempting to spam or hack my various sites. You can export the current IPv6 address list if you want for test data. There aren't log entries per se, but you could use the "Simple Text List" export option which prints out just a raw list of addresses, one per line, that should still work with your posted script. Just make sure to select IPv6 from the drop-down under the explanatory text.
https://www.gpf-comics.com/dnsbl/export.php
Anonymous
Mar 21st 2014
1 decade ago
len=$(echo "$@" | sed 's/\:/\n/g' | grep . | wc -l)
if [ $len -eq 1 ]
then
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 2 ]
then
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 3 ]
then
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 4 ]
then
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 5 ]
then
echo "$@" | sed 's/\:\:/:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 6 ]
then
echo "$@" | sed 's/\:\:/:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 7 -o $len -eq 8 ]
then
echo "$@" | sed 's/\:\:/:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
fi
Anonymous
Mar 21st 2014
1 decade ago
I prefer to have logging systems dump to a database that stores the addresses in the binary/integer equivalent, that way formatting is not an issue when comparing. Of course, not all software has that option, and even when they do, they sometimes don't have a backup cache while the database is unreachable.
Anonymous
Mar 21st 2014
1 decade ago