Normalizing IPv6 Addresses

Published: 2014-03-20
Last Updated: 2014-03-20 22:40:50 UTC
by Johannes Ullrich (Version: 1)
3 comment(s)

One of the annoyances with IPv6 addresses is that they may be abbreviated. Leading "0"s may be omitted, and groups of all ":0000:" may be replaced with "::". The key annoyance is the word "may". Some logs (for example iptables) will not abbreviate, others, like for example nginx or apache, will abbreviate, making correlating logs more difficultly.

Lately, I started using a little perl script to "normalize" my IPv6 addresses in logs. The script will insert all the missing "0"s making it easier to find a specific IP address. The script I am using:

use strict;
while (<> ) {
    my $line=$_;
    if ( $line=~/[0-9a-f:]+/ ) {
my $old=$&;
        my $new=fillv6($old);
$line=~ s/$old/$new/;
    print $line;
sub fillv6 {
    my $in=shift;
    $in =~ s/^:/0000:/;
    my @parts=split(/:/,$in);
    my $partn=scalar @parts;
    if ( $partn < 7 ) {
my $x= ':0000' x (9-$partn);
$in =~ s/::/$x:/;
$in =~ s/:://g;
    while ( my $part=each(@parts) ) {
$parts[$part] = sprintf("%04s",$parts[$part]);
    return join(':',@parts);
What I could use is a bit more diverse IPv6 logs to see if it covers all possible cases. The script is right now in a "works for me" state, so let me know if it works for you too.

Johannes B. Ullrich, Ph.D.
SANS Technology Institute

Keywords: ipv6
3 comment(s)


For what it's worth, I tend to do the opposite: I "compress" IPv6 addresses to make them all match. Most of my logs are already "compressed", and it saves at least a little bit of space in my database. Below is a snippet of PHP code I use to "normalize" the addresses, stripping out the extra zeros and ensuring everything is in lower case. For those unfamiliar with PHP, the filter_var() method is a built-in method in later versions of PHP that lets you identify certain common types. Essentially, if the input string looks like an IPv6 address, it is run through the compression algorithm; if the string does not look like an IPv6 address, it is returned unchanged.

/* Given an IP address, determine whether the IP is an IPv6 address and normal-
ize its compression if that's the case. Since IPs are stored as strings, two
address strings might technically be the same IPs but would not be counted
as equivalent by MySQL because the strings do not match. This function
should help cut down on potential duplicate addresses getting into the
database, as well as finding matches during queries. The alphabetic charac-
ters are also normalized to lowercase for the same reason. If the input
string is not a valid IPv6 address, the string is passed back out
unaltered, making this method safe to call on any input. */
function compress_ipv6($ip) {
if (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6)) {
/* Get rid of leading zeros, first in the first section, then in the
subsequent sections following the colons. Note that we also force
all the alphabetic characters to lowercase, so we won't have miss-
matches with mixed case letters. */
$ip = preg_replace('/^0+/', '', strtolower($ip));
$ip = preg_replace('/\:0+/', ':', $ip);
/* The above might give us groups of three or more
colons, which should be compressed to just two
colons: */
$ip = preg_replace('/\:{3,}/', '::', $ip);
return $ip;

I currently maintain a block list that includes both IPv4 and IPv6 addresses, all of which were caught attempting to spam or hack my various sites. You can export the current IPv6 address list if you want for test data. There aren't log entries per se, but you could use the "Simple Text List" export option which prints out just a raw list of addresses, one per line, that should still work with your posted script. Just make sure to select IPv6 from the drop-down under the explanatory text.
After several hours this is my bash equivalent.

len=$(echo "$@" | sed 's/\:/\n/g' | grep . | wc -l)

if [ $len -eq 1 ]
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 2 ]
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 3 ]
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 4 ]
echo "$@" | sed 's/\:\:/:0000:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 5 ]
echo "$@" | sed 's/\:\:/:0000:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 6 ]
echo "$@" | sed 's/\:\:/:0000:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
elif [ $len -eq 7 -o $len -eq 8 ]
echo "$@" | sed 's/\:\:/:0000:/g' | sed 's/\:/\n/g' | grep . | awk '{for(i=1;i<=4;i++) if (length($0) != 4) sub(/^/,"0");print}' | sed ':a;N;s/\n/:/g;ta'
How would you handle the ::ffff:0:0:0/96 range? e.g. ::ffff:0: I've noticed some ipv4 software logging like that in the past, even though they aren't listening on any ipv6 addresses.

I prefer to have logging systems dump to a database that stores the addresses in the binary/integer equivalent, that way formatting is not an issue when comparing. Of course, not all software has that option, and even when they do, they sometimes don't have a backup cache while the database is unreachable.

Diary Archives