Capturing Honeypot Data Beyond the Logs
By default, DShield Honeypots [1] collect firewall, web and cowrie (telnet/ssh) [2] data and log them on the local filesystem. A subset of this data is reported to the SANS Internet Storm Center (ISC) where it can be used by anyone [3]. A common question that comes up from new users is whether there is any benefit to collecting PCAP data from the honeypots if the active services are already being logged. One example I often give of a useful benefit of having PCAPs is HTTP POST data. This data is not currently captured within the web honeypot logs, but can be seen within the PCAP data.
Figure 1: Log data from web honeypot for POST request. 
Figure 2: PCAP data with POST information not found in previous web honeypot log file. 
This is just one example from the active honeypot services collecting and storing log data. What about services that are not open and waiting for connections? I used a python script to extract any data that was being streamed to the honeypot using UDP and was in a "Raw" layer . I used a python script to pull out any data from my PCAP collections and I included the following information in my SQLite database:
- Honeypot - Location of my honeypot, which may be "AWS", "GCP", etc
- Source File - PCAP file the data came from, allows me to also understand the timeframe of the capture
- Source IP
- Destination Port
- Raw Data - Raw Data from UDP packet
- Service Name - label of the UDP port from ISC API data [4], this was enriched progamatically afterward, focusing on the most commonly seen ports
#sample of script extracting data from a list of files
for honeypot, files in files.items():
    for each_file in files:
        logging.info(f"Starting processing file: '{each_file}'")
        for pkt in PcapReader(each_file):
            if pkt.haslayer(IP):
                if pkt[IP].proto == 17:
                    try:
                        logging.debug(f"UDP Layer Found from IP {pkt[IP].src} for port {str(pkt[IP].dport)}")
                    except Exception as e:
                        logging.error(f"{e}")
                        logging.error(f"Issues accessing destination port for data from IP {pkt[IP].src}")
                        logging.error(f"UDP Layer Found from IP {pkt[IP].src} for unknown destionation port")
                    if pkt.haslayer(Raw):
                        logging.debug(f"Raw Layer found from IP {pkt[IP].src}")
                        try:
                            dst_ports.append(pkt[IP].dport)
                        except Exception as e:
                            logging.error(f"{e}")
                            logging.error(f"Issues accessing destination port for data from IP {pkt[IP].src}")
                            logging.error(f"Filling in blank destionation port for data from IP {pkt[IP].src}")
                            dst_ports.append("")                            
                        honeypot_names.append(honeypot)
                        filenames.append(each_file)
                        try:
                            src_ips.append(pkt[IP].src)
                        except Exception as e:
                            logging.error(f"{e}")
                            logging.error(f"Issues accessing source IP for data")          
                            src_ips.append("")  
                        try:                
                            raw_data.append(pkt[Raw].load)
                        except Exception as e:
                            logging.error(f"{e}")
                            logging.error(f"Issues accessing raw data from IP {pkt[IP].src} for port {str(pkt[IP].dport)}")
                            raw_data.append("")  
#function to gather port data from ISC API
#http://isc.sans.edu/api/port/80
@lru_cache
def isc_portinfo(port, email):
    url = f"https://isc.sans.edu/api/port/{port}"
    headers = {
        'User-Agent': f'Request from {email}',
    }   
    response = requests.get(url, headers=headers)    
    while response.status_code != 200:
        delay = 5
        if response.status_code == 429:
            logging.error(f"Request limit reached: {response.text}")
            try:
                delay_received = int(re.findall(r'.*Try again after (.*) seconds', response.text)[0])
                delay = int(delay_received)
                logging.error(f"Delaying for an additional {delay} seconds")
            except:
                logging.error(f"Some issue occured with the delay we recevied: {delay_received}")
        time.sleep(delay)
        response = requests.get(url, headers=headers) 
    if response.status_code == 200:
        xml =  response.text
        logging.debug(f"XML Data: {xml}")
        root = ET.fromstring(xml)
        portdata = {}
        portdata[port] = {}
        try:
            portdata[port]["number"] = root.findall("number")[0].text
            for idx2, portinfo in enumerate(root.findall("data")):
                try:
                    portdata[port]["data_date"] = portinfo.findall("date")[0].text
                except:
                    logging.error(f"No value for 'date' found in 'data' for port '{port}'")
                
                try:
                    portdata[port]["data_records"] = portinfo.findall("records")[0].text
                except:
                    logging.error(f"No value for 'records' found in 'data' for port '{port}'")
                try:    
                    portdata[port]["data_targets"] = portinfo.findall("targets")[0].text
                except:
                    logging.error(f"No value for 'targets' found in 'data' for port '{port}'")
                try:
                    portdata[port]["data_sources"] = portinfo.findall("sources")[0].text
                except:
                    logging.error(f"No value for 'source' found in 'data' for port '{port}'")
                try:
                    portdata[port]["data_tcp"] = portinfo.findall("tcp")[0].text
                except:
                    logging.error(f"No value for 'tcp' found in 'data' for port '{port}'")
                try:
                    portdata[port]["data_udp"] = portinfo.findall("udp")[0].text
                except:
                    logging.error(f"No value for 'udp' found in 'data' for port '{port}'")
                try:
                    portdata[port]["data_datein"] = portinfo.findall("datein")[0].text
                except:
                    logging.error(f"No value for 'datein' found in 'data' for port '{port}'")
                try:
                    portdata[port]["data_portin"] = portinfo.findall("portin")[0].text
                except:
                    logging.error(f"No value for 'portin' found in 'data' for port '{port}'")
            for idx2, portinfo in enumerate(root.findall("services")):
                for idx3, portinfo2 in enumerate(portinfo.findall("udp")):
                    try:
                        portdata[port]["services_udp_service"] = portinfo2.findall("service")[0].text
                    except:
                        logging.error(f"No value for 'service' found in 'services\\udp' for port '{port}'")
                    try:
                        portdata[port]["services_udp_name"] = portinfo2.findall("name")[0].text
                    except:
                        logging.error(f"No value for 'name' found in 'services\\udp' for port '{port}'")     
                for idx3, portinfo2 in enumerate(portinfo.findall("tcp")):
                    try:
                        portdata[port]["services_tcp_service"] = portinfo2.findall("service")[0].text
                    except:
                        logging.error(f"No value for 'service' found in 'services\\tcp' for port '{port}'")      
                    try:       
                        portdata[port]["services_tcp_name"] = portinfo2.findall("name")[0].text
                    except:
                        logging.error(f"No value for 'name' found in 'services\\tcp' for port '{port}'")          
        except Exception as e:
            logging.error(f"{e}")
        return portdata
    
First, let's take a look at what this "raw data" is from an example PCAP. I looked for any sources that only had one result so that I could easily correlate the extracted data to the original PCAP.
Figure 3: SQLite extract of UDP data from an IP address with only one result. 
Figure 4: Data displayed in Wireshark from original PCAP. 
Within Wireshark, the "Protocol" is just listed as UDP, rather than something more specific, like "Half-Life Game Server" [5]. This may not always be the case, but we're already seeing some data sent to the honeypot that isn't available in the honeypot logs.
Common UDP Port Destinations
I figured we'd see some attempted communications on some ports more than others. My first search showed something unexpected.
Figure 5: Data showing that Dropbox LanSync Discovery as the most common port, which was unexpected. 
It turns out that my home honeypot had some additional broadcast traffic being allowed. I went ahead and filtered out sources on a local private network. Filtering that out showed large hits for port 3306.
Figure 6: MySQL port showing as the most common port attempted on the honeypot with raw UDP data. 
Most of these items were Simple Object Access Protocol (SOAP) envelopes [6], the most common one seen below.
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope">
    <SOAP-ENV:Body xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope">
        <SOAP-ENV:Fault xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope">
            <faultcode>SOAP-ENV:Client</faultcode>
            <faultstring>Validation constraint violation: tag name or namespace mismatch in element <:></faultstring>
        </SOAP-ENV:Fault>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Figure 7: A variety of XML data submitted to UDP 3306. 
Even from just one port, a lot of data can be seen from PCAPs. The second most common destination port seen was completely blank. A lot of the data also appears to be XML and SOAP related, but truncated.
" xmlns:ns16="http://www.onvif.org/ver10/events/wsdl/EventBinding" xmlns:tev="http://www.onvif.org/ver10/events/wsdl" xmlns:ns17="http://www.onvif.org/ver10/events/wsdl/SubscriptionManagerBinding" xmlns:ns18="http://www.onvif.org/ver10/events/wsdl/NotificationProducerBinding" xmlns:ns19="http://www.onvif.org/ver10/events/wsdl/NotificationConsumerBinding" xmlns:ns20="http://www.onvif.org/ver10/events/wsdl/PullPointBinding" xmlns:ns21="http://www.onvif.org/ver10/events/wsdl/CreatePullPointBinding" xmlns:ns22="http://www.onvif.org/ver10/events/wsdl/PausableSubscriptionManagerBinding" xmlns:wsnt="http://docs.oasis-open.org/wsn/b-2" xmlns:ns3="http://www.onvif.org/ver10/analyticsdevice/wsdl" xmlns:ns4="http://www.onvif.org/ver10/deviceIO/wsdl" xmlns:ns5="http://www.onvif.org/ver10/display/wsdl" xmlns:ns8="http://www.onvif.org/ver10/receiver/wsdl" xmlns:ns9="http://www.onvif.org/ver10/recording/wsdl" xmlns:tds="http://www.onvif.org/ver10/device/wsdl" xmlns:timg="http://www.onvif.org/ver20/imaging/wsdl" xmlns:tptz="http://www.onvif.org/ver20/ptz/wsdl" xmlns:trt="http://www.onvif.org/ver10/media/wsdl" xmlns:trt2="http://www.onvif.org/ver20/media/wsdl" xmlns:ter="http://www.onvif.org/ver10/error" xmlns:tns1="http://www.onvif.org/ver10/topics" xmlns:tnsn="http://www.eventextension.com/2011/event/topics"><SOAP-ENV:Body><SOAP-ENV:Fault><faultcode>SOAP-ENV:Client</faultcode><faultstring>Validation constraint violation: tag name or namespace mismatch in element 
Figure 8: Raw data from PCAPs with a blank destionation port. 
I once again tried to find a file that allowed me to look for one source IP address and the resulting data.
Figure 9: Data within IP fragment. 
/device_service</d:XAddrs><d:MetadataVersion>1</d:MetadataVersion></d:ProbeMatch></d:ProbeMatches></SOAP-ENV:Body></SOAP-ENV:Envelope>
The data was within a fragmented IP packet, but it also had an additional payload in UDP. Looking in the extracted data, both pieces of data were found, one without a destination port listed and the other listed as port 3306.
Figure 10: Data from the same source address, but with a destination port listed. 
<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope" xmlns:SOAP-ENC="http://www.w3.org/2003/05/soap-encoding" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:d="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:dn="http://www.onvif.org/ver10/network/wsdl"><SOAP-ENV:Header><wsa:MessageID>uuid:00000000-0000-0000-0000-000ffc521bc5</wsa:MessageID><wsa:RelatesTo></wsa:RelatesTo><wsa:To SOAP-ENV:mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action SOAP-ENV:mustUnderstand="true">http://schemas.xmlsoap.org/ws/2005/04/discovery/ProbeMatches</wsa:Action><d:AppSequence SOAP-ENV:mustUnderstand="true" MessageNumber="2" InstanceId="1289460835"></d:AppSequence></SOAP-ENV:Header><SOAP-ENV:Body><d:ProbeMatches><d:ProbeMatch><wsa:EndpointReference><wsa:Address>urn:uuid:00000000-0000-0000-0000-000ffc521bc5</wsa:Address></wsa:EndpointReference><d:Types>dn:NetworkVideoTransmitter</d:Types><d:Scopes>onvif://www.onvif.org/hardware/2MPIPCamera onvif://www.onvif.org/name/2MPIPCamera onvif://www.onvif.org/type/video_analytic onvif://www.onvif.org/type/audio_encoder onvif://www.onvif.org/location/country/taiwan onvif://www.onvif.org/Profile/Streaming onvif://www.onvif.org/type/video_encoder </d:Scopes><d:XAddrs>http://192.168.0.200/onvif
This is just scratching the surface. For those that are running a honeypot, but aren't collecting any kind of packet captures may want to consider it. There's much more information waiting for analysis.
[1] https://isc.sans.edu/honeypot.html
[2] https://github.com/cowrie/cowrie
[3] https://isc.sans.edu/data/threatfeed.html
[4] https://isc.sans.edu/api/#port
[5] https://isc.sans.edu/data/port/27015
[6] https://www.w3schools.com/xml/xml_soap.asp
--
Jesse La Grew
Handler
 
              
Comments