March 18, 2012

slicehosts: Extract host-based traffic out of pcap dumps

Filed under: Blog — krkhan @ 2:56 pm

During the course of my work on botnet security we have had to deal with mammoth traffic traces captured at a local ISP. While analyzing the traffic we needed to extract traffic for some certain hosts out of large pcap files. An obvious solution would be to run tshark once for each host, filtering the traffic for that particular IP and writing it to a separate pcap file. However with the number of hosts approaching thousands and the pcap traces approaching terabytes in size tshark didn’t really fit the bill.

Initially I thought of writing a splitter in Python but my colleague’s aversion for using Python on large network traces coupled with lack of maintenance of libpcap bindings resulted in me going for C/libpcap directly. The new C-based slicer is available at our GitHub respository. It needs glib to compile though, as I needed a hash table implementation for maintaining the list of hosts that need to be sliced. The Makefile in the repository should take care of compiling with the appropriate flags.

Onto the performance, the speed of slicing is only throttled by libpcap‘s own read/write throughput as most of the remaining work is done in constant time. It took only 71 minutes (or 1.1 hours) to slice 1019 hosts out of a 180 GB pcap file on 2.5 GHz CPU. In simpler words, it’s lightning fast.

Right now the script does its job well enough. If someone needs to package it I’ll prefer removing the glib dependency in favor of perhaps glibc‘s own hash table implementation (search.h). In any case, I hope it proves helpful for other people playing with large pcap files.

Tags: , , , , , , , , ,


  1. Love this code :-))

    By the way, i couldn’t make it the usual make way. So i compiled it by hand like this:
    gcc slicehosts.c `pkg-config gtk+-2.0 –cflags –libs` -lpcap

    Maybe i should do a post on how doing this simple task in Python gave me headache, despite using two different modules (Pypcap and Dpkt).

    Comment by Sheharbano — April 9, 2012 @ 9:30 pm

  2. Kami bhai, I might start blogging again, and you told me to let you know if I wanted a new domain. Don’t like the name I have right now either. Haha. If you can do something, email or comment on my blog.

    Comment by Abdullah Tariq — May 19, 2012 @ 2:30 am

  3. Thanks for the script, kamran bhai.

    Comment by zaafar ahmed — December 1, 2012 @ 12:30 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.