The Anonymization Process

The anonymization process is quite similar to what computers and other network nodes do when they send and receive packets and frames. TraceWrangler has a parsing engine that dissects the frame from Ethernet up using "parsers" (Ethernet being the only layer 2 protocol at the moment that a parser exists for). Each protocol or layer is processed by a specific parser that understands that protocol/layer and can extract information from it. It is important to know that parsers may not understand all aspects of a protocol; for example exotic TCP options are not parsed and thus the information of these options are not accessible. So basically, TraceWrangler will take a frame apart layer by layer, stopping only when there is nothing more to parse or when it doesn't have a parser for the next protocol. Right now, DHCPv4 is the only protocol that it can handle above layer 4, as you can tell from the list of supported protocols. When there is further data after the last layer that TraceWrangler could parse, it will be kept as payload in a byte buffer.

Other sanitization tools

The biggest difference between TraceWrangler and other trace file sanitization tools is that everything it does is always focused on keeping the result as true to the original as possible, in a way that you can still perform network analysis (and network forensics, hopefully) on it. E.g. it will not randomly replace an IP address like the Google DNS ("8.8.8.8") by blindly "rolling the dice", which may end up in something like "127.0.0.1" - which is never seen on a real network and would make any kind of analysis of the packets impossible. It will also try to keep the frame sizes as close to the original as possible, and it will not mess around with the timings. At all. If it does, it's a bug.

Parsers

Parsers are the modules that dissect the frame content. I could have called them dissectors (like Wireshark does), but since they are a completely different code base I decided to call them parsers to avoid confusion. Wireshark is the king of dissectors, and I'll never get even close to what those can do :-)

After parsing the layers, TraceWrangler will reassemble the frame from top to button, going the other way than it was parsed. This is necessary because some layers depend on information of higher layers. E.g. TCP needs to calculate the CRC including the TCP payload. Also, this is the only way that layers can be left out (like VLAN tags), and to perform Defensive Transformation. That means that nothing that TraceWrangler did not understand while parsing a layer will be written to the new frame (unless you configured it to). E.g. in case of an exotic TCP option it will not make it into the sanitized frame because TraceWrangler can't be sure if it contains sensitive data.

Assemblers

The modules TraceWrangler uses to create the new, sanitized layers are called "assemblers". To be able to sanitize a protocol or a layer the according parser and assembler must exist. Right now there are some protocols where I have parsers for, but no assemblers yet, e.g. DNS. So until I find the time to code the according assembler, DNS cannot be sanitized.