-
Notifications
You must be signed in to change notification settings - Fork 17
dClass vs grep
Is dClass faster than grep?
Yes and no, it depends on the grep parameters and optimizations used.
While dClass is not optimized for parsing files, it still can outperform grep.
Consider the following example:
dClass (2670 patterns with attributes)
$ time ./dclass_client -l ../dtrees/openddr.dtree /tmp/uas.txt > /dev/null
real 0m0.086s
user 0m0.056s
sys 0m0.028s
grep (3 patterns without attribtues)
$ time grep -iE "iphone|ipad|blackberry" /tmp/uas.txt > /dev/null
real 0m0.112s
user 0m0.100s
sys 0m0.008s
(Note that this example is done on a free tier EC2 instance running Ubuntu 12.04.2 LTS)
So what is going on in this example? Both binaries are searching /tmp/uas.txt. This file contains approx 8000 user agent strings. Each string is around 100 bytes in length. In this example, dClass is matching this file against the DeviceMap DDR. This DDR contains 2670 device patterns. For each match, dClass is returning the associated pattern attributes. Grep is matching the same file against only 3 patterns. For each match, grep is only printing the line. So dClass is matching against several orders of magnitude more patterns (2667 patterns to be exact) with a custom response for each pattern.
While dClass is heavily optimized as a pattern matching library, it is not optimized for command line usage. dClass loads and indexes its dtree pattern configuration from plain text on startup. This is taking 22ms or 25% of its total runtime. Also, dClass uses plain file IO (stdio fopen() and fgets()) to read and parse the input file.
Does dClass look at every byte of input? No. In this example, dClass only looks at approx 25% of the input data. This is a feature of dClass. Because dClass has all of its patterns in an index, it walks the pattern index and the input data in lockstep. This allows it to immediately identify and skip dead patterns. This is also helped by the fact that the DeviceMap DDR is a concise pattern set (its patterns are not verbose).