Performance improvements #111

PavelCibulka · 2015-04-09T08:58:55Z

I've examined, why user agent parsing is slow. Here are some tips:

This could be done just with HashMap<String, Robot>. Note no regexp here.
AbstractUserAgentStringParser.examineAsBrowser()
for (final Robot robot : data.getRobots()) {
if (robot.getUserAgentString().equals(builder.getUserAgentString())) {

Lazy OS detection. OS is not always needed.
Lazy Device detection. Same here. Device is not always needed.

Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea:
We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example:
EnumTest1: User agent starts with string "Mozilla"
If this return false, don't test any rexep that start with /^Mozilla

EnumTest2: User agent starts with string "M"
If this return true, don't test any regex starting with /^ but not starting with /^M

There are 631 <browser_reg>, 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.

There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example:
/mozilla._AppleWebKit._NetFrontLifeBrowser/([0-9.]+)/si
requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser
test: if ( hashmap.containsAll( requiredWords ) )
This would need probable new field for required words in uasdata.

Regards, Pavel

arouel · 2015-04-11T08:43:14Z

@PavelCibulka sounds good. Would you do a Pull Request that prototypes your proposed changes?

arouel added the enhancement label Apr 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #111

Performance improvements #111

PavelCibulka commented Apr 9, 2015

arouel commented Apr 11, 2015

Performance improvements #111

Performance improvements #111

Comments

PavelCibulka commented Apr 9, 2015

arouel commented Apr 11, 2015