You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.
I've examined, why user agent parsing is slow. Here are some tips:
This could be done just with HashMap<String, Robot>. Note no regexp here.
AbstractUserAgentStringParser.examineAsBrowser()
for (final Robot robot : data.getRobots()) {
if (robot.getUserAgentString().equals(builder.getUserAgentString())) {
Lazy OS detection. OS is not always needed.
Lazy Device detection. Same here. Device is not always needed.
Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea:
We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example:
EnumTest1: User agent starts with string "Mozilla"
If this return false, don't test any rexep that start with /^Mozilla
EnumTest2: User agent starts with string "M"
If this return true, don't test any regex starting with /^ but not starting with /^M
There are 631 <browser_reg>, 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.
There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example:
/mozilla._AppleWebKit._NetFrontLifeBrowser/([0-9.]+)/si
requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser
test: if ( hashmap.containsAll( requiredWords ) )
This would need probable new field for required words in uasdata.
Regards, Pavel
The text was updated successfully, but these errors were encountered:
I've examined, why user agent parsing is slow. Here are some tips:
This could be done just with HashMap<String, Robot>. Note no regexp here.
AbstractUserAgentStringParser.examineAsBrowser()
for (final Robot robot : data.getRobots()) {
if (robot.getUserAgentString().equals(builder.getUserAgentString())) {
Lazy OS detection. OS is not always needed.
Lazy Device detection. Same here. Device is not always needed.
Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea:
We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example:
EnumTest1: User agent starts with string "Mozilla"
If this return false, don't test any rexep that start with /^Mozilla
EnumTest2: User agent starts with string "M"
If this return true, don't test any regex starting with /^ but not starting with /^M
There are 631 <browser_reg>, 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.
There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example:
/mozilla._AppleWebKit._NetFrontLifeBrowser/([0-9.]+)/si
requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser
test: if ( hashmap.containsAll( requiredWords ) )
This would need probable new field for required words in uasdata.
Regards, Pavel
The text was updated successfully, but these errors were encountered: