sbserver differs from online/browser lookup? #30

serpiente · 2016-08-31T15:28:42Z

I have noticed that the sbserver returns an empty response for some urls while Chrome browser and online lookup tool ( https://www.google.com/transparencyreport/safebrowsing/diagnostic/ ) does return a correct danger response. I have looked and the server is updating its list. Anyone know what is happening?

A sample url for which this happens.

http://www.precision-mouldings.com/.ls/.https:/.www.paypal.co.uk/uk.web.apps.mpp.home.sign.in.country.a.GB.locale.a.en.GB-6546refhs8ehgf8-890b7fefut9546954543ds867hgf9-1egey3ds4820435t546ggc-u4ydstgu5438gjksssGB/plmgeo.php

dsnet · 2016-08-31T17:40:54Z

Thanks for the bug report. We'll look into shortly.

gliwka · 2016-11-23T17:16:30Z

Anything new on this?

Heavenwalker · 2017-01-26T20:42:07Z

Bumping this.... Anything new ?

asieira · 2017-05-02T17:35:05Z

I am having this exact same issue with the following URL:

https://www.google.com/transparencyreport/safebrowsing/diagnostic/#url=https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc

asieira · 2017-05-02T19:23:54Z

Just wanted to confirm that sblookup also reports this URL as safe:

| => echo "https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D" | sblookup -apikey '<redacted>'
safebrowsing: 2017/05/02 16:18:26 database.go:106: no database file specified
safebrowsing: 2017/05/02 16:18:30 database.go:336: database is now healthy
safebrowsing: 2017/05/02 16:18:30 safebrowser.go:504: Next update in 30m11s
Safe URL: https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D

Plus, this is the output of the test as indicated in the README file:

| => go test github.com/google/safebrowsing -v -run TestSafeBrowser -apikey '<redacted>'
=== RUN   TestSafeBrowser
--- PASS: TestSafeBrowser (0.78s)
PASS
ok  	github.com/google/safebrowsing	0.933s

Finally, I can confirm that there no problem with my API key since I can successfully query this URL using https://github.com/afilipovich/gglsbl on the same machine:

| => python
Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gglsbl import SafeBrowsingList
>>> sbl = SafeBrowsingList('<redacted>')
>>> sbl.update_hash_prefix_cache()
>>> sbl.lookup_url('https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc')
[SOCIAL_ENGINEERING/OSX/URL]

asieira · 2017-05-03T14:13:15Z

With further testing I noticed that when I specified a database file for sbserver and sblookup, the created file is only 6 megabytes. In comparison, the gglsbl Python module creates a local sqlite database that is over 1.4 gigs in size.

So maybe what's happening here is that the go client is silently failing to download and/or save the hash database locally.

asieira · 2017-05-15T19:57:48Z

Just wanted to share that the lack of feedback on this issue has led me to file this repository under "abandonware".

I am using https://github.com/afilipovich/gglsbl instead. It works great, is fast and the author is very very responsive to reported issues. Would recommend that @serpiente, @gliwka and @Heavenwalker take a look at this alternative too if they haven't found another already.

gliwka · 2017-05-15T21:35:05Z

@asieira Thanks for the hint! Unfortunately I need the REST api, altough it should be possible to combine gglsbl with flask to get there.

@dsnet @colonelxc Any progress on this? Sbserver isn't working correctly at this point and the worst part is that it's failing silently! This could leave applications depending on it and their Users vulnerable!

gliwka · 2017-05-15T21:49:12Z

/cc: @alexwoz

asieira · 2017-05-15T21:49:34Z

I have actually built a Flask + gunicorn dockerized REST server on top of gglsbl and was planning on open sourcing it. Would that help?

gliwka · 2017-05-15T21:52:43Z

@asieira Sure, that would be amazing :-)

dsnet · 2017-05-15T22:06:13Z

I do not work in this team anymore, but I can assure you that this project is not abandonware.

alexwoz · 2017-05-18T18:53:45Z

Hi everyone,

Thank you for all of your contributions to this repo and your patience while we investigated -- based on your reports/comments we've been able to clarify the issue.

As part of our API, some clients receive a different list of threats due to data sharing restrictions. This is why you may see discrepancies between the Go client and Safe Browsing-enabled browsers like Chrome. Upon investigating the bugs filed in this repo, we realized that there was a different problem afoot - a bug on the server-side - that will be patched in the coming weeks.

Thanks,
Alex

hbakhtiyor · 2017-05-21T01:44:19Z

@asieira any updates?

asieira · 2017-06-05T05:16:36Z

Finally published the repo I had talked about before, you can find it at https://github.com/mlsecproject/gglsbl-rest if you want to try it out. Any comments and suggestions are most welcome.

gliwka · 2017-09-11T12:45:42Z

@alexwoz @colonelxc
Any update on this issue? It's been a year, since this issue has been created.

alexwoz · 2017-10-30T03:42:06Z

@gliwka This issue should be resolved. Please update this bug if you continue to experience any inconsistencies.

wjgilmore · 2017-11-01T20:42:06Z

I'm running into the same issues described by other users who commented earlier in this issue thread. Notably, if I use https://transparencyreport.google.com/safe-browsing/search to search for a known malware URL such as 999fitness.com I'm correctly told "Some pages on this site are unsafe".

Yet when I use Postman/cURL/sblookup to classify 999fitness.com I receive an "empty" 200 response, indicating there is nothing wrong with the URL.

When I the Google API Explorer (https://developers.google.com/apis-explorer/?hl=en_US#p/safebrowsing/v4/safebrowsing.threatMatches.find) to classify the same URL, it just "spins" endlessly. As of right now the explorer has been running for 23 minutes without actually returning a response.

Reviewing the Google Cloud Platform API monitor, I'm told everything is just fine, and every one of my queries returned a 200.

I was going to post a question on the Google Safe Browsing API forum (https://groups.google.com/forum/#!forum/google-safe-browsing-api) but ironically it is full of spam.

Not complaining; just trying to figure out what exactly is going on with this service.

Jason

colonelxc · 2017-11-01T21:32:00Z

@wjgilmore

I see the same problem as you with the API Explorer. I have created an internal bug with the applicable team.
Regarding the transparency report, as compared to the safebrowsing lookup, there are some slight differences in utility and function. It is best explained with an example.

URL	API lookup	Transparency Report
foo.com	Safe	Some pages unsafe
foo.com/bad/	Malware	This page unsafe/Malware
foo.com/bad/baz/	Malware	This page unsafe/Malware
foo.com/good/	Safe	Safe

Essentially the API is focused on answering the question, "Do we think it is safe to go to this site right now?". For foo.com, it is. The malware was on a different (more specific) path (or subdomain). This often happens when a site has been hacked. The attacker will add their own content and redirect users from other sites to the specific path/subdomain. This sometimes has no impact on the rightful content of the site, and so we try to minimize the scope of what is blocked to only the paths that will actually try to infect you.

The transparency report does API-style checks, but it also checks if there are more specific paths/subdomains that are known to be bad. So for the second and third URLs, it is responding the same as the API does. For the first URL, it knows that there are more specific paths that are known to be bad. So it says some pages are unsafe, even though foo.com is fine to visit on its own.

Does that help?

alexwoz · 2017-11-01T21:37:58Z

Hi @wjgilmore,

Thanks for your message, and apologies for the confusion. I can see why the Transparency Report wording and Safe Browsing API responses appear to contradict one another. The Transparency Report communicates the extent to which the provided site is bad; in this case, the site is only "partially" bad ("Some pages on this site..."). The Safe Browsing API, however, will only return a verdict when the provided URL is definitively bad; i.e. we have determined that all URLs (including the root domain) are not unsafe for a user to access.

Hopefully that makes sense!

Alex

wjgilmore · 2017-11-02T13:01:30Z

Hi @colonelxc and @alexwoz Thank you both for these detailed explanations. To summarize:

The Transparency Report is useful for determining whether a URL (and it's associated siblings/children/parents/grandparents) is "safe".
The Safe Browsing API is useful for determining whether a specific URL is safe.

Is my understanding correct? Our project attempts to determine whether any URLs found in an incoming text message contain potentially dangerous links (phishing, malware, etc). We were under the impression the Safe Browsing API would offer an ideal solution. However it is certainly possible the URL found in a text message would be "safe" yet ultimately lead the unsuspecting user to a subsequently dangerous endpoint. So it sounds like we're going to have to look for an alternative solution.

Thanks again, I really appreciate your time.

Jason

alexwoz · 2017-11-02T19:11:19Z

Hey @wjgilmore,

As @colonelxc mentioned, the Safe Browsing API answers the question of whether the provided URL is safe for a user to access at this time. Your use case sounds very well-suited for this check. The Safe Browsing lists are intended to contain URL expressions from various points of the navigation, including those that users receive links to (e.g. through an SMS). If the initial URL redirects a user to an unsafe endpoint, then there's a good chance that the initial URL and those of subsequent navigations are all on a Safe Browsing list.

Hopefully that addresses some of your concerns.

Alex

summera · 2018-04-05T02:53:34Z

@alexwoz @colonelxc I'm finding differences between the Safe Browsing API (what's returned from running the sbserver) and what's on https://transparencyreport.google.com as well.

The transparency report is saying that the url is unsafe but sbserver is returning an empty response.

summera · 2018-04-05T03:05:34Z

Found another:

Is it possible that results from the API are more up to date than https://transparencyreport.google.com or are they using the same api?

afilipovich · 2018-04-05T16:11:40Z

Thanks @summera

Yeah, I saw such discrepancy in the past but I cannot tell which source is more up to date as I am not affiliated with Google.
Transparency report states "This info was last updated on Apr 1, 2018."

summera · 2018-04-05T17:23:06Z

@afilipovich Thanks for the response! Very weird. So have you or anyone else been able to determine how accurate this is in a real world production environment? It seems to me, based on what's been reported in this issue and the google group and with my own simple tests, that there are a lot of false negatives being returned from the API. Since phishing and malware urls are constantly changing it's challenging to determine whether this is really going to catch much and how accurate it will be.

alexwoz · 2018-04-05T17:26:16Z

Due to data sharing restrictions, the set of URLs accessible via the Safe Browsing API, Transparency Report, and web browser integrations may differ. It is our goal to ensure these discrepancies are as rare as possible, but it's not guaranteed.

asieira · 2018-04-05T17:26:58Z

I think any detection technology will have false negatives, no solution can claim to catch everything. So that is something we should already expect.

In particular, it seems to me the Google Safebrowing API must be removing malicious entries from their database either through an aging process or by detection of when they are no longer active. In any case, I will take a solution that does that to minimize false positives over a very noisy one every time.

afilipovich · 2018-04-05T17:47:03Z

You can try to compare results from gglsbl with Google Safe Browsing Lookup API.
https://developers.google.com/safe-browsing/v4/lookup-api

It does not use local cache so it has performance limitations, but it excludes possible issues with gglsbl client code.

pravee9 · 2018-06-04T07:22:52Z

which database is specified in the database.go file line number 110 ?

imfht · 2018-06-28T08:56:32Z

same issue at http://58.194.172.18/Thesis/, any update?

digitalsurgeon · 2019-02-11T15:24:05Z

@alexwoz can google please update the gsb developers page and mention this fact there.

As part of our API, some clients receive a different list of threats due to data sharing restrictions. This is why you may see discrepancies between the Go client and Safe Browsing-enabled browsers like Chrome.

spendyala · 2019-08-12T19:16:09Z

Still confused! does the sbserver is in sync with gglsbl, finding a hard time to trust either of these two.

Did anyone get the response of full hashes after submitting hash prefixes?

colonelxc · 2019-08-14T18:59:26Z

gglsbl and this client both get the same lists. There was a bug in the past that caused them to get different lists. It is still true that browser clients (chrome, safari, firefox, etc) can receive slightly different threat lists. As alexwoz pointed out, this is due to data sharing restrictions we have with a subset of our data. The Safe Browsing team works hard to improve our detection capabilities to get good coverage for all clients.

If you're looking for URLs to test, try some of the top ones in http://testsafebrowsing.appspot.com/. If you have issues with a different client implementation, you can start a new thread, or post on https://groups.google.com/forum/#!forum/google-safe-browsing-api

Larsundso · 2023-08-16T14:11:56Z

Bump
https://transparencyreport.google.com/safe-browsing/search?url=http:%2F%2Fkeycom.pro%2F

lkc0626 · 2023-11-26T04:34:30Z

Has this issue been fixed?
I still see discrepancies between the online status-checking sites, actual website warning messages when trying to access malicious URLs and API results.
I checked about 100 phishing URLs and out of those, only 20 were considered unsafe from online checking sites, out of those 20, only 10 were considered as a threat.
Some of them are blocked by safe browsing when trying to access the URL however online checking site shows that there is no available data with that URL.
Is it possible to have discrepancies between the three states? which one should I trust and make it a baseline?

dsnet assigned colonelxc Jan 26, 2017

yinziyang mentioned this issue Jun 2, 2017

safebrowsing api query result is not consistent with chrome browser? #60

Closed

summera mentioned this issue Apr 5, 2018

ECS Configuration Settings mlsecproject/gglsbl-rest#17

Closed

sezuan mentioned this issue Oct 29, 2018

Some URLs thare are known to firefox/chrome, but are not detected by the sblookup #86

Closed

alexliu0809 mentioned this issue Nov 11, 2019

Why is the API marking Chinese cdns as unsafe while online/browser lookup aren't ? #103

Closed

sbserver differs from online/browser lookup? #30

sbserver differs from online/browser lookup? #30

Comments

serpiente commented Aug 31, 2016

dsnet commented Aug 31, 2016

gliwka commented Nov 23, 2016

Heavenwalker commented Jan 26, 2017

asieira commented May 2, 2017

asieira commented May 2, 2017

asieira commented May 3, 2017

asieira commented May 15, 2017

gliwka commented May 15, 2017

gliwka commented May 15, 2017

asieira commented May 15, 2017

gliwka commented May 15, 2017

dsnet commented May 15, 2017

alexwoz commented May 18, 2017

hbakhtiyor commented May 21, 2017

asieira commented Jun 5, 2017

gliwka commented Sep 11, 2017

alexwoz commented Oct 30, 2017

wjgilmore commented Nov 1, 2017

colonelxc commented Nov 1, 2017

alexwoz commented Nov 1, 2017

wjgilmore commented Nov 2, 2017

alexwoz commented Nov 2, 2017

summera commented Apr 5, 2018

summera commented Apr 5, 2018

afilipovich commented Apr 5, 2018

summera commented Apr 5, 2018 • edited Loading

alexwoz commented Apr 5, 2018

asieira commented Apr 5, 2018

afilipovich commented Apr 5, 2018

pravee9 commented Jun 4, 2018

imfht commented Jun 28, 2018

digitalsurgeon commented Feb 11, 2019

spendyala commented Aug 12, 2019

colonelxc commented Aug 14, 2019

Larsundso commented Aug 16, 2023

lkc0626 commented Nov 26, 2023

summera commented Apr 5, 2018 •

edited

Loading