Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbserver differs from online/browser lookup? #30

Open
serpiente opened this issue Aug 31, 2016 · 36 comments
Open

sbserver differs from online/browser lookup? #30

serpiente opened this issue Aug 31, 2016 · 36 comments
Assignees

Comments

@serpiente
Copy link

I have noticed that the sbserver returns an empty response for some urls while Chrome browser and online lookup tool ( https://www.google.com/transparencyreport/safebrowsing/diagnostic/ ) does return a correct danger response. I have looked and the server is updating its list. Anyone know what is happening?

A sample url for which this happens.

http://www.precision-mouldings.com/.ls/.https:/.www.paypal.co.uk/uk.web.apps.mpp.home.sign.in.country.a.GB.locale.a.en.GB-6546refhs8ehgf8-890b7fefut9546954543ds867hgf9-1egey3ds4820435t546ggc-u4ydstgu5438gjksssGB/plmgeo.php

@dsnet
Copy link
Contributor

dsnet commented Aug 31, 2016

Thanks for the bug report. We'll look into shortly.

@gliwka
Copy link

gliwka commented Nov 23, 2016

Anything new on this?

@Heavenwalker
Copy link

Bumping this.... Anything new ?

@asieira
Copy link

asieira commented May 2, 2017

@asieira
Copy link

asieira commented May 2, 2017

Just wanted to confirm that sblookup also reports this URL as safe:

| => echo "https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D" | sblookup -apikey '<redacted>'
safebrowsing: 2017/05/02 16:18:26 database.go:106: no database file specified
safebrowsing: 2017/05/02 16:18:30 database.go:336: database is now healthy
safebrowsing: 2017/05/02 16:18:30 safebrowser.go:504: Next update in 30m11s
Safe URL: https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D

Plus, this is the output of the test as indicated in the README file:

| => go test github.com/google/safebrowsing -v -run TestSafeBrowser -apikey '<redacted>'
=== RUN   TestSafeBrowser
--- PASS: TestSafeBrowser (0.78s)
PASS
ok  	github.com/google/safebrowsing	0.933s

Finally, I can confirm that there no problem with my API key since I can successfully query this URL using https://github.com/afilipovich/gglsbl on the same machine:

| => python
Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gglsbl import SafeBrowsingList
>>> sbl = SafeBrowsingList('<redacted>')
>>> sbl.update_hash_prefix_cache()
>>> sbl.lookup_url('https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc')
[SOCIAL_ENGINEERING/OSX/URL]

@asieira
Copy link

asieira commented May 3, 2017

With further testing I noticed that when I specified a database file for sbserver and sblookup, the created file is only 6 megabytes. In comparison, the gglsbl Python module creates a local sqlite database that is over 1.4 gigs in size.

So maybe what's happening here is that the go client is silently failing to download and/or save the hash database locally.

@asieira
Copy link

asieira commented May 15, 2017

Just wanted to share that the lack of feedback on this issue has led me to file this repository under "abandonware".

I am using https://github.com/afilipovich/gglsbl instead. It works great, is fast and the author is very very responsive to reported issues. Would recommend that @serpiente, @gliwka and @Heavenwalker take a look at this alternative too if they haven't found another already.

@gliwka
Copy link

gliwka commented May 15, 2017

@asieira Thanks for the hint! Unfortunately I need the REST api, altough it should be possible to combine gglsbl with flask to get there.

@dsnet @colonelxc Any progress on this? Sbserver isn't working correctly at this point and the worst part is that it's failing silently! This could leave applications depending on it and their Users vulnerable!

@gliwka
Copy link

gliwka commented May 15, 2017

/cc: @alexwoz

@asieira
Copy link

asieira commented May 15, 2017

I have actually built a Flask + gunicorn dockerized REST server on top of gglsbl and was planning on open sourcing it. Would that help?

@gliwka
Copy link

gliwka commented May 15, 2017

@asieira Sure, that would be amazing :-)

@dsnet
Copy link
Contributor

dsnet commented May 15, 2017

I do not work in this team anymore, but I can assure you that this project is not abandonware.

@alexwoz
Copy link
Collaborator

alexwoz commented May 18, 2017

Hi everyone,

Thank you for all of your contributions to this repo and your patience while we investigated -- based on your reports/comments we've been able to clarify the issue.

As part of our API, some clients receive a different list of threats due to data sharing restrictions. This is why you may see discrepancies between the Go client and Safe Browsing-enabled browsers like Chrome. Upon investigating the bugs filed in this repo, we realized that there was a different problem afoot - a bug on the server-side - that will be patched in the coming weeks.

Thanks,
Alex

@hbakhtiyor
Copy link

@asieira any updates?

@asieira
Copy link

asieira commented Jun 5, 2017

Finally published the repo I had talked about before, you can find it at https://github.com/mlsecproject/gglsbl-rest if you want to try it out. Any comments and suggestions are most welcome.

@gliwka
Copy link

gliwka commented Sep 11, 2017

@alexwoz @colonelxc
Any update on this issue? It's been a year, since this issue has been created.

@alexwoz
Copy link
Collaborator

alexwoz commented Oct 30, 2017

@gliwka This issue should be resolved. Please update this bug if you continue to experience any inconsistencies.

@wjgilmore
Copy link

I'm running into the same issues described by other users who commented earlier in this issue thread. Notably, if I use https://transparencyreport.google.com/safe-browsing/search to search for a known malware URL such as 999fitness.com I'm correctly told "Some pages on this site are unsafe".

Yet when I use Postman/cURL/sblookup to classify 999fitness.com I receive an "empty" 200 response, indicating there is nothing wrong with the URL.

When I the Google API Explorer (https://developers.google.com/apis-explorer/?hl=en_US#p/safebrowsing/v4/safebrowsing.threatMatches.find) to classify the same URL, it just "spins" endlessly. As of right now the explorer has been running for 23 minutes without actually returning a response.

Reviewing the Google Cloud Platform API monitor, I'm told everything is just fine, and every one of my queries returned a 200.

I was going to post a question on the Google Safe Browsing API forum (https://groups.google.com/forum/#!forum/google-safe-browsing-api) but ironically it is full of spam.

Not complaining; just trying to figure out what exactly is going on with this service.

Jason

@colonelxc
Copy link
Contributor

@wjgilmore

  1. I see the same problem as you with the API Explorer. I have created an internal bug with the applicable team.

  2. Regarding the transparency report, as compared to the safebrowsing lookup, there are some slight differences in utility and function. It is best explained with an example.

URL API lookup Transparency Report
foo.com Safe Some pages unsafe
foo.com/bad/ Malware This page unsafe/Malware
foo.com/bad/baz/ Malware This page unsafe/Malware
foo.com/good/ Safe Safe

Essentially the API is focused on answering the question, "Do we think it is safe to go to this site right now?". For foo.com, it is. The malware was on a different (more specific) path (or subdomain). This often happens when a site has been hacked. The attacker will add their own content and redirect users from other sites to the specific path/subdomain. This sometimes has no impact on the rightful content of the site, and so we try to minimize the scope of what is blocked to only the paths that will actually try to infect you.

The transparency report does API-style checks, but it also checks if there are more specific paths/subdomains that are known to be bad. So for the second and third URLs, it is responding the same as the API does. For the first URL, it knows that there are more specific paths that are known to be bad. So it says some pages are unsafe, even though foo.com is fine to visit on its own.

Does that help?

@alexwoz
Copy link
Collaborator

alexwoz commented Nov 1, 2017

Hi @wjgilmore,

Thanks for your message, and apologies for the confusion. I can see why the Transparency Report wording and Safe Browsing API responses appear to contradict one another. The Transparency Report communicates the extent to which the provided site is bad; in this case, the site is only "partially" bad ("Some pages on this site..."). The Safe Browsing API, however, will only return a verdict when the provided URL is definitively bad; i.e. we have determined that all URLs (including the root domain) are not unsafe for a user to access.

Hopefully that makes sense!

Alex

@wjgilmore
Copy link

Hi @colonelxc and @alexwoz Thank you both for these detailed explanations. To summarize:

  • The Transparency Report is useful for determining whether a URL (and it's associated siblings/children/parents/grandparents) is "safe".
  • The Safe Browsing API is useful for determining whether a specific URL is safe.

Is my understanding correct? Our project attempts to determine whether any URLs found in an incoming text message contain potentially dangerous links (phishing, malware, etc). We were under the impression the Safe Browsing API would offer an ideal solution. However it is certainly possible the URL found in a text message would be "safe" yet ultimately lead the unsuspecting user to a subsequently dangerous endpoint. So it sounds like we're going to have to look for an alternative solution.

Thanks again, I really appreciate your time.

Jason

@alexwoz
Copy link
Collaborator

alexwoz commented Nov 2, 2017

Hey @wjgilmore,

As @colonelxc mentioned, the Safe Browsing API answers the question of whether the provided URL is safe for a user to access at this time. Your use case sounds very well-suited for this check. The Safe Browsing lists are intended to contain URL expressions from various points of the navigation, including those that users receive links to (e.g. through an SMS). If the initial URL redirects a user to an unsafe endpoint, then there's a good chance that the initial URL and those of subsequent navigations are all on a Safe Browsing list.

Hopefully that addresses some of your concerns.

Alex

@summera
Copy link

summera commented Apr 5, 2018

@alexwoz @colonelxc I'm finding differences between the Safe Browsing API (what's returned from running the sbserver) and what's on https://transparencyreport.google.com as well.

The transparency report is saying that the url is unsafe but sbserver is returning an empty response.

screen shot 2018-04-04 at 8 52 07 pm

@summera
Copy link

summera commented Apr 5, 2018

Found another:
screen shot 2018-04-04 at 9 03 29 pm

Is it possible that results from the API are more up to date than https://transparencyreport.google.com or are they using the same api?

@afilipovich
Copy link

Thanks @summera

Yeah, I saw such discrepancy in the past but I cannot tell which source is more up to date as I am not affiliated with Google.
Transparency report states "This info was last updated on Apr 1, 2018."

@summera
Copy link

summera commented Apr 5, 2018

@afilipovich Thanks for the response! Very weird. So have you or anyone else been able to determine how accurate this is in a real world production environment? It seems to me, based on what's been reported in this issue and the google group and with my own simple tests, that there are a lot of false negatives being returned from the API. Since phishing and malware urls are constantly changing it's challenging to determine whether this is really going to catch much and how accurate it will be.

@alexwoz
Copy link
Collaborator

alexwoz commented Apr 5, 2018

Due to data sharing restrictions, the set of URLs accessible via the Safe Browsing API, Transparency Report, and web browser integrations may differ. It is our goal to ensure these discrepancies are as rare as possible, but it's not guaranteed.

@asieira
Copy link

asieira commented Apr 5, 2018

I think any detection technology will have false negatives, no solution can claim to catch everything. So that is something we should already expect.

In particular, it seems to me the Google Safebrowing API must be removing malicious entries from their database either through an aging process or by detection of when they are no longer active. In any case, I will take a solution that does that to minimize false positives over a very noisy one every time.

@afilipovich
Copy link

You can try to compare results from gglsbl with Google Safe Browsing Lookup API.
https://developers.google.com/safe-browsing/v4/lookup-api

It does not use local cache so it has performance limitations, but it excludes possible issues with gglsbl client code.

@pravee9
Copy link

pravee9 commented Jun 4, 2018

which database is specified in the database.go file line number 110 ?

@imfht
Copy link

imfht commented Jun 28, 2018

same issue at http://58.194.172.18/Thesis/, any update?

@digitalsurgeon
Copy link

@alexwoz can google please update the gsb developers page and mention this fact there.

As part of our API, some clients receive a different list of threats due to data sharing restrictions. This is why you may see discrepancies between the Go client and Safe Browsing-enabled browsers like Chrome.

@spendyala
Copy link

Still confused! does the sbserver is in sync with gglsbl, finding a hard time to trust either of these two.

Did anyone get the response of full hashes after submitting hash prefixes?

@colonelxc
Copy link
Contributor

gglsbl and this client both get the same lists. There was a bug in the past that caused them to get different lists. It is still true that browser clients (chrome, safari, firefox, etc) can receive slightly different threat lists. As alexwoz pointed out, this is due to data sharing restrictions we have with a subset of our data. The Safe Browsing team works hard to improve our detection capabilities to get good coverage for all clients.

If you're looking for URLs to test, try some of the top ones in http://testsafebrowsing.appspot.com/. If you have issues with a different client implementation, you can start a new thread, or post on https://groups.google.com/forum/#!forum/google-safe-browsing-api

@Larsundso
Copy link

Bump
https://transparencyreport.google.com/safe-browsing/search?url=http:%2F%2Fkeycom.pro%2F

@lkc0626
Copy link

lkc0626 commented Nov 26, 2023

Has this issue been fixed?
I still see discrepancies between the online status-checking sites, actual website warning messages when trying to access malicious URLs and API results.
I checked about 100 phishing URLs and out of those, only 20 were considered unsafe from online checking sites, out of those 20, only 10 were considered as a threat.
Some of them are blocked by safe browsing when trying to access the URL however online checking site shows that there is no available data with that URL.
Is it possible to have discrepancies between the three states? which one should I trust and make it a baseline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests