Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Play Compliance: Precise location supposedly sent off device #3466

Closed
westnordost opened this issue Oct 28, 2021 · 6 comments
Closed

Comments

@westnordost
Copy link
Member

Google Play introduced a new data safety policy. For app developers, this is basically a form they need to fill out and declare what user data exactly is shared with a third party (sent off the device).

This information will then be shown on the app page on Google Play.

The data types as documented here need to be declared.

I filled out the form and declared the following:

  • Email address (but only processed ephemerally) - i.e. for OSM account registration
  • Personal identifiers - i.e. OSM user name and oauth token

After filling out the form, I got this message:

image

TLDR: Google Play claims that StreetComplete sends off the precise user's location somewhere which was apparently automatically determined by some AI. Does anyone have any idea what could cause this AI to think that?

@westnordost westnordost added the help wanted help by contributors is appreciated; might be a good first contribution for first-timers label Oct 28, 2021
@westnordost
Copy link
Member Author

westnordost commented Oct 28, 2021

So let's see, what does the app connect to at all?:

Not user-related

  • GET banned_versions.txt from my server
  • download map data from OSM
  • download map tiles from Jawg
  • download oneway-data from my server
  • photos uploaded to my server. These may include a precise location when taken, but any metadata is scrubbed before uploading

User-related:

  • register, login user on OSM
  • upload map changes (w/ user OAuth token)
  • download user statistics from my server
  • upload notes (w/ user OAuth token)

Now, one can argue that the precise/coarse location of the user can be determined by

  • his current contribution history (especially if he has auto-sync on)
  • the map tiles downloaded automatically around his location

But I kind of doubt that the Google Play policy bot would detect this because the former does not send the user's location but only the location of the things he answers quests for and the latter are just tile numbers. This doesn't need to be his location, he could be just scrolling the map.

So, any idea where the app might elsewhere send off the user's location?

@smichel17
Copy link
Member

I kind of doubt that the Google Play policy bot would detect this because the former does not send the user's location but only the location of the things he answers quests for and the latter are just tile numbers

I think you are over-estimating how smart the policy bot is. After all, how might it work? I can only think of:

  1. A. Code analysis: check if a LocationManager callback ever triggers a network request
    B. Or some heuristic to identify suspicious code
  2. Black box: Run the app in an emulator. Give it some location data. Watch for network requests that include the location.

This doesn't need to be his location, he could be just scrolling the map.

Keep in mind that they're probably looking to find bad actors who are looking to skirt the Play Store policies, and that it's fairly easy to obfuscate what data you're exfiltrating. So, I wouldn't be surprised at all if the tool casts an overly broad net (like 1A— "ever triggers", without checking what data it contains).

I'd guess that the culprit is QuestAutoSyncer. It receives a sufficiently precise location update, and then triggers some network requests. Note that it only says "data sent off device", nothing about who the data is supposedly sent to.

@mnalis
Copy link
Member

mnalis commented Oct 29, 2021

I have no idea how Google detects it (hey, their AI might be smart enough to read and interpret privacy policy lingo 😱 ) , but I would say that the location does need to be declared, as it strongly correlates with location of the user (current or previous location - both are privacy sensitive information). Sure, in some rare cases map/quests location might be completely uncorrelated with user location, but:

  • this happens only rarely, and (for quests) only if user actively lied to the app when asked "if they verified data in person"' so for vast majority of users it would reveal where they have been (or even are currently)
  • SC does set source=survey in changesets which indicates that user was actually physically present at that location
  • SC privacy policy does state "Any changes you make (and their date and location) are attributed to your OSM user account and publicly visible on the openstreetmap.org website. Because StreetComplete should only be used for on-site survey, this reveals where and when you have used the app"

Also, the fact that the location is transmited indirectly (via OSM integer element ids which have lat/lon linked to them, instead of direct floating point numbers) is likely irrelevant (or even worse, seen as attempt to obfuscated data exfiltration).

Thus, I believe that the fact that user location is being sent outside of device needs to be declared in that Data safety form too.

@matkoniecz
Copy link
Member

matkoniecz commented Oct 29, 2021

Overall I think that it is reasonable to describe SC as revealing specific user location.

Does anyone have any idea what could cause this AI to think that?

Downloading map tiles at specific location? Maybe static analysis run also at Tangram library code included in the app?

Google pattern matching could also detect GPS track recording, despite that this is not send anywhere.

Precise location | User or device physical location within an area less than three square kilometres, such as location provided by Android’s ACCESS_FINE_LOCATION permission.

I would say that app by requesting specific map tiles is sending data that can be used to reconstruct user location, and Jawg would be likely able to reconstruct my location to a high accuracy.

Also, as @smichel17 notices quest download also fetches data mostly at user location - and maybe on OSM API side one would be able to distinguish automatic and user-triggered queries.

Pattern matching (buzzworded into AI) is likely unable to detect this, but by making edits user is also sending very precise location - that was my point of #3208

One does not need to be a genius analyst to reconstruct where I walked yesterday (with looking at specific elements it would be even worse and more accurate)

screen02

@HolgerJeromin
Copy link
Contributor

I think you are over-estimating how smart the policy bot is.

Perhaps this is only:

  • emulator starts app.
  • Simulates location, watch generated traffic (initial map view)
  • Simulates a small location change, watch generated traffic (map view needs to download new tiles at least sometimes)

This is interpreted as: small Location updates are send to someone.
It would hit every online map application. But we have no evidence that this does not.

@westnordost
Copy link
Member Author

Right, I think this is how it may work @HolgerJeromin

After all, a bad-actor could just encrypt the precise location to circumvent any detection.

So I added that the precise location is transmitted since it somehow is, de-facto, not on a technical level and resubmitted the form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants