Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scanning APK files #3517

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Add support for scanning APK files #3517

wants to merge 2 commits into from

Conversation

joeleonjr
Copy link
Contributor

Description:

APK (Android Package Kit) files are used by Android to install and distribute applications. These files are essentially zip archives with a specific directory structure for Android apps. We currently scan them as normal zip archives; however, most of the locations within an APK that secrets would live aren't properly decompiled/decoded during our regular zip file scanning. This PR adds special support to decompile APKs and then search them for secrets.

The most robust approach for searching APKs for secrets is to use a decompiler like jadx and then run TruffleHog against the output. The downsides are twofold: (1) this would require TruffleHog users to install jadx, (2) decompiling takes a while (up to several minutes) and a lot of memory. Instead of going this route, we rely on two golang libraries (dextk and apkparser) that balance functionality and performance to get us 80% of the way there without any external dependencies.

Through this PR, TruffleHog users can now scan for secrets in APK files in the following locations: AndroidManifest.xml, strings.xml (kind of), *.xml (some), *.json (most?) and .dex files. Here's what is specifically scanned:

XML

Android'sxml files need to be decoded in order to properly scan them because the places were secrets might live are often stored as reference IDs instead of plain text strings. This PR runs an Android XML decoder that uses the resources.arsc file as context to automatically resolve most resource reference IDs into their corresponding value.

AndroidManifest.xml

This approach includes the important AndroidManifest.xml file.

Strings.xml

One of the xml files that is most likely to contain a secret is called strings.xml. This file proved a challenge b/c during the APK compilation process, the file is transformed in a way that when we run unzip file.apk, we can't just see the strings.xml file. A tool like jadx would easily decompile it, but since we're not using it, we had to find a different way to get at that data.

We found that the resources.arsc file houses the key/value pairs that might contain secrets from the strings.xml file in the resources ID range: 0x7f000000-0x7fffffff. So we iterate through all resources of type string in that range, and search for secrets there. This seems to work for most scenarios, but admittedly we need greater testing.

Dex

A DEX file contains compiled code (it’s where the Java or Kotlin source code is transformed into bytecode). APK files generally include at least one DEX file, usually named classes.dex, but if the app is large or modular, there might be multiple DEX files—like classes2.dex, classes3.dex.

We run a golang-based Dex decompiler that helps us identify two instruction types from within the bytecode: const-string and iput-object. These instructions often signal hardcoded string values or objects in the original Java or Kotlin source code.

JSON

This is straightforward. JSON files might contain something useful, so we scan them like normal. Nothing fancy going on here.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@CLAassistant
Copy link

CLAassistant commented Oct 28, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ joeleonjr
❌ Joe Leon


Joe Leon seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants