-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for scanning APK files #3517
Conversation
Co-authored-by: Richard Gomez <[email protected]>
Some other common XML files that can probably be safely excluded:
https://github.com/smlbiobot/cr/blob/master/apk/2.0.1/com.supercell.clashroyale-2.0.1.decoded/unknown/third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml |
// lightweight version that will search for secrets in the most common files that contain them. | ||
// And run in a fraction of the time (ex: 15 seconds vs. 5 minutes) | ||
|
||
// ToDo: Scan nested APKs (aka XAPK files). ATM the archive.go file will skip over them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for .xapk
files here is how MobSF a popular security scanning tool handling it.
its unzipping the archive -> reading the manifest.json
file -> extracting the apk with base
id and only scanning that apk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.apkm
is another common format (at least for ApkMirror).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bugbaba I appreciate the idea re: .xapk
files. IMO the cleanest way to resolve the lack of .xapk
scanning is to address it in the archive.go
file. Basically, unzip .xapk
like any other zip, and then call back out to the HandleFile
function in handlers.go
, so that any unique file that requires a special handler can be dealt with. And maybe it's not that exact approach, but something along those lines.
I'll put some effort into that in a different PR.
pkg/handlers/apk.go
Outdated
} | ||
|
||
// processDexFile decodes the dex file and returns the relevant instructions | ||
func processDexFile(ctx logContext.Context, rdr io.ReadCloser) (io.Reader, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this outside of the apk specific case like if we find .dex
file directly outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea. Might be something for a separate PR, just so that this can get out the door. But I like where you're going with this. I think we should probably have several additional file handlers to handle specific types like dex
, pyc
, etc.
After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the The api key was in the
But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails. |
@bugbaba is there anyway you could share that apk file? |
@joeleonjr Couldn't find you in the discord server, Please ping |
@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all |
Note to reviewers: Please look at how we handle an error caused by calling |
It is now able to detect the key for that specific apk file.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to provide my initial comments before I dive into the actual apk handler logic.
We should add a feature flag for this called We can turn it on by default in OSS, and when it's imported into Enterprise it will be off by default unless we override it with a feature flag. Joe, let me know if you want to sync on how this works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve taken an initial pass at the review and covered a good portion of the implementation. Let me know if it’s easier to go over any of this synchronously. I’ll finish the remainder of the review tomorrow morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, forgot to hit submit review last night for a couple more comments. My mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, @joeleonjr! 🥳 This looks like it was quite the project to get working. I’m excited to see it in action—and to handle the inevitable user question about verifying findings we found in their .apk file. 🤣 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, but i'll defer to @ahrav for final approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks again for getting this implemented. 🙇
usage of trufflehog to scanning apk? |
Yes, that's correct. Feel free to check out our blog for additional details. |
trufflehog shoul improve regex on findings api keys and others secrets like i test on apk which contains api keys but it failed to get them |
Description:
APK (Android Package Kit) files are used by Android to install and distribute applications. These files are essentially
zip
archives with a specific directory structure for Android apps. We currently scan them as normalzip
archives; however, most of the locations within an APK that secrets would live aren't properly decompiled/decoded during our regular zip file scanning. This PR adds special support to decompile APKs and then search them for secrets.The most robust approach for searching APKs for secrets is to use a decompiler like jadx and then run TruffleHog against the output. The downsides are twofold: (1) this would require TruffleHog users to install
jadx
, (2) decompiling takes a while (up to several minutes) and a lot of memory. Instead of going this route, we rely on two golang libraries (dextk and apkparser) that balance functionality and performance to get us 80% of the way there without any external dependencies.Through this PR, TruffleHog users can now scan for secrets in APK files. Here's what is specifically scanned:
XML
Android's
xml
files need to be decoded in order to properly scan them because the places were secrets might live are often stored as reference IDs instead of plain text strings. This PR runs an Android XML decoder that uses theresources.arsc
file as context to automatically resolve most resource reference IDs into their corresponding value.AndroidManifest.xml
This approach includes the important
AndroidManifest.xml
file.Strings.xml
One of the
xml
files that is most likely to contain a secret is calledstrings.xml
. This file proved a challenge b/c during the APK compilation process, the file is transformed in a way that when we rununzip file.apk
, we can't just see thestrings.xml
file. A tool likejadx
would easily decompile it, but since we're not using it, we had to find a different way to get at that data.We found that the
resources.arsc
file houses the key/value pairs that might contain secrets from thestrings.xml
file in the resources ID range:0x7f000000-0x7fffffff
. So we iterate through all resources of typestring
in that range, and search for secrets there. This seems to work for most scenarios, but admittedly we need greater testing.Dex
A DEX file contains compiled code (it’s where the Java or Kotlin source code is transformed into bytecode). APK files generally include at least one DEX file, usually named classes.dex, but if the app is large or modular, there might be multiple DEX files—like
classes2.dex
,classes3.dex
.We run a golang-based Dex decompiler that helps us identify multiple relevant instruction types from within the bytecode. The one most likely to contain a secret is
const-string
. The rest are for providing context to our potential secret values.The challenge is the keyword we need to clue in our scanning engine is often located too far from the secret given that our decompilation method is lightweight and imperfect. As a result, we implement keyword scanning against the decompiled code. If a keyword that we support is found, we then append that keyword to every value (read
const-string
instruction value) and toss it in for scanning. This ensures we don't lack appropriate coverage and is similar in implementation to our work on Postman.Note: Since we can't get all of the scanner keywords via the engine pkg (import issues) like we did for Postman, we create a separate file named
apk_keywords.go
. In an ideal world,defaults.go
is ripped out of engine and moved intopkg/detectors
, so that we don't need to have the same data listed in two places.Everything else
All other files are just read-in like normal and passed to a chunk for scanning. We likely won't see many secrets from these files, but it's worth a review. Examples of these types of files are:
.json
,.properties
, etc.Checklist:
make test-community
)?make lint
this requires golangci-lint)?