Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New fuzz drivers for a series of integrated C projects #9523

Closed
wants to merge 24 commits into from
Closed

New fuzz drivers for a series of integrated C projects #9523

wants to merge 24 commits into from

Conversation

the-dawn-chorus
Copy link

Hi, we have recently created 320 fuzz drivers for 31 oss-fuzz C projects and would like to submit them for practical cluster fuzzing. These drivers target untested APIs within the projects, selected based on their names (an API is considered if it looks like parsing the input data). The drivers have passed our local fuzz testing for a duration of one minute, and we plan to further improve them in both quality and quantity.


Here lists the statistics of the fuzz drivers:

Finished projects (23/31):

  • project bluez : 7 fuzz drivers
  • project coturn : 37 fuzz drivers
  • project gdbm : 2 fuzz drivers
  • project gdk-pixbuf : 11 fuzz drivers
  • project gpac : 43 fuzz drivers
  • project inchi : 2 fuzz drivers
  • project kamailio : 25 fuzz drivers
  • project libdwarf : 5 fuzz drivers
  • project libjpeg-turbo : 6 fuzz drivers
  • project libpg_query : 3 fuzz drivers
  • project libssh : 1 fuzz drivers
  • project libucl : 22 fuzz drivers
  • project libwebsockets : 10 fuzz drivers
  • project lua : 2 fuzz drivers
  • project oniguruma : 2 fuzz drivers
  • project opusfile : 1 fuzz drivers
  • project ostree : 7 fuzz drivers
  • project spdk : 8 fuzz drivers
  • project selinux : 1 fuzz drivers
  • project pjsip : 17 fuzz drivers
  • project postfix : 2 fuzz drivers
  • project proftpd : 7 fuzz drivers
  • project utf8proc : 5 fuzz drivers

Ongoing projects (8/31) which require the fix of the build.sh in project

  • project bind9 : 32 fuzz drivers
  • project civetweb : 13 fuzz drivers
  • project fribidi : 1 fuzz drivers
  • project krb5 : 4 fuzz drivers
  • project pidgin : 3 fuzz drivers
  • project miniz : 28 fuzz drivers
  • project igraph : 6 fuzz drivers
  • project libbpf : 9 fuzz drivers (not the build issue, failed to execute git clone git://sourceware.org/git/elfutils.git, too slow)

@google-cla
Copy link

google-cla bot commented Jan 29, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@the-dawn-chorus
Copy link
Author

CLA signed

@the-dawn-chorus the-dawn-chorus changed the title Improving fuzz drivers of a series of supported C projects New fuzz drivers for a series of integrated C projects Jan 29, 2023
@evverx
Copy link
Contributor

evverx commented Jan 29, 2023

These drivers target untested APIs within the projects, selected based on their names (an API is considered if it looks like parsing the input data)

I wonder it those drivers were generated automatically?

project selinux : 1 fuzz drivers
project libbpf : 9 fuzz drivers

Those projects keep their fuzz targets upstream. Could you please send your fuzz drivers to https://github.com/libbpf/libbpf/ and https://lore.kernel.org/selinux/ accordingly to get them reviewed properly? cc @cgzones since the selinux project is involved.

The drivers have passed our local fuzz testing for a duration of one minute, and we plan to further improve them in both quality and quantity.

That's great but given that all the developers receive notifications from Monorail and also sometimes CVEs are automatically assigned based on OSS-Fuzz bug reports it would be great if the bar was a bit higher from the start.

@the-dawn-chorus
Copy link
Author

the-dawn-chorus commented Jan 29, 2023

Thanks for the feedback, we will refine the submission according to the suggestions.

I wonder it those drivers were generated automatically?

Yes, we are trying to automate the whole thing and it is still working in progress.

@evverx
Copy link
Contributor

evverx commented Jan 29, 2023

we are trying to automate the whole thing and it is still working in progress.

Looking at the new selinux fuzz targets I think it's far from being ready to be honest. sepol_policydb_from_image just calls policydb_read and policydb_read is already covered: https://storage.googleapis.com/oss-fuzz-coverage/selinux/reports/20230128/linux/src/selinux/libsepol/src/policydb.c.html#L4225. Looks like whatever generates those fuzz targets isn't aware of call trees and can't infer whether stuff is already covered or not. Anyway it would be great if before sending those auto-generated fuzz targets upstream they were at least verified manually.

@the-dawn-chorus
Copy link
Author

the-dawn-chorus commented Jan 29, 2023

Thanks for the notice, we just noticed that we forget to remove the existed_xxx_fuzzer.c, which indicates these APIs are already covered by existing drivers (fixed in the new commit f215cb2). These submitted fuzz drivers are manually selected and have done some checks (pass compile, cover new API, manually check the code, and fuzz one minute) before submission. Any more suggestion on the manual check part is appreciated!

Currently, these drivers are targeting the untested APIs, and covering more diverse usage is our next target.

@evverx
Copy link
Contributor

evverx commented Jan 29, 2023

Any more suggestion on the manual check part is appreciated!

I think https://fuzz-introspector.readthedocs.io/en/latest/ can come in handy here. It can show what's already covered and whether anything has been improved or not.

these drivers are targeting the untested APIs

It's true that sepol_policydb_from_image technically isn't covered but it just calls policydb_read and policydb_read is already covered by the selinux project. I'm not sure it makes much sense to start fuzzing sepol_policydb_from_image because it would just waste the resources allocated to the selinux project.

@the-dawn-chorus
Copy link
Author

Thanks, will try the fuzz-introspector and refine the commits.

@evverx
Copy link
Contributor

evverx commented Jan 29, 2023

FWIW FI kind of started supporting diffs recently: https://fuzz-introspector.readthedocs.io/en/latest/user-guides/comparing-introspector-reports.html and ossf/fuzz-introspector#734 so in theory it could be used to evaluate auto-generated fuzz targets automatically. Then again personally I think that before automating anything it would probably make sense to improve projects one by one manually (or semi-automatically) by looking at how they are integrated and try contacting projects in the process (to make sure they are fine with adding external contributors to CC for example).

@DavidKorczynski
Copy link
Collaborator

Exciting with the auto generation!

I don't think the PR is ready for various reasons.

Technically, it's not ready due to:

  1. The fuzzers seem to be relatively shallow (I assume you have some cov information, perhaps add this?) and a bit too spammy -- I went over a few of the kamailio fuzzers as I'm familiar with the project and found this was the case. I think the point made by @evverx is good about Fuzz Introspector -- I think there is a lot that can be used from that project to assist here and would be happy to help. Please reach out to me if you're interested in that (my email). Some filtering we can make easily is e.g. if the accumulated complexity of the fuzzer is beneath a certain threshold we should discard.
  2. The build.sh follow a bit non-standard OSS-Fuzz manners (e.g. use clang instead of $CC) which we should avoid.

Logistically it's not ready due to:

  1. My assumption is this is for academic research. I think we should avoid adding tons of fuzzers for such a purpose. The reasons being high likelihood of noise over signal and added maintenance burden.
  2. Adding emails to auto_cc is for maintainers or places where maintainers have given consent. I assume this has not happened in this case? I don't think the best is to tag all maintainers here to ask for permission, but rather to approach on project-by-project basis to make the process easy for the maintainers (CCing them to a large pull request will likely confuse maintainers).

I would also prefer if it's auto-generated that the algorithms were public. This makes it more easy to "review" as what we really want to review in that case is the auto-generation approach. Can you reveal this? I know UTopia has a fuzzer on OSS-Fuzz (libwebsockets trophy) https://github.com/Samsung/UTopia/blob/main/Trophy.md and was wonering if the fuzzers are from this project?

@the-dawn-chorus
Copy link
Author

Thanks for the explanation and discussion today, we've learnt a lot from it.

  • We'll close the PR since it is not ready for acceptance.
  • The drivers are not from Utopia. We're trying some new approaches for the driver generation.
  • Yes, we are doing some academic researches and will make it public once it is ready.
  • We are trying the fuzz-introspector. This is a great project for providing insights of what should be fuzzed next.
  • For the suggestions on code convention, CC and potential spam issue, we'll carefully handle it before submitting PR next time.

I think there is a lot that can be used from that project to assist here and would be happy to help. Please reach out to me if you're interested in that.

Agreed and thanks! We'll try it first!

@evverx
Copy link
Contributor

evverx commented Jan 31, 2023

I think there is a lot that can be used from that project

@DavidKorczynski I think it's an understatement because FI already has enough data to generate fuzz targets (and there was an experimental analysis plugin doing exactly that as far as I can remember). If it could extract global coverage it would be even better. I'm not sure how to infer whether code is reachable through world-writable sockets by using only FI yet but it could be a separate research project :-)

Anyway I think if that algorithm turns out to be good enough in practice I think it would great if it could be integrated into FI as a plugin or something like that. This way those fuzz targets could be copy-pasted easily and turned into something suitable for integrating into actual projects.

@DavidKorczynski
Copy link
Collaborator

DavidKorczynski commented Feb 7, 2023

@evverx -- I totally agree. I think I didn't want to be too pushy given my involvement with FI as in a sense I'm quite biased. However, FI has a ton of potential in this domain. It has a of mechanics that can be extended for the purpose of auto-generation. I'm actually working on this at the moment and will let you know when I make it public!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants