Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzing Coverage Expansion #407

Merged
merged 17 commits into from
Sep 17, 2023
Merged

Fuzzing Coverage Expansion #407

merged 17 commits into from
Sep 17, 2023

Conversation

nmlsg
Copy link
Contributor

@nmlsg nmlsg commented Sep 1, 2023

This pull requests expands fuzzing coverage for oss-fuzz with new fuzzers and additional corpuses.

@dillof
Copy link
Member

dillof commented Sep 1, 2023

Thanks for improving our fuzz setup, we very much appreciate it.

I haven't looked at it in detail yet, but could you please put the fuzzer specific files in their own directory? I would like to keep the regress directory for our test suite only. I would prefer a new top level directory, but if it's easier to integrate with the fuzz infrastructure, a subdirectory under regress is also acceptable.

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 1, 2023

Hi @dillof
It will be a problem because in such way I will need to make a commit to oss-fuzz directory in order to re-write a Dockerfile

But I don't have any access to oss-fuzz repo.
You could accept a PR, and then redesign the folder structure ( If you have an access to oss-fuzz repo ).

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 6, 2023

Hi @dillof
I see that PR is still open, do you need something specific from my side?
Let's discuss it !

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 14, 2023

This pull request needs major cleanups.

  • Please put the new fuzzer code in regress, that's fine, but not top-level.
  • don't move files around in the fuzzer shell script
  • you seem to add a lot of corpus data, but I only see it used by being zipped up in one file in the fuzzer shell script. where are they used?
  • if the corpus data is really used, it should either be just added as (one to four) zip files directly to the repository, or if there's a reason the data needs to be there unpacked as well, it should be in a subdirectory of regress e.g regress/corpus, and the fuzzer shell script adapted as necessary
    Please add some description what the new fuzzers want to test, as comments in the source code.
    Thanks!

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 14, 2023

Hi @0-wiz-0,

@dillof asked me to create a new subdirectory under 'regress.' As a result, I have updated the shell script and also created a new subdirectory. Just to confirm, is it okay to move all the fuzzers directly to the 'regress' directory in this repository?

I'm using corpuses as initial seeds for the fuzzers. According to the oss-fuzz documentation, the recommended approach is to zip the corpuses with the specific name 'name_seed_corpus.zip.' In the initial setup of the repository, you had created a zip file with all the corpuses in shell script, however I can archive them once and upload the prepared archive if you would like me to.

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 15, 2023

@nmlsg Please take a step back and consider the existing project, not just the fuzzer part.
libzip has a regression suit that can be run manually or in an automated way, which is done regularly by the CIs.
The tests and their data exist in the regress directory.
So the existing zip archives are crafted to be used by specific tests to cover particular functionality.
These files must be left available for the tests.

The previous fuzzer setup just took advantage of the existing test data and used that.

You added test data that is not zip archives (like pdf, php, js, xml), I'm not quite sure why (because libzip is not handling any of these formats), and put it unpacked in the regress directory. We saw that that as noise from the point of view of regression testing and asked you to put it in a subdirectory, or pack it up.

Your latest change zipped them up, but also removed the test data we use for regression testing.

Please, for future changes, make sure that the test suite runs through successfully.

Here's an updated list of requests so this can be merged:

  • restore the test data so the test suite can run again
  • document the new tests
  • remove the c source code you added to the top-level
    Thank you.

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 15, 2023

@0-wiz-0 I have cleaned up the 'regress' directory, removed 'C' source code from the top-level directory, restored the test data, and thoroughly documented the targets for the new fuzzers. Please take a moment to review these changes. Thank you.

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 15, 2023

Thank you, that's better.

  • I still see 4 four new toplevel *.cc files
  • bigstored.zh was removed, please restore it
  • some of the corpus zip files contain other corpus zip files, which looks like it was an accident - can you please check?

Thanks.

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 15, 2023

@0-wiz-0 Thanks for your time ! I have pushed the changes, check it please !

  • The corpus files, for example, 'zip_write_encrypt_aes256_file_fuzzer_seed_corpus.zip,' consist of multiple files that serve as the initial seeds for the fuzzer. Similarly, 'zip_read_encrypted_file_fuzzer_seed_corpus.zip' contains various zip archives, each containing encrypted files.

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 15, 2023

Thank you for the changes.
As to the code:

  • I don't understand the changes in zip_read_fuzzer - are they intended? Can you please explain why you create an archive and immediately remove it?
  • The file name says AES 256, but the code is for AES 128
  • I think when we're fuzzing the encryption, it would be good to also use a random password each time, right? (that's for both AES and pkware fuzzers)
  • For zip_read_encrypted_file_fuzzer: I think this will fail a lot - if it fuzzes the input data, it will not be valid AES encrypted data and then decryption will fail. But perhaps that's the point?
  • The corpus files for the other fuzzers are included in the corpus file for the encryption fuzzer. Is that really useful?

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 15, 2023

  • To cover the 'zip_open' function, I have utilized I/O operations to create an archive and subsequently remove it, enabling the loading of data via the 'zip_source_buffer_create' function.
  • Acknowledge my mistake and will rectify it by changing it to 'ZIP_EM_AES_256'. I conducted experiments involving various encryption algorithms and key lengths ;)
  • It seems like a good idea; I can implement random password generation into the process.
  • I have pre-generated valid archives with encrypted files inside. Although mutations from the oss-fuzz side may lead to potential failures, we have implemented handling mechanisms to prevent false-positive crashes.
  • It's essential to experiment with different data and file types to observe their behavior, so yes, we have to add all of theese file "types"

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 15, 2023

Thank you for the improvements.

  • The corpus data for the AES tests is AES128 encrypted. Shouldn't this match the code and be AES256? (How do you test your commits?)
  • Is the corpus data available under a free license? Where did you get the files? (js, xml, pdf, ...)
  • The AES corpus data zip file contains the complete corpus zip files of two other corpora. That looks wrong to me - it looks like an oversight, but if it's not, it at least wastes space. If we really want to include the corpus files inside this zip archive, we could add them to the zips in the fuzzer shell script.
  • Same problem for the pkware corpus file.
  • I still don't understand the changes to zip_read_fuzzer. The newly added code does something completely different than the existing code there. If you think it's useful, it should be a separate fuzzer cc file.

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 15, 2023

I appreciate your comments, and I'm thankful for your time and efforts.

  • Regarding the '_zip_get_encryption_implementation' function, it consistently returns 'zip_source_winzip_aes_decode' or 'zip_source_winzip_aes_encode' for all key lengths related to AES encryption. I've conducted local tests and also have a deployed instance of oss-fuzz on my virtual machine with the repository URL modified to 'nmlsg/libzip.'
  • I generated some of the corpuses manually, while others were obtained from this repository repo_link.
  • I acknowledge the issue you mentioned, and I'm addressing it in a new commit. It appears that something went wrong with my use of 'xargs,' and I've made the necessary corrections.
  • Regarding the method of opening and handling archives by writing data to disk from a fuzzer, this approach is designed to cover the 'zip_open' function (zip_read_fuzzer target) . It seems reasonable to leave it as is, but if you believe it's better to create another target for this, I'm open to doing so. This method effectively covers a substantial portion of the codebase.

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 16, 2023

Thank you for the information. Wow, you did set up your own fuzzing instance!

  • About the test data - I looked at the repository and it doesn't have any license information as far as I can see - so we can't use it. I think for the purposes of libzip fuzzing, two cases are probably enough:
  1. use completely random input data
  2. use ASCII random input data (so only characters from bytes 32-127)
    I think that should cover enough for our purposes.

As for zip_read_fuzzer, I think I understand what you mean now - the new code tests if the file data that is provided can be read using zip_open() if written to a file.
However, the existing (previous) code is just run as is, and duplicates part of this.
I think it would be nicer to leave zip_read_fuzzer.cc as is, and add a zip_read_file_fuzzer, that's basically the same (including reading all the data) but does it after writing the file data to disk, like your new code does. I.e. at the beginning, write the file data to disk, then use the loop from zip_read_fuzzer that reads all entries, and finally zip_close the file and remove it.

Thanks!

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 16, 2023

Oh, and please take a look at the CIFuzz failure in the last run for the pull request. Are these alrady libzip fuzzing issues or problems in the setup?

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 17, 2023

Hi @0-wiz-0
Ihave recently made several modifications, including the addition of a new fuzzer called 'zip_read_file_fuzzer,' and have also addressed minor issues in existing fuzzers.

Based on my evaluation, the changes appear to be functioning as expected both locally and on my deployed machine, and there are no indications of issues with the ossfuzz setup in this pull request.

I am considering the generation of new corpora that exclusively contain ASCII characters to replace the current ones. Would you like me to do it?

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 17, 2023

@0-wiz-0 I have implemented all requisite modifications. Kindly review them at your earliest convenience.

@0-wiz-0 0-wiz-0 merged commit 28c5db8 into nih-at:main Sep 17, 2023
1 check failed
@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 17, 2023

There were some minor things I cleaned up myself.
I've now merged your pull request, thank you for your work on this!
Looking forward to the new ossfuzz reports :)

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 17, 2023

@0-wiz-0 Thanks for your work, I really appreciate that !

@nmlsg
Copy link
Contributor Author

nmlsg commented Sep 21, 2023

Hello @0-wiz-0,

I've observed that there might be an issue with the recent commit you made, which seems to have caused a breakdown in the OSS-Fuzz setup. Could you please consider rolling back this commit? Here is the latest log for reference: Link to Log. *Step #3 - "compile-libfuzzer-coverage-x86_64": [1m/src/libzip/regress/fuzzers/zip_write_encrypt_aes256_file_fuzzer.cc:70:1: [0m�[0;1;31merror: �[0m�[1mdeclaration of 'random_string' has a different language linkage�[0m

I've also taken the initiative to test the setup locally with actual repository , and it has failed. Interestingly, when I tested my fork of the repository link, it worked perfectly fine, I assume that your last commit affected the whole setup.

@0-wiz-0
Copy link
Member

0-wiz-0 commented Sep 21, 2023

I've fixed the random_string issue - the problem was that the prototype needed to be inside the extern "C" block.
Thanks for pointing out the error.

I have a local change that builds the fuzzers during a libzip build to find issues such as this one in the future, but this now needs a C++20 compiler, so I can't commit it as-is (because libzip is a C library). It would be nicer if the fuzzers were written in C instead of C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants