Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzzer based on cargo-fuzz #211

Closed
wants to merge 5 commits into from
Closed

Conversation

Dandandan
Copy link
Contributor

Currently it returns (after fuzzing):

Error: Fuzz target exited with signal: 11

That looks like a problem in the nightly compiler (LLVM?)

@coveralls
Copy link

coveralls commented Jun 27, 2020

Pull Request Test Coverage Report for Build 150594647

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 161 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.03%) to 91.781%

Files with Coverage Reduction New Missed Lines %
src/ast/mod.rs 37 80.76%
src/parser.rs 124 89.07%
Totals Coverage Status
Change from base Build 148650756: -0.03%
Covered Lines: 4221
Relevant Lines: 4599

💛 - Coveralls

@nickolay
Copy link
Contributor

I'm not sure I understand the value of fuzzing sqlparser in general, and of fuzzing without first devising a way to generate SQL-looking input, in particular. Or did you mean to mark this PR as a WIP? Could you elaborate?

@Dandandan
Copy link
Contributor Author

Dandandan commented Jun 28, 2020

The fuzzer (quickly) generates bytes.

Because of a few things at starts generating:

  • It quickly generates bytes, starting from smaller inputs
  • It tries to discover new code paths, and tries to find smaller examples for existing paths
  • It uses a genetic algorithm to generate new inputs
  • It keeps a corpus of which paths have been tries

It can also run in parallel (-- -njobs=16) to make it even faster.
Some more details here https://llvm.org/docs/LibFuzzer.html and here https://rust-fuzz.github.io/book/cargo-fuzz.html

The example comes from here: https://rust-fuzz.github.io/book/cargo-fuzz/tutorial.html

Could also use this to make it a bit more smarter: https://rust-fuzz.github.io/book/cargo-fuzz/structure-aware-fuzzing.html . For example, we could generate arbitrary strings instead of bytes.

some random things in the corpus (after running for a few secods), looks like it covers many things in the tokenizer:

((*
/,,$,,,,,,,,,,,,3,,M*
'''
>>!=!=!X
(INDEX<=A(<=A(NX<=((((((<=}666F\

The thing is that currently, because of the error above, it doesn't generate any useful reports.

Although it may also be an actual problem like MaterializeInc/materialize#3429

@Dandandan
Copy link
Contributor Author

Also to elaborate on the general goal to parse sqlparser in is mainly to find inputs which make the parser crash. This could be because of any use of partial functions like unwrap, out of bounds access, stack overflows, etc.

@Dandandan
Copy link
Contributor Author

It looks like the size of the problems indeed is the problem (I think related to a recursion problem).
By adding a maximum length, it just continues and covers already >1600 lines of the program (also many parser functions). I'll just keep it running for a while to see whether it can get it to crash!

@Dandandan Dandandan changed the base branch from master to main June 28, 2020 18:37
@alamb
Copy link
Contributor

alamb commented Aug 20, 2021

@Dandandan shall we close this based on the contribution from @PsiACE in #312 ?

@Dandandan
Copy link
Contributor Author

@Dandandan shall we close this based on the contribution from @PsiACE in #312 ?

Yeah sounds good. According to some articles, different fuzzers can detect different bugs, but for now it seems we should pick just one option.
https://theultramarine19.github.io/data/736.pdf

@Dandandan Dandandan closed this Aug 21, 2021
@alamb
Copy link
Contributor

alamb commented Aug 21, 2021

I also filed apache/datafusion#913 in DataFusion for using a "domain specific" fuzzer (aka that generates valid sql) which may be applicable to sqlparser-rs as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants