-
-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Submodules Fuzz Target #1919
Add Submodules Fuzz Target #1919
Conversation
Fuzz Introspector heuristics suggest the Submodule API code represent "optimal analysis targets" that should yield a meaningful increase in code coverage. The changes here introduce a first pass at implementing a fuzz harness that cover the primary APIs/methods related to Submodules. Of particular interest to me is the `Submodule.config_writer()` coverage. Please note however, there is likely plenty of room for improvement in this harness in terms of both code coverage as well as performance; the latter of which will see significant benefit from a well curated seed corpus of `.gitmodules` file like inputs. The `ParsingError` raised by the fuzzer without a good seed corpus hinders test efficacy significantly.
fa52ea2
to
6d52bdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking such great care!
Part of me thinks that the submodule implementation is so riddled with inaccuracies and and incorrectness that fuzzing it seems like a waste. The fuzzer can only try to find unexpected exceptions, and maybe that's a small win, but at what cost?
Part of that feeling also stems for the incredible sluggishness of Python in general, so any fuzzing feels wasteful. But that's besides the point I suppose, apologies for the ramblings.
Please don't apologize, and definitely do not hesitate to reject or push back on any of my PRs! (especially considering that my last few PRs came out of the blue without prior discussion about whether they're even wanted -- sorry about that 😅)
I think your points are perfectly reasonable. Here is how I've been thinking of the value in fuzzing GitPython:
I think everything you said is very much on-point regarding any of the fuzzing work in this repo. Moreover, I really appreciate hearing your thoughts, so thanks! In case it isn't clear, I won't be offended if you feel the juice isn't worth the squeeze, and would rather me hold off on any non-maintenance type fuzzing work. Frankly, if you decided you'd rather it all removed ASAP, I'd help remove it. I've learned a lot about Git, Python, fuzzing, and more working on these, so I wouldn't consider it a wasted effort even if the changes never made it to PR, So thanks, @Byron, for the support along the way! 🙂 And now, it's my turn to apologize for the ramblings 😅 |
That's perfectly alright - no need to make it more complicated, just do what you think is right, you are driving this.
I am also clearly biased and think that everybody should use
Spreading fuzzing as a technique through GitPython is a great thought and I am fully behind that - if nothing else comes out of it, more Python projects might adopt it which could be a net-win. And even if not, people learn how to use a fuzzer which will help in any programming environment eventually.
I'd never do that, and don't feel that way at all. But I do admit that I'd love to see you eventually move to Eventually. No pressure :D. PS: There I'd definitely have more opinions on what to fuzz as well, which might make it more interesting for you. |
Fuzz Introspector heuristics suggest the Submodule API code represent "optimal analysis targets" that should yield a meaningful increase in code coverage. The changes here introduce a first pass at implementing a fuzz harness that cover the primary APIs/methods related to Submodules. Of particular interest to me is the
Submodule.config_writer()
coverage.Please note however, there is likely plenty of room for improvement in this harness in terms of both code coverage as well as performance; the latter of which will see significant benefit from a well curated seed corpus of
.gitmodules
file like inputs. TheParsingError
raised by the fuzzer without a good seed corpus hinders test efficacy significantly.I have a draft PR up with a seed corpus here: gitpython-developers/qa-assets#5