-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch data_source dependency to GitPython #409
Conversation
e63af64
to
892dd7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you++ 👍
I added a few nitpickings for your consideration.
Also you may want to add the pinned deps of gitpython to the requirements file?
if previous_commit is None: | ||
previous_commit = commit | ||
continue | ||
if cutoff_commit is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could if cutoff_commit
be enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually restrict to None comparaison when possible (explicit is better than implicit) and I can see that most of this file is also using this principle
vulnerabilities/data_source.py
Outdated
if d.new_file: | ||
added_files.add(abspath) | ||
elif d.a_blob and d.b_blob and d.a_path != d.b_path: | ||
added_files.add(abspath) # consider moved files as added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do not mind, I like to avoid end of line comments when possible
added_files.add(abspath) # consider moved files as added | |
# consider moved files as added | |
added_files.add(abspath) |
vulnerabilities/data_source.py
Outdated
abspath = os.path.join(self.config.working_directory, d.b_path) | ||
if d.new_file: | ||
added_files.add(abspath) | ||
elif d.a_blob and d.b_blob and d.a_path != d.b_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it help if the part d.a_blob and d.b_blob
here and below was refactored in one test followed by two sub-tests: one on path and one blob?
repo.index.commit("Added a new file") | ||
|
||
yield str(git_dir) | ||
shutil.rmtree(git_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about putting the shutil.rmtree(
in a try/finally
block?
3b02bc6
to
0c5b28e
Compare
@pombredanne I think I have answered to your remarks. i fixed the remaining tests, and nix build. |
While neither pygit2 or GitPython have a formidable API and documentation GitPython has the advantage of being pure python and support same proxy feature as git. Signed-off-by: Pierre Tardy <[email protected]>
vulnerabilities/data_source.py
Outdated
def _is_binary(d: DiffIndex): | ||
if not d.b_blob: | ||
return False | ||
try: | ||
d.b_blob.data_stream.read().decode() | ||
except UnicodeDecodeError: | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this function trying to do ? Is it used to check whether the diff blob consists of binary ? This would return true if the diff is about something like a tar ball ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it checks by trying to decode it with default encoding settings.
The previous code had some logic of checking if the data was binary to not count it as a diff.
The is_binary semantic was provided by libgit2. The logic is quite complicated:
https://github.com/libgit2/libgit2/blob/508361401fbb5d87118045eaeae3356a729131aa/src/buf_text.c#L268
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbs2001 as you suggested in the Gitter chat I guess we could use binaryornot which does a decent job there and is used extensively in scancode-toolkit:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the PR to use binaryornot. I am not sure why the nix CI is failing :-/
Signed-off-by: Pierre Tardy <[email protected]>
# simplistic. This does not cover file renames, copies & | ||
# deletions. | ||
if d.status == pygit2.GIT_DELTA_ADDED: | ||
for d in cutoff_commit.diff(self._repo.head.commit): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This errors out with older versions of git (mine was 2.25.1
). See gitpython-developers/GitPython#1016.
We could just add a note about this in the README I guess ?
@@ -74,7 +74,7 @@ | |||
name = "vulnerablecode-${version}"; | |||
src = vulnerablecode-src; | |||
dontConfigure = true; # do not use ./configure | |||
propagatedBuildInputs = [ pythonEnv postgresql ]; | |||
propagatedBuildInputs = [ pythonEnv postgresql gitMinimal]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rolfschr is this fine on your side ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me. @tardyp I gues you could remove the ssl cert file fix for pygit2 in the nix test if you remove pygit2 anyways.
diff --git a/etc/nix/flake.nix b/etc/nix/flake.nix
index 1a65045..84ed672 100644
--- a/etc/nix/flake.nix
+++ b/etc/nix/flake.nix
@@ -119,10 +119,6 @@
buildInputs = [ wget vulnerablecode ];
- # Used by pygit2.
- # See https://github.com/NixOS/nixpkgs/pull/72544#issuecomment-582674047.
- SSL_CERT_FILE = "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt";
-
unpackPhase = "true";
buildPhase = ''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this, I'll take care of nix tests and docs in another PR.
Thanks @tardyp :)
Sorry I was in vacation.. thanks for merging. |
Sure thing. Thanks for the contrib! |
While neither pygit2 or GitPython have a formidable API and documentation
GitPython has the advantage of being pure python and support
same proxy feature as git.
I removed some of the tests, which to me are much over mocked, and do not really test much.
I also changed the diff algorithm so that it only diff the whole thing instead of considering diff commit by commits.
Fix: #403