Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch data_source dependency to GitPython #409

Merged
merged 3 commits into from
Apr 22, 2021

Conversation

tardyp
Copy link
Contributor

@tardyp tardyp commented Mar 29, 2021

While neither pygit2 or GitPython have a formidable API and documentation

GitPython has the advantage of being pure python and support
same proxy feature as git.

I removed some of the tests, which to me are much over mocked, and do not really test much.

I also changed the diff algorithm so that it only diff the whole thing instead of considering diff commit by commits.

Fix: #403

@tardyp tardyp force-pushed the gitpython branch 2 times, most recently from e63af64 to 892dd7a Compare March 29, 2021 11:39
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you++ 👍
I added a few nitpickings for your consideration.
Also you may want to add the pinned deps of gitpython to the requirements file?

if previous_commit is None:
previous_commit = commit
continue
if cutoff_commit is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could if cutoff_commit be enough here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually restrict to None comparaison when possible (explicit is better than implicit) and I can see that most of this file is also using this principle

if d.new_file:
added_files.add(abspath)
elif d.a_blob and d.b_blob and d.a_path != d.b_path:
added_files.add(abspath) # consider moved files as added
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not mind, I like to avoid end of line comments when possible

Suggested change
added_files.add(abspath) # consider moved files as added
# consider moved files as added
added_files.add(abspath)

abspath = os.path.join(self.config.working_directory, d.b_path)
if d.new_file:
added_files.add(abspath)
elif d.a_blob and d.b_blob and d.a_path != d.b_path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if the part d.a_blob and d.b_blob here and below was refactored in one test followed by two sub-tests: one on path and one blob?

vulnerabilities/tests/test_data_source.py Show resolved Hide resolved
repo.index.commit("Added a new file")

yield str(git_dir)
shutil.rmtree(git_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting the shutil.rmtree( in a try/finally block?

@tardyp tardyp force-pushed the gitpython branch 2 times, most recently from 3b02bc6 to 0c5b28e Compare March 29, 2021 18:38
@tardyp
Copy link
Contributor Author

tardyp commented Mar 29, 2021

@pombredanne I think I have answered to your remarks.

i fixed the remaining tests, and nix build.

While neither pygit2 or GitPython have a formidable API and documentation

GitPython has the advantage of being pure python and support
same proxy feature as git.

Signed-off-by: Pierre Tardy <[email protected]>
Comment on lines 334 to 341
def _is_binary(d: DiffIndex):
if not d.b_blob:
return False
try:
d.b_blob.data_stream.read().decode()
except UnicodeDecodeError:
return True
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this function trying to do ? Is it used to check whether the diff blob consists of binary ? This would return true if the diff is about something like a tar ball ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it checks by trying to decode it with default encoding settings.

The previous code had some logic of checking if the data was binary to not count it as a diff.

The is_binary semantic was provided by libgit2. The logic is quite complicated:
https://github.com/libgit2/libgit2/blob/508361401fbb5d87118045eaeae3356a729131aa/src/buf_text.c#L268

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the PR to use binaryornot. I am not sure why the nix CI is failing :-/

# simplistic. This does not cover file renames, copies &
# deletions.
if d.status == pygit2.GIT_DELTA_ADDED:
for d in cutoff_commit.diff(self._repo.head.commit):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This errors out with older versions of git (mine was 2.25.1). See gitpython-developers/GitPython#1016.

We could just add a note about this in the README I guess ?

@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 16, 2021

@tardyp this looks good to me aside from the issue with git version . IMHO the nix test failure has nothing to do with your changes. Those are probably caused to due to some sync issues. @rolfschr ping ! .

Thanks for this, and sorry for being late.

@sbs2001 sbs2001 mentioned this pull request Apr 19, 2021
@@ -74,7 +74,7 @@
name = "vulnerablecode-${version}";
src = vulnerablecode-src;
dontConfigure = true; # do not use ./configure
propagatedBuildInputs = [ pythonEnv postgresql ];
propagatedBuildInputs = [ pythonEnv postgresql gitMinimal];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolfschr is this fine on your side ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me. @tardyp I gues you could remove the ssl cert file fix for pygit2 in the nix test if you remove pygit2 anyways.

diff --git a/etc/nix/flake.nix b/etc/nix/flake.nix
index 1a65045..84ed672 100644
--- a/etc/nix/flake.nix
+++ b/etc/nix/flake.nix
@@ -119,10 +119,6 @@
 
             buildInputs = [ wget vulnerablecode ];
 
-            # Used by pygit2.
-            # See https://github.com/NixOS/nixpkgs/pull/72544#issuecomment-582674047.
-            SSL_CERT_FILE = "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt";
-
             unpackPhase = "true";
 
             buildPhase = ''

@sbs2001 sbs2001 self-requested a review April 22, 2021 11:41
Copy link
Collaborator

@sbs2001 sbs2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this, I'll take care of nix tests and docs in another PR.
Thanks @tardyp :)

@sbs2001 sbs2001 merged commit 13eaa01 into aboutcode-org:main Apr 22, 2021
@tardyp
Copy link
Contributor Author

tardyp commented Apr 22, 2021

Sorry I was in vacation.. thanks for merging.

@tardyp tardyp deleted the gitpython branch April 22, 2021 12:36
@pombredanne
Copy link
Member

Sorry I was in vacation.. thanks for merging.

Sure thing. Thanks for the contrib!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run vulnerablecode behind proxy
4 participants