-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API support for downloading git based urls #5
Conversation
@pombredanne please have a look on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and sorry for the time it took! to review this!
fetchcode/giturl.py
Outdated
if dest: | ||
dest_dir = os.path.join(dest, branch_name) | ||
else: | ||
dest_dir = os.path.join(os.environ.get('CHARM_DIR'), "fetched", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats is CHARM_DIR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you copied this code from https://github.com/juju/charm-helpers/blob/master/charmhelpers/fetch/giturl.py or similar we absolutely need to track its origin and license AND keep all original notice.
We never borrow or copy code without tracking it and documenting where it comes from.
Now, why not reusing charm-helpers as a library directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of using their module I should code mine with some little tweaks and how can I track its origin, please help me in that
fetchcode/giturl.py
Outdated
return True | ||
|
||
def clone(source, dest, branch='master', depth=None): | ||
if not canHandle(source): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be a small docstring would help? after all you are not only cloning but also pulling
I got your point @pombredanne will do the changes in 1-2 days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you and sorry for taking so long to review this!
See my comments inline. You also really want to start adding some tests too.
@pombredanne changes are done, please check them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my feedback.
Additionally, you need many more test URLs to make sure that your code will work in all cases. We need tests for Null URLs, ''
URLs, git://
structured URLs, svn://
structured URLs, ftp://
structured URLs and many others.
fetchcode/giturl.py
Outdated
Returns destination directory | ||
""" | ||
url_parts = urlparse(source) | ||
repo_name = url_parts.path.strip('/').split('/')[-1].split('.')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very ugly and I do not know what it does. You should use urllib.parse to parse URLs instead of homemade string manipulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am already using urllib.parse, but it only gives path, Then I have to parse that path into something meaningful. I agree it looks ugly, that's why I am now using multiple lines and explained what steps I am using. I hope that will work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe you could use https://docs.python.org/3/library/pathlib.html to handle the various pieces of the path?
The point here being, parsing of paths has been done before by others. I would much rather use a tested library instead of splitting path strings, when possible.
Hi @MaJuRG, can you give me some sample URLs that I have to handle for test cases using git |
@TG1999 Here is a list of possible git urls combos: https://stackoverflow.com/questions/31801271/what-are-the-supported-git-url-formats You can craft samples from these skeletons. |
Hey @MaJuRG , Thanks for your guidance I think now this PR is good to go
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some formatting comments.
A quick note: Our convention is to indent with spaces, not tabs. Our default is 4 spaces per line of indentation.
On a side note, have you ran the entire test suite for fetchcode? Since we have no CI at the moment, I will have to check later and see if everything has passed tests.
tests/test_giturl.py
Outdated
""" | ||
Testing https based URLs | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line break
tests/test_giturl.py
Outdated
""" | ||
Testing git based URLs | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line break
tests/test_giturl.py
Outdated
""" | ||
Testing git+ssh based URLs | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line break
tests/test_giturl.py
Outdated
""" | ||
Testing git+https based URLs | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line break
tests/test_giturl.py
Outdated
""" | ||
Testing ssh based URLs | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line break
fetchcode/giturl.py
Outdated
""" | ||
url_parts = urlparse(source) | ||
if url_parts.scheme in ('https', 'git', 'git+ssh', 'ssh', 'git+https'): | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is out of line with the rest of the file.
You need to indent in increments of 4 spaces per line of indentation. This line should be indented 8 spaces in this case.
fetchcode/giturl.py
Outdated
if os.path.exists(dest): | ||
print('Can not clone since repository already exists') | ||
else: | ||
os.mkdir(dest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is out of line with the rest of the file.
You need to indent in increments of 4 spaces per line of indentation. This line should be indented 8 spaces in this case.
fetchcode/giturl.py
Outdated
os.mkdir(dest) | ||
cmd = ['git', 'clone', source, dest, '--branch', branch] | ||
if depth: | ||
cmd.extend(['--depth', depth]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is out of line with the rest of the file.
You need to indent in increments of 4 spaces per line of indentation. This line should be indented 8 spaces in this case.
fetchcode/giturl.py
Outdated
repo_name = Path(url_parts.path).stem | ||
|
||
if dest: | ||
dest_dir = os.path.join(dest, repo_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is out of line with the rest of the file.
You need to indent in increments of 4 spaces per line of indentation. This line should be indented 8 spaces in this case.
fetchcode/giturl.py
Outdated
if dest: | ||
dest_dir = os.path.join(dest, repo_name) | ||
else: | ||
dest_dir = os.path.join('./', repo_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is out of line with the rest of the file.
You need to indent in increments of 4 spaces per line of indentation. This line should be indented 8 spaces in this case.
Yes I have ran the test suite for it :), I will edit the indentation as per your comments. |
@MaJuRG Done 👍 |
@MaJuRG @pombredanne please look into this PR :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TG1999 Thank you for the updates. See the few nitpickings form my review before I can merge!
import os | ||
|
||
from urllib.parse import urlparse | ||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you sort your imports here and everywhere?
os, pahlib, urllib
print('Can not clone since repository already exists') | ||
else: | ||
os.mkdir(dest) | ||
cmd = ['git', 'clone', source, dest, '--branch', branch] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK for now, but we would want to have something much more robust based on pip code in the future.
fetchcode/giturl.py
Outdated
|
||
clone(source, dest_dir, branch, depth) | ||
|
||
return dest_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a trailing LF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, what is LF :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the lack of clarity. I meant a line feed, e.g. a line return.
The convention is to always have one at the end of text files.
tests/test_giturl.py
Outdated
@mock.patch('os.system') | ||
def test_git_https(mock_os): | ||
""" | ||
Testing ssh based URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_git_https makes me believe that this is a test for git and https
But the docstring says "Testing ssh based URLs"
It would be simpler and cleaner to:
- use verbose and descriptive test function names
- do not use docstring for tests
So your function could be instead:
test_fetch_with_ssh_git_url_returns_a_response(mock_os)
tests/test_giturl.py
Outdated
""" | ||
url = 'ssh://[email protected]:EastCloud/node-websockets.git' | ||
with mock.patch('os.mkdir') as mocked_file: | ||
response = giturl.fetch(source=url,branch='master',depth=None,dest='/home/tg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that your spacing is correct.
This should be:
(source=url, branch='master', depth=None, dest='/home/tg')
@pombredanne @MaJuRG good to go ? |
tests/test_giturl.py
Outdated
url = 'https://github.com/TG1999/fetchcode.git' | ||
with mock.patch('os.mkdir') as mocked_file: | ||
response = giturl.fetch(source=url, branch='master', depth=None, dest='/home/tg') | ||
assert response is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add to your tests? I would like to see what this response
object contains for each of these url schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add what @MaJuRG I can not understand 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you only test if response
is not None. This is not very descriptive as to what fetching a git URL gets you. What are the attributes to this response
object? What are the values for these various attributes for each of these test urls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I am getting you, doing the changes :)
@MaJuRG I have added tests for schemes and domain of URL please check them :) and if any case is not handled please tell me I will try to resolve those cases too 😄 |
tests/test_giturl.py
Outdated
with mock.patch('os.mkdir') as mocked_file: | ||
response = giturl.fetch(source=url, branch='master', depth=None, dest='/home/tg') | ||
assert response.url_scheme == 'ssh' | ||
assert response.domain == '[email protected]:EastCloud' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the domain be github.com
as well for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am getting you, can you suggest me some way in which I can parse the domain for ssh based URLs currently I am using urllib.parse which is giving me this kind of domain name, what approach should I use for it or should I handle ssh case differently and do some string manipulation on my own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TG1999 Well, the first step would be to look at the urllib docs to see exactly how things like this address get parsed. If there is nothing there, then we probably want to handle this as a special case. I havent looked in depth at the docs myself, so I do not know the exact way to approach this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaJuRG , I have handled it separately now all test cases are working fine too, good to go?
@pombredanne only your approval is left, please approve it 😅 |
Signed-off-by: TG1999 <[email protected]>
Partially solves issue #1
Signed-off-by: unknown <tushar.goel.dav.gmail.com>