Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Osf remote #4860

Closed
wants to merge 34 commits into from
Closed

[WIP] Osf remote #4860

wants to merge 34 commits into from

Conversation

aerubanov
Copy link
Contributor

close #4176

I have implemented the OSFTree class and I open this pull request to get feedback on these changes. Now I will write the tests and make changes to the documentation.

Note on method _download in dvc/tree/osf.py. The currently used osfclient library does not have a public attribute of the File class to get the file size. Although the API itself has this feature. I opend the pull request with the necessary changes osfclient/osfclient#185. So it should be possible to add tqdm progress bar for _downlod method after next osfclient release.

@efiop efiop changed the title Osf remote [WIP] Osf remote Nov 8, 2020
@efiop
Copy link
Contributor

efiop commented Nov 8, 2020

Hi @aerubanov ! Great stuff! 🎉 Please check fix styling as noted in the failing lint job. Be sure to install our pre-commit hooks in your dev environment, so it checks everything for you when you do any commit.

Also, added [WIP] prefix to this PR title, so people know that this is not ready for review.

@efiop
Copy link
Contributor

efiop commented Nov 9, 2020

Btw, regarding

Note on method _download in dvc/tree/osf.py. The currently used osfclient library does not have a public attribute of the File class to get the file size. Although the API itself has this feature. I opend the pull request with the necessary changes osfclient/osfclient#185. So it should be possible to add tqdm progress bar for _downlod method after next osfclient release.

could we for now just use a hack for it? Or is it not possible to easily access that hidden attribute from outside?

@aerubanov
Copy link
Contributor Author

Ok, I'll try.

@efiop
Copy link
Contributor

efiop commented Nov 17, 2020

@aerubanov Please rebase on top of master. Also, in the PR header there was a reminder about docs.

Btw, what about tests for this? Are there any emulators?

@aerubanov
Copy link
Contributor Author

aerubanov commented Nov 17, 2020

@efiop Opps, I have not completed the merger process. Thank you for noticing.

About the documentation. I was distracted by my studies, but in the next couple of days I will open a pull request at dvc.org.

As for testing. OSF provides a separate url with API for testing applications (see testing section https://developer.osf.io/#tag/General-Usage). It looks like we can replace the url in osfclient and thus test our code. But I have my doubts about this being a good idea.For example, won't such tests break when running in CI/CD pipline?
But without it, we can only test the part of the code that does not include interaction with the OSF server. And there is not much such code. What do you think?

@efiop
Copy link
Contributor

efiop commented Nov 18, 2020

@aerubanov Looks like something went wrong with your merge. The changes now include other people's changes, which makes it really hard to review. Could you try:

git remote add upstream https://github.com/iterative/dvc
git fetch upstream
git rebase upstream/master

and when everything looks good:

git push -f

?

@efiop
Copy link
Contributor

efiop commented Nov 18, 2020

As for testing. OSF provides a separate url with API for testing applications (see testing section https://developer.osf.io/#tag/General-Usage). It looks like we can replace the url in osfclient and thus test our code. But I have my doubts about this being a good idea.For example, won't such tests break when running in CI/CD pipline?
But without it, we can only test the part of the code that does not include interaction with the OSF server. And there is not much such code. What do you think?

So we would have to create account for testing purposes but use the test endpoint instead of the production one? If so, that might be an option, but that will definitely slow down the testing. Maybe there are some better options (emulator, mocker or smth like that). Maybe take a look at the osf library you use, it might have some good recipes.

@efiop efiop added awaiting response we are waiting for your reply, please respond! :) feature is a feature p3-nice-to-have It should be done this or next sprint labels Nov 18, 2020
@aerubanov
Copy link
Contributor Author

@efiop, Thank you for your advice! I found in the code of osfclient mock-objects, which I suppose allow to write tests without using API queries https://github.com/osfclient/osfclient/blob/master/osfclient/tests/test_uploading.py. I will try to implement it. I think it's better suited to our task than using test endpoint.

@aerubanov
Copy link
Contributor Author

Open PR for docs iterative/dvc.org#1958. Starting to work on the tests.

@aerubanov
Copy link
Contributor Author

I wrote tests for osf_tree. @efiop, please look at my code and specify what needs to be improved.

dvc/config.py Outdated Show resolved Hide resolved
dvc/tree/osf.py Outdated Show resolved Hide resolved
setup.py Outdated
@@ -152,6 +155,7 @@ def run(self):
"filelock",
"black==19.10b0",
"wsgidav",
"osfclient==0.0.4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will require the user to pip install osfclient themselves? Or can it be an optional dependency when you pip install dvc[osf]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. asking per iterative/dvc.org/pull/1958

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel, this is only for the tests. There's osf above already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. My question is in general about the package though, not this specific line. Thanks

setup.py Outdated
@@ -100,12 +99,16 @@ def run(self):
hdfs = ["pyarrow>=2.0.0; python_version < '3.9'"]
webhdfs = ["hdfs==2.5.8"]
webdav = ["webdavclient3>=3.14.5"]
osf = ["osfclient==0.0.4"]
Copy link
Member

@skshetry skshetry Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel, this is what makes dvc[osf] work. So, it's optional. The below one is for the tests.

@aerubanov
Copy link
Contributor Author

I changed osf_username to username and changed the message in OSFAuthError.

@efiop
Copy link
Contributor

efiop commented Nov 24, 2020

I changed osf_username to username and changed the message in OSFAuthError.

@aerubanov Let's do user, because other remotes have it already (e.g. ssh)

.dvcignore Outdated Show resolved Hide resolved
dvc/tree/osf.py Outdated Show resolved Hide resolved
dvc/tree/osf.py Outdated Show resolved Hide resolved
dvc/tree/osf.py Outdated
) # need for private projects
if self.password is None:
self.password = config.get("password")
logger.debug(OSFTree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be too verbose? should we make a message that includes what we need?

dvc/tree/osf.py Outdated
try:
project = osf.project(self.project_guid)
except UnauthorizedException:
raise OSFAuthError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we chain exceptions?

dvc/tree/osf.py Outdated Show resolved Hide resolved
dvc/tree/osf.py Outdated Show resolved Hide resolved
# gssapi should not be included in all_remotes, because it doesn't have wheels
# for linux and mac, so it will fail to compile if user doesn't have all the
# requirements, including kerberos itself. Once all the wheels are available,
# we can start shipping it by default.
ssh_gssapi = ["paramiko[invoke,gssapi]>=2.7.0"]
all_remotes = gs + s3 + azure + ssh + oss + gdrive + hdfs + webhdfs + webdav
all_remotes = (
gs + s3 + azure + ssh + oss + gdrive + hdfs + webhdfs + webdav + osf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if at some point we'll have to do multiple bundles? DVC can become too heavy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can use pip install dvc[osf] to use dvc with the selected remote storage. I suppose that all_remotes is mainly needed to install the development and testing environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can. The problem that we have brew, deb, win, dmg, and a few other ways we deliver this. And those include all the dependencies.

@efiop
Copy link
Contributor

efiop commented Mar 8, 2021

Hi @aerubanov ! Quick update: we are migrating to fsspec and that will be our plugin mechanism for the future. We won't be able to provide proper support for osf ourselves, so the best way to go about it would be for you to implement osffs using fsspec and maintain that implementation (try registering it in fsspec, other people might find it useful too). On our end, we will allow using that as a plugin as seamlessly as possible. Thank you so much for contributing! 🙏

@efiop efiop closed this Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Open Science Foundation?
5 participants