Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[apiserver]: fix jgscm root folder creation, dependencies #1

Merged
merged 7 commits into from
Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions jgscm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,8 @@ def run_post_save_hook(self, model, os_path):
model=model,
contents_manager=self)
except Exception:
self.log.error("Post-save hook failed on %s", os_path, exc_info=True)
self.log.error("Post-save hook failed on %s",
os_path, exc_info=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary format change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a auto-format line-length change, 79 character length, formatted by autopep8. Should I force it to save without formatting?
Screen Shot 2019-04-08 at 1 30 37 PM

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's fine.


@default("checkpoints_class")
def _checkpoints_class_default(self):
Expand Down Expand Up @@ -522,13 +523,15 @@ def _get_bucket(self, name, throw=False):
cache[name] = bucket
return bucket

@staticmethod
def _parse_path(path):
def _parse_path(self, path):
"""
Splits the path into bucket name and path inside the bucket.
:param path: string to split.
:return: tuple(bucket name, bucket path).
"""
if self.default_path and not path.startswith(f"{self.default_path}/"):
path = f"{self.default_path}/{path}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I'm starting to see: path is not a URL but it may or may not include the bucket.

Hmm, this doesn't do what the old code does. If default_path doesn't have a leading slash, you probably need to add it in the call to startswith. If path has a leading slash, then this code doesn't behave the same as the previous one. You probably want something like os.path.join(f'/{default_path}', path).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_path shouldn't have a leading slash https://github.com/src-d/jgscm#usage

  • c.GoogleStorageContentManager.default_path = 'path/without/starting/slash'

The _parse_path needs to return the bucket_name (no slash), and the rest of the path, with the slash separating bucket_name and the rest of the path stripped. The call right below my change is:

 bucket, _, blobname = path.partition("/")
 return bucket, blobname

which splits on the first slash, and strips that first slash (because the middle argument is dropped).

The behavior we want:

  • _parse_path("your_bucket_name_with_slash/") we should get ("your_bucket_name", "")
  • _parse_path("some-folder") we should get ("your_bucket_name", "some_folder")
    • This would previously fail, you would get ("some_folder", ""), which would result in the failed get operation.
  • _parse_path("some-folder/") we should get ("your_bucket_name", "some_folder/")
  • _parse_path("your_bucket_name/some-folder") we should get ("your_bucket_name", "some_folder")
  • _parse_path("your_bucket_name/some-folder/some_file") we should get ("your_bucket_name", "some-folder/some_file")

I've attached some examples of the call stack:

Screenshot 2019-04-08 13 54 11

Screenshot 2019-04-08 13 54 32

Screenshot 2019-04-08 13 54 52

Screenshot 2019-04-08 13 55 14

Copy link
Author

@akotlar akotlar Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only behavior that this changes is that when you create a bucket in the root dir, and you have default_path (bucket name) set, it will use that bucket, rather than treat the directory being created as a new bucket. That's why I had treated this as a breaking change.

And I realized it actually fixes another issue. In the old jgscm, when you created 2 python files in the root dir, the 2nd would overwrite the first. This fixes that as well. See the below before_change.log call stack, notice it overwrites Untitled.ipynb.

before_change.log
after_change.log

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumerating the cases makes it clear this design has a real ambiguity when there is a folder with the same name as a bucket, but let's deal with that after this goes in.

Your code still breaks on the case there is a folder that starts with (but is not equal to) the bucket name, my recommendation of startswith is bad.

when you create a bucket in the root dir

You mean folder?

That's why I had treated this as a breaking change.

This really seems like a bug (creating things outside the default path), which is why I'd call it a patch release.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumerating the cases makes it clear this design has a real ambiguity when there is a folder with the same name as a bucket, but let's deal with that after this goes in.

Yes, I agree, I don't like the lack of separation between folder and bucket taken in jgscm. I think the more stable solution is to always prefix the filename with the bucket name, unless in the root folder, in which case the behavior will be different depending on the flag passed (and that maybe shouldn't be default_path; someone may want to be able to create buckets even when a default bucket is specified).

Your code still breaks on the case there is a folder that starts with (but is not equal to) the bucket name, my recommendation of startswith is bad.

I will attempt to address this now.

You mean folder?
Yes

Copy link
Author

@akotlar akotlar Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's addressed by checking for

path.startswith(f"{self.default_path}/")

This problem only affects renaming and moving, since creation uses a default name (untitled-folder). Of the two, this fixes renaming, but moving still acts strangely with folder names that are the same as bucket names. However, jgscm moving logic is in general funny, because it completely ignores the semantic differences between folders and files (since buckets don't really distinguish these, and there's a lot of / stripping happening, in the jupyter class it extends).

Here is an example of this working renaming.mp4.zip

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we defer the moving fix for another PR, so that Konrad/Liam can test other functions. However, can work further on this if you think worth it. Whatever you prefer :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd open an issue for the file/folder-as-bucket-name problem and make it low priority. Getting IO (hadoop stuff) and user isolation going seem much more important.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


bucket, _, blobname = path.partition("/")
return bucket, blobname

Expand Down
8 changes: 5 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,18 @@
setup(
name="jgscm",
description="Jupyter Google Cloud Storage ContentsManager",
version="0.1.9",
version="0.1.10-hail",
license="MIT",
author="Vadim Markovtsev",
author_email="[email protected]",
url="https://github.com/src-d/jgscm",
download_url="https://github.com/src-d/jgscm",
packages=["jgscm"],
keywords=["jupyter", "ipython", "gcloud", "gcs"],
install_requires=["google-cloud>=0.32.0", "notebook>=4.2", "nbformat>=4.1",
"tornado>=4", "traitlets>=4.2"],
install_requires=["google-api-python-client>=1.7",
"google-cloud-storage>=1.14",
"notebook>=5.7", "nbformat>=4.4",
"tornado>=6.0", "traitlets>=4.3"],
package_data={"": ["requirements.txt", "LICENSE", "README.md"]},
classifiers=[
"Development Status :: 3 - Alpha",
Expand Down