Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register SSHFileSystem with fsspec. #34

Merged
merged 9 commits into from
Jul 14, 2023
Merged

Register SSHFileSystem with fsspec. #34

merged 9 commits into from
Jul 14, 2023

Conversation

NotSpecial
Copy link
Contributor

@NotSpecial NotSpecial commented Jul 12, 2023

Related to #25.

Currently, it is not possible to register the SSHFileSystem with fsspec.

However, it seems to me that the only thing that is missing is appropriate parsing of the protocol string.
I compared the current SSHFileSystem with the AbstractFileSystem base class and discovered that only two methods are missing: _strip_protocol, which removes all protocol info from a url, and _get_kwargs_from_urls, which translates a URL to the appropriate arguments to instantiate the class.

I leveraged the fact that sshfs is already able to parse ssh URLs, and copied the required functions from the existing STFP filesystem.

Finally, I updated setup.cfg such that the SSHFileSystem is automatically registered for the ssh protocol when installing this library.

Testing locally with my setup, this works like a charm. However, I would like to contribute towards testing this more thoroughly. I would appreciate any pointers on how to achieve this most efficiently. Thanks!

@NotSpecial
Copy link
Contributor Author

In the current tests, it seems that testing rm fails. I do not understand if and how this might be related to the proposed changes here. Any advice?

@efiop
Copy link
Member

efiop commented Jul 12, 2023

Added daily and workflow_dispatch to our tests jobs, let's see if that succeeds.

@efiop
Copy link
Member

efiop commented Jul 12, 2023

Looks like your change is triggering the error https://github.com/fsspec/sshfs/actions/runs/5535143314

@NotSpecial
Copy link
Contributor Author

Good to know! I'll investigate and see if I can figure out where things are going wrong.

@NotSpecial
Copy link
Contributor Author

NotSpecial commented Jul 13, 2023

I have found the issue. The default _strip_protocol from the FileSystem base class would remove trailing slashes, while my updated version would not.

The failing line in the test was:

fs.touch(remote_dir + "/dir/c/a/")

You cannot touch a directory. Before, this path was changed to remote_dir/dir/c/a, and touch would succeed.
With my initial change, it would remain remote_dir/dir/c/a/ and touch would fail.

I've updated the code to remove trailing slashes, and the test succeeds again (at least locally).

@NotSpecial
Copy link
Contributor Author

I have also added an additional test for the registration with fsspec. It tests writing to a file using a ssh:// url.

@NotSpecial
Copy link
Contributor Author

Finally, I have updated the README to describe the new functionality. Happy for any feedback!

@NotSpecial
Copy link
Contributor Author

NotSpecial commented Jul 13, 2023

Nice to see that the tests passed, but pre-commit failed because of wrongly sorted imports. I ran isort and it should be fixed now.

@NotSpecial
Copy link
Contributor Author

Hi @efiop, as soon as you have some time, could you trigger the tests again, please?

I probably don't have the permissions, as I can't see a way to trigger them myself.

@efiop
Copy link
Member

efiop commented Jul 14, 2023

Oops, sorry for that. Let me see if I can disable that safeguard for you...

@efiop
Copy link
Member

efiop commented Jul 14, 2023

@NotSpecial I think the actions permissions are fixed now, you won't be constrained by it anymore.

Comment on lines 79 to 85
# Additional options like client_keys need to be passed to open.
url = f"ssh://{user}@{ssh_server.host}:{ssh_server.port}/{path}"
with fsspec.open(url, "w", client_keys=[USERS[user]]) as file:
file.write("Anything")

with fs.open(path, "r") as file:
assert file.read() == "Anything"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that this is sshfs and not sftpfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll investigate if there is any way to assert that the correct file system is created.

Copy link
Contributor Author

@NotSpecial NotSpecial Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way. One can access the underlying filesystem used by fsspec via file.buffer.fs. I added asserts for both the class and parameters.

@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@a62fd30). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 41671dd differs from pull request most recent head 138e2d1. Consider uploading reports for the commit 138e2d1 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #34   +/-   ##
=======================================
  Coverage        ?   93.99%           
=======================================
  Files           ?       13           
  Lines           ?      699           
  Branches        ?        0           
=======================================
  Hits            ?      657           
  Misses          ?       42           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Comment on lines +29 to +32

[options.entry_points]
fsspec.specs =
ssh = sshfs.spec:SSHFileSystem
Copy link
Member

@efiop efiop Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that sshfs replaces sftpfs if installed, right? Just making sure I understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Comment on lines 75 to 95
def test_fsspec_registration(fs, ssh_server, remote_dir, user="user"):
"""Check that we can use sshfs with fsspec.open and an ssh url."""
path = remote_dir + "/a.txt"

# Additional options like client_keys need to be passed to open.
url = f"ssh://{user}@{ssh_server.host}:{ssh_server.port}/{path}"
with fsspec.open(url, "w", client_keys=[USERS[user]]) as file:
# Check the underlying file system.
file_fs = file.buffer.fs
assert isinstance(file_fs, SSHFileSystem)
assert file_fs.storage_options == {
"host": ssh_server.host,
"port": ssh_server.port,
"username": user,
"client_keys": [USERS[user]],
}
# Write something.
file.write("Anything")

with fs.open(path, "r") as file:
assert file.read() == "Anything"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could just do something like:

fs = fsspec.filesystem("ssh")
assert isinstance(fs, SSHFileSystem)

The rest of open/read/write testing is redundant and is out of scope for testing if sshfs is plugged into fsspec correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth testing that the URL is parsed correctly, i.e. user/port/hostname.

But I see that the write and read are redundant, I'll remove them.

I'll also add your suggestion as a separate test, i.e. to have:

  • one test for registration (just testing filesystem("ssh"))
  • one for URL parsing

The best way I have found for URL parsing is to use fsspec.open. Maybe there is an easier way? (Similar to fsspec.filesystem? If you know one, please let me know, because I couldn't find it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍 Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I realized we cannot just do fsspec.filesystem("ssh"), as SSHFileSystem tries to connect immediately and fails if it can't. So we need to provide host etc. to that call.

Still, I think the tests are clearer this way.

Not sure if the open route is the best way to test URL parsing, but as mentioned, the best one I found so far.

Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@efiop efiop merged commit dca7d9b into fsspec:main Jul 14, 2023
@NotSpecial
Copy link
Contributor Author

Happy to help!

@efiop
Copy link
Member

efiop commented Jul 14, 2023

For the record 2023.7.0 is out with this change.

@uunal
Copy link
Contributor

uunal commented Aug 15, 2023

Hi, when I tried registration via the test example: test_fsspec_registration assertion fails. Still sftp/ssh implementation on fsspec points to paramiko.

Shouldn't the fsspec.filesystem('sftp',connection_args) points to 'sshfs.spec.SSHFilesystem' like in this PR ?

@NotSpecial
Copy link
Contributor Author

NotSpecial commented Aug 15, 2023

Are you using the latest version, i.e. does pip freeze | grep sshfs return 2023.7.0 (or later, in case you read this after a future release)?

Alternatively, if you have cloned the repository locally, have you installed it using pip install .?

The registration happens during installation, maybe something went wrong there?

@uunal
Copy link
Contributor

uunal commented Aug 15, 2023

For first option yes returns correct version. Did not try cloning repo though.

Tried installation couple of times, just not register.

@NotSpecial
Copy link
Contributor Author

What does the following code return for you?

from fsspec.registry import known_implementations

known_implementations['ssh']['class']

I'm getting sshfs.spec.SSHFileSystem, what do you see?

@NotSpecial
Copy link
Contributor Author

Re-reading your messages, are you tring to use an ssh URL or an sftp URL?

Currently sshfs is only registered for ssh://, not for sftp:// (which still uses paramiko).

@uunal
Copy link
Contributor

uunal commented Aug 15, 2023

Re-reading your messages, are you tring to use an ssh URL or an sftp URL?

Currently sshfs is only registered for ssh://, not for sftp:// (which still uses paramiko).

Ahh ok then. Yes I'm trying sftp. I've misunderstood PR.

Thank you.

@NotSpecial
Copy link
Contributor Author

NotSpecial commented Aug 15, 2023

@efiop I guess we could also register sshfs for sftp:// urls, couldn't we? If I understand correctly, nothing would change internally. I'm happy to add this (and a quick test that it works); but please let me know if you have any concerns.

@uunal
Copy link
Contributor

uunal commented Sep 5, 2023

Hi @NotSpecial , are you going to create PR for sftp spec registration? If you are occupied, I can do it too.
Thanks

@NotSpecial
Copy link
Contributor Author

I didn't get the change to look at this. Thanks for taking care of it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants