-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register SSHFileSystem with fsspec. #34
Conversation
In the current tests, it seems that testing |
Added daily and workflow_dispatch to our tests jobs, let's see if that succeeds. |
Looks like your change is triggering the error https://github.com/fsspec/sshfs/actions/runs/5535143314 |
Good to know! I'll investigate and see if I can figure out where things are going wrong. |
I have found the issue. The default The failing line in the test was: fs.touch(remote_dir + "/dir/c/a/") You cannot touch a directory. Before, this path was changed to I've updated the code to remove trailing slashes, and the test succeeds again (at least locally). |
I have also added an additional test for the registration with |
Finally, I have updated the README to describe the new functionality. Happy for any feedback! |
Nice to see that the tests passed, but pre-commit failed because of wrongly sorted imports. I ran isort and it should be fixed now. |
Hi @efiop, as soon as you have some time, could you trigger the tests again, please? I probably don't have the permissions, as I can't see a way to trigger them myself. |
Oops, sorry for that. Let me see if I can disable that safeguard for you... |
@NotSpecial I think the actions permissions are fixed now, you won't be constrained by it anymore. |
tests/test_sshfs.py
Outdated
# Additional options like client_keys need to be passed to open. | ||
url = f"ssh://{user}@{ssh_server.host}:{ssh_server.port}/{path}" | ||
with fsspec.open(url, "w", client_keys=[USERS[user]]) as file: | ||
file.write("Anything") | ||
|
||
with fs.open(path, "r") as file: | ||
assert file.read() == "Anything" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be sure that this is sshfs and not sftpfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll investigate if there is any way to assert that the correct file system is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a way. One can access the underlying filesystem used by fsspec via file.buffer.fs
. I added asserts for both the class and parameters.
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #34 +/- ##
=======================================
Coverage ? 93.99%
=======================================
Files ? 13
Lines ? 699
Branches ? 0
=======================================
Hits ? 657
Misses ? 42
Partials ? 0 ☔ View full report in Codecov by Sentry. |
|
||
[options.entry_points] | ||
fsspec.specs = | ||
ssh = sshfs.spec:SSHFileSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that sshfs replaces sftpfs if installed, right? Just making sure I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
tests/test_sshfs.py
Outdated
def test_fsspec_registration(fs, ssh_server, remote_dir, user="user"): | ||
"""Check that we can use sshfs with fsspec.open and an ssh url.""" | ||
path = remote_dir + "/a.txt" | ||
|
||
# Additional options like client_keys need to be passed to open. | ||
url = f"ssh://{user}@{ssh_server.host}:{ssh_server.port}/{path}" | ||
with fsspec.open(url, "w", client_keys=[USERS[user]]) as file: | ||
# Check the underlying file system. | ||
file_fs = file.buffer.fs | ||
assert isinstance(file_fs, SSHFileSystem) | ||
assert file_fs.storage_options == { | ||
"host": ssh_server.host, | ||
"port": ssh_server.port, | ||
"username": user, | ||
"client_keys": [USERS[user]], | ||
} | ||
# Write something. | ||
file.write("Anything") | ||
|
||
with fs.open(path, "r") as file: | ||
assert file.read() == "Anything" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you could just do something like:
fs = fsspec.filesystem("ssh")
assert isinstance(fs, SSHFileSystem)
The rest of open/read/write testing is redundant and is out of scope for testing if sshfs is plugged into fsspec correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth testing that the URL is parsed correctly, i.e. user/port/hostname.
But I see that the write and read are redundant, I'll remove them.
I'll also add your suggestion as a separate test, i.e. to have:
- one test for registration (just testing
filesystem("ssh")
) - one for URL parsing
The best way I have found for URL parsing is to use fsspec.open
. Maybe there is an easier way? (Similar to fsspec.filesystem
? If you know one, please let me know, because I couldn't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍 Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I realized we cannot just do fsspec.filesystem("ssh")
, as SSHFileSystem tries to connect immediately and fails if it can't. So we need to provide host etc. to that call.
Still, I think the tests are clearer this way.
Not sure if the open
route is the best way to test URL parsing, but as mentioned, the best one I found so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Happy to help! |
For the record 2023.7.0 is out with this change. |
Hi, when I tried registration via the test example: test_fsspec_registration assertion fails. Still sftp/ssh implementation on fsspec points to paramiko. Shouldn't the fsspec.filesystem('sftp',connection_args) points to 'sshfs.spec.SSHFilesystem' like in this PR ? |
Are you using the latest version, i.e. does Alternatively, if you have cloned the repository locally, have you installed it using The registration happens during installation, maybe something went wrong there? |
For first option yes returns correct version. Did not try cloning repo though. Tried installation couple of times, just not register. |
What does the following code return for you?
I'm getting |
Re-reading your messages, are you tring to use an Currently sshfs is only registered for |
Ahh ok then. Yes I'm trying sftp. I've misunderstood PR. Thank you. |
@efiop I guess we could also register sshfs for |
Hi @NotSpecial , are you going to create PR for sftp spec registration? If you are occupied, I can do it too. |
I didn't get the change to look at this. Thanks for taking care of it! |
Related to #25.
Currently, it is not possible to register the SSHFileSystem with fsspec.
However, it seems to me that the only thing that is missing is appropriate parsing of the protocol string.
I compared the current SSHFileSystem with the AbstractFileSystem base class and discovered that only two methods are missing:
_strip_protocol
, which removes all protocol info from a url, and_get_kwargs_from_urls
, which translates a URL to the appropriate arguments to instantiate the class.I leveraged the fact that sshfs is already able to parse ssh URLs, and copied the required functions from the existing STFP filesystem.
Finally, I updated
setup.cfg
such that the SSHFileSystem is automatically registered for thessh
protocol when installing this library.Testing locally with my setup, this works like a charm. However, I would like to contribute towards testing this more thoroughly. I would appreciate any pointers on how to achieve this most efficiently. Thanks!