Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support stateful transactions by allowing to pass keyword arguments #1517

Open
nicholasjng opened this issue Jan 26, 2024 · 6 comments
Open

Comments

@nicholasjng
Copy link
Contributor

nicholasjng commented Jan 26, 2024

A file system's corresponding transaction class has been a class attribute since #1424.

However, the fsspec.transaction.Transaction interface is currently stateless. For someone trying to roll a transaction class that takes some state in the constructor, this results in incompatibilities that can break some of fsspec's most important features, e.g. like so:

from fsspec import filesystem
from fsspec.utils import get_protocol

class MyFS:
    protocol = "myfs"
    ...

    def transaction(foo: str = "hello") -> MyTransaction:
        return MyTransaction(foo)

def transact(uri, **kwargs):
    protocol = get_protocol(uri)
    fs = filesystem(protocol, **kwargs)
    with fs.transaction as tx:  # <- bang! if protocol == "myfs", tx will be MyFS's `transaction` class API.
        ...

Question is, would you support moving from the current "transactions as properties" approach to a more functional approach allowing for state passing to the respective transaction class?

@martindurant
Copy link
Member

Thanks for thinking about this. I don't immediately know how, but I feel what you want should be doable without breaking the current API, perhaps with some context or other FS instance-level state

@nicholasjng
Copy link
Contributor Author

Thanks for the tip. We ended up defining a __call__() operator on our custom transaction class (which we abuse as sort of a second __init__() method), so that with fs.transaction(...) as tx actually resolves to Transaction.__call__().__enter__(), but fs.transaction is still a property.

This is obviously a bit hacky, but without opening up e.g. a context argument slot in Transaction.__init__(), I don't see how you could go about making this cleaner. (You could of course always set the required transaction state after entering the context, but that's extra work and does not feel natural to me.)

@martindurant
Copy link
Member

I have thought of a very nice use case for this. When writing zarr stores, there is a mixture of ".z" metadata small text metadata files and larger binary files being written. So, it would be nice to specilise the transaction to only capture paths which meet some pattern. Even better would be to modify/wrap the ls/dircache behaviour so that while the transaction is live, these files can be found, and without asking the remote store.

@nicholasjng
Copy link
Contributor Author

Sounds great. Our use case is essentially a write to a git-ish branch, which needs metadata like repository, branch name, and some flags. It's a mixed bag between positionals and keyword arguments.

If you want to take a look, our implementation can be found here. We're also interested in listing files in transaction, even though we might not be able to use them, since we do not use the AbstractBufferedFile anymore and our file-like objects seem to be written eagerly on close().

Of course, I'm happy to support on any part, or specifically the ls() part if you like.

@martindurant
Copy link
Member

we might not be able to use them

Actually, transactions could use a general workup. At the moment, they only refer to files written using open(), which means that you might get transactional writing with pipe_file if it depends on open() (e.g., memory), but probably not. Instead, we should queue pipe and put calls and send them async if possible on completion.

Your transaction does some or all of this already, since you have a definite temporary place in the remote to write to.

@martindurant
Copy link
Member

You may be interested in #1531 , a different take on write transactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants