Skip to content
This repository has been archived by the owner on Nov 5, 2019. It is now read-only.

Add multiprocess file storage. #504

Merged
merged 9 commits into from
Jul 12, 2016

Conversation

theacodes
Copy link
Contributor

@theacodes theacodes commented Apr 22, 2016

Towards #470

This is not ready for merge, this is an initial sketch of what this would look like. Please note:

  1. Instead of outright replacing multistore_file, I am introducing a new module to cover the same functionality named multiprocess_file_storage. We can do a release to deprecate multistore_file in favor of multiprocess_file_storage and then another release to remove it.
  2. The interface has changed. Instead of a bunch of get_storage_{} functions, there is now just MultiprocesFileStorage(filename, key). The key argument can be used to emulate how the previous helper methods worked, eg. key = '{}-{}-{}'.format(client_id, user_agent, scope).
  3. Tests are not done, want to sanity check this first.

@thobrla, while writing this, I came across some interesting behavioral cases:

  1. The storage only holds the lock as long as needed to read/write the credentials file during put or get, but if at any point in time it fails to acquire the lock it will switch to read only mode and will never try locking again.
  2. The storage only reads the credentials file once, but writes to it constantly. This means if multiple processes access the store and write, they are probably erasing the other's stuff despite the locking.

It seems that multistore_file never lived up to its promises, and neither will this class. As I mentioned before, a sqlite-based storage would be a true solution but it would require the cloud sdk and gsutil to take on a third-party binary dependency, which may be untenable for you. What are your thoughts?

_backends_lock = threading.Lock()


@util.positional(1)

This comment was marked as spam.

This comment was marked as spam.

@theacodes
Copy link
Contributor Author

After chatting with @thobrla in person, I have a better grasp of how this really needs to work. I'll work on updating this a bit over the weekend to solve some of the behavior problems, feedback is still welcome.

@thobrla
Copy link
Contributor

thobrla commented Apr 22, 2016

Discussed a bit with @jonparrott offline.

I don't think taking a binary dependency is feasible, at least not in the near future. It might be possible in the future if we condense to a single install mechanism that can install the binary dependency, but I think that's a long ways off.

I don't think we need a lot of logic in multistore_file[_storage]. All we really want is an effective and portable cache to limit the amount of HTTP requests we need to make to the OAuth endpoint. The logic, loosely, is:

If I need credentials and don't have valid ones cached in memory:
    Try to read from the cache (file)
    If I get valid credentials, I'm done
    Else (if anything went wrong, be it stale credentials or trouble reading the file):
        Try to write-lock the file
        Refresh the credentials over HTTP
        If I have the lock, put the credentials in the file

If we make an extra HTTP request here or there due to lock contention or timeout, that's not a big deal. The scenario we want to avoid is sending 100's of HTTP requests in a short span because all of our processes and threads are trying to refresh credentials at once.

@theacodes
Copy link
Contributor Author

@nathanielmanistaatgoogle I am tentatively putting this forward for initial review. For the first round, please focus on high-level thoughts- we can get into details once you're comfortable with the API surface.

@thobrla this has been adjust as per our conversation and this now works quite differently from multistore_file. Please note, especially, the docstring:

This module provides the MultiprocessFileStorage class that:

  • Is tied to a single credential via a user-specified key. This key can be used
    to distinguish between multiple users, client ids, and/or scopes.
  • Can be safely accessed and refreshed across threads and processes.

Process & thread safety guarantees the following behavior:

  • If one process refreshes a credential, subsequent refreshes from other
    processes will re-fetch the credentials from the file instead of performing
    an http request.
  • If two processes attempt to refresh concurrently, only one process will be
    able to acquire the lock and refresh, with the deadlock caveat below.
  • The interprocess lock will not deadlock, instead, the if a process can not
    acquire the interprocess lock within INTERPROCESS_LOCK_DEADLINE it will
    allow refreshing the credential but will not write the updated credential to
    disk, This logic happens during every lock cycle - if the credentials are
    refreshed again it will retry locking and writing as normal.

@thobrla
Copy link
Contributor

thobrla commented May 21, 2016

Approach seems reasonable to me at a high-level; I'll save low-level comments until after @nathanielmanistaatgoogle 's review.

I'd like to test this out with gsutil, but probably will not have an opportunity to do so for a couple of weeks as I'm working on trying to add compatbility across oauth2client 1.5.2 and 2.x+

# See the License for the specific language governing permissions and
# limitations under the License.

"""File-based storage that supports multiple credentials and cross-process

This comment was marked as spam.

@thobrla
Copy link
Contributor

thobrla commented Jul 11, 2016

FYI, testing this out is currently on the gsutil backlog - I want to take it for a spin but it's nontrivial and there are many work items in front of it.

"""
credential = self._backend.locked_get(self._key)

if credential:

This comment was marked as spam.

This comment was marked as spam.

@nathanielmanistaatgoogle
Copy link
Contributor

I'm wary of a catch-all-exceptions for all the usual reasons to be wary of a catch-all-exceptions: I don't think we should be writing code that catches WholeyFishTheWholeWorldIsOnFireException. Do we feel like we have enough experience with fasteners to understand what exceptions, despite being specified not to raise any exceptions, it actually raises?

Exceptions coming out of the underlying fasteners object can indicate one of two things: either a programming mistake, in which case we fix our code rather than put in a handler, or an operational problem, in which case we put in a handler and file a defect report upstream that their code is raising exceptions that it is not specified to raise. I worry that putting in a catch-all-exceptions would be our trying to preemptively handle some third category of raised exception that doesn't actually exist?

@theacodes
Copy link
Contributor Author

We can likely get away with catching what multistore_file catches today: IOErrors and OSErrors.

@thobrla
Copy link
Contributor

thobrla commented Jul 12, 2016

The problem is that the intended use of credential storage is to act as a cache. Getting an exception of any kind when accessing storage effectively amounts to a cache miss. Should callers be required to handle that? Do we want to find out in our application much later on that fasteners could throw some exception that we didn't anticipate in rare cases? This will likely crash the application - if we don't know here which exceptions to handle, then how could the application know?

In this case, I'd much prefer to log the exception that led to the cache miss and fall back to refreshing the credentials. If that also fails, then we can surface an exception to the caller.

@theacodes
Copy link
Contributor Author

At the locking level:

It seems like fasteners broadly catches IOError. This should be parity with what we do. locked_file also seems to catch OSError in some cases, but I'm unsure how useful that is. The locked_file implementation conflates the lockfile with the storage file, which is something we've totally avoided with this new implementation.

At the storage level:

We catch a subset of IOErrors and OSErrors here. The only situation here that makes sense for us to think about is EACCESS. However, this is a pretty severe error and I think we should let that bubble up.

Thoughts @nathanielmanistaatgoogle @thobrla?

We can always release this and give @thobrla time to test it thoroughly in gsutil and we can always add on later.

@nathanielmanistaatgoogle
Copy link
Contributor

I dispute that "[g]etting an exception of any kind when accessing storage effectively amounts to a cache miss" - an exception could mean anything, from interpreter shutdown to local filesystem corruption to a programming defect in the called system to a programming defect in our own system to global thermonuclear conflict. An exception that comes from fasteners, and is not on the documented list of this-is-what-fasteners-raises-and-the-circumstances-in-which-fasteners-raises-it, is necessarily of process-wide interest. If our code were application code fully in command of everything that happens in the process, I'd be open to a catch-all or catch-most. But we're just a library used by the application. We should only catch the exceptions that the other library (fasteners) that we are using is documented to raise and we should only behave when catching those exceptions in ways commensurate with our use of the other library. Declining to bother the application we serve about what happened may seem like a safe assumption to make, but I don't think it's an assumption we should be making at all, regardless of its safety.

Now that I look at acquire and release, they're not documented to raise anything. Do we have any operational experience that they raise anything? If not, let's just move forward without any catching until we gain some experience that the library we're using behaves in some way other than as advertised?

if os.path.exists(filename):
return False

# Equivalent to "touch".

This comment was marked as spam.

This comment was marked as spam.

@theacodes
Copy link
Contributor Author

Do we have any operational experience that they raise anything?

Their source (and tests) indicate that they catch all relevant exceptions.

If not, let's just move forward without any catching until we gain some experience that the library we're using behaves in some way other than as advertised?

The only experience we have contrariwise in locked_file is that the occasional OSError such as access denied or out of space, not sure how we should treat that.

I'm +1 on getting this in and seeing what happens in the real world and I'm happy to work with you @thobrla to get this integrated into gsutil and tested.

@nathanielmanistaatgoogle
Copy link
Contributor

Looks nearly good to me.

@thobrla
Copy link
Contributor

thobrla commented Jul 12, 2016

The point about the types of exceptions thrown is salient. However, what's the exception model for a consuming application in this case? The problem with the "wait and see" approach is that every real world scenario we see translates to a reliability failure from the user's perspective; a failure that generally could have been avoided via refreshing the credential. This is exactly what we saw with locked_file.

I guess the real question is: does the application have control over attempting to refresh the credential as a fallback mechanism? Do they need to wrap every HTTP request with catch-all logic if they want this behavior?

@theacodes
Copy link
Contributor Author

The problem with the "wait and see" approach is that every real world scenario we see translates to a reliability failure from the user's perspective

The only class of errors not handled in this new code seem to be OSErrors related to access denied and no disk space. I can handle those if you want, but they seem serious enough to warrant raising.

does the application have control over attempting to refresh the credential as a fallback mechanism?

Not directly, but you can with your own storage class that wraps this one.

@nathanielmanistaatgoogle
Copy link
Contributor

Ready (to me, at least) to merge; make sure to follow the rules with commits that get formed on merge.

@theacodes
Copy link
Contributor Author

@nathanielmanistaatgoogle will do so when I merge it.

Do you think it's worthwhile for you to add that into CONTRIBUTING? Or is it unnecessary since you, me, and Danny are in full control over the commit message for merges?

@thobrla
Copy link
Contributor

thobrla commented Jul 12, 2016

I'm fine with wrapping it, so generally looks good.

@nathanielmanistaatgoogle
Copy link
Contributor

We should add that link to CONTRIBUTING. It's been good enough advice over the year or so that I've been citing it.

@theacodes
Copy link
Contributor Author

@thobrla I'm happy to help you write whatever glue code is needed, and upstream whatever seems necessary. :)

@theacodes
Copy link
Contributor Author

@nathanielmanistaatgoogle will merge after travis is happy. There will be a follow-up PR to make multistore_file warn about deprecation, cool?

@nathanielmanistaatgoogle
Copy link
Contributor

Totes cool.

@theacodes
Copy link
Contributor Author

totes

@theacodes
Copy link
Contributor Author

@thobrla tentative plan:

oauth2client 3.0.0 will warn about usage of multistore_file, you can silence it if needed.
oauth2client 4.0.0 will remove multistore_file and the maintainers will rejoice.

@thobrla
Copy link
Contributor

thobrla commented Jul 12, 2016

Sounds good. Thanks for putting this together, I'm excited to try it out (when I can find time).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants