Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the same data to be shared by multiple Upload scalars #92

Merged
merged 25 commits into from
Aug 15, 2018

Conversation

mike-marcacci
Copy link
Collaborator

This will provide support for file deduplication using alpha releases of fs-capacitor which I'm building alongside this.

The fs-capacitor API needed to change to accomplish this, so I went ahead and added tests and better docs (borrowing a lot of boilerplate from this project, since this will probably be its primary use case).

Copy link
Owner

@jaydenseric jaydenseric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great news 🎉

Does this mean the Deduped files test should no longer be todo?

Also, a changelog entry should be added under ## Next.

@mike-marcacci
Copy link
Collaborator Author

That's correct! I just hadn't finished the implementation; the first commit was simply to ensure existing functionality with the new API for fs-capacitor, but then I got tired and went to bed :).

This should now support file de-dup. If you wouldn't mind just giving a quick scan of fs-capacitor (master branch) to make sure everything there looks sane, I'll publish it as 1.0.0 and switch the dependency here from beta.

@mike-marcacci
Copy link
Collaborator Author

mike-marcacci commented Jul 23, 2018

OK, so the challenge here is timing our cleanup...

What we need to do is:

  1. call stream.destroy(error) on streams which have never had a consumer
  2. call capacitor.destroy() without an error, so that resources will be cleaned up once all read streams detach

We want to do this once we know that the server is done, which we can assume is true once the response is sent.

Initially, I attempted to do this when the request closed, but node 8 doesn't fire this event in the same situations as node 10. I had to do all sorts of trial and error to get #81 to work the same across the different versions, and I suspect I'll have to do so here as well. I'm going to have to shelve this for another day.

@mike-marcacci
Copy link
Collaborator Author

mike-marcacci commented Aug 2, 2018

OK, so this should now pass in all versions of node >=8. However, I did resort to some unattractive behavior in order to get this to work without making breaking changes to processRequest. In particular, I emit a release event on the request, which is non-standard and a kind of dirty thing to do.

Instead, I would prefer either:

  • requiring response as an argument
  • returning Promise<{ release: () => void, operation: * }>

@jaydenseric do you have any thoughts about making these changes?

@mike-marcacci
Copy link
Collaborator Author

In fact, I'm pretty sure we should make one of these changes to the API; right now there's a "hidden" breaking change, in that the API will continue to work, but could leak files (until the process exits)... which is the worst kind of breaking change.

@mike-marcacci
Copy link
Collaborator Author

OK, so the real challenge I'm facing here is in timing (surprise, surprise). In particular, the existing behavior says that in the following scenario:

  1. Upload A completes
  2. request disconnects while Upload B is streaming
  3. request disconnected before Upload C started

It should be possible to call .createReadStream() on Upload A and stream all its contents in a resolver. However, because I start cleanup once the HTTP request/response is closed, this won't be the case if the client disconnected before the resolver is run.

Is this an OK change to make? I believe I can keep the previous behavior, but the API change would be different: the middleware would have to call release when all resolvers are finished, instead of processRequest trying to infer this from the HTTP response.

@jaydenseric
Copy link
Owner

jaydenseric commented Aug 3, 2018

Is it possible to have it so that cleanup will not happen before the HTTP request/response is closed (.createReadStream() works), but after (.createReadStream() disabled) it will not immediately cleanup; it will wait for all child streams to end first?

@mike-marcacci
Copy link
Collaborator Author

Hey @jaydenseric that is the current behavior in this PR :)

@mike-marcacci
Copy link
Collaborator Author

I have another hour here and I get to figure out why it's hanging on travis... since the tests pass every time in a while loop on mac...

@jaydenseric
Copy link
Owner

You can run Travis locally in a Docker image, I found it helpfull when I had some things work locally but not in Travis.

You can temporarily undo all the async config for tests too, so they at least log progress up until the point of failure instead of silently hanging.

@mike-marcacci
Copy link
Collaborator Author

OK! So I've got this working.

I kept getting throttled at this coffee shop while downloading the travis docker images (great tip btw!) but I already had some linux node images on my machine which helped me narrow down the timing issue.

I've simplified the error handling so that all "fatal" errors use the exit(error) method as a hub instead of relying on error events to be emitted in a particular order.

I also consolidated PromiseDisconnectUploadError and StreamDisconnectUploadError into a single DisconnectUploadError so we don't have to create new error objects each time they're sent to a promise/stream. I know you have some thoughts about further simplifying/standardizing errors, so this might be a short-lived change.

In the abort test, I do use a setTimeout to ensure the request has been interrupted before running the second test, which isn't very elegant... but much simpler than adding hooks to the test request.

This ended up being a much smaller change with our new strategy of calling .createReadStream() within the resolvers, so hopefully the diff makes sense. I'm about to head out on a short backpacking trip this afternoon, so I won't be available over the next several days, but I'm down to chat again next week if you have any questions – I felt that was really productive the other day!


if (map)
for (const upload of map.values())
if (upload.file) upload.file.capacitor.destroy()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought capacitor was only going to be used in tests?

@jaydenseric
Copy link
Owner

Holy moley the tests have become complicated. There has to be a simpler, more consistent way to approach some of them 🤔

@mike-marcacci
Copy link
Collaborator Author

mike-marcacci commented Aug 9, 2018

Hey! Sorry for the delay; was away from sufficient connectivity for longer than I had anticipated.

In my first implementation (before I realized I needed the .capacitor object for the tests) I stored them in a Map internally for cleanup, but once I added the property I removed the map so I'd only have it in one place.

I completely agree about the tests and definitely like your changes. We can probably just check the capacitor.closed flag instead of making sure the file no longer exists, since file cleanup is really in the domain of fs-capacitor and is pretty well tested now... but the explicitness is also kind of nice.

@mike-marcacci mike-marcacci changed the title [WIP] Allow the same data to be shared by multiple Upload scalars Allow the same data to be shared by multiple Upload scalars Aug 13, 2018
@jaydenseric jaydenseric merged commit 62daa35 into master Aug 15, 2018
@jaydenseric jaydenseric deleted the dedup-files branch August 15, 2018 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants