Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ia upload --checksum doesn't always skip existing files #289

Open
JustAnotherArchivist opened this issue Jan 22, 2019 · 5 comments
Open

Comments

@JustAnotherArchivist
Copy link
Contributor

Situation: I'm uploading a large dataset to IA (cf. #288). As of right now, 130 of the 159 total files are uploaded. The local files should not have been modified since I started the upload.

Running ia upload $identifier * --checksum --no-derive (regarding the last flag, see #288) starts uploading the first file again, which is already on IA, instead of resuming at the 131st file. The files were skipped correctly on a previous resume when 62 files had been uploaded before.

I can work around this by explicitly listing the missing files obviously, but --checksum clearly isn't doing what it's supposed to do...

@jjjake
Copy link
Owner

jjjake commented Jan 22, 2019

The checksum option only works when there are no queued or running tasks. I'm guessing you had queued or running tasks when you used it?

If so, it looks like this needs to be documented better (I thought it was).

@JustAnotherArchivist
Copy link
Contributor Author

Oh, I see. Yes, that is indeed the case. (The derive that was queued due to #288.)

Maybe it could be checked whether there are queued tasks? Although I guess that would require a fix for #167 first.

@JustAnotherArchivist
Copy link
Contributor Author

@jjjake What would you think about either returning an error or waiting for existing tasks to finish (with a warning message of course so the user knows what's going on) when the --checksum option is used?
Spreadsheet uploads would complicate this somewhat though; checking and erroring/waiting every time the item identifier column changes seems most reasonable to me there.

@jjjake
Copy link
Owner

jjjake commented Feb 10, 2022

I don't like the idea of waiting for existing tasks to finish because that could be hours to even days, depending on the task.

I think erroring out might make the most sense, but I don't love this either. I think most people use --checksum to avoid having to re-upload the same file again (e.g. save some time). I feel like erroring out or stalling, rather than just re-uploading, would be confusing and annoying for most users.

I'd prefer to keep the same behavior we currently have, and adding a warning message. Alternatively, perhaps there should be another option that would support what you're talking about?

Just my 2 cents though, open to feedback! :)

@JustAnotherArchivist
Copy link
Contributor Author

Yeah, you're right, it wouldn't be the best UX. I'd be fine with a warning and an option to make it an error. The rest can be done with a small wrapper script when desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants