-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error messages when using ia upload --metadata incorrectly #267
Comments
See issue #176 |
Are you sure that this is the same issue? Although the error message is the same, I wasn't using the bulk upload method. Also, as mentioned, I've seen that same error occasionally due to general connectivity issues (fairly frequently in the last few days actually), but simply rerunning |
I just fell into this trap again. Definitely not the same as #176 and the countless transient appearances of that error. Please reopen. I'm investigating the detailed conditions currently. |
I managed to get it down to a minimal reproducible example. I strongly suspect that it's due to whitespace before the first colon in the ia upload test_connection_aborts_jaa_20210601 foo --metadata='collection:test_collection' --metadata='a test:foo' If
The
If
There is no hard limit where it switches from one error to the other. For me, it changes at about 95 kB, but a test with a 95250-byte I'll happily send a PR to catch this client-side before even attempting the upload. Might as well make it more robust than just this though. @jjjake, could you clarify what the requirements for metadata keys are exactly please? I couldn't find it in the docs. Would make a good addition there as well... :-) |
Metadata keys must be valid XML tags, as noted here. IA-S3 metadata headers should be lowercase and are a subset of what fits in the http headers header name space, with one exception: IMO, the issue here should be resolved by IA-S3 returning a valid S3 XML response that would then be parsed and surfaced to the user exactly like other S3 errors are. Currently it returns HTML, and that's why there is no error message returned. I think any and all validation should be centralized and happen at the IA-S3 level. For whitespace in particular though, I'm wondering if we should just strip trailing/leading whitespace and replace any other whitespace with |
Ah, duh, not sure how I missed that. Thanks. That explains the behaviour I guess. When the request is small enough to be received in full by the server before parsing, it notices the malformed HTTP headers and responds with a 400, but if the payload is too large, it just bails. Or perhaps it does send a 400 response even then, but the client never notices that because it sees the connection error on trying to send the next chunk of request data.
I certainly agree with that, though I've found it to often take much longer to change such things on the IA side than the client one. And I have a feeling that this might not be exactly straightforward to implement without adding 100 Continue support or similar; as hinted at above, the response might not be seen by the client at all otherwise. But if it's easy enough to do, absolutely!
As a user, I'd much rather get told that my metadata field name is invalid and how to fix it than it getting mangled transparently. The former should lead to the user learning how to properly name custom metadata fields. The latter results in rather painful attempts at fixing the metadata after making a mistake like the one in the original issue. |
@JustAnotherArchivist Thanks for the reply. I agree with everything you've said, it makes sense. I'll try to push getting S3 to return proper errors in these cases. It sounds like something we might be able to do soon, fingers crossed. |
Earlier, I used a wrong syntax with
ia upload
's--metadata
option; specifically, I forgot to include thedescription
field specifier and directly used--metadata='My description: something.'
(instead of--metadata='description:My description: something'
). That should obviously produce an error, and it did, but the error message was very unhelpful and led me to think the IA S3 API was having issues instead (as I had connection issues the past few days which manifested in the exact same error):Note that my
--metadata
value did contain a colon, and so I suspect thatia
thought it's valid. Here's the exact value I specified (for this item; in Bash, hence the weird'"'"'
):I assume that this was detected as an attempt to inject malicious HTML into the system or something like that. Perhaps there could be a bit more validation on the client side as to what metadata keys are acceptable?
By the way, using that value with
ia metadata identifier --modify
produceserror (400): Invalid Character Error
.The text was updated successfully, but these errors were encountered: