-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-job: flux_kvs_lookup_get: Value too large for defined data type #6256
Comments
Was this a |
This was likely |
So I'm guessing this user produced a bunch of stdout into the KVS, and once it was > 2G, EOVERFLOW occurs b/c the data returned would be greater than the size of an int. Reproduced via
I'm not really sure how to fix / improve this w/o major re-engineering (i.e. read KVS values w/ offsets / seeking values ... support > INT_MAX data ... may require RFC changes, don't know off the top of my head where we define things as ints only). For the time being ... Better error message? |
To my surprise we don't define things as |
Well it is a bit silly to be failing for that reason, although fetching a > 2G message in a KVS get may fail for other reasons. It's certainly going to be a self-DoS for a while. |
just brainstorming here, we could support some type of |
Few brainstormings this morning:
|
I wonder if it would be easier to implement a limit on storing a large value in the first place? I hate to suggest modifying RFC 11 but we could maybe consider adding an optional size to If we have a running total size for a kvs value, then we have the means to reject an append that would cause it to exceed some maximum. |
That seems like a good idea for a full on correct solution in the KVS. But if we're going down the path of just limiting appends, for a quicker solution for this case, perhaps we can abuse the OUTPUT_LIMIT support in the shell to limit stdout/stderr? I see this in the shell's code
if we just change that "0" to "2G" (or perhaps less if we want to be conservative), I think the stdout/stderr will probably just be capped for single user instances. |
Great point! |
ok, lets go with that quick and dirty solution in the shell. The KVS thing you mention above is the longer term generalized solution, b/c users might be willing to abuse it if they write their own data to the KVS. |
OK, I'll open an issue on the long term one. It requires more thought/discussion/spelunking I think. |
We've had #5148 open since the original problem |
Problem: The KVS has a size limit of INT_MAX for when returning kvs values. This limit can be exceeded by a job's standard output because it is continually appended and the total size is not yet tracked by the KVS. When reading the output later, such as via `flux job attach`, this can lead to EOVERFLOW errors. Solution: For a single user instance, default to a maximum standard output of 1G instead of "unlimited". 1G should provide a practical maximum for most users and encourage them to send standard output to a file if they want to save excess standard output. If desired, the value can still be overwritten via the "output.limit" setting. Fixes flux-framework#6256
Problem: The KVS has a size limit of INT_MAX for when returning kvs values. This limit can be exceeded by a job's standard output because it is continually appended and the total size is not yet tracked by the KVS. When reading the output later, such as via `flux job attach`, this can lead to EOVERFLOW errors. Solution: For a single user instance, default to a maximum standard output of 1G instead of "unlimited". 1G should provide a practical maximum for most users and encourage them to send standard output to a file if they want to save excess standard output. If desired, the value can still be overwritten via the "output.limit" setting. Fixes flux-framework#6256
Problem: The KVS has a size limit of INT_MAX for when returning kvs values. This limit can be exceeded by a job's standard output because it is continually appended and the total size is not yet tracked by the KVS. When reading the output later, such as via `flux job attach`, this can lead to EOVERFLOW errors. Solution: For a single user instance, default to a maximum standard output of 1G instead of "unlimited". 1G should provide a practical maximum for most users and encourage them to send standard output to a file if they want to save excess standard output. Do not allow configuration larger than this. As a consequence the configuration of "unlimited" is no longer allowed. Fixes flux-framework#6256
A user is seeing this error from
flux run
in a subinstance with a job that may generate a lot of output. Not sure we've seen this one before.The text was updated successfully, but these errors were encountered: