-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow processors to log and spew objects larger than 64K #55
Conversation
try: | ||
line = await out.readline() | ||
except ValueError: | ||
logging.error('Received a too long log line (>64KB), truncated') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the log line being truncated and re-outputted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's being discarded altogether and the error log is shown instead.
I think that log lines should be used for human readable text - limiting log lines at 64K is quite reasonable, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - it's fine, but the error message suggests it'll be truncated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you. Changing to 'discarded'.
@OriHoch would you mind taking a look too? |
looks good from what I understand this doesn't resolve my use where we really needed more then 64kb of output |
Why do you think so? |
I'll test with my use-case: Beit-Hatfutsot/mojp-dbs-pipelines#4 |
This pull request handles 2 issues:
The main problem is that the manager uses the asyncio framework which imposes hard limits on line sizes (as it's an async framework). Processors themselves don't have that problem as they work synchronously.
The 1st issue is handled by detecting such cases and showing a friendly message to the user (the original log line is discarded).
The 2nd issue is resolved by adding a 'sink' processor to every pipeline which makes sure that no large objects are passed to the manager.
(Also added a few missing tests which were clouded by a bad gitignore rule)