feat: Add flag to disable writing failure files in batch daemon #3832
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello all 👋
This is just a quick little patch that I think might help with GAE users. Let me explain.
The problem
The moment I added google/cloud to my Laravel + GAE Standard project, I noticed I was getting critical log entries and failed requests at regular intervals in production. The errors read like this:
Exceeded soft memory limit of 512 MB with 577 MB after servicing 89 requests total.
I hunted for memory leaks in my code, but could not reproduce any. Then one day I remembered that GAE standard has what is basically a tmpfs (ram disk) in the /tmp directory. I thought, "Maybe there are some files being written in there." Well... for once I was right! I made a little function that would display all of the files in the /tmp, along with their size (since you can't really access the file system of gae standard instances). To my surprise, there were actually some files from the batch daemon that were filling up the ram on my gae standard instances. They looked like this:
After an instance was started, these files would steadily grow until they occupied about 75% of my instance's memory. And then poof, the instance would run out of memory and die. This explains the random failures and 500's I was getting.
A possible fix
In this PR, I am basically doing the same thing I had to fork this package for. I add an env var that lets me just skip the
fwrite
operation that writes these failed-items files. So they never even get written. Tests still pass.I know what you might be thinking. "Ray, there's an env var that let's you change the directory that these failed-items files are created." You are right! But on GAE standard, there is only one writable directory. If I tell BatchDaemon to write the files elsewhere, it throws exceptions. So this PR let's you just stop these files from being written.
Further discussion
The silly part to this whole story is that I still don't quite know what the batch-runner is doing for us in my app. We don't use it. The only reason it is installed is because another package we use had google/cloud as a dep in composer. I actually tried downloading one of the failed-items files and it was unreadable binary looking data. Maybe what I was actually fixing here was a symptom and perhaps a configuration change in my app would keep these failed-items from being written to begin with. But the important part for me is figuring out what data they hold. When I say these files filled up quick, I don't mean a few kilobytes per hour. I mean, I would hit refresh and see three of them grow by 2-3MB. So each instance would only last for 15-30 minutes before running out of ram.
I am very open to suggestion and being wrong about my assumptions in this case. Just let me know what you think. This PR is very similar. Looks related, just in a different trait.
#2336
Thanks and Happy Valentines Day!