Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing very large batches #7734

Closed
laurentS opened this issue May 9, 2023 · 3 comments
Closed

Managing very large batches #7734

laurentS opened this issue May 9, 2023 · 3 comments

Comments

@laurentS
Copy link
Contributor

laurentS commented May 9, 2023

At madada, we are currently facing some interesting issues with the scale of some batches.
We have started a "giant metabatch", where we are sending the same request to all the bodies in our database (this is for a document that is unambiguously required by law, our goal being to quantify how badly the law is applied, we're not just spamming them for fun). This represents over 50k bodies. To make this sort of manageable, we are building "smallish" batches of ~1k bodies at a time (by running SQL queries directly in the db 😱)

While this might seem exceptional, we have a number of very valid use cases where people have asked to create batches of ~500-2k requests, sometimes many more (all cities/towns in France is >36k bodies). We feel that the current system has some limitations for handling such large batches, and I'd be keen to discuss how we can improve this.

What we have seen so far:

  • emails are sometimes marked as spam, particularly if we target a group of public bodies which use a central email system (they see 1-2k emails come in from the same address within a few minutes)
  • the web UI becomes a bit clunky to use once the batch has been sent
    • each page has to load a lot of data, making it slow
    • the batch view is not ordered by state, so we need to scroll around to find replies and handle them
  • to avoid getting flagged as a spam box (because we'd send 500x our usual email output as a spike), we manually space out batch sending, doing 1-2 per day, but this ends up taking many days. This will get better as our traffic grows.
  • we get a deluge of automated replies that we have to go through one by one to figure out if it's a spam filter, an incorrect address in our db that needs updating, etc... We are building some scripts to tag requests based on heuristics to help with this (think email filters), and then minimise the number of clics required by our admins for each case.
  • From a legal perspective, batch requests in France are handled slightly differently during appeal compared to individual requests. When we break down a huge batch into a set of smaller ones, we sort of go against this logic.

I'm curious to know if others have faced similar issues, and what you've done to address them. We are going to work on this in a way that works for us, but if we can help others at the same time, it would be even better :)

@laurentS
Copy link
Contributor Author

Related: OOM issues when sending large batches: #8184

@garethrees
Copy link
Member

we get a deluge of automated replies… incorrect address in our db that needs updating…

We've done some thinking about auto-replies in #2045.

Also thought a bit about automated bodies database maintenance in #4837 and #7174.

the web UI becomes a bit clunky… the batch view is not ordered by state

Yeah, I think this definitely needs looking at, even for smaller batches.

From a legal perspective, batch requests in France are handled slightly differently during appeal compared to individual requests. When we break down a huge batch into a set of smaller ones, we sort of go against this logic.

I'm not sure I agree that just because you might split a "batch" into several batch records in Alaveteli that this wouldn't be considered a single "batch" in a legal sense. What specific issues does this seem to cause on your end?

@garethrees
Copy link
Member

This is desirable, but unlikely to be worked on in the next 12 months so closing for now.

@garethrees garethrees closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants