Managing very large batches #7734

laurentS · 2023-05-09T08:23:18Z

At madada, we are currently facing some interesting issues with the scale of some batches.
We have started a "giant metabatch", where we are sending the same request to all the bodies in our database (this is for a document that is unambiguously required by law, our goal being to quantify how badly the law is applied, we're not just spamming them for fun). This represents over 50k bodies. To make this sort of manageable, we are building "smallish" batches of ~1k bodies at a time (by running SQL queries directly in the db 😱)

While this might seem exceptional, we have a number of very valid use cases where people have asked to create batches of ~500-2k requests, sometimes many more (all cities/towns in France is >36k bodies). We feel that the current system has some limitations for handling such large batches, and I'd be keen to discuss how we can improve this.

What we have seen so far:

emails are sometimes marked as spam, particularly if we target a group of public bodies which use a central email system (they see 1-2k emails come in from the same address within a few minutes)
the web UI becomes a bit clunky to use once the batch has been sent
- each page has to load a lot of data, making it slow
- the batch view is not ordered by state, so we need to scroll around to find replies and handle them
to avoid getting flagged as a spam box (because we'd send 500x our usual email output as a spike), we manually space out batch sending, doing 1-2 per day, but this ends up taking many days. This will get better as our traffic grows.
we get a deluge of automated replies that we have to go through one by one to figure out if it's a spam filter, an incorrect address in our db that needs updating, etc... We are building some scripts to tag requests based on heuristics to help with this (think email filters), and then minimise the number of clics required by our admins for each case.
From a legal perspective, batch requests in France are handled slightly differently during appeal compared to individual requests. When we break down a huge batch into a set of smaller ones, we sort of go against this logic.

I'm curious to know if others have faced similar issues, and what you've done to address them. We are going to work on this in a way that works for us, but if we can help others at the same time, it would be even better :)

laurentS · 2024-03-28T17:06:10Z

Related: OOM issues when sending large batches: #8184

garethrees · 2024-04-02T11:52:44Z

we get a deluge of automated replies… incorrect address in our db that needs updating…

We've done some thinking about auto-replies in #2045.

Also thought a bit about automated bodies database maintenance in #4837 and #7174.

the web UI becomes a bit clunky… the batch view is not ordered by state

Yeah, I think this definitely needs looking at, even for smaller batches.

From a legal perspective, batch requests in France are handled slightly differently during appeal compared to individual requests. When we break down a huge batch into a set of smaller ones, we sort of go against this logic.

I'm not sure I agree that just because you might split a "batch" into several batch records in Alaveteli that this wouldn't be considered a single "batch" in a legal sense. What specific issues does this seem to cause on your end?

garethrees · 2024-11-22T12:13:33Z

This is desirable, but unlikely to be worked on in the next 12 months so closing for now.

WilliamWDTK added f:batch-creation f:batch-management x:france labels May 9, 2023

garethrees added the professional label Apr 2, 2024

garethrees closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing very large batches #7734

Managing very large batches #7734

laurentS commented May 9, 2023

laurentS commented Mar 28, 2024

garethrees commented Apr 2, 2024

garethrees commented Nov 22, 2024

Managing very large batches #7734

Managing very large batches #7734

Comments

laurentS commented May 9, 2023

laurentS commented Mar 28, 2024

garethrees commented Apr 2, 2024

garethrees commented Nov 22, 2024