Streamline upload process on the backend #75

flaneuse · 2020-11-17T19:20:34Z

Uploading large chunks of data is a pain, since there's not a good way to queue the data to be uploaded, and due to the complexity of the .json validation before ES-insertion, 300 records takes ~ 5 min to upload.

There are at least a few limits to queuing large amounts of data:

The front-end has a limit for how much data it can store in memory for uploading
The backend can only accept I think about 1 MB before it complains; as a result, right now the front-end parses the file into ~ 1 MB chunks to send to the backend.
On the prod server, if there are too many simultaneous requests, the multiprocessing queue can get mixed up and the same record can be inserted multiple times into the index.

Ideally, we could queue a buncha records and let it do its thing overnight. This may involve moving away from the front-end interface, but we'll still have problems with the multiprocessing inserting duplicates.

flaneuse · 2020-11-17T20:21:34Z

Also, sometimes I get random unexplained PUT errors:

flaneuse · 2020-11-17T20:22:15Z

Also, sometimes I get random unexplained PUT errors:

never mind... i think this is actually wifi / vpn instability on my part

flaneuse · 2020-11-18T19:33:30Z

@juliamullen : Thinking about this further, it'd be good to decouple and expose the jsonvalidation process to check the validation before trying the POST command (either in the command-line and/or GUI on the website). ideally, the validation process should be checked first, and then make the POST request.

might also make it easier to take advantage of the elasticsearch python library capabilities.

flaneuse added the backend label Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline upload process on the backend #75

Streamline upload process on the backend #75

flaneuse commented Nov 17, 2020

flaneuse commented Nov 17, 2020

flaneuse commented Nov 17, 2020

flaneuse commented Nov 18, 2020

Streamline upload process on the backend #75

Streamline upload process on the backend #75

Comments

flaneuse commented Nov 17, 2020

flaneuse commented Nov 17, 2020

flaneuse commented Nov 17, 2020

flaneuse commented Nov 18, 2020