You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Uploading large chunks of data is a pain, since there's not a good way to queue the data to be uploaded, and due to the complexity of the .json validation before ES-insertion, 300 records takes ~ 5 min to upload.
There are at least a few limits to queuing large amounts of data:
The front-end has a limit for how much data it can store in memory for uploading
The backend can only accept I think about 1 MB before it complains; as a result, right now the front-end parses the file into ~ 1 MB chunks to send to the backend.
On the prod server, if there are too many simultaneous requests, the multiprocessing queue can get mixed up and the same record can be inserted multiple times into the index.
Ideally, we could queue a buncha records and let it do its thing overnight. This may involve moving away from the front-end interface, but we'll still have problems with the multiprocessing inserting duplicates.
The text was updated successfully, but these errors were encountered:
@juliamullen : Thinking about this further, it'd be good to decouple and expose the jsonvalidation process to check the validation before trying the POST command (either in the command-line and/or GUI on the website). ideally, the validation process should be checked first, and then make the POST request.
might also make it easier to take advantage of the elasticsearch python library capabilities.
Uploading large chunks of data is a pain, since there's not a good way to queue the data to be uploaded, and due to the complexity of the .json validation before ES-insertion, 300 records takes ~ 5 min to upload.
There are at least a few limits to queuing large amounts of data:
Ideally, we could queue a buncha records and let it do its thing overnight. This may involve moving away from the front-end interface, but we'll still have problems with the multiprocessing inserting duplicates.
The text was updated successfully, but these errors were encountered: