Indexer performance for very large tables (> 2Gb) #90

bfcrampton · 2018-10-29T15:51:09Z

No description provided.

melissachang · 2018-10-29T16:24:59Z

It takes 2 hours to load the table into a pandas dataframe (issue). This comment says if we export the BigQuery table to csv and then read the csv from GCS, it would only take 1 minute. This sounds promising.

Also note, we don't need to use pandas. I couldn't find a direct way to go from BigQuery to Elasticsearch so I used pandas, but we don't need to.

bfcrampton · 2018-10-29T16:41:20Z

Agreed I think exporting the table to a file would definitely be a better way to do it (though I think we have to use JSON not CSV in order to support repeated:

You cannot export nested and repeated data in CSV format. Nested and repeated data is supported for Avro and JSON exports.
We could potentially just transform this JSON output into the format we want in the index (I think just fully qualifying the fields with dataset.table.column) and avoid pandas altogether, which would be ideal.

The bigger problem I'm encountering is the elasticsearch requests timing out, I'm currently experimenting with lowering the max_chunk_bytes parameter to see if that helps.

bfcrampton · 2018-11-30T15:28:31Z

Kubernetes configs updated in #98

bfcrampton self-assigned this Oct 29, 2018

bfcrampton mentioned this issue Oct 30, 2018

BQ indexer times out in requests to ES on dataset with many samples per participant #78

Open

bfcrampton closed this as completed Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer performance for very large tables (> 2Gb) #90

Indexer performance for very large tables (> 2Gb) #90

bfcrampton commented Oct 29, 2018

melissachang commented Oct 29, 2018

bfcrampton commented Oct 29, 2018

bfcrampton commented Nov 30, 2018

Indexer performance for very large tables (> 2Gb) #90

Indexer performance for very large tables (> 2Gb) #90

Comments

bfcrampton commented Oct 29, 2018

melissachang commented Oct 29, 2018

bfcrampton commented Oct 29, 2018

bfcrampton commented Nov 30, 2018