You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It takes 2 hours to load the table into a pandas dataframe (issue). This comment says if we export the BigQuery table to csv and then read the csv from GCS, it would only take 1 minute. This sounds promising.
Also note, we don't need to use pandas. I couldn't find a direct way to go from BigQuery to Elasticsearch so I used pandas, but we don't need to.
Agreed I think exporting the table to a file would definitely be a better way to do it (though I think we have to use JSON not CSV in order to support repeated:
You cannot export nested and repeated data in CSV format. Nested and repeated data is supported for Avro and JSON exports.
We could potentially just transform this JSON output into the format we want in the index (I think just fully qualifying the fields with dataset.table.column) and avoid pandas altogether, which would be ideal.
The bigger problem I'm encountering is the elasticsearch requests timing out, I'm currently experimenting with lowering the max_chunk_bytesparameter to see if that helps.
No description provided.
The text was updated successfully, but these errors were encountered: