Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexer performance for very large tables (> 2Gb) #90

Closed
bfcrampton opened this issue Oct 29, 2018 · 3 comments
Closed

Indexer performance for very large tables (> 2Gb) #90

bfcrampton opened this issue Oct 29, 2018 · 3 comments
Assignees

Comments

@bfcrampton
Copy link

No description provided.

@bfcrampton bfcrampton self-assigned this Oct 29, 2018
@melissachang
Copy link
Contributor

It takes 2 hours to load the table into a pandas dataframe (issue). This comment says if we export the BigQuery table to csv and then read the csv from GCS, it would only take 1 minute. This sounds promising.

Also note, we don't need to use pandas. I couldn't find a direct way to go from BigQuery to Elasticsearch so I used pandas, but we don't need to.

@bfcrampton
Copy link
Author

Agreed I think exporting the table to a file would definitely be a better way to do it (though I think we have to use JSON not CSV in order to support repeated:

You cannot export nested and repeated data in CSV format. Nested and repeated data is supported for Avro and JSON exports.
We could potentially just transform this JSON output into the format we want in the index (I think just fully qualifying the fields with dataset.table.column) and avoid pandas altogether, which would be ideal.

The bigger problem I'm encountering is the elasticsearch requests timing out, I'm currently experimenting with lowering the max_chunk_bytes parameter to see if that helps.

@bfcrampton
Copy link
Author

Kubernetes configs updated in #98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants