-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema Viewer Drawer #3291
Schema Viewer Drawer #3291
Conversation
Note: the way that schema updates work now is through a periodic celery task that runs the queries to get column names and types etc. The results are stored in the new schema tables. Whenever the schema is fetched from the UI, it just directly queries the data in these tables. Since the schema is set to refresh only every 30 min (https://github.com/getredash/redash/blob/master/redash/settings/__init__.py#L48), this is likely why the We can either increase the frequency of schema update (quicker option, but not as good) or have a one-off schema refresh that is done on init so that the schema is available. I'll look into the latter. |
a47575c
to
ab13344
Compare
cy.login(); | ||
cy.request({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created db-seed.js with this purpose 🤔, so this kind of dependency could be created by using npm run cypress db-seed
prior to all tests and this would be avoided among then:
// create_query_spec.js - a few upper lines that were not shown
const pg = {
name: 'test',
options: {
dbname: 'postgres',
host: 'postgres',
password: 'postgres',
user: 'postgres',
},
type: 'pg',
};
LMK what you think haha
PS: if you are just testing, ignore this 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabrieldutra I was in fact, just testing. Though I could use some help. I cannot reproduce this percy issue locally that shows up here. In fact, when I run the create_query_spec.js
test on master
locally, the DOM snapshots seem to be missing the shema data (included screenshot below) And on the other hand, the snapshot for this PR seems to show the schema locally (screenshot also included below).
Any idea what might be going on here or how I can reproduce this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container. I'll handle this further, but a quick fix to make it respond properly is to, after start cypress server just like you did, run npm run start
for webpack development server and open cypress with CYPRESS_baseUrl=http://localhost:8080 npm run cypress open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't know if it's related, but I noticed the Chinook data source is not showing schema info in the preview.
I'll try to reproduce this locally and give you some help with Percy anyway
Edit: the Chinook issue is probably related to the missing schema
queue in one of the files (docker-compose.production.yml
perhaps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container.
Does the Cypress Docker Compose configuration use VOLUMEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cypress is using docker-compose.cypress.yml
when in CI and the development docker-compose.yml
when not.
Edit: Forgot to mention about the volumes haha, but the first one doesn't use and the second one does.
However it uses http://localhost:5000
, which I guess doesn't use webpack to watch files, so frontend in this case only updates after a rebuild. The two options I see to make it friendlier to the developer would be either adding a npm run start
to a frontend container in docker-compose.yml
or adding this outside docker in cypress scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add to the instructions to run npm run build
before running Cypress tests. Running npm in the container is not possible, because the container will not have Node (currently it does, but it's a temporary thing).
docker-compose.yml
Outdated
@@ -29,7 +29,7 @@ services: | |||
REDASH_LOG_LEVEL: "INFO" | |||
REDASH_REDIS_URL: "redis://redis:6379/0" | |||
REDASH_DATABASE_URL: "postgresql://postgres@postgres/postgres" | |||
QUEUES: "queries,scheduled_queries,celery" | |||
QUEUES: "queries,scheduled_queries,celery,schemas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aweosme! Updating docker-compose.cypress.yml
did the trick! I didn't realize cypress had its own yml file. Thank you for your help @gabrieldutra!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome @emtwo! 🙂
99cfad1
to
7a5d95d
Compare
e6be093
to
3c4e8c8
Compare
I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - |
Don't forget to add the |
3c4e8c8
to
95d3ff6
Compare
I've added the schemas queue in a couple of other spots as you suggested. However, I was hesitant at first to add it since the |
I will do a review of all the Docker Compose files and add I do realize now that everyone who are using the AMIs we build, use a Docker Compose setup without this queue. Which means that: 1) this queue is growing in size, but nothing is processing it; 2) they don't get schema refreshes. 🤦♂️ |
Change of plans: #3325. |
@@ -198,21 +198,25 @@ def delete(self): | |||
return res | |||
|
|||
def get_schema(self, refresh=False): | |||
key = "data_source:schema:{}".format(self.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is removing this redis caching of schema information intentional? Is there a performance impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out @washort!
It was intentional because from what I recall back in the Berlin work-week, I think @arikfr was saying he felt that using redis to store schema was a bit of a hack and he would prefer it stored in a table. Of course, we could be storing the data in tables and have additional caching for performance, but I felt this added complexity of maintaining both a cache and tables for the same data was perhaps not worth the performance gain.
I did a quick test on my machine and with 5 runs of the old vs. the new get_schema()
function, the redis one averages 7.2ms per call and this one (from this pr) averages 44ms per call. It's a big relative difference, but 44ms isn't so bad. Though of course this could be worse in different scenarios, e.g. slower network/machine or more data. I suppose I will defer this decision to @arikfr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, just curious.
redash/models/__init__.py
Outdated
@@ -67,6 +67,56 @@ def get(self, query_id): | |||
scheduled_queries_executions = ScheduledQueriesExecutions() | |||
|
|||
|
|||
@python_2_unicode_compatible | |||
class TableMetadata(db.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll need a migration to create these tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I had missed this, thank you!
95d3ff6
to
2ac1607
Compare
2ac1607
to
0c1813c
Compare
redash/query_runner/presto.py
Outdated
|
||
for row in results['rows']: | ||
for i, row in enumerate(results['rows']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar blocks of code found in 2 locations. Consider refactoring.
for row in results['rows']: | ||
table_samples = {} | ||
|
||
for i, row in enumerate(results['rows']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar blocks of code found in 2 locations. Consider refactoring.
redash/tasks/queries.py
Outdated
persisted_table = models.db.session.query( | ||
TableMetadata).filter( | ||
TableMetadata.table_name==table_name).filter( | ||
TableMetadata.data_source_id==ds.id).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing whitespace around operator
0c1813c
to
f072e64
Compare
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <[email protected]> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <[email protected]> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <[email protected]> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <[email protected]> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <[email protected]>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <[email protected]>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <[email protected]> Co-authored-by: Jannis Leidel <[email protected]>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Co-authored-by: Alison <[email protected]> Co-authored-by: Jannis Leidel <[email protected]>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Remove old migrations. Co-authored-by: Alison <[email protected]> Co-authored-by: Jannis Leidel <[email protected]>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Remove old migrations. Co-authored-by: Alison <[email protected]> Co-authored-by: Jannis Leidel <[email protected]>
* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Remove old migrations. Co-authored-by: Alison <[email protected]> Co-authored-by: Jannis Leidel <[email protected]>
This is a fresh PR with the code from #2990 rebased and linted.
It is ready for review now. This PR is the first of a series of PRs for schema enhancements. I will link the subsequent PRs here as they become available.
[1] Schema viewer drawer #3291 (this one)
[2] Schema admin configuration #3292
[3] Schema query samples #3293
[4] Data source descriptions #3401