Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature idea: make suggestions for schema improvements, such as deleting empty columns #44

Open
simonw opened this issue Nov 5, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Nov 5, 2023

I've been working with a CSV which turns out to have a bunch of columns with no data. I manually scrolled down the list and clicked "delete" next to each column with no examples shown, but a feature that suggests that and then clicks all the buttons for me could be neat.

Related: could also suggest deleting all columns which only have a single distinct value in them.

@simonw simonw added the enhancement New feature or request label Nov 5, 2023
@simonw
Copy link
Owner Author

simonw commented Nov 5, 2023

Rough UI mockup:

CleanShot 2023-11-05 at 13 02 32@2x

@simonw simonw changed the title Feature idea: delete all empty columns Feature idea: delete all empty columns and other suggestions Nov 5, 2023
@simonw
Copy link
Owner Author

simonw commented Nov 5, 2023

More ideas for suggestions:

  • Rename columns with spaces and hyphens in them to their cleaned up underscore alternative
  • Apply a primary key if there is only one column that looks good as a primary key option
  • Foreign key suggestions to other tables could go up here too
  • Suggest type conversions: if a column is entirely digits, suggest converting it to an integer
  • Maybe even suggest a better name for a table, could use LLM trickery here via a new plugin hook?

If suggestions were driven by a plugin hook there could be fancy ones like spotting location (lat, lon) columns and suggesting splitting those into latitude and longitude columns, but that gets a LOT harder as now we are doing data conversions in addition to just editing the schema with .transform().

Maybe this feature should live outside the datasette-edit-schema plugin? That way it could include features that modify data directly. It could also suggest things like "why not setup FTS against this column with lots of text in it?".

@simonw
Copy link
Owner Author

simonw commented Nov 5, 2023

There's also something interesting about generating these suggestions as an offline process (or a separate asyncio task in the same process not connected to the current request) - that way Datasette could do expensive things like scan millions of rows for potential columns that could be converted to a date.

@simonw simonw changed the title Feature idea: delete all empty columns and other suggestions Feature idea: make suggestions for schema improvements, such as deleting empty columns Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant