Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cast for Table and Column #6711

Merged
merged 24 commits into from
May 19, 2023
Merged

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented May 16, 2023

Pull Request Description

Closes #6112

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd self-assigned this May 16, 2023
@radeusgd radeusgd force-pushed the wip/radeusgd/6112-table-cast branch 2 times, most recently from d3bc5d9 to 9b6b641 Compare May 17, 2023 11:54
@radeusgd radeusgd marked this pull request as ready for review May 17, 2023 11:56
@radeusgd radeusgd force-pushed the wip/radeusgd/6112-table-cast branch from dff87c3 to 80c22ef Compare May 17, 2023 16:54
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First few comments will review once done morning meetings.

SQL_Type_Reference.new self.connection self.context new_expression
new_column = dialect.make_cast self.as_internal target_sql_type infer_from_database
Column.Value new_column.name self.connection new_column.sql_type_reference new_column.expression self.context
cast self value_type=Value_Type.Char on_problems=Problem_Behavior.Report_Warning =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove the value_type default and make the user choose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but why?

We wanted to have as many defaults on nodes as possible, right? So that they preview nicely in the CB when 'pre-selected'.

While for parse we should remove the default in Database, because there is no good choice; I think for cast Char is a pretty good choice - converting to Text should work in baiscally every case and is a good example of what cast does.

I can ofc change it if you think this is the right approach, but IMO this particular default is OK, so I'd like to confirm that you really want this and would appreciate understanding why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like in cast it is a real choice of the user so I felt like it should be a users explicit choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But if we go by this heuristic, shouldn't we also remove defaults from stuff like select_columns, remove_columns and many more? There it is even more of an explicit user's choice which columns they want to remove - defaulting to the first one is just an arbitrary default we have to show the functionality working in the preview.

(I don't want to be stubborn but I want us to be consistent and really understand the reasoning for why we want to include defaults in X but not in Y.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - and ok lets go with Text as the default I guess as its harmless.

I'm not sure we should have defaults on everything and this feels a more aggressive default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should have defaults on everything

Yeah, I think I agree.

and this feels a more aggressive default.

But I cannot understand this part - hence my opposition.

It's not super important, and I guess a bit subjective so it may be hard to explain - but could you try to narrow down what makes this a 'more aggresive' default?

I think my dissonance comes from the fact that IMO select_columns and remove_columns defaults are even 'more aggresive' than this one. Or maybe you agree that these are 'aggressive defaults' too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know it's not the most important discussion, but just trying to understand your perspective so I can understand the decisions better)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it's changing data in a way - in select it's basically a no-op (we just pick a column from the list).
On a billion-row table, cast is a chunky complex operation and hence I guess felt like worth getting user to choose what they want to do. Unlike parse, we can't guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't think of the operation cost! That is a very good point indeed - thanks for explaining it.

So I can indeed remove the default, we indeed don't want to wait for a very long operation to complete just to switch the type...

@radeusgd radeusgd force-pushed the wip/radeusgd/6112-table-cast branch from 80c22ef to 110071b Compare May 18, 2023 09:55
@radeusgd radeusgd requested a review from jdunkerley May 18, 2023 09:56
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but have put some questions in.

@radeusgd radeusgd force-pushed the wip/radeusgd/6112-table-cast branch from 110071b to 2a12295 Compare May 18, 2023 17:34
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label May 18, 2023
@radeusgd radeusgd force-pushed the wip/radeusgd/6112-table-cast branch from 2a12295 to 32f9772 Compare May 18, 2023 17:47
case TimeOfDayType() -> new TimeOfDayBuilder(size);
case FloatType(Bits bits) ->
switch (bits) {
case AnyObjectType x -> new ObjectBuilder(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this will prevent the incremental compilation problems as matching on type doesn't need to recognize record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was sad to see the neat matching go, but better build stability is more important.

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so cool that we will finally get rid of the incremental compilation issue with records

@mergify mergify bot merged commit 447786a into develop May 19, 2023
@mergify mergify bot deleted the wip/radeusgd/6112-table-cast branch May 19, 2023 10:00
mergify bot pushed a commit that referenced this pull request May 19, 2023
Implements the simplest `parse` scenarios for the Database backend.

Before #6711 these could have been done by `cast`, but in #6711 the APIs were unified to only allow casting to the same set of types in both in-memory and Database. Converting Text to other types is supposed to be done by `parse` and not `cast`, so the ability to use `cast` for rudimentary parsing is removed in the Database backend to make it consistent with in-memory. But now it is lacking any, even simplest, Text->Int/Text->Date support. To alleviate that, the simple scenarios for `parse` are implemented (no support for format customization yet, will boil down to a cast under the hood).
Procrat added a commit that referenced this pull request May 22, 2023
…t-rename

* develop:
  Widgets, Vector as Column, Viz Fixes and Rename Columns (#6768)
  Implement simple variants of `parse` for the Database backend (#6731)
  Enable `require-jsdoc` lint and add two lints related to React (#6403)
  Decimal/Integer .round and .int #6654 (#6743)
  Set suggestion reexports when serializing the library (#6778)
  Fix file uploading node expression. (#6689)
  Using WarningsLibrary to query for warnings (#6751)
  Implement `cast` for Table and Column (#6711)
  Display Initializing project... message when initializing project (#6661)
  Only send suggestions updates when type changes (#6755)
  sbt runEngineDistribution --debug to ease debuggging (#6745)
  Display "modified at" column on the cloud dashboard (#6687)
  Meta.meta Integer . methods (#6740)
  Show spinner while loading directory (#6714)
  Add cloud dashboard to changelog (#6688)
  Fix list editor panics during insertion (#6540)
  Update bug-report.yml
  Remove project create form (#6710)
  Change full-screen visualisation shortcut to shift-space (#6663)
Procrat added a commit that referenced this pull request May 22, 2023
* develop: (30 commits)
  Widgets, Vector as Column, Viz Fixes and Rename Columns (#6768)
  Implement simple variants of `parse` for the Database backend (#6731)
  Enable `require-jsdoc` lint and add two lints related to React (#6403)
  Decimal/Integer .round and .int #6654 (#6743)
  Set suggestion reexports when serializing the library (#6778)
  Fix file uploading node expression. (#6689)
  Using WarningsLibrary to query for warnings (#6751)
  Implement `cast` for Table and Column (#6711)
  Display Initializing project... message when initializing project (#6661)
  Only send suggestions updates when type changes (#6755)
  sbt runEngineDistribution --debug to ease debuggging (#6745)
  Display "modified at" column on the cloud dashboard (#6687)
  Meta.meta Integer . methods (#6740)
  Show spinner while loading directory (#6714)
  Add cloud dashboard to changelog (#6688)
  Fix list editor panics during insertion (#6540)
  Update bug-report.yml
  Remove project create form (#6710)
  Change full-screen visualisation shortcut to shift-space (#6663)
  Revert "Show spinner when opening/creating a project (#6321)" (#6712)
  ...
mergify bot pushed a commit that referenced this pull request May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Table/Column.cast
5 participants