Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce INFORMATION_SCHEMA.ROUTINES table #13255

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Nov 5, 2024

Which issue does this PR close?

Part of #12144 .

Rationale for this change

As per the discussed in #12144 (comment), to support listing available functions, we should implement corresponding tables in the information schema.

What changes are included in this PR?

Refer to Postgres, MySQL, and BigQuery, I implemented the following columns for the ROUTINES table:

  • specific_catalog: Same as the session default catalog
  • specific_schema: Same as the session default schema
  • specific_name: function name
  • routine_catalog: Same as the session default catalog
  • routine_schema: Same as the session default schema
  • routine_name: function name
  • routine_type: always FUNCTION (this field could be PROCEDURE in other databases)
  • is_deterministic: if the volatility of the function signature is Immutable
  • data_type: the return type of the function
  • function_type: DataFusion custom field. It could be scalar, aggregate or window
  • description: DataFusion custom field. The description of the function document.

Are these changes tested?

yes

Are there any user-facing changes?

A new table of information_schema

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait execution Related to the execution crate labels Nov 5, 2024
@github-actions github-actions bot removed the substrait label Nov 5, 2024
@goldmedal
Copy link
Contributor Author

There is another useful column I wanted to add, is_null_call. Its description in the Postgres document is

If the function automatically returns null if any of its arguments are null, then YES, else NO. Null for a procedure.

However, I think we don't have enough information in the function signature to do it 🤔 .

query TTTTTTTBTTT rowsort
select * from information_schema.routines where routine_name = 'date_trunc' OR routine_name = 'string_agg' OR routine_name = 'rank';
----
datafusion public date_trunc datafusion public date_trunc FUNCTION true Timestamp(Microsecond, None) SCALAR Truncates a timestamp value to a specified precision.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type name is an ArrowType name, e.g., Timestamp (Microsecond, None). I prefer to return a SQL type. I think it would be improved if we finished the logical type epic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- makes sense 👍

Comment on lines +255 to +257
// TODO: Implement for other types
TypeSignature::Uniform(_, _)
| TypeSignature::Coercible(_)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a follow-up issue for inferring the types from other TypeSignatures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should file a ticket to track adding the feature (I bet some other contributors would enjoy helping out now that you have setup the framework)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. I filed #13271 for it.

@goldmedal goldmedal marked this pull request as ready for review November 5, 2024 03:43
Comment on lines +304 to +305
// only handle the function which implemented [`ScalarUDFImpl::return_type`] method
let return_type = udf.return_type(&arg_types).ok().map(|t| t.to_string());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions only implement return_type_from_exprs, e.g., arrow_cast. It's hard to get the return type here. Return null for this case.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start to me -- thank you @goldmedal

I think it would be really nice to file tickets for the follow on work you have identified so we don't forget (and so that others can help out)

Comment on lines +255 to +257
// TODO: Implement for other types
TypeSignature::Uniform(_, _)
| TypeSignature::Coercible(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should file a ticket to track adding the feature (I bet some other contributors would enjoy helping out now that you have setup the framework)

query TTTTTTTBTTT rowsort
select * from information_schema.routines where routine_name = 'date_trunc' OR routine_name = 'string_agg' OR routine_name = 'rank';
----
datafusion public date_trunc datafusion public date_trunc FUNCTION true Timestamp(Microsecond, None) SCALAR Truncates a timestamp value to a specified precision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- makes sense 👍

@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

fyi @findepi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate execution Related to the execution crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants