Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List available functions (SHOW FUNCTIONS) #12144

Open
findepi opened this issue Aug 24, 2024 · 13 comments · May be fixed by #12266
Open

List available functions (SHOW FUNCTIONS) #12144

findepi opened this issue Aug 24, 2024 · 13 comments · May be fixed by #12266
Labels
enhancement New feature or request

Comments

@findepi
Copy link
Member

findepi commented Aug 24, 2024

Is your feature request related to a problem or challenge?

I as a user would want to see a list of available functions.

Describe the solution you'd like

SHOW FUNCTIONS [ LIKE <pattern> ];

Describe alternatives you've considered

No response

Additional context

No response

@findepi findepi added the enhancement New feature or request label Aug 24, 2024
@findepi findepi linked a pull request Aug 30, 2024 that will close this issue
@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 1, 2024

I think we should support table function similar to https://duckdb.org/docs/sql/functions/overview.html

We could not only get the function name but also function_type, return_type, parameters ... and so on

select * from datafusion_functions()

@findepi
Copy link
Member Author

findepi commented Sep 1, 2024

We could not only get the function name but also function_type, return_type, parameters ... and so on

absolutely, more info than just function name is a good thing to have

I think we should support table function similar to https://duckdb.org/docs/sql/functions/overview.html

The SQL standard way of presenting function information seems to be via information_schema.routines view. This is similar to how tables' information is queryable. But then we still have SHOW TABLES on top of information_schema.tables for easier access.

We can certainly present this information as a table function (instead of, or additionally to).
What would be the main advantage of doing so?

@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 2, 2024

We can certainly present this information as a table function (instead of, or additionally to).
What would be the main advantage of doing so?

We could easily apply any query on the table function like

SELECT DISTINCT ON(function_name)
    function_name,
    function_type,
    return_type,
    parameters,
    parameter_types,
    description
FROM duckdb_functions()
WHERE function_type = 'scalar'
  AND function_name LIKE 'b%'
ORDER BY function_name;

and we can get a subset of information with select

Unlike the syntax here, it is completely different so we need additional syntax support. It is also not easy to select the subset of information

https://docs.snowflake.com/en/sql-reference/sql/show-functions

SHOW FUNCTIONS [ LIKE '<pattern>' ]
               [ IN
                    {
                      ACCOUNT                       |

                      CLASS <class_name>            |

                      DATABASE                      |
                      DATABASE <database_name>      |

                      SCHEMA                        |
                      SCHEMA <schema_name>          |
                      <schema_name>
                    }
               ]

@findepi
Copy link
Member Author

findepi commented Sep 3, 2024

We could easily apply any query on the table function like

This is a good point.
Would we get similar benefits if we implemented information_schema.routines view (which SQL spec happens to define)?

In any case, writing a full query to information_schema.* or to a table function, is a few key strokes. That's probably why we have SHOW TABLES statement even though the information is also available via information_schema.tabes and is more queryable there (one can select desired columns or apply filters).

Summing up we have discussed these implementation options

  1. information_schema.routines (as defined by the SQL spec)
  2. SHOW FUNCTIONS (similar to SHOW TABLES and similar (but not identical) to some query engines like Snowflake, Trino/Presto or MySQL)
  3. datafusion_functions() table function (similar to DuckDB's duckdb_functions())

Which one or which ones do we want to implement?
How do arrive at this decision to make sure it's not questioned later by some project maintainer(s) that I failed to involve in the discussion?

@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 3, 2024

I like the idea of datafusion_functions() as a table. I also prefer existing system (Postgres, DuckDB is my preference) than spec.

How do arrive at this decision to make sure it's not questioned later by some project maintainer(s) that I failed to involve in the discussion?

We could wait for a few days for more opionion. Don't worry about being questioned later on. Datafusion is a continuously evolving and breaking change friendly project, if there comes out a better idea we could review it again

@alamb
Copy link
Contributor

alamb commented Sep 6, 2024

It is my opinion we should follow the same pattern as implemented for SHOW TABLES as it seems to work well and allows for easier integration into third-party tools (like BI tools) that may query such tables for information.

Specifically, I think that means:

  1. implement the information_schema schema (potentially with some additional custom columns for DataFusion like help_text), etc
  2. Implement SHOW FUNCTIONS as a query on top of the information schema
  3. (maybe this is wishful thinking) automatically create function documentation by scraping the code (as we do for configuration values)

@alamb
Copy link
Contributor

alamb commented Sep 6, 2024

We could add a datafusion_functions table function in datafusion-cli or other downstream implementation perhaps

@alamb
Copy link
Contributor

alamb commented Sep 11, 2024

(maybe this is wishful thinking) automatically create function documentation by scraping the code (as we do for configuration values)

I filed #12432 to track this idea

@goldmedal
Copy link
Contributor

Is there any update on this issue? In my downstream project, Wren AI, we would register various custom functions (UDF, UDAF, etc.) dynamically. This feature would be useful in our scenario for getting the available functions in the current session.

Currently, I can only access functions through SessionState::scalar_functions() or similar methods. However, obtaining parameter type or return type information is challenging.

As @alamb suggested, I think we can begin by implementing information_schema.

@alamb
Copy link
Contributor

alamb commented Oct 28, 2024

We are pretty close to completing the project to have the documentation programatically available

@goldmedal
Copy link
Contributor

We are pretty close to completing the project to have the documentation programatically available

I checked the documents. It's very good for human reading but I think it doesn't have the type information (parameters and return_type). They're important for the third-party tool to integrate with DataFusion-based applications.

Following the information_schema way, I think we need routines and parameters (They're the Postgres tables but I think we don't need to implement all the column of theirs).

@findepi are you still working on this?
I would like to push this feature. If you don't have bandwidth, I could help to implement the information_schema parts.

@findepi
Copy link
Member Author

findepi commented Oct 31, 2024

I paused this for now. @goldmedal go for it.

@goldmedal
Copy link
Contributor

goldmedal commented Nov 5, 2024

I created #13255 for the routines table. Just organize the related works here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants