Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External functions #567

Merged
merged 4 commits into from
Dec 31, 2024
Merged

External functions #567

merged 4 commits into from
Dec 31, 2024

Conversation

penberg
Copy link
Collaborator

@penberg penberg commented Dec 28, 2024

This pull request adds support for external functions, which are functions provided by extensions. The main difference to how things are today is that extensions can register functions to a symbol table at runtime instead of specifying an enum variant.

@penberg penberg force-pushed the generic-functions branch 6 times, most recently from ed0d996 to 90cd98f Compare December 29, 2024 08:55
@penberg penberg requested a review from jussisaurio December 29, 2024 08:55
@penberg penberg marked this pull request as ready for review December 29, 2024 08:55
@jussisaurio
Copy link
Collaborator

jussisaurio commented Dec 29, 2024

Reviewing this, I noticed a funny bug we have ATM - nonexistent functions in result columns just get ignored completely (#573)

limbo> select this_definitely_exists('lol');
NULL

compare:

limbo> select * from users where id = this_definitely_exists('lol');
Parse error: unknown function this_definitely_exists

Bug is due to this line: https://github.com/tursodatabase/limbo/blob/main/core/translate/planner.rs#L365


Anyway, this got me thinking whether we should try to validate and/or bind functions already in planner.rs even before going into bytecode emission. We are already doing stuff like resolve_aggregates() and bind_column_references() there, so bind_function_references() would fit there too.

Advantages:

  1. we wouldn't have to thread the symbol table into the bytecode emission phase at all (there is quite a large amount of threading through function invocations right now just to use it in one place in expr.rs - oispa monadeja).
  2. when we translate expressions in bytecode emission, we guarantee that all references to functions have already been bound to a function implementation or an error returned (similarly to how we know at bytecode emission time that all column references have been validated)

Disadvantages:

We would probably have to add another new custom AST node like ast::Expr::KnownFunction that has a reference to a function enum member (or GenericFunc) instead of a string. That, or add a new optional field to ast::Expr::FunctionCall that is a reference to a known function. Either way, at function "binding time" we would mutate the original AST node to the "strongly typed" one

@BobKerns
Copy link

I'd like to raise an issue of terminology wrt 'Generic Function'.

I realize there aren't enough terms to go around, but the term has been used for functions that operate differently depending on what objects they're called on (polymorphism). This usage originated in the early 1980s with Lisp, in Howard Cannon's Flavors object system, predecessor to the Common Lisp Object System.

More important than the history is that it is in wide usage today, in languages from C++ to Python, Typescript, and R. There's even a Wikipedia page on it.

Also relevant is that, to me at least, "generic" doesn't convey anything meaningful here. "External function" might be more suggestive. (At least, on a quick scan, I didn't see anything suggesting any sort of polymorphic dispatch).

I'm not going to insist that the term be used the same way in all contexts for all time! I just want to flag the potential for confusion so you can mitigate it, either with different terminology or taking care in how you describe it.

Or: Make them actually generic functions, with externally-loadable implementations for different types, so a plugin that defines, say, a GIS coordinate and a plugin that defines a geometric coordinate can both define a distance(p1, p2) function, giving great circle or cartesian distances. As a bonus, this side-steps namespace collision issues. Implementation and semantics can be straightforward if you limit dispatch to the first argument, as was the case with the original Flavors version. Downside is you cannot validate the call signature until the type of the object is known, which may be later than your current validation of the AST.

Copy link
Collaborator

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with @BobKerns that it makes sense to rename to something like ExternalFunction. Nevertheless, looks great

@jussisaurio
Copy link
Collaborator

Nevertheless, looks great

There's a potential problem w/ waiting this late to resolve the function in case it's an aggregate: #576 (comment)

Ofc we can for now just do double the work and resolve the function using the symbol table in the planner as well

@penberg penberg changed the title Generic functions External functions Dec 31, 2024
@penberg penberg force-pushed the generic-functions branch 2 times, most recently from d34a166 to e633003 Compare December 31, 2024 11:50
The database object is a way to represent state that's shared across
multiple connections. We don't want to release that object until all
connections are closed.
We don't need them anywhere and they make it hard to introduce
GenericFunction.
@penberg penberg merged commit b1ca2b0 into main Dec 31, 2024
36 checks passed
@penberg penberg deleted the generic-functions branch December 31, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants