Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined metafunctions based on dynamic library loading #1261

Closed
wants to merge 9 commits into from

Conversation

DyXel
Copy link
Contributor

@DyXel DyXel commented Aug 25, 2024

This is a follow-up of #797, #907, #909 and all other related issues/discussions regarding metafunctions I might have missed. It is not necessary to have all of the previous context to follow this PR, although it would help a lot.

Special thanks to @JohelEGP for the original idea and implementation (see references above), @MaxSagebaum for #809 which is included in this PR and @edo9300 for helping with the DLL loading implementation.

Introduction

A metafunction is a special kind of function which is invoked on a declaration's reflection, and participates in defining its meaning (more in the documentation). This PR implements a mechanism that allows users to define and apply these kind of functions, as opposed to being only writable within cppfront's code.

The basic idea behind this design is that the definition of a metafunction can be compiled separately to a dynamic library (.so on Unix-like, .dll on Windows), targeting the reflection API, which can then be loaded and used by cppfront when processing further code. The major advantage of doing it this way is that metafunctions are regular code, and thus are not limited to the usual C++ compile-time evaluation restrictions, this makes it possible to use third-party libraries or execute any arbitrary code (e.g.: Open and write files, call a remote service, etc.).

Of course there are caveats as well, one in particular is that you can't define a metafunction and use it in the same compilation step (e.g.: In the same file), as the code would have needed to be compiled by the C++ compiler before cppfront can load it.

Basic Usage

Here is an example with GCC as base C++ compiler, build systems can automate the tedious parts:

metafunctions.cpp2:

greeter: @meta (inout t: cpp2::meta::type_declaration) = {
  t.add_member($R"(say_hi: () = std::cout << "Hello, world!\nFrom (t.name())$\n";)");
}

main.cpp2

my_class: @::greeter type = { }
main: ()                  = my_class().say_hi();

Steps

  1. Compile cppfront with -rdynamic: g++ cppfront.cpp -std=c++20 -rdynamic -o cppfront

  2. Transpile metafunctions.cpp2: ./cppfront metafunctions.cpp2

  3. Compile metafunctions.cpp to a dynamic library named libmetafunctions.so:

g++ -std=c++20 -I../include/ -shared -fPIC -o libmetafunctions.so metafunctions.cpp
  1. Transpile main.cpp2: CPPFRONT_METAFUNCTION_LIBRARIES=./libmetafunctions.so ./cppfront main.cpp2

  2. Compile main.cpp: g++ -std=c++20 -I../include/ main.cpp

  3. Run the resulting binary (./a.out), which outputs:

Hello, world!
From my_class

Implementation Details

Features

The main ingredients are a set of orthogonal features that combine to provide the full application:

Reflection API to generate non-member declarations

append_declaration_to_translation_unit is added, which allows a metafunction to generate code outside of the declaration that it is being applied to. Useful for things like factory functions and more.

Support applying metafunctions to functions and the @meta metafunction

The ability to apply a metafunction to a function declaration is enabled, and a @meta metafunction is added to mark would-be metafunctions in order to perform automatic registration.

Reflection API to obtain a declaration's fully qualified name

fully_qualified_name() is added, which allows obtaining the unique and final identifier of any declaration node, which can be used to refer to a specific entity from any place in code.

Public reflection API header and the internal foreign interface metafunction

The declaration of the reflection API is entirely moved to a public place that can be used by everybody including the compiler itself, it can be thought of as a "synopsis", but it is compilable code.

An internal compiler metafunction is added @_internal_foreign_interface_pseudovalue (obnoxious name on purpose), which automatically wraps the aforementioned declarations of the public header into something that allows for the full definition of the API to live somewhere else outside the library (i.e.: Inside cppfront).

The compiler's internal reflection code is refactored to provide the implementation of the interfaces created by the processing of the public header, allowing complete decoupling while keeping value semantics for the API, which also means not having to modify existing metafunctions.

Dynamic library loading and look-up mechanism

A standalone header (source/dll.h) is added that permits loading dynamic libraries on demand. Additional code that ties the loading of the dynamic libraries with the registration of metafunctions, as well as looking up names when attempting to apply a metafunction is added.

Putting All Together

When @meta is applied, it does a few things:

  • Add the support header cpp2reflect_api.h.
  • Generates an anonymous entity which receives the fully qualified name of the function as a string and its address.

The generated entity is of type register_metafunction, which is declared within cpp2reflect_api.h, and its constructor will be called when the library is loaded. The implementation of the overloads for this constructor are provided within cppfront.

Before loading a library, cppfront sets itself up so when the implementations of register_metafunction are called, they dispatch to a internal object which does bookkeeping and ensures the metafunctions can be found and called when needed.

Once cppfront tries to apply a metafunction, it will create the internal context (compiler_services_base) and a reflection API object, by filling in the interfaces with its internal implementation (provided by the renamed _impl classes in reflect.h2). In essence, the object is made up of "vtables", and the function pointers are all provided by cppfront at runtime. Prepped with this, it can proceed to do name look-up as before, it first attempts to use the internal metafunctions (the ones with nice name), followed by looking up in the loaded libraries.

@DyXel
Copy link
Contributor Author

DyXel commented Aug 25, 2024

To Do

  • Implement full name look-up for metafunctions (including interpreting using directives?)
  • Optimize "foreign interface" generated code
  • Allow using and factoring existing metafunctions (TBD if to move existing metafunctions to a separate header or if somehow load them via DLLs)
  • Test Windows
  • Do more user-friendly error reporting
  • Grep remaining TODO(DyXel)
  • Fix found bugs. Add tests?
  • Documentation?

@DyXel
Copy link
Contributor Author

DyXel commented Aug 25, 2024

There's still plenty to do, but I figured I would open the PR now to start getting reviews, as the main ideas are implemented and the basic case works. Expect lots of rebases as I go (even tho they are extra annoying due to self-hosting), since I like having a clean history.

I apologize for taking so long for this, after all, I made #907 (comment) ages ago by now. Going forward I should be able to contribute code more regularly, not just by participating in discussions.

@DyXel
Copy link
Contributor Author

DyXel commented Aug 25, 2024

For this

including interpreting using directives?

I am not too sure how to proceed. I will probably not handle that for this PR.

@DyXel
Copy link
Contributor Author

DyXel commented Aug 25, 2024

Regarding

TBD if to move existing metafunctions to a separate header or if somehow load them via DLLs

Both directions have their pros and cons.

First, for "move existing metafunctions to a separate header":

  • pro: Move lots of code from out of reflect.h2, which is already super big.
  • pro: Makes it impossible for metafunctions provided in-the-box to "cheat" by using internal compiler stuff.
  • pro: Adds separability of the ugly internal metafunctions (now just 1) and the rest.
  • pro: Is trivial to do. All that needs to be done is move the code to either at the bottom of include/cpp2reflect_api.h2, or include/cpp2reflect_library.h2 (my pick).
  • con: This duplicates the metafunction definitions, cppfront will have them by default, and there will be an extra copy for each dynamic library, increases compile time.

For "somehow load them via DLLs":

  • pro: No definition duplication, faster compile time.
  • pro: Less delta with current reflect.h2.
  • con: Harder to implement, we'd need at least an additional mechanism to export DLL symbols, as the libraries would only have their declarations, not the implementations.

If its up to me, I would go with the former, seems like a more natural choice; The "standard library" is just code, and would only ever be included if you are authoring metafunctions, and I don't even think that the additional compile time would be that much.

@MaxSagebaum
Copy link
Contributor

Nice to see the update. I would have a closer look, but I am currently on vecation. Maybe I will have time at the beginning of September.

Copy link
Contributor

@MaxSagebaum MaxSagebaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over it and most seems good to me.

Just two general remark that I also hinted at in some comments:

  • I do not understand why we have to support in and inout calling. Wouldn't be inout calling enough. Is there a reason for this?
  • Some places are quite hardcoded for type and function metafunctions. If we want to support metafunctions on other objects like namespaces, variable declarations, using declarations, etc. Would it not be better to generalize this now. I suggested an enum and tuple approach.
  • On that note. What if somebody wants to have a meta function for everything. He would need to implement all different interface function. How about a wildcard interface function?

@hsutter What is you opinion about these three points?

//
// The verbose name is on purpose, lets try to give it a better name if it
// can be generalized into something useful for all developers.
foreign_interface_pseudovalue: (inout t : meta::type_declaration) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have this new metafunction? A regular @interface should do the same. Cppfront can then extend from this interface and override everything.

What did I miss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have explained this better in the PR summary, but this is not just for an interface, we are doing more than that.

The current API uses value semantics, not pointer/reference semantics. For example, when you call declaration.as_type(), you are not getting a reference to a declaration, you are getting a value representation of the type declaration (type_declaration, not type_declaration& or type_declaration*). This presents an issue since in order to have a full value instance like this, you must know the entire definition, which in turn would mean spilling all the implementation guts of the compiler to the users of the reflection API.

Instead, by using this specific metafunction, we avoid the issue entirely by making the definitions "wrappers" that point to the actual implementation, which lives within Cppfront (or anything else that might implement the reflection API). You can't do that with just a regular @interface, since you'd need to cast to a pointer, and modify every metafunction to conform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand the problem. So the options are:

  1. Have this metafunction that basically is a value wrapped interface. (Maybe rename it to something like this.)
  2. Change the call syntax of metafunctions to either use pointers or inout style.

My own choice would be 2 with inout style. But I can see also the advantages of option 1. Since the compiler services are already provided by pointer maybe we can move everything to pointer semantics.

I just thought about the reason why Herb might have chosen value semantics here. The objects are constructed on the fly. Using pointers or references would not work since the objects are only temporary and would be destroyed. This could only be solved with smartpointers, which we do not want to have for such an interface. @hsutter is this correct?

If so, your solution should be generalized.

//
// meta - mark f as a metafunction and generate registration code for it
//
meta: (inout f: meta::function_declaration) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both in passing style and inout passing style. inout would also cover all cases. Since both exist it can not be because of security or speed reasons.

I would suggest to drop thee in passing style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's currently a metafunction like print that uses in parameter style since it doesn't modify the declaration. whether or not we want to support those is up to Herb really, but since in is the default parameter I feel like I can't leave it out for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. I am just asking the question: Was the in style because of some special thoughts or because it was just the minimal required definition for this function.

// * function_declaration or type_declaration (implicitly deduced to be the above), AND,
// * either in or inout paramater passing style, AND,
// * to be publicly accessible, AND,
// * to not be on header file, AND,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be in a header file. If this header is just included in one translation unit it would be valid code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, tbh I am not sure if we should allow it or not, in any case we can detect multiple registration of the same function, which is a clear ODR violation, but it can be entirely avoided if we just prevent users from writing stuff in headers in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Maybe have extra file extension for defining metafunctions.

Registering a function twice should not be a problem if the function pointer is the same. Having the same name with a different function pointer should be a problem.

else {
error( "unrecognized metafunction name: " + name );
error( "unrecognized metafunction name: '" + name + "' for type declaration" );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the detection logic should be changed. We should have two lists:

  • Internal metafunctions
  • External metafunctions

First look up the internal ones, afterwards the external ones. If both are not found create an error that lists all.

This would reduce the number of places which need to change when a new metafunction is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should dump every single metafunction that could be registered, for big programs the list could grow quite big. This does look first on internals and then externals second though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we print out a list of all metafunctions. Printing out all external metafunctions would be a logical consequence. It also helps the user to debug. If he does not find the metafunction in the list it is not loaded. He might had a typo and can see how the metafunction is actually called.

  • Printing out everything could be enabled via a compiler flag
  • Or a flag can force cppfront to print out all registered metafunctions.

// TODO(DyXel): Do full/better name look-up.
auto search = functions.find(name);
if(search == functions.end()) return false;
auto& entry = search->second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave an empty line here. The early out is hardly recognizeable. Maybe also add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out, it could be great to have #1253 so I can stop worrying about stylistic choices. In any case, this code is throwaway since I need to implement full registration and look-up.

Comment on lines +78 to +88
auto visitor = [&](auto&& v) -> bool {
if constexpr(requires { v(decl); }) {
v(decl);
return true;
}
return false;
};
if constexpr(std::is_same_v<meta::type_declaration, CPP2_TYPEOF(decl)>)
return std::visit(visitor, entry.type);
else
return std::visit(visitor, entry.func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty involved. Why can you not call entry.type directly.

Would it be better to have two functions try_apply_type and try_apply_function. The lookup could be refactored into its own method and then the correct method could directly be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like I mentioned, this code is throwaway but I personally don't feel like its involved. This code is very tame compared to the lambda overload trick I had before 😛 (visit is just a mess to use, what can I say).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. ;)

friend meta::register_metafunction;

void register_one(std::string name, auto* func) {
// TODO(DyXel): Check if already registered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that a metafunction can be written for functions and types. A check should check only of the function pointer is null, not the existence of the entry in the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing it out, I am gonna delete most of this when I get full name look-up.

void register_one(std::string name, auto* func) {
// TODO(DyXel): Check if already registered
auto& entry = functions[name];
if constexpr(requires { entry.type = func; })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this requires { entry.type = func; } hard to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? this code just means "if we can assign to the type variant, assign to it, otherwise assign to the func variant. Again, the alternative would be a complicated lambda overload trick mess to assign the value to the correct variant. Same observation as above applies, this is all throwaway code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to the lambda it is probably simpler to read. I might just need to get used to it.

Comment on lines +35 to +46
struct function_entry
{
std::variant<
std::monostate,
meta::mf_sign_type_in*,
meta::mf_sign_type_inout*
> type = {};
std::variant<
std::monostate,
meta::mf_sign_func_in*,
meta::mf_sign_func_inout*
> func = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is very type safe, but requires all the other methods to distinguish between function and and type metafunctions. Would it be possible to have an enum for type, function, etc. We could then define a tuple that could be accessed via the enum value.

There could even be a consexpr lookup from the argument type of try_apply to the index in the tuple.

This would remove a lot of code duplication and would make it simpler to add new metafunction for different kinds, e.g. using decl, variable decl, member function decl, member variable decl, namespace dcl, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need a better approach what would avoid a lot of code duplication for what is essentially "overload handling". I'll keep it in mind when designing the full name look-up solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +137 to +141
[&](
std::string_view caller,
std::string const& name,
meta::type_declaration& decl
) -> bool { return libraries.try_apply(caller, name, decl); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly provide the libraries object?

Maybe because reflect_libraries would then need to be defined in cpp2reflect_api.h?

If so, would it make sense to move apply_metafunction out of cpp2reflect_api into reflect_libraries and just call a function once in cpp2reflect_api which registers all the internal metafunctions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow... The main reason we do this wrapping and pass a lambda instead of passing the object, is to entirely decouple the existence of reflect_libraries with the rest of the implementation, it has nothing to do with exposing things in reflect_api or anything like that. Same thing is done for printing errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can understand the decoupling argument.

@DyXel
Copy link
Contributor Author

DyXel commented Sep 6, 2024

Let me more broadly answer your questions Max, we still also need feedback from Herb of course:

I do not understand why we have to support in and inout calling. Wouldn't be inout calling enough. Is there a reason for this?

Technically, we don't need it. When you call a metafunction you always have a mutable reference of the declaration available, and in fact, you wouldn't be able to have in and inout overloads, since it would be ambiguous. However, like I mentioned in one of the comments, in is the default argument parameter, and there might be metafunctions that you author which just read the declaration and generate information from rather than mutating it (like the aforementioned @print), so to me it makes sense to support in, I call this more a "comfort and clarity feature".

At the compiler code level, we can't cast a in metafunction to inout, that would lead to UB afaik, so registering and invocation must happen with the original function signature.

Some places are quite hardcoded for type and function metafunctions. If we want to support metafunctions on other objects like namespaces, variable declarations, using declarations, etc. Would it not be better to generalize this now. I suggested an enum and tuple approach.

I wouldn't be against a composition-over-inheritance approach, but this does mean big changes for the API, and subsequently, to the existing metafunctions. If we decide to go ahead with big API changes, I would also suggest completely decoupling compiler_services from the rest of the declaration types, possibly having 2 arguments for the metafunction rather than one.

On that note. What if somebody wants to have a meta function for everything. He would need to implement all different interface function. How about a wildcard interface function?

I am not sure a "metafunction for everything" is the right approach, we don't want to directly compete with the upcoming reflection support... To be discussed further I guess.

Regarding "wildcard interface function", the declaration type from which all other declarations inherit provides a base that the users can extend.

@MaxSagebaum
Copy link
Contributor

I wouldn't be against a composition-over-inheritance approach, but this does mean big changes for the API, and subsequently, to the existing metafunctions. If we decide to go ahead with big API changes, I would also suggest completely decoupling compiler_services from the rest of the declaration types, possibly having 2 arguments for the metafunction rather than one.

Why would this be a big API change? We would only change the lookup logic. The interface of the metafunctions would not be changed.

Regarding "wildcard interface function", the declaration type from which all other declarations inherit provides a base that the users can extend.

This would then require pointer or reference semantics, I think.

@hsutter
Copy link
Owner

hsutter commented Sep 24, 2024

Thanks! As mentioned in #1287, I'm very interested in this kind of thing, but won't have the bandwidth to look at it for a while. So for now I'll close this, but do feel free to keep trying out the idea in a clone of the repo!

Thanks again for understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants