Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: automatically create macro-definition for EXPORT_MACRO FOO in generated files #19422

Open
h-vetinari opened this issue Nov 27, 2024 · 3 comments

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Nov 27, 2024

I recently retried to convert a project (https://github.com/google/sentencepiece/) using protobuf to build shared libraries by default. In doing so, my first approach was to change

-  protobuf_generate_cpp(SPM_PROTO_SRCS       SPM_PROTO_HDRS       sentencepiece.proto)
-  protobuf_generate_cpp(SPM_MODEL_PROTO_SRCS SPM_MODEL_PROTO_HDRS sentencepiece_model.proto)
+  protobuf_generate_cpp(SPM_PROTO_SRCS       SPM_PROTO_HDRS       EXPORT_MACRO SPM_DLL sentencepiece.proto)
+  protobuf_generate_cpp(SPM_MODEL_PROTO_SRCS SPM_MODEL_PROTO_HDRS EXPORT_MACRO SPM_DLL sentencepiece_model.proto)

This correctly adds markers where necessary in the generated .ph.h files, with the crucial exception of actually defining the macro itself, leading to error of the kind:

%SRC_DIR%\build\src\sentencepiece.pb.h(45): error C2079: 'SentencePieceText_SentencePiece' uses undefined struct 'SPM_DLL'

At first I tried to ensure that a header common.h is present everywhere that sentencepiece.pb.h is included, but that doesn't work for sentencepiece.pb.cc for example. More importantly, sentencepiece installs sentencepiece.pb.h, which is a header that other projects rely on (I can provide examples), so I cannot just depend on putting the definition in another header, AFAICT it has to be in the generated file itself.

I ended up cobbling together a handspun protoc plugin that injects the required definition into generated files (not least based on this and having stumbled over @@protoc_insertion_point(includes) while searching in the repo), but this feels like overkill for such a simple problem (aside from requiring yet another language - python - because I was too lazy to write this in C++ directly).

the plugin
import sys
from google.protobuf.compiler import plugin_pb2 as plugin

payload = """
// SPM_DLL
// inspired by https://github.com/abseil/abseil-cpp/blob/20220623.1/absl/base/config.h#L730-L747
//
// When building sentencepiece as a DLL, this macro expands to `__declspec(dllexport)`
// so we can annotate symbols appropriately as being exported. When used in
// headers consuming a DLL, this macro expands to `__declspec(dllimport)` so
// that consumers know the symbol is defined inside the DLL. In all other cases,
// the macro expands to nothing.
// Note: SPM_DLL_EXPORTS is set when building shared libsentencepiece
//       SPM_DLL_IMPORTS is set as part of the interface for consumers of the DLL
#if defined(SPM_DLL_EXPORTS)
#  define SPM_DLL __declspec(dllexport)
#elif defined(SPM_DLL_IMPORTS)
#  define SPM_DLL __declspec(dllimport)
#else
#  define SPM_DLL
#endif

"""

def main():
    # Read CodeGeneratorRequest from stdin
    data = sys.stdin.buffer.read()
    request = plugin.CodeGeneratorRequest()
    request.ParseFromString(data)

    # Prepare CodeGeneratorResponse
    response = plugin.CodeGeneratorResponse()

    for proto_file in request.proto_file:
        # Determine the name of the generated .pb.h file
        pb_h_file = proto_file.name.replace(".proto", ".pb.h")

        # Target the generated .pb.h with the plugin
        file_response = response.file.add()
        file_response.name = pb_h_file
        # Insert the required macro, see
        # https://protobuf.dev/reference/cpp/cpp-generated/#plugins
        file_response.insertion_point = "includes"
        file_response.content = payload

    # Write CodeGeneratorResponse to stdout
    sys.stdout.buffer.write(response.SerializeToString())

if __name__ == "__main__":
    main()

I don't think the _IMPORTS part is handled automagically in CMake, but is opt-in for library consumers, or, if it should always be set, one can do:

target_compile_definitions(sentencepiece INTERFACE SPM_DLL_IMPORTS)

which is what the comment in payload refers to.


Not only did the CMake code balloon from 2 lines to

updated CMake invocation to generate sources, including plugin
  # run protoc, using plugin that adds SPM_DLL to generated code
  add_custom_command(
    OUTPUT
        ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece.pb.h
        ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece.pb.cc
        ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece_model.pb.h
        ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece_model.pb.cc
    COMMAND ${Protobuf_PROTOC_EXECUTABLE}
    ARGS --plugin=protoc-gen-custom=${CMAKE_CURRENT_SOURCE_DIR}/protoc_plugin_wrapper.bat
         --cpp_out=dllexport_decl=SPM_DLL:${CMAKE_CURRENT_SOURCE_DIR}
         --custom_out=${CMAKE_CURRENT_SOURCE_DIR}  # ensure the plugin is actually called
         --proto_path=${CMAKE_CURRENT_SOURCE_DIR}
         ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece.proto
         ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece_model.proto
    COMMENT "Running protoc with custom plugin"
  )
  set(SPM_PROTO_HDRS ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece.pb.h)
  set(SPM_PROTO_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece.pb.cc)
  set(SPM_MODEL_PROTO_HDRS ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece_model.pb.h)
  set(SPM_MODEL_PROTO_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/sentencepiece_model.pb.cc)

but I also had to create a wrapper

@echo off
python "%~dp0/protoc_plugin_common.py" %*

due to not getting the python invocation to work with quoting/pathing rules when added directly to --plugin=protoc-gen-custom=....

I have no doubt that there are improvements or shortcuts I missed in my cobbled-together solution (OTOH, I found no good examples how to do this either...), but still I think that the overall effort necessary to do this would be a substantial barrier to contribute anything to an upstream project, if they now would have to take on maintenance of a custom plugin just to support a relatively niche use case (shared libraries on windows are rare, exactly because there's so many paper cuts).

However, I don't think it fundamentally has to be that difficult. IMO, when using EXPORT_MACRO FOO (or equivalent options like --cpp_out=dllexport_decl=FOO), protoc itself could generate the following

#if defined(FOO_EXPORTS)
#  define FOO __declspec(dllexport)
#elif defined(FOO_IMPORTS)
#  define FOO __declspec(dllimport)
#else
#  define FOO
#endif

and insert it somewhere around @@protoc_insertion_point(includes).

The suffixes FOO{_EXPORTS,_IMPORTS} are somewhat arbitrary (as long as they're known and documented) - the ones I've suggested here are because CMake will automatically generate them for a shared library. This is unfortunately a highly underdocumented feature... the only mention I found in the docs was in an unrelated page:

A preprocessor macro, <target_name>_EXPORTS is defined when a shared library compilation is detected.

Still, this convention is something that's in relatively widespread use, and would make this work out of the box in several cases.

What language does this apply to?
The CMake interface

Describe the problem you are trying to solve.
Avoid having to write a plugin for a basic task like ensuring the macro injected by EXPORT_MACRO FOO is defined.

Describe the solution you'd like
Automatic insertion of a definition of FOO into the generated sources based on presence of symbols FOO{_EXPORTS,_IMPORTS}.

Describe alternatives you've considered

  • Write a custom plugin
  • Something else? I haven't found another way.

Additional context

I'll run into the same problem in other places, e.g. #13726, grpc/grpc#36026, conda-forge/google-cloud-cpp-feedstock#108

@h-vetinari h-vetinari added the untriaged auto added to all issues by default when created. label Nov 27, 2024
@h-vetinari
Copy link
Contributor Author

CC @haberman @dawidcha

@acozzette
Copy link
Member

I haven't thought about this in a while but I thought the idea was that you would define it using a compiler flag, e.g. -DSPM_DLL=__declspec(dllexport) or -DSPM_DLL=__declspec(dllimport). Would that work for your use case?

@acozzette acozzette removed the untriaged auto added to all issues by default when created. label Nov 27, 2024
@h-vetinari
Copy link
Contributor Author

I thought the idea was that you would define it using a compiler flag, e.g. -DSPM_DLL=__declspec(dllexport) or -DSPM_DLL=__declspec(dllimport). Would that work for your use case?

Hm. I've never seen it handled like that, but I presume this could be made to work with something like

set_target_properties(sentencepiece PROPERTIES DEFINE_SYMBOL SPM_DLL=__declspec(dllexport))
target_compile_definitions(sentencepiece INTERFACE SPM_DLL=__declspec(dllimport))

Perhaps I'm just used to projects making use of the mentioned CMake magic FOO_EXPORTS to handle the switch between those two cases with much less manual intervention (abseil is an example that I've often used as a reference).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@acozzette @h-vetinari and others