Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR/Frontend] C++ compiler driver improvements, ability to compile textual IR #216

Merged
merged 198 commits into from
Sep 20, 2023

Conversation

pengmai
Copy link
Contributor

@pengmai pengmai commented Jul 20, 2023

Context: The previous version of the C++ compiler driver is a work-in-progress implementation that went down to the LLVM IR module. It also is missing important features around debugging that exist in the current subprocess driver.

Description of the Change:

  • The C++ driver compiles an MLIR module down to an object binary file.
  • General refactoring, separating out the extensions from the driver into separate pybind modules.
  • Addition of the ability to use @qjit on a string containing textual IR (MLIR at any level and LLVM IR) and get it to run from Python.
  • Enzyme module is updated to be compiled statically.

Benefits:
Improved compilation time

Progress

  • Implement the C++ part of the driver functionality
  • Implement pipeline configuration Python API
  • Implement pipeline output printing Python API
  • Update the tests

[sc-41430]
[sc-41704]

pengmai and others added 15 commits July 17, 2023 10:00
* Add MHLO as C++ dependency to quantum-opt

* Configure parsing from C++

* Implement lowering to LLVM IR with custom metadata handling for Enzyme

* Fix bugs with memory ownership

* Clean up C calling convention

* Canonicalize ._mlir attribute in frontend

* python formatting

* Update tests with canonical IR

* Update canonicalized lit tests

* Add MHLO as a dependency to quantum dialect build in CI

* Formatting + typo in workflow

* CI: Attempt to re-checkout MHLO during dialect build

* CI: Attempt to use cached MHLO

* CI: Add _mlir_libs to mocked modules for doc build, fix logic issue with MHLO source caching

* CI: mock specific CAPI library

* CI: typo in docs configuration

* Switch mlir_canonicalize to generic pass runner, reorganize driver files

* Clean up, rename LLVMTarget to avoid confusion with core LLVMTarget

* Fix error message for mlir_run_pipeline

* Update mlir/CMakeLists.txt

Co-authored-by: Ali Asadi <[email protected]>

* Update copyright year

Co-authored-by: Ali Asadi <[email protected]>

* Update mlir/lib/Catalyst/Driver/Pipelines.cpp

Co-authored-by: Ali Asadi <[email protected]>

* Update copyright year

Co-authored-by: Ali Asadi <[email protected]>

* Update copyright year

Co-authored-by: Ali Asadi <[email protected]>

* Add #pragma once

Co-authored-by: Ali Asadi <[email protected]>

* Add #pragma once

Co-authored-by: Ali Asadi <[email protected]>

* Move MHLO passes to top-level CMake variable, documentation

---------

Co-authored-by: Ali Asadi <[email protected]>
@pengmai pengmai changed the title Jmp/c++ compile from ir [MLIR/Frontend] C++ compiler driver improvements, ability to compile textual IR Jul 20, 2023
@sergei-mironov
Copy link
Contributor

sergei-mironov commented Jul 25, 2023

@pengmai @erick-xanadu
I would like to discuss some additional design decisions we might need to make regarding this driver. I think that the API proposed in this PR might actually limit some features we currently have. Although these features may be not very important, I think that I need to ask some questions before starting the integration. First, let me define the scope of the problem. I suggest to think that this PR changes the original "composable" API which I imagine as

$MLIR_{mhlo} \to_{pl_0} MLIR_1 \to_{pl_1} MLIR_2 \to_{pl_2} ... \to_{pl_{N-1}} MLIR_N \to (LLVM,Path_{obj})$

into a "programmable black box" model

$(MLIR_{mhlo}, Spec) \to_{compileasm} (LLVM,Path_{obj},Effects)$

Where: arrow ($\to_x$) refers to the evaluation of a pipeline $x$ consisting of some passes, $MLIR$, $LLVM$ are corresponding IRs in a string form; $Effects$ are side-effects like printing to streams or dumping files to disk that become important to note in the latter design. Finally, $Spec$ is a placeholder for configuration object which we might want pass to the compile_asm function to increase its configurability.

The questions are:

  1. What level of details do we want in a pipeline configuration in $Spec$? In the previous version of the design, users were able to combine MLIR passes into pipelines by themselves. Do we need to keep CompilerDriver fully-compatible with this? If so, I can imagine that $Spec$ may be dict-of-lists parameter similar to this definition which is currently hardcoded. Alternatively, we may just allow passing names of some pre-defined pipelines.
  2. What level of details do we need in the $Effects$? The print_stage functionality that we had previously allowed users to print the pipeline outputs in any combination. Lit tests typically want to print the output of only one pipeline. Are we allowed to limit the API scope down to this use-case?
  3. What is the idea behind the mlir_run_pipeline function? It accepts inputs and returns outputs in string form, which means that we may lose the speed benefits if we use it multiple times. At the same time, it does not follow compile options and is not designed to e.g. infer IR attributes, so we can't use it in the sequence with compile_asm. Should we remove it in favor of a more generic version of compile_asm?

@erick-xanadu
Copy link
Contributor

I suggest to think that this PR changes the original "composable" API which I imagine as

$MLIR_{mhlo} \to_{pl_0} MLIR_1 \to_{pl_1} MLIR_2 \to_{pl_2} ... \to_{pl_{N-1}} MLIR_N \to (LLVM,Path_{obj})$

into a "programmable black box" model

$(MLIR_{mhlo}, Spec) \to_{compileasm} (LLVM,Path_{obj},Effects)$

I think both models are equivalent. The first one also had effects (printing to a file and preserved in the file system). And the Spec was just the default pipeline. We only gave the users the ability to define their own pipeline, which I think should also be possible in the compiler driver, but I haven't investigated.

If so, I can imagine that $Spec$ may be dict-of-lists parameter similar to this definition which is currently hardcoded.

Can you elaborate on this?

I think for the scope of this PR, we can limit the ability for the user to define their own pass pipelines if it is getting in the way while we think of which passes are useful to us. In GCC, there's no option for the user to add passes (beyond enabling passes that are disabled by default) without recompiling the compiler. Similarly, if the user wants to change the order of transformations, they would need to recompile the compiler. I think this wouldn't be too bad but I agree that it would take away some of the dynamism we are accustomed to.

What level of details do we need in the $Effects$? The print_stage functionality that we had previously allowed users to print the pipeline outputs in any combination.

What do you mean by "in any combination"?

I think the Compiler Driver already prints all the IR with the shouldPrintAfterPass. Or do you mean that you would like to see the options in between?

@sergei-mironov
Copy link
Contributor

sergei-mironov commented Jul 25, 2023

MLIRmhlo→pl0MLIR1→pl1MLIR2→pl2...→plN−1MLIRN→(LLVM,Pathobj)
(MLIRmhlo,Spec)→compileasm(LLVM,Pathobj,Effects)

The first one also had effects (printing to a file and preserved in the file system).

Yes, this is true. I didn't mention effects in the current design because I am assuming that users can control them via the Python API quite precisely (so we are not responsible that much).

I think both models are equivalent.

No, I don't think so, unless Spec is as powerful as Python that we are using now. But I do think that we might not need to have all this power actually. So I would like to know what do we expect from the pipeline configurations.

If so, I can imagine that Spec may be dict-of-lists parameter similar to this definition which is currently hardcoded.

Can you elaborate on this?

The idea I have in mind - is to allow users to call compile_asm like this

filename, llvm_ir, *inferred_data = compile_asm(ir, workspace_name, module_name, ...
   pipelines = {
     'mhloToCorePasses' : ["func.func(chlo-legalize-to-hlo)", ..., "convert-to-signless" ],
     'quantumCompilationPasses' : ["this-pass", "that-pass", ...],
     ...
     })

I think for the scope of this PR, we can limit the ability for the user to define their own pass pipelines if it is getting in the way while we think of which passes are useful to us. In GCC, there's no option for the user to add passes (beyond enabling passes that are disabled by default) without recompiling the compiler. Similarly, if the user wants to change the order of transformations, they would need to recompile the compiler. I think this wouldn't be too bad but I agree that it would take away some of the dynamism we are accustomed to.

What level of details do we need in the Effects? The print_stage functionality that we had previously allowed users to print the pipeline outputs in any combination.

What do you mean by "in any combination"?

I can imagine users calling print_stage like we do in tests, but they may do the printing in any order. I am wondering do we need to keep this feature or (as one option) we can just add a parameter asking the driver to output the result of a single stage X instead of the final result.

Yes, another option we have - is just to hardcode the pipelines into C++ , but we would still need to specify their names to be able to refer to them in print_stage configuration for example.

I think the Compiler Driver already prints all the IR with the shouldPrintAfterPass. Or do you mean that you would like to see the options in between?

Hmm, probably yes, if I understand you correctly. In the tests we run the pipeline but only want the result of a single intermediate pipeline.

@erick-xanadu
Copy link
Contributor

erick-xanadu commented Jul 25, 2023

No, I don't think so, unless Spec is as powerful as Python that we are using now. But I do think that we might not need to have all this power actually. So I would like to know what do we expect from the pipeline configurations.

Yes the user could have any function whatsoever, but the intended use case is mostly for a way to specify command line arguments to mlir-opt tools and similar.

The idea I have in mind - is to allow users to call compile_asm like this

Yeah, something like that would be great! EDIT: I also wouldn't be opposed to essentially having 'mhloToCorePasses' : -mhlo-lowering and -mhlo-lowering to be expanded in the C++ side to the current definition.

I can imagine users calling print_stage like we do in tests, but they may do the printing in any order. I am wondering do we need to keep this feature or (as one option) we can just ask the driver to output the result of a single stage X instead of the final result.

I don't understand here the notion of order fully, but I think it the main point is that the user did not specify that there would be an output. I think both options (having the human readable IR for these stages available vs printing it only on demand) have their use cases. Accessing the .mlir field in the QJIT object is a nice convenience. Avoiding printing it out is more efficient. I think we should preserve the behaviour for .qir and .mlir though.

In the tests we run the pipeline but only want the result of a single intermediate pipeline.

We can dump it to a file and not print it. In the tests all human readable IRs are dump to a file (and as you pointed out elsewhere read into a dictionary) but they are only printed when the user requests them to be printed to stdout. I think we can keep that behaviour for .qir and .mlir

@dime10 dime10 mentioned this pull request Sep 19, 2023
Copy link
Contributor

@dime10 dime10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@erick-xanadu erick-xanadu merged commit 64be9d2 into main Sep 20, 2023
19 of 20 checks passed
@erick-xanadu erick-xanadu deleted the jmp/c++-compile-from-ir branch September 20, 2023 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants