Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements #3

Closed
5 tasks done
mewmew opened this issue Oct 2, 2014 · 18 comments
Closed
5 tasks done

Requirements #3

mewmew opened this issue Oct 2, 2014 · 18 comments
Assignees
Labels
Milestone

Comments

@mewmew
Copy link
Member

mewmew commented Oct 2, 2014

This issue summarizes the requirements of the LLVM packages, as specified by its intended use cases.

The requirements of llgo as stated by @axw (in this issue) are as follows:

As for llgo's requirements:

  • in terms of using the LLVM API for generating code, it's mostly write-only via the builder API. Bitcode and IR reading is not important (at the moment?), but writing is; one or the other is required, but preferably both.
  • llgo uses the DIBuilder API for generating debug metadata (DWARF, et al.). This could be built outside of the core (it's just a matter of creating metadata nodes in a particular format), just be aware that it's pretty finicky and easy to break.
  • llgo needs to be able to look up target data (arch word size, alignment, etc.) from triples

For the decompilation pipeline the llvm packages should be able to:

  • represent the LLVM IR using Control Flow Graphs, where each node represents a BasicBlock.
  • insert new nodes and rearrange existing ones.
@mewmew mewmew added this to the design milestone Oct 2, 2014
@quarnster
Copy link
Contributor

Do you intend to be API compatible(-ish) with (a subset of) go-llvm? I'm not sure a go/ast-ish representation would add value over the basic LLVMValueRef, LLVMBasicBlockRef etc from the C api or their equivalent in go-llvm, but I'm open to being proved wrong ;)

@mewmew
Copy link
Member Author

mewmew commented Oct 13, 2014

Do you intend to be API compatible(-ish) with (a subset of) go-llvm? I'm not sure a go/ast-ish representation would add value over the basic LLVMValueRef, LLVMBasicBlockRef etc from the C api or their equivalent in go-llvm, but I'm open to being proved wrong ;)

I'd love to give you a clear answer, but I still have to do more preliminary research to evaluate the benefits and drawbacks between different representations. I'll definitely let you know when I have something to show :) At that point, I'd be very interested in constructive criticism :)

@mewmew mewmew self-assigned this Jan 8, 2015
@mewmew
Copy link
Member Author

mewmew commented Jan 16, 2015

Hi Fredrik,

During the months to follow I plan on implementing lexing and parsing of LLVM IR assembly. A separate package will create a Control-Flow Graph of the in-memory representation of LLVM IR, which will be in SSA-form, similar to that of the C++ library.

Currently I'm implementing the lexing and expect to have a working implementation in the next week or so. This work is taking place in the lexer branch. The set of tokens are based of LLVM rev 224917, and my intention is to include full support for all tokens of LLVM version 3.6 which is scheduled for release by the end of February.

I'd be very happy if you would take a look at what is in place and provide any feedback or constructive criticism :) My intention is to create a couple of self-contained packages which focuses on performing a single task. More high level packages (like the builder API) can later on be build upon these core packages. The thing I'm struggling with is to get the Value interface just right. As Go doesn't have any union or sum types there are a few different approaches one may take.

So, please let me know what you think. I know this is a work in progress but getting the API for the ir package right before implementing the parsing feels important.

Looking forward to hearing your thoughts!

Cheers /Robin

@quarnster
Copy link
Contributor

Regarding the ir package, IMO it'd be better to define a few abstract public interfaces and hide as much as possible of the concrete implementations. Have you had a chance to study the C (not C++) API? The C-API manages to do this elegantly IMO, where pretty much everything is a, or can be converted to a LLVMValueRef.

There's an awful lot of exported types in the ir package, which also export their struct "contents". Consider the side effects of a user randomly changing say Op2 or swapping positions of two Instruction's in a BasicBlock's []Insts slice, or changing the Parent field of the BasicBlock to some other random function.

With all types being concrete and with all their contents being public, it gives the impression that it's ok to change these values as you please. Being able to track where a Value is actually used is important, especially when implementing various transformation passes which will want to replace all uses of a specific value with something else.

As interfaces can only contain methods and not data, it ensures that you are free to implement that interface how you want and that users don't manipulate struct fields explicitly but instead use the methods provided by the interface. It would also allow validation for when a user wants to set Op2 to something, that something is of a compatible type for that instruction. I.e. for branch instructions it can enforce the argument is a BasicBlock and for floating point addition that the value is a floating point value.

It also gives users API-stability and yourself the option of throwing the concrete implementation away and rewrite from scratch without it impacting users negatively as they didn't see the concrete implementation anyway.

Also, consider using type embedding for the concrete implementations as a lot of the instructions are pretty much identical and it'd be pointless to have to implement the same code for each and everyone of them. (See basicInstr as an example below).

Off the top of my head just to draw up a rough outline (certainly not perfect, but similar to what I recall from using the LLVM C-API previously):

type (
    MDKind int
    Opcode int

    Type interface {}
    Use interface {
        Next() Use
        User() Value
        UsedValue() Value
    }
    BasicBlock interface{
        Value
        // Already has for example Next/Previous/Parent via embedding Value..
        MoveAfter(other BasicBlock) error
        MoveBefore(other BasicBlock) error
    }
    Branch interface {
        Value
        IsConditional() bool
        Condition() Value
        SetCondition(Value) error
    }

    Value interface {
        Opcode() Opcode
        Type() Type

        Operands() int
        Operand(n int) Value
        SetOperand(n int, v Value) error

        EraseFromParent() error
        FirstUse() Use
        ReplaceAllUsesWith(Value) error

        Metadata(MDKind) Value
        SetMetadata(MDKind, Value) error

        Parent() Value
        Previous() Value
        Next() Value
    }

    basicInstr struct {
        typ Type
        operands []Value 
    }
    addInstr struct {
        basicInstr
    }
    mulInstr struct {
        basicInstr
    }
)
const (
    DebugLocation MDKind = iota
    ProfileGuidedOptimizationBranchCount
    // etc
    )
const (
    ADDi Opcode = iota
    ADDf
    MULi
    MULLi
    // etc
)

I gave the methods that change something an error-return, which is a good idea for future proofing the API even if no errors are returned at first. The functions that return something don't need to return an error as they can just return nil when getting the value isn't possible.

Consider changing the asm/lexer.Parse function's signature to Parse(io.Reader) ([]token.Token, error). io.Reader just for flexibility, and the added error return just for anyone who might want to use the package in the future and who might want to present the error to users in some custom way.

The asm/token package looks ok to me, changing the struct values or the order of the tokens here doesn't have as severe consequences/side-effects as it does when messing with actual IR values.

So that would be my initial feedback, which you of course are free to completely ignore if you want :)

What do you think?

@quarnster
Copy link
Contributor

Also, I should have mentioned, consider API-compatibility of (or a subset of) the official Go bindings: http://godoc.org/llvm.org/llvm/bindings/go/llvm

@mewmew
Copy link
Member Author

mewmew commented Jan 21, 2015

Hi Fredrik!

Thanks a lot for your detailed reply. I'll dissect all the information in the weeks to come and refine the ir library to something that feels right. My current feeling is that there will exist a low-level IR library with concrete types, maybe hidden as an internal package. One or more high-level packages will build abstractions on this low-level package, mainly because the abstractions that feels good to one user may feel bad to another.

There's an awful lot of exported types in the ir package, which also export their struct "contents". Consider the side effects of a user randomly changing say Op2 or swapping positions of two Instruction's in a BasicBlock's []Insts slice, or changing the Parent field of the BasicBlock to some other random function.

Indeed, the current ir package is too low-level for any user (except package developers). Basically the low-level package will give you enough power to shoot yourself in the foot, while the high-level package will prevent the majority of the common mistakes by design (e.g. move/remove a BasicBlock, but forget to update references to it, etc as you already mentioned). I greatly appreciate your draft API for the high-level package. It will give me something to think about and take a lot of inspiration from :) I definitely feel the need to keep studying the C API in closer detail. I've used it and the C++ API to some extent, but not enough yet to have a feeling of its overall design. I agree that the extensive use of the LLVMValueRef is very powerful. Both values and instructions may act as values, allowing you to simply chain them together.

I gave the methods that change something an error-return, which is a good idea for future proofing the API even if no errors are returned at first.

I'll try to keep this in mind while designing the API.

Still have a lot of thinking to do :) Now I'm on my way to Brussels and FOSDEM! Hopefully this trip should allow for some time to reflect on these matters.

Thanks! And I hope to discuss the design with you as it progresses over the month to come.

Cheers /Robin

@quarnster
Copy link
Contributor

With go 1.5's support for creating c-shared libraries, I really like the idea of having a go generate tool which generates the standard C LLVM api (or the implemented subset anyways). That way this code could be used as a drop in replacement for anything that currently uses the LLVM C api, presuming all the functions used are implemented.

Just figured I'd mention this as it's been on my mind due to the discussion on the llvm-dev list about potentially splitting the LLVM C api into a separate project.

@mewmew
Copy link
Member Author

mewmew commented Jul 19, 2015

Hi Fredrik,

With go 1.5's support for creating c-shared libraries, I really like the idea of having a go generate tool which generates the standard C LLVM api (or the implemented subset anyways). That way this code could be used as a drop in replacement for anything that currently uses the LLVM C api, presuming all the functions used are implemented.

Just figured I'd mention this as it's been on my mind due to the discussion on the llvm-dev list about potentially splitting the LLVM C api into a separate project.

I really like this idea :) Thanks for suggesting it! It should definitely be possible to implement, once a stable C API has been defined. Fascinating times :)

@mewmew
Copy link
Member Author

mewmew commented Jul 19, 2015

I've created a dedicated issue to track the development of the go generate tool. Once again, thanks for the idea.

A small progress update on the LLVM parser library. It is getting closer to feature completeness. The few things missing are:

  • Better parsing of functions. Support for parsing types and constants have been implemented. The parsing of function declarations and function definitions works, but could use some love. For now, I've basically duplicated the code which parses function types (see https://github.com/llir/llvm/blob/master/asm/parser/type.go#L303) to handle function headers (see commit at ec21daf). The main difference is that the function declarations and function definitions may include parameter names, while the function types doesn't. If you can find a clean way to refactor this which avoids the duplicated code, I'd be thrilled to know.
  • Parsing of the individual instructions. This should be trivial to implement, as it is mostly copy paste work.

And the IR library is current missing the following:

  • Precise definitions of each instruction in the IR, including comprehensive documentation. This is indeed time consuming, but implementing support for the most common instructions (i.e. binary operators, terminator instructions) should be relatively easy.

As you know, the lexer is done. The parser is getting closer to completion, and the IR library is being implemented in parallel with the parser. Once these libraries have been implemented, other high-level packages may be defined for interacting with the IR; such as the builder API. And of cause, once these packages have been implemented, it should be possible to develop the go generate tool which automatically generates the c-shared library for the C LLVM API.

The exposed API of the entire project could use a thorough review once the parser reaches feature completeness. At this point, the focus will switch to defining a stable API which other packages may rely on. I'm looking forward to the API review and overhaul :) It will be fun to discuss how a clean and minimal API may look like.

@quarnster
Copy link
Contributor

Thanks for the update. Let me know when you feel ready to start discussing public API etc.

@mewmew
Copy link
Member Author

mewmew commented Jul 20, 2015

Let me know when you feel ready to start discussing public API etc.

Definitely!

I'm traveling to Asia on Friday and will be back in Sweden by September, at which point I'm expecting to pick up development again. I wish you a wonderful summer! :)

@quarnster
Copy link
Contributor

Right back at you and have a pleasant trip!

@mewmew mewmew modified the milestones: design, v0.2 Nov 24, 2015
@mewmew
Copy link
Member Author

mewmew commented Jun 19, 2016

Thanks for the update. Let me know when you feel ready to start discussing public API etc.

Hej @quarnster!

The ir package of the llvm repository is starting to reach feature completion. There are still instructions to implement, but the general API is already usable, and in use by a number of projects, such as the µC to LLVM IR compiler, the LLVM IR parser generator, and the LLVM IR Control Flow Graph generator.

At this stage, the API is now ready for a fresh set of eyes. There are definitely things we wish to fix to make it cleaner, and more consistent. As you have experience with the Go bindings for LLVM, it would be very interesting to hear your opinion on how the API looks and what we may wish to change. The aim is definitely to reach a clean an intuitive API before the 1.0 release, so feel free to suggest any changes big or small. Making big changes is going to be a lot easier now, before we start getting more users of the library. Note, all of the aforementioned projects currently using the project have been developed by us, in part to stress test the library from different angles, so even substantial changes to the API are welcome at this stage, such as refining how the value.Value interface works.

So, when you get some time, I'd be very happy if you reviewed the API and told us what you think of it.

Have a lovely summer!

Cheerful regards /u

@mewmew
Copy link
Member Author

mewmew commented Dec 25, 2016

After experimenting with the API of llir/llvm for some time, through projects which read, write and manipulate LLVM IR, a set of drawbacks have been identified with the current API. To mitigate these drawbacks a couple of experimental development branches have investigated various ways of refining the API, to better allow for future use cases of the API and focus on extensibility; namely,

  • use
    • use tracking of values
  • irmerge
    • merge the ir, value, constant and types packages into a single ir package
  • exportmembers
    • export the members of data types

The main issue that was identified with the current API, was that it prevented a lot of use cases, simply by not providing direct access to the members of data types (i.e. not exporting struct fields), but rather providing only methods to get and set these members. The key idea behind providing getters and setters, rather than exporting members is that it may prevent incorrect uses of the API, as the getters and setters can perform sanity checks. However, this issue is alleviated to some extent by the future implementation of a sem package, which provides a static semantic checker for LLVM IR.

It may be possible to keep the members unexported and design an API for accessing and manipulating these members, which could be general enough to support future use cases. Part of this work is tracked by #19 (Use tracking of values).

To give an example of a use case which is prevented by the current API, imagine being given access to the work horse behind go fix but for LLVM IR, namely an IR walker analogous to the go fix Go AST walker which traverses the IR and invokes a visit method for each node of the LLVM IR module, but also with a pointer to each node to enable alteration. A similar AST walker has been implemented for the AST representation of LLVM IR assembly in astutil#Walk. The intention is to implement a similar walker for the data representation of LLVM IR, and a prerequisite for this would be to either export the member types, or merge the IR data types into a single package, and implement the walker in the mono IR package, since the walker is required to be able to take the address of a member to pass both interfaces and pointers to structs and slices as pointers to interfaces, pointers to pointers to structs, and pointers to slices.

The result is a very powerful IR walker, one that would be extremely fun to play with and would give similar capabilities as the utilized by go fix (and go-post).

An experimental branch merged the IR packages (i.e. ir, types, value and constant) into a single package (ir), to evaluate whether the result was preferable. While this approach made it possible to implement the aforementioned walker, it had several shortcomings. Firstly, the API listing of the ir package became difficult to maintain, and several additional prefixes had to be added to structure data types, to keep them sorted somehow in GoDoc. A more serious shortcoming however, is that the approach did not take into account any future extensions and use cases not yet imagined by the authors of the LLVM IR library. With the intention of making the llir/llvm library general enough to also be useful for users with substantially different use cases than the main authors of the llir/llvm project, another approach has been evaluated which is in line with the design of go/ast.

With this in mind, an experimental branch was created to evaluate how the API would look like if all IR data type members where exported, and to get a feel for future use cases this may enable. The key idea of this branch is to export all members, get rid of getter and setter methods, and only provide methods which describe the shared functionality as captured by interface definitions such as ir.Instruction and value.Value. This approach enables the walker library to be implemented, facilitates the parsing of values using forward references whos types are not yet known, and makes future use cases of the library possible that have not yet been envisioned by the llir/llvm authors.

So far, this is the best approach we've been able to identify which provides a balance between extensibility and sanity checking. We will be eager to evalute the new API during the next couple of months to see how it holds up. Anyone interested in playing with LLVM IR is invited to join at this stage, to evaluate the API and provide feedback on how it feels to use and what may be improvied.

Cheers!
/u

mewmew added a commit that referenced this issue Dec 25, 2016
mewmew added a commit that referenced this issue Jan 2, 2017
@mewmew
Copy link
Member Author

mewmew commented Jun 14, 2017

Original requirements of llgo posted by @axw; #3 (comment)

As for llgo's requirements:

  • in terms of using the LLVM API for generating code, it's mostly write-only via the builder API. Bitcode and IR reading is not important (at the moment?), but writing is; one or the other is required, but preferably both.

Similar to the builder API, the llvm/ir package may now be used for writing LLVM IR; see example at https://godoc.org/github.com/llir/llvm/ir#example-package

As an example, the toy uC compiler uses llir/llvm to compile a subset of C into LLVM IR.

  • llgo uses the DIBuilder API for generating debug metadata (DWARF, et al.). This could be built outside of the core (it's just a matter of creating metadata nodes in a particular format), just be aware that it's pretty finicky and easy to break.

Metadata nodes are now supported by llvm/ir. A third-party dwarf package could be build on top of this.

As an example, decomp uses llir/llvm to decompile LLVM IR into Go, and makes use of metadata nodes for tracking the addresses of functions in machine code.

  • llgo needs to be able to look up target data (arch word size, alignment, etc.) from triples

Both read and write support for target specifiers (i.e. target triple and target data layout) has been added to llir/llvm (see ir.Module.DataLayout and ir.Module.TargetTriple).

As far as I can tell, the llir/llvm project is now ready to be used by the llgo compiler, if the developers' of llgo so wish.

If there are additional requirements of llgo that have not yet been satisfied, I'd be happy to learn.

@mewmew
Copy link
Member Author

mewmew commented Jun 14, 2017

A few requirements of decomp, from #3 (comment)

For the decompilation pipeline the llvm packages should be able to:

  • represent the LLVM IR using Control Flow Graphs, where each node represents a BasicBlock.

The approach taken by the decompiler is to translate the control flow of LLVM IR basic blocks into Control Flow Graphs stored as Graphviz DOT files. This approach elliviates any requirements on the API regarding control flow, as those may be handled by projects such as gonum/graph.

  • insert new nodes and rearrange existing ones.

As documented in #3 (comment), the members of structures have been exported to facilitate a range of use cases that the original author of llir/llvm may not yet have thought about. As such, this requirement has been satisfied and dedicated utility packages with high-level APIs may be written to facilitate such node manipulations.

The entire decompiler pipeline has been updated to make use of llir/llvm, and its current requirements are satisfied. For future releases of decomp and llir/llvm, a focus on data flow analysis will introduce APIs for dealing with use-def chains, among others. This is the focus of the future v0.3 release of llir/llvm, as described in the roadmap section of the readme.

@mewmew
Copy link
Member Author

mewmew commented Jun 14, 2017

@quarnster wrote in #3 (comment)

Do you intend to be API compatible(-ish) with (a subset of) go-llvm? I'm not sure a go/ast-ish representation would add value over the basic LLVMValueRef, LLVMBasicBlockRef etc from the C api or their equivalent in go-llvm, but I'm open to being proved wrong ;)

I'd love to give you a clear answer, but I still have to do more preliminary research to evaluate the benefits and drawbacks between different representations. I'll definitely let you know when I have something to show :) At that point, I'd be very interested in constructive criticism :)

And comment from 2015, #3 (comment)

Also, I should have mentioned, consider API-compatibility of (or a subset of) the official Go bindings: http://godoc.org/llvm.org/llvm/bindings/go/llvm

Hi Fredrik,

I've now a bit better understanding after experimenting with various representations in llir/llvm for almost 2.5 years o.O (e.g. some notes and reflections in #3 (comment)), so I'm finally in a better position to provide you with some feedback on the question you posted back in 2014.

The aim of the llir/llvm project is to provide a clean and idiomatic Go API for interacting with LLVM IR, and will therefore aim to start afresh to avoid some of the C++ inherent API designs of go-llvm (e.g. use of iterators rather than slices, use Value interface with a lot of dummy implemented methods that may assert on invokation, etc).

At the same time, the llir/llvm packages may been seen as a low-level interaction library for LLVM IR, and nothing prevents a library to be built on top, which implements the go-llvm API, at least not as far as I can tell.

You've already given some thoughts in this direction, as you mentioned the idea of generating a c-shared Go library which implements the LLVM C API (ref #3 (comment)). As you already know, I love this idea, and have created a specific issue to track its future implementation #12

The llir/llvm project is almost ready for its v0.2 release, which will make it go-getable just like any other package :)

In preparation for this release, I would be very curious if you have any thoughts on the current API or approach taken by the project. Is there anything you think looks a bit strange API or design wise, or that we may refine someone?

If so, now would be an opportune time to give some feedback!

Cheerful regards /u & i

@mewmew mewmew modified the milestones: v0.3, v0.2 Jun 24, 2017
@mewmew
Copy link
Member Author

mewmew commented Nov 12, 2018

This issue has been mostly superseded by #29, which discusses the overall roadmap of the project, and related requirements of various projects to the implementation and design goals of llir/llvm. Most, if not all of the requirements discussed in this issue have now been implemented in llir/llvm. If there is something specific that is missing, feel free to point it out, either here, or in #29.

Closing this issue for now.

Cheers
/u

@mewmew mewmew closed this as completed Nov 12, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
Updates #3.


Former-commit-id: fad16d1
mewmew added a commit that referenced this issue Nov 30, 2018
Updates #3.


Former-commit-id: f1f385d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants