-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements #3
Comments
Do you intend to be API compatible(-ish) with (a subset of) go-llvm? I'm not sure a |
I'd love to give you a clear answer, but I still have to do more preliminary research to evaluate the benefits and drawbacks between different representations. I'll definitely let you know when I have something to show :) At that point, I'd be very interested in constructive criticism :) |
Hi Fredrik, During the months to follow I plan on implementing lexing and parsing of LLVM IR assembly. A separate package will create a Control-Flow Graph of the in-memory representation of LLVM IR, which will be in SSA-form, similar to that of the C++ library. Currently I'm implementing the lexing and expect to have a working implementation in the next week or so. This work is taking place in the lexer branch. The set of tokens are based of LLVM rev 224917, and my intention is to include full support for all tokens of LLVM version 3.6 which is scheduled for release by the end of February. I'd be very happy if you would take a look at what is in place and provide any feedback or constructive criticism :) My intention is to create a couple of self-contained packages which focuses on performing a single task. More high level packages (like the builder API) can later on be build upon these core packages. The thing I'm struggling with is to get the So, please let me know what you think. I know this is a work in progress but getting the API for the ir package right before implementing the parsing feels important. Looking forward to hearing your thoughts! Cheers /Robin |
Regarding the There's an awful lot of exported types in the ir package, which also export their struct "contents". Consider the side effects of a user randomly changing say With all types being concrete and with all their contents being public, it gives the impression that it's ok to change these values as you please. Being able to track where a Value is actually used is important, especially when implementing various transformation passes which will want to replace all uses of a specific value with something else. As interfaces can only contain methods and not data, it ensures that you are free to implement that interface how you want and that users don't manipulate struct fields explicitly but instead use the methods provided by the interface. It would also allow validation for when a user wants to set It also gives users API-stability and yourself the option of throwing the concrete implementation away and rewrite from scratch without it impacting users negatively as they didn't see the concrete implementation anyway. Also, consider using type embedding for the concrete implementations as a lot of the instructions are pretty much identical and it'd be pointless to have to implement the same code for each and everyone of them. (See Off the top of my head just to draw up a rough outline (certainly not perfect, but similar to what I recall from using the LLVM C-API previously): type (
MDKind int
Opcode int
Type interface {}
Use interface {
Next() Use
User() Value
UsedValue() Value
}
BasicBlock interface{
Value
// Already has for example Next/Previous/Parent via embedding Value..
MoveAfter(other BasicBlock) error
MoveBefore(other BasicBlock) error
}
Branch interface {
Value
IsConditional() bool
Condition() Value
SetCondition(Value) error
}
Value interface {
Opcode() Opcode
Type() Type
Operands() int
Operand(n int) Value
SetOperand(n int, v Value) error
EraseFromParent() error
FirstUse() Use
ReplaceAllUsesWith(Value) error
Metadata(MDKind) Value
SetMetadata(MDKind, Value) error
Parent() Value
Previous() Value
Next() Value
}
basicInstr struct {
typ Type
operands []Value
}
addInstr struct {
basicInstr
}
mulInstr struct {
basicInstr
}
)
const (
DebugLocation MDKind = iota
ProfileGuidedOptimizationBranchCount
// etc
)
const (
ADDi Opcode = iota
ADDf
MULi
MULLi
// etc
) I gave the methods that change something an Consider changing the The So that would be my initial feedback, which you of course are free to completely ignore if you want :) What do you think? |
Also, I should have mentioned, consider API-compatibility of (or a subset of) the official Go bindings: http://godoc.org/llvm.org/llvm/bindings/go/llvm |
Hi Fredrik! Thanks a lot for your detailed reply. I'll dissect all the information in the weeks to come and refine the
Indeed, the current ir package is too low-level for any user (except package developers). Basically the low-level package will give you enough power to shoot yourself in the foot, while the high-level package will prevent the majority of the common mistakes by design (e.g. move/remove a BasicBlock, but forget to update references to it, etc as you already mentioned). I greatly appreciate your draft API for the high-level package. It will give me something to think about and take a lot of inspiration from :) I definitely feel the need to keep studying the C API in closer detail. I've used it and the C++ API to some extent, but not enough yet to have a feeling of its overall design. I agree that the extensive use of the LLVMValueRef is very powerful. Both values and instructions may act as values, allowing you to simply chain them together.
I'll try to keep this in mind while designing the API. Still have a lot of thinking to do :) Now I'm on my way to Brussels and FOSDEM! Hopefully this trip should allow for some time to reflect on these matters. Thanks! And I hope to discuss the design with you as it progresses over the month to come. Cheers /Robin |
With go 1.5's support for creating c-shared libraries, I really like the idea of having a go generate tool which generates the standard C LLVM api (or the implemented subset anyways). That way this code could be used as a drop in replacement for anything that currently uses the LLVM C api, presuming all the functions used are implemented. Just figured I'd mention this as it's been on my mind due to the discussion on the llvm-dev list about potentially splitting the LLVM C api into a separate project. |
Hi Fredrik,
I really like this idea :) Thanks for suggesting it! It should definitely be possible to implement, once a stable C API has been defined. Fascinating times :) |
I've created a dedicated issue to track the development of the A small progress update on the LLVM parser library. It is getting closer to feature completeness. The few things missing are:
And the IR library is current missing the following:
As you know, the lexer is done. The parser is getting closer to completion, and the IR library is being implemented in parallel with the parser. Once these libraries have been implemented, other high-level packages may be defined for interacting with the IR; such as the builder API. And of cause, once these packages have been implemented, it should be possible to develop the The exposed API of the entire project could use a thorough review once the parser reaches feature completeness. At this point, the focus will switch to defining a stable API which other packages may rely on. I'm looking forward to the API review and overhaul :) It will be fun to discuss how a clean and minimal API may look like. |
Thanks for the update. Let me know when you feel ready to start discussing public API etc. |
Definitely! I'm traveling to Asia on Friday and will be back in Sweden by September, at which point I'm expecting to pick up development again. I wish you a wonderful summer! :) |
Right back at you and have a pleasant trip! |
Hej @quarnster! The ir package of the llvm repository is starting to reach feature completion. There are still instructions to implement, but the general API is already usable, and in use by a number of projects, such as the µC to LLVM IR compiler, the LLVM IR parser generator, and the LLVM IR Control Flow Graph generator. At this stage, the API is now ready for a fresh set of eyes. There are definitely things we wish to fix to make it cleaner, and more consistent. As you have experience with the Go bindings for LLVM, it would be very interesting to hear your opinion on how the API looks and what we may wish to change. The aim is definitely to reach a clean an intuitive API before the 1.0 release, so feel free to suggest any changes big or small. Making big changes is going to be a lot easier now, before we start getting more users of the library. Note, all of the aforementioned projects currently using the project have been developed by us, in part to stress test the library from different angles, so even substantial changes to the API are welcome at this stage, such as refining how the So, when you get some time, I'd be very happy if you reviewed the API and told us what you think of it. Have a lovely summer! Cheerful regards /u |
After experimenting with the API of llir/llvm for some time, through projects which read, write and manipulate LLVM IR, a set of drawbacks have been identified with the current API. To mitigate these drawbacks a couple of experimental development branches have investigated various ways of refining the API, to better allow for future use cases of the API and focus on extensibility; namely,
The main issue that was identified with the current API, was that it prevented a lot of use cases, simply by not providing direct access to the members of data types (i.e. not exporting struct fields), but rather providing only methods to get and set these members. The key idea behind providing getters and setters, rather than exporting members is that it may prevent incorrect uses of the API, as the getters and setters can perform sanity checks. However, this issue is alleviated to some extent by the future implementation of a It may be possible to keep the members unexported and design an API for accessing and manipulating these members, which could be general enough to support future use cases. Part of this work is tracked by #19 (Use tracking of values). To give an example of a use case which is prevented by the current API, imagine being given access to the work horse behind The result is a very powerful IR walker, one that would be extremely fun to play with and would give similar capabilities as the utilized by An experimental branch merged the IR packages (i.e. With this in mind, an experimental branch was created to evaluate how the API would look like if all IR data type members where exported, and to get a feel for future use cases this may enable. The key idea of this branch is to export all members, get rid of getter and setter methods, and only provide methods which describe the shared functionality as captured by interface definitions such as ir.Instruction and value.Value. This approach enables the walker library to be implemented, facilitates the parsing of values using forward references whos types are not yet known, and makes future use cases of the library possible that have not yet been envisioned by the So far, this is the best approach we've been able to identify which provides a balance between extensibility and sanity checking. We will be eager to evalute the new API during the next couple of months to see how it holds up. Anyone interested in playing with LLVM IR is invited to join at this stage, to evaluate the API and provide feedback on how it feels to use and what may be improvied. Cheers! |
Original requirements of llgo posted by @axw; #3 (comment)
Similar to the builder API, the As an example, the toy uC compiler uses llir/llvm to compile a subset of C into LLVM IR.
Metadata nodes are now supported by llvm/ir. A third-party dwarf package could be build on top of this. As an example, decomp uses llir/llvm to decompile LLVM IR into Go, and makes use of metadata nodes for tracking the addresses of functions in machine code.
Both read and write support for target specifiers (i.e. target triple and target data layout) has been added to llir/llvm (see ir.Module.DataLayout and ir.Module.TargetTriple). As far as I can tell, the llir/llvm project is now ready to be used by the llgo compiler, if the developers' of llgo so wish. If there are additional requirements of llgo that have not yet been satisfied, I'd be happy to learn. |
A few requirements of decomp, from #3 (comment)
The approach taken by the decompiler is to translate the control flow of LLVM IR basic blocks into Control Flow Graphs stored as Graphviz DOT files. This approach elliviates any requirements on the API regarding control flow, as those may be handled by projects such as gonum/graph.
As documented in #3 (comment), the members of structures have been exported to facilitate a range of use cases that the original author of llir/llvm may not yet have thought about. As such, this requirement has been satisfied and dedicated utility packages with high-level APIs may be written to facilitate such node manipulations. The entire decompiler pipeline has been updated to make use of llir/llvm, and its current requirements are satisfied. For future releases of decomp and llir/llvm, a focus on data flow analysis will introduce APIs for dealing with use-def chains, among others. This is the focus of the future v0.3 release of llir/llvm, as described in the roadmap section of the readme. |
@quarnster wrote in #3 (comment)
And comment from 2015, #3 (comment)
Hi Fredrik, I've now a bit better understanding after experimenting with various representations in llir/llvm for almost 2.5 years o.O (e.g. some notes and reflections in #3 (comment)), so I'm finally in a better position to provide you with some feedback on the question you posted back in 2014. The aim of the llir/llvm project is to provide a clean and idiomatic Go API for interacting with LLVM IR, and will therefore aim to start afresh to avoid some of the C++ inherent API designs of go-llvm (e.g. use of iterators rather than slices, use At the same time, the llir/llvm packages may been seen as a low-level interaction library for LLVM IR, and nothing prevents a library to be built on top, which implements the go-llvm API, at least not as far as I can tell. You've already given some thoughts in this direction, as you mentioned the idea of generating a c-shared Go library which implements the LLVM C API (ref #3 (comment)). As you already know, I love this idea, and have created a specific issue to track its future implementation #12 The llir/llvm project is almost ready for its v0.2 release, which will make it go-getable just like any other package :) In preparation for this release, I would be very curious if you have any thoughts on the current API or approach taken by the project. Is there anything you think looks a bit strange API or design wise, or that we may refine someone? If so, now would be an opportune time to give some feedback! Cheerful regards /u & i |
This issue has been mostly superseded by #29, which discusses the overall roadmap of the project, and related requirements of various projects to the implementation and design goals of llir/llvm. Most, if not all of the requirements discussed in this issue have now been implemented in llir/llvm. If there is something specific that is missing, feel free to point it out, either here, or in #29. Closing this issue for now. Cheers |
This issue summarizes the requirements of the LLVM packages, as specified by its intended use cases.
The requirements of llgo as stated by @axw (in this issue) are as follows:
For the decompilation pipeline the
llvm
packages should be able to:The text was updated successfully, but these errors were encountered: