Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gherkin: Delegate to Gherkin-Go prebuilt binaries #425

Closed
15 of 45 tasks
aslakhellesoy opened this issue Jul 3, 2018 · 48 comments
Closed
15 of 45 tasks

Gherkin: Delegate to Gherkin-Go prebuilt binaries #425

aslakhellesoy opened this issue Jul 3, 2018 · 48 comments

Comments

@aslakhellesoy
Copy link
Contributor

aslakhellesoy commented Jul 3, 2018

The Gherkin parser is currently implemented in too many languages. It's too much to maintain. Each language implementation should spawn a child process (gherkin-go), read its STDOUT and deserialise the output to language-native objects using Protocol Buffers.

If you want to have a go at implementing this for one of the languages below (or a language that isn't listed here), look at #424, which implements this for Java.

  • Build messages library
    • Java
    • Ruby
    • JavaScript
    • .NET
    • OCaml
    • Python
    • Perl
    • ObjC/Swift
  • Implement "magic file" which extracts binaries from package and/or downloads them from S3
    • Java
    • Ruby
    • JavaScript
    • .NET
    • OCaml
    • Python
    • Perl
    • ObjC/Swift
  • Read and deserialise messages from gherkin-go
    • Java
    • Ruby
    • JavaScript
    • .NET
    • OCaml
    • Python
    • Perl
    • ObjC/Swift
  • Delete the Gherkin parser (code, update Makefile, .rsync)
    • Java
    • Ruby
    • JavaScript
    • .NET
    • OCaml
    • Python
    • Perl
    • ObjC/Swift
  • Pass all the tests :-) (make)
    • Java
    • Ruby
    • JavaScript
    • .NET
    • OCaml
    • Python
    • Perl
    • ObjC/Swift
@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Jul 3, 2018

FWIW there are only 348 SLOC left in gherkin-java after I deleted 10K SLOC.

What @cucumber/committers want to help out with the remaining implementations? Nothing is more satisfying than deleting code :-)

Also note that gherkin now supports a new Rule keyword (#416), so if you want that to be available in the next version of cucumber-X - this has to be done.

@d-led
Copy link

d-led commented Jul 3, 2018

the bullet points are editable. Is this by design?

what about platforms where there are no gherkin-go libraries? E.g. as there's no viable Go compiler, is it out of scope?

@aslakhellesoy
Copy link
Contributor Author

Go's toolchain can cross-compile binaries for most platforms, from any OS that has a Go compiler. We're currently doing that as part of the CI process. This is what we're currently building:

gherkin-darwin-386
gherkin-darwin-amd64
gherkin-freebsd-386
gherkin-freebsd-amd64
gherkin-freebsd-arm
gherkin-linux-386
gherkin-linux-amd64
gherkin-linux-arm
gherkin-linux-mips
gherkin-linux-mips64
gherkin-linux-mips64le
gherkin-linux-mipsle
gherkin-linux-s390x
gherkin-netbsd-386
gherkin-netbsd-amd64
gherkin-netbsd-arm
gherkin-openbsd-386
gherkin-openbsd-amd64
gherkin-windows-386.exe
gherkin-windows-amd64.exe

The CI process also uploads them to S3, where they are publicly accessible. Try this:

curl https://s3.eu-west-2.amazonaws.com/io.cucumber/gherkin-go/master/gherkin-darwin-amd64 -o gherkin-darwin-amd64
chmod +x gherkin-darwin-amd64
./gherkin-darwin-amd64 --json features/*.feature

If there are additional platforms we need to support I'm sure we can make it work for most of them.

@aslakhellesoy
Copy link
Contributor Author

the bullet points are editable. Is this by design?

Yes, but only for committers. See https://blog.github.com/2013-01-09-task-lists-in-gfm-issues-pulls-comments/

@muggenhor
Copy link

Go's toolchain can cross-compile binaries for most platforms, from any OS that has a Go compiler. We're currently doing that as part of the CI process. This is what we're currently building:

This is not relevant for me at work yet, but we're targeting aarch64 (aka arm64) and I'm expecting to see more of that in the future, not less. So adding that as a target CPU seems desirable.

@aslakhellesoy
Copy link
Contributor Author

@muggenhor PR for that would be much appreciated

@gasparnagy
Copy link
Member

As I have mentioned earlier, having a process-execution-based parser is great for maintainability, but there are other concerns (performance, security, working offline) that make still room for native parsers. As far as I understand now the decision has been made that this repo will contain the go wrappers. Is there any advice on how/where the native parsers should be maintained (if the community wishes to do that of course)? Shall we host them under the "cucumber" organization (e.g. cucumber/gherkin-dotnet-native) or shall we move it to other organizations/users. I would keep it within cucumber for more consistency. With @SabotageAndi, we are committed to keep maintaining the .NET native parser, and personally, I would like to see a native javascript parser too (I yet have to see how I can contribute to that).

@aslakhellesoy
Copy link
Contributor Author

Regarding performance - I haven't done any throrough benchmarking, but here is a small one:

# Gherkin-Java 5.1.0
time ./bin/gherkin testdata/good/*.feature > /dev/null

real	0m0.299s
user	0m0.367s
sys	0m0.068s

# Gherkin-Java 6.0.0-SNAPSHOT (master)
time ./bin/gherkin testdata/good/*.feature > /dev/null

real	0m0.313s
user	0m0.388s
sys	0m0.065s

For Java there is no noticeable degradation in performance. The go binary is very fast:

# Gherkin-Go (master)
time ./bin/gherkin testdata/good/*.feature > /dev/null

real	0m0.021s
user	0m0.007s
sys	0m0.006s

Regarding working offline: The jar currently bundles binaries for windows/linux/osx + amd64/x86_32. It only tries to download binaries from S3 for other platforms. We should probably bundle all the binaries so we never have to go online. It would increase the size of the jar a little, but disk is cheap.

@gasparnagy I hope this adresses your concerns about performance and the ability to work offline.

Let's focus on security issues next and see where that brings us. What securtity concerns do you have?

@l3pp4rd
Copy link
Member

l3pp4rd commented Jul 3, 2018

have you considered using upx for go binary compression? UPX may decrease the executable startup performance by a tiny fraction, but may reduce the binary size by more than a half, I was successfully using it for more than a year, there was one moment when it was producing a failing binary, but that was an issue on go end.

@aslakhellesoy
Copy link
Contributor Author

Thanks for the upx pointer l3pp4rd - sounds like a good idea. Do you want to tweak the build to use it?

@l3pp4rd
Copy link
Member

l3pp4rd commented Jul 3, 2018

in a day or two I will be able to, if it can wait. I have a baby now, who changed my free time dramatically :) that is why I'm not that frequent contributor. but whenever I can make a small thing, I will be happy to.

@aslakhellesoy
Copy link
Contributor Author

Congratulations on the baby @l3pp4rd!!!!! I don't think upx is urgent at all, take your time :-)

@mxygem
Copy link
Member

mxygem commented Jul 4, 2018

Pass all the tests :-)

Are we including the Windows tests for Ruby, too ;)

@aslakhellesoy
Copy link
Contributor Author

I’m referring to the shared approval tests for gherkin - comparing output with files in ‘testdata’.

@mxygem
Copy link
Member

mxygem commented Jul 4, 2018

Ah, yeah. Makes sense!

@mxygem
Copy link
Member

mxygem commented Jul 4, 2018

−8,020 in my PR for the Ruby stuff so far. :)

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Nov 9, 2018

@SabotageAndi the security issues are going to be a concern. See this thread: https://groups.google.com/d/msg/cukes/ZYDs13qoVTY/OtvCvl_fBQAJ

I think we're at an impasse here.

  • We need to reduce the amount of duplicated development effort
  • Bundling cross-compiled executables will be blocked in many IT departments

I'd like to explore another avenue. Installing the executable separately. This is how WebDriver works. The various native language implementations are just thin shims that talk to a daemon running locally (chromedriver, geckodriver etc). This has to be installed separately.

Another option is to write the shared code in a language that can be compiled to many targets, such as Haxe. I don't think it's a viable option though. Few people know Haxe, so maintenance would be hard.

What do you think @charlierudolph @brasmusson @SabotageAndi @gasparnagy @mattwynne?

@mpkorstanje
Copy link
Contributor

Installing the executable separately. This is how WebDriver works. The various native language implementations are just thin shims that talk to a daemon running locally (chromedriver, geckodriver etc). This has to be installed separately.

This makes it hard to keep projects selfcontained. It'd require some maven/gradle plugin to fetch the executable as part of the build. E.g: https://github.com/Ardesco/selenium-standalone-server-plugin

This solution would be less then perfect though. Most build systems in the java ecosystem can only download artifacts through a proxy/cache like Nexus.

@aslakhellesoy
Copy link
Contributor Author

Do you have any suggestions @mpkorstanje?

@mpkorstanje
Copy link
Contributor

I think there are exactly four options: transpile, interpret, generate and delegate. And we're evaluating all these options against the same set of constraints: should be usable and should minimize effort.

So far we've found generation and delegation to be lacking. Only interpretation and transpilation have not yet been tested.

You've added an ease of maintenance requirement. But I don't Haxe is going to any harder then it already is. We're already dealing with a parser generators for a relatively nice language. If there were more people then we wouldn't be trying to reduce the duplication of effort in the first place. So depending on how well it transpiles Haxe would be an option.

As for interpretation: I couldn't find a JavaScript interpreter for most languages.

@aslakhellesoy
Copy link
Contributor Author

By delegation I assume you mean delegating to a binary executable, communicating over streams (what's in gherkin 6).

There are some problems with our current implementation:

  • Downloading (some malware/virus scanners don't allow downloading of binary executables)
  • Execution (some OSes are configured to only allow execution of "approved")

If we can address these problems, I don't think delegation is going to be a significant problem.

I propose we stop bundling the executables inside the platform-specific packages. We'll refactor the native code so it can use either of the following strategies for talking to the executable:

  • Over a socket (assuming it's already running)
  • Over STDIN/STDOUT (it would be launched just like now)

Users would have to download the binary executable in addition to installing the native package. We would need to build version negotiation into the protocol - the executable and the native package may drift out of sync.

This is the model WebDriver uses. It's a bit clunky, but it works. I can't think of a better strategy right now.

@aslakhellesoy
Copy link
Contributor Author

Just adding for the record that transpilation is not an option. I don’t want to build this new infrastructure and exclude all the platforms that are unsupported by the transpiler.

@mpkorstanje
Copy link
Contributor

I can't think of a better strategy right now.

Me neither. But I don't think it's a winning one.

@mattwynne
Copy link
Member

Would it be possible to use the shared code as libraries rather than executables? Could the Go code be compiled into something that each platform could use as a library instead?

@aslakhellesoy
Copy link
Contributor Author

It’s possible to compile Go code to shared libraries.

We’d still need to bundle them, so bundle sizes would be similar. Not sure if it would shut up malware detectors.

It might be worth a try!

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Nov 10, 2018

@cyocum
Copy link
Contributor

cyocum commented Nov 10, 2018

I have been very busy lately but I have picked up on this as I have seen that languages that have not moved to the new structure will be removed in January. Honestly, I have little time right now to get Cucumber.ml into this structure. Mostly, and I hate to admit this, but I do not understand how to get the protobuf messages information from the monorepo to my repo. I have tried twice now and I am, at this point, just avoiding this because I do not have the time to figure it out on my own. I will need someone who has time to hand-hold me through the process because it seems out of my ability. If someone has the time and patience to help me, I would appreciate it but right now I do not have the time as I have another project that has my time right now. I am hoping to get it out of the way but I will need a long paring session to understand what exactly I need to do to get information that my project needs from the monorepo into mine to avoid deletion.

@brasmusson
Copy link
Contributor

I think we should allow for a combination of approaches. The standardized test suite (feature files and corresponding golden json files) has been very successful, enabling us to growing 8 different parser implementations fulfilling the same gherkin language specification. Whatever we do we must not loose this ability. Side note: I think json golden files are better for qualifying a parser implementation than protocol buffer golden files, since it requires less of the parser implementation to show compliance to the gherkin language specification.

For the sake of argument, lets assume that we come up with an approach that works for Java, Javascript and Ruby (that is gherkin parsers to be used in Cucumber-JVM, Cucumber-JS and Cucumber-Ruby), but that approach is not suitable for the .NET environment. Then the gherkin parsers for Java, Javascript and Ruby can be developed and released together (using the mono-repo), and the gherkin for .NET can be developed and released separately still qualified for compliance to the gherkin language specification, and still hosted in the cucumber github project. The hosting in the cucucmber github project should probably require the used of the berp parser generated so that the is some consistency of implementation between the gherkin parsers hoisted here, even though not all are developed and released together.

This is similar to that not all Cucumber-JVM modules are developed and released together anymore. The non-Java language modules are released separately. Similar to a issue discussed here, this change was done to reduce the maintenance burden for the Cucumber-JVM core maintainers.

@SabotageAndi
Copy link
Contributor

@aslakhellesoy

I'd like to explore another avenue. Installing the executable separately. This is how WebDriver works. The various native language implementations are just thin shims that talk to a daemon running locally (chromedriver, geckodriver etc). This has to be installed separately.

On .NET, Selenium and all of the webdrivers are distributed by packages. See: https://www.nuget.org/packages?q=selenium
And this is good that way. No idea how I could explain my users, that they have to install an additional tool, when you want to use some library.

Additionally, you can then only have one version installed on your system. There will be breaking changes in the IPC.
And how to you install it on third party hosted build agents (as the AzureDevOps/VSTS hosted pipelines are)?

So having a separate binary to install is also not an option in .NET.

As @mpkorstanje wrote in #425 (comment) I think it is essential, that projects are self-contained. It is really annoying for me, when I have to install a long list of additional tools/libraries to get a project to compile.

@brasmusson Don't we have this golden files here: https://github.com/cucumber/cucumber/tree/master/gherkin/testdata

About shared libraries:

AFAIK they should work on .NET Standard. I have to do some experiments. But native stuff in a .NET program makes always problems. And the size increase of the package is still there. I am still not sold, that this would be good for .NET.

@brasmusson
Copy link
Contributor

@SabotageAndi Yes, we have the golden files, my point is just that we shall make sure not to loose them, even if we move away form that all gherkin parsers are stored in the cucumber/cucumber (mono-repo) project, and are released together.

@laeubi
Copy link

laeubi commented Nov 30, 2018

there are only 348 SLOC left in gherkin-java after I deleted 10K SLOC

But in contrast, the jar blows up from 342 kb to 45 MB(!), I understand that it is hard to support many parsers, but is it really the way to go to package such large binaries instead? Won'T it be better to write the parser in some kind of Parser-Language and the generate code for each language from that?
Also spawning new processes might degrade performance compared to a "native" parser.
This really hurts currently the cucumber-eclipse where I try to integrate the new parser, where parsing is used on several places to support the user, now parsind would include to spawn a new go-process for each parsing request.

@mpkorstanje
Copy link
Contributor

This really hurts currently the cucumber-eclipse where I try to integrate the new parser, where parsing is used on several places to support the user, now parsind would include to spawn a new go-process for each parsing request.

Unrelated but I don't think it's worth switching to Gherkin 6 just yet for the plugin. Cucumber JVM also can't use it.

@srvance
Copy link

srvance commented Dec 23, 2018

I'm looking to implement Gherkin compatibility in a new test tool and need to be able to turn feature specs to pickles in a browser. Delegating to an external Go process clearly won't work and I don't want the overhead of having to call a back end API for the functionality.

  1. The gherkin README still documents library use, which no longer seems to exist.
  2. What version has the older library code?

@stale
Copy link

stale bot commented Feb 21, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

@stale stale bot added the ⌛ stale Will soon be closed by stalebot unless there is activity label Feb 21, 2019
@mxygem mxygem added 🧷 pinned Tells Stalebot not to close this issue and removed ⌛ stale Will soon be closed by stalebot unless there is activity labels Feb 21, 2019
@mxygem
Copy link
Member

mxygem commented Feb 21, 2019

Removed stale and added Slow Burner to keep things open for now. Consider this particular issue on hold at least until Cukenfest where there'll hopefully be some more discussions about how to address things.

Sorry folks, but thank you for the patience!

@Ruschnik
Copy link

McAfee reported an “Artemis!ACC6568E95DA” regarding artifact .m2\repository\io\cucumber\gherkin\6.0.14\gherkin-6.0.14.jar\gherkin-go-windows-386.exe.
Because of that we downloaded gherkin-6.0.14.jar from maven central and tested it with McAfee and also here McAfee reported the same threat.

Is there anything wrong with the file regarding viruses or is it a false positive?

@mpkorstanje
Copy link
Contributor

The Gherkin parser is currently implemented in too many languages. It's too much to maintain. Each language implementation should spawn a child process (gherkin-go), read its STDOUT and deserialise the output to language-native objects using Protocol Buffers.

Informally we have decided against continuing this approach and the latest versions of Gherkin no longer include the go executables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment