Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown support (alternative to Gherkin) #1209

Merged
merged 77 commits into from
May 14, 2021
Merged

Markdown support (alternative to Gherkin) #1209

merged 77 commits into from
May 14, 2021

Conversation

aslakhellesoy
Copy link
Contributor

@aslakhellesoy aslakhellesoy commented Oct 4, 2020

Summary

This PR adds support for Markdown as an alternative to Gherkin

Details

The Gherkin lexer/tokenizer has been modified to recognise Markdown. This has been done by adding a new dialect named md.

The idea is that the rest of the toolchain remains mostly unchanged:

  • The parser is the same
  • Cucumber is the same (except it needs to modify the glob logic to load both **/*.feature and **/*.md)
  • Formatters are mostly the same

The biggest change will be in the HTML formatter - or more specifically in @cucumber/react. It needs a whole new way to render documents:

  • Use a Markdown library to render the source instead of our own custom React components to render a GherkinDocument AST.
  • Decorate the rendered Markdown DOM with results, attachments etc from other messages

Motivation and Context

Adding prose, diagrams and other rich markup to Gherkin documents is cumbersome at best.

Although the Gherkin grammar doesn't make it explicit, you can put anything in the description section of a Feature, Scenarioetc, and some formatters (such as @cucumber/react / html formatter) will process this as Markdown.

In other words, it's possible to put small snippets of Markdown inside a Gherkin document.

This isn't how people work with Markdown. If you want to use Markdown, it's much more natural if the entire document is Markdown. This give you more flexibility to write a readable document.

We use existing Markdown constructs to recognise scenarios:

  • ## is a Scenario
  • ### is Examples
  • * (list item) is a step (Given, When, Then)

Types of changes

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist:

  • The change has been ported to Java.
  • The change has been ported to Ruby.
  • The change has been ported to JavaScript.
  • The change has been ported to Go.
  • The change has been ported to .NET.
  • I've added tests for my code.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have updated the CHANGELOG accordingly.

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Oct 4, 2020

Here is an example of IntelliJ IDEA running Cucumber which executes a Markdown document! - cucumber/cucumber-jvm#2140

image

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Oct 4, 2020

What happens if I put more information in markdown document then just scenarios? Like for example if I were to have scenarios at a different heading then h2? Or if I were to have documentation at a scenario heading. Or if I were to have a folder of .md files some with scenarios, some with just documentation.

An example of all these at once:

Hello world
===========

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat.

## The world is round

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur.

### Scenario: Something about math
 * step one
 * step two
 * step three

### Scenario: Something about gravity
 * step one
 * step two
 * step three

## The world is wet

Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

### Scenario: Something about chemistry
 * step one
 * step two
 * step three

### Scenario: Something about weather
 * step one
 * step two
 * step three

## Recommended accommodations

Vivamus eget magna eros. Mauris feugiat elit a lectus vulputate eleifend
a sed nulla. 

### If you are adventurous

Nunc sed auctor sem. Quisque vitae ligula commodo quam
vehicula mollis ut at magna. 

### Only in 2020

ed semper feugiat turpis in vulputate. Praesent varius
leo a enim sollicitudin lobortis. Nam a gravida ex. Nulla
interdum orci purus, vitae pellentesque lorem ultricies ac. 

## Tourist attractions

 Pellentesque efficitur turpis lorem, in tempor metus varius vitae. Proin
lectus dolor, luctus eget pulvinar vitae, commodo nec sapien. Donec
mattis quis mi sit amet auctor.

With that in mind I would strongly recommend a separate parser for Gherkin in Markdown. It will make evolving and experimenting with the parser easier. It also removes some of the edge cases that are leaking out of the parser abstraction and into other parts right now (e.g. the json file not being usable to generate annotations).

@aslakhellesoy
Copy link
Contributor Author

What happens if I put more information in markdown document then just scenarios?

These lines would just be ignored by the scanner. Probably marked as Empty.

Like for example if I were to have scenarios at a different heading then h2?

I've added GHERKIN_MARKDOWN.md which describes the proposed syntax in more detail. Scenarios would only be recognised with ## or possibly ### if the ## above is interpreted as a rule.

Or if I were to have a folder of .md files some with scenarios, some with just documentation.

I'm not sure. If the user doesn't specify what files to include, we could just parse them all. The "documentation" ones might end up having a lot of "undefined" scenarios. If that becomes a hassle we could come up with a more restrictive syntax to reduce the likelihood of this happening.

With that in mind I would strongly recommend a separate parser for Gherkin in Markdown.

Maybe. I think it's too early to make this decision. It's an investment I would like to defer until we have more data. If we can make this work without writing a new parser I would prefer that.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Oct 5, 2020

There were two ideas that underpinned my questions. I think they've both gone unaddressed.

  1. The proposed markdown syntax is very inflexible. Why should I define my scenarios always at h2. I'm writing a document that contains features, not a feature file that happens to look like markdown. There seems to be no point in using markdown otherwise.

  2. The implementation is a feature file with different keywords that happen to intersect with markdown but isn't actually markdown. The Gherkin markdown parser should accept all valid markdown documents but only provide pickles if there are indeed scenarios contained within. Currently the parser will reject anything that isn't structured like a feature file.

@aslakhellesoy
Copy link
Contributor Author

Why should I define my scenarios always at h2

I think it's important that users understand what's regarded as a scenario. The parsing rules must be easy to retain and understand for humans. If we make this flexible, as you seem to suggest, I think it will be harder for people to retain and understand the parsing rules.

Gauge is a tool inspired by Cucumber that has supported Markdown from the start. It's not as popular as Cucumber, but it seems to have a healthy user base. I take this as a sign that the Markdown syntax they have settled for works for end users, and that it would be safe for us to adopt a similar (or perhaps identical) syntax. Gauge defines a scenario as a h2.

In order to support the alternative syntax for h2 (text underlined with -----) we'd need a more advanced parser. I'd like to try with a 3rd party commonmark parser when/if we decide to add support for this. But as I said above, I think we can defer this until we have more feedback.

Currently the parser will reject anything that isn't structured like a feature file.

Yes, I think we need to improve the Markdown parser to be more tolerant of documents that don't use the Gherkin structure.

- `* {Keyword}` - Given, When, Then, And and But
- `|` - Tables (DataTable and ExamplesTable)
- `\`\`\`` - DocString
- `>` - prefix for @tags

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quotes may be not very appropriate for tags, they maybe useful for descriptions of feature or scenarios.

Please consider to allow tags in html comments, especially for features, e.g.

<!-- @tag @wip -->
# Feature

<!-- @foo -->
## Scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using > for descriptions of scenarios and features wouldn't be necessary. You'd just use normal paragraphs for that:

## This is a scenatio
This is the description
over a few lines...

* a step
* another step

Are there other reasons we want to consider something else than > to prefix tags?

If we put tags inside HTML comments, they won't be rendered once the Markdown is converted to HTML, and I think most users would expect them to be rendered.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I maybe not very clear with my bad English. I did not mean > as marker for feature description, I mean > is a quote in markdown and I imagine I would want to add some quotes in the document which describes the a feature.

If we put tags inside HTML comments, they won't be rendered once the Markdown is converted to HTML, and I think most users would expect them to be rendered.

Yeah, this is what I also came to after thinking more about my proposal.

In general I think the > is fine (if parser will look tags which starts with @ in the quotes).

The only thing that unclear to me, is the parser going to look for tags before # Feature or after? Because I think the quotes before level 1 header is not how we usually write the documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would look better if the tags came after the heading. However, that changes the grammar a bit and might complicate things.

This could work:

# @foo @bar
# Feature: Hello
According to research:
> Water boils at 200C

## @zap
## Scenario: Hello

It admittedly looks a bit weird in Markdown and the default rendering, but we could style it so that # @tag headers are rendered more like small tags above the real header.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Oct 5, 2020

If we make this flexible, as you seem to suggest, I think it will be harder for people to retain and understand the parsing rules.
I take this as a sign that the Markdown syntax they have settled for works for end users, and that it would be safe for us to adopt a similar (or perhaps identical) syntax.

I'm expecting it to be flexible because is called Markdown. The example is meant to illustrate that. Gauge on the other hand is calling their specifications "Gauge specifications " and uses the .spec extension with a syntax similar to Markdown (but not actually Markdown). This avoids the problems of mixing markdown and specifications and manages the expectations.

@aslakhellesoy
Copy link
Contributor Author

One idea is to define scenarios like this:

Bla bla

# Feature: Addition

Bla bla

## Scenario: 2+3

Bla bla

* Given I have entered 2

Bla bla

People could use any number of # we match based on Scenario. This also allows internationalising the markdown parsers.

I think this kind of “overlaying” Gherkin on top of Markdown might work better.

It addresses the concern you had about pure documentation documents.

It also makes it easy for authors and consumers to spot more easily what parts of the documentation is executable.

I also think this would be easy to implement.

Copy link
Contributor Author

@aslakhellesoy aslakhellesoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with @aurelien-reeves and @mattwynne

  • Call it "Markdown with Gherkin" (MdG)
    • Rename token matcher
  • Document why we're not matching dialect (has to be set globally by cucumber, see cucumber-js has a command line option for this).
  • Document that JSON formatters won't have descriptions for Markdown documents.
    • Tell people who want it that JSON formatter is in maintenance mode - use message format instead.
    • Maybe add descriptions to AST, it might be easy...

@aslakhellesoy aslakhellesoy merged commit 763170c into master May 14, 2021
@aslakhellesoy aslakhellesoy deleted the markdown branch May 14, 2021 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥒 core team Candidate for going onto the Cucumber Open Board: https://github.com/orgs/cucumber/projects/8 library: gherkin type: feature
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

9 participants