Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross platform DataTable implementation #210

Closed
aslakhellesoy opened this issue May 21, 2017 · 22 comments
Closed

Cross platform DataTable implementation #210

aslakhellesoy opened this issue May 21, 2017 · 22 comments
Labels

Comments

@aslakhellesoy
Copy link
Contributor

There are several DataTable implementations around, (in order of appearance):

We should take the best parts of all these (slightly divergent) implementations and reimplement it as a standalone (cross platform) library with a shared test suite. This has already been done in Cucumber Expressions, Tag Expressions and of course, Gherkin.

I think this implementation should be decoupled from the Gherkin library - it should not depend on Gherkin's DataTable implementation. Instead, it should be possible to construct a new DataTable object with a simple array of array of strings.

For this new implementation I would like data tables to support the Cucumber Expressions' custom parameter types.

This would allow users to define parameter types once, and use them both in steps and data table headers. For example:

Given some stuff:
  | color | currency |
  | red   | GBP      |
  | green | NOK      |
  | blue  | EUR      |

The corresponding DataTable would then automatically convert columns to the right type.

If you're interested in helping out building this new DataTable put your name below!

@osher
Copy link

osher commented May 22, 2017

im here, my name is Osher
I connect to what you say, but I need you to confirm my understanding.

If I get you right - you want instead of:

     | name     | String | Monster Joe |
     | escapes  | Int    | 7           |
     | address  | String | Balfur 413  |
     | district | String | Sohoy       |
     | city     | String | London      |
     | caught   | Bool   | No          |
     | lastSeen | Date   | 1976-06-07  |

to write:

     | name     | "Monster Joe" |
     | escapes  | 7             |
     | address  | "Balfur 413"  |
     | district | "Sohoy"       |
     | city     | "London"      |
     | caught   | No            |
     | lastSeen | 1976-06-07    |

and assume ParameterType like

//date
{
  regexp: /\d{4}-\d{2}-\d{2}/,
  transformer: Date.parse,
  typeName: 'date'
}

@aslakhellesoy, So far, did I get you right?

My only fear is that it makes the table technical.
Normal people (the target audience for feature files) do not enclose values with quotes, technical people do
Without the quotes, we cannot distinct between "No" that should be interpreted as false, and "No" that should be interpreted as a string.

I can generally see the consistency - we imply that we require them to use quotes in steps - But this is not a standard. I saw many step definitions that extract string values from gherkin lines without requiring them to be enclosed with quotes.

@osher
Copy link

osher commented May 22, 2017

one more question - I need to clarify I understand how ParameterType works:
Does the regexp list of each parameter type applied on the capture, or against the gherkin line?
Can I use captures in the 2nd regexp and expect them in the transformer?

i.e - I speculate order of operation, can you confirm it?

  1. step regexp tried on line
  2. step regexp matches -> delivers captures
  3. for each capture
    • for each ParamType
      • for each ParamType.regexp
        match? yes - use transformer & next capture

Can you point me to the relevant code so I can read it?

@osher
Copy link

osher commented May 22, 2017

About quotes -
Can we can default to string if no other type is recognized, and allow to specify strings explicitly by enclosing with quotes?

i.e - enclosed string will take precedence, and defaulting to string if no other type is matched

What happens if a capture matches two types? do we intend to throw like we do with matching two step defs?

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Sep 19, 2017

I just had an idea for how to use parameter types with tables. The Cucumber Expression can be followed by a type to be used for transformation. This can default to DataTable, which would produce a regular DataTable object, or it could be Something as long as there is also a parameter type with type = Something. The header would be used to inspect the type, possibly querying about field names and types, so it can transform the individual cells in each row of the body. Each body row then becomes an instance of the Something, with fields set to the transformed values of the cells.

The original discussion was in slack

@osher
Copy link

osher commented Sep 28, 2017

Not fully clear...
perhaps I'm not up to date with your DSL...
do you want me to join you on slack?

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Sep 28, 2017

I'll try to give an example (in JavaScript):

defineParameterType({
  name: 'ingredient',
  transformer: ({name, amount, unit}) => new Person({name, amount, unit}),
  // no regexp - this parameter type is only used for tables, not cucumber expressions
})

A class:

class Ingredient {
  constructor({name, amount, unit}) {}
}

A data table:

Given the following ingredients:
  | name        | amount | unit  |
  | sugar       |      3 | tbsp  |
  | lemons      |      3 | count |
  | water       |      1 | l     |

A step definition:

Given('the following ingredients:', 'ingredient', function(ingredients) {
  // ingredients is an Array of Ingredient
})

This is different from what I said in the comment above (but more aligned with the end of the linked slack discussion)

@osher
Copy link

osher commented Sep 28, 2017

Well, I'm not aware of that discussion, however - I can see that this solution does not work out of the box - it requires users to define transformers.
Generally it does not make me too happy - but I'll go with the flow, and contribute some comments later.

Before that - mind that we started with a PR that aims to let user define data-type of fields vertically, and we end up discussing defining data-type of a complete object horizontally
cucumber/cucumber-js#752

Last groan here - is that the requirement for explicitness and readability is lost, because the Gherkin spec doc does not provide the data-types - they are inferred.

Anyway - I don't see how one is related to the other - these are two different use-cases - we can support both.

Now - to business 😄
I assume you don't mean

  transformer: ({name, amount, unit}) => new Person({name, amount, unit}),

but

  transformer: ({name, amount, unit}) => new Ingredient({name, amount, unit}),

AFAI understand - here the class Ingedient is an implementation detail of the transformer handler, and not a requirement of any type.
We could just as well say:

defineParameterType({
  name: 'ingredient',
  transformer: ({name, amount, unit}) => ({ name, amount: Number(amount), unit }),
  // no regexp - this parameter type is only used for tables, not cucumber expressions
})

or

defineParameterType({
  name: 'itinerary',
  transformer: ({target, fromDate, toDate, fare, curCode}) => ({ 
    target,
    fromDate: moment(fromDate, 'YYYY-MM-DD'),
    toDate: moment(fromDate, 'YYYY-MM-DD'),
    fare: Number(fare),
    currency: currencies[curCode] || `NaC: ${curCode}`
  }),
  // no regexp - this parameter type is only used for tables, not cucumber expressions
})

I hope I'm right here...
By no mean do I wish to impose users work in paradigms they don't like

...

And that the 2nd argument in

Given('the following ingredients:', 'ingredient', function(ingredients) {

refers to

defineParameterType({
  name: 'ingredient',

which is AFAIK - an API change of a future change.

In that case I would propose to try take the argument name from func.toString() and rid of the API change of additional optional argument.

I also can suggest that if we support optional attributes (like regexp) - we can support an additional optional attribute that will tell how the variable name is given for list - this will allow users cascade a default behavior that assumes 'ingredients' is the plural of 'ingredient' for whatever lingual exception, or whatever user preference (e.g. 'ingredientList').
Heck, it can even be a list of such names, if the user needs more flexibility (less consistency).
We can protect her from using the same name twice with an error on load time.

@osher
Copy link

osher commented Sep 28, 2017

one more last cent:

If we really must go for an api change and introducing an additional argument - let it be a destructurable property-bag, and not a string.
this is more extensible and will let us provide any additional directives we wish to provide for a step.

@akostadinov
Copy link

Instead, it should be possible to construct a new DataTable object with a simple array of array of strings.

FYI in ruby we already can call in World:

my_table = table([["a","b","c"],["v","f","r"]])

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Oct 15, 2017

Some notes for Java.

Currently the ParameterTypeRegistry only contains a mapping of (name|regex): String => Type. To make working with DataTables in Java possible (name): TableRow => Type is also needed. This has already been proposed above but the lack of types doesn't make this obvious.

Without naming a table row transform lambda step definitions should be able to handle any non generic type as their arguments. When declaring a table row transform they must be able to handle any type. Annotation based step definitions should be able to handle all types without naming a table row transform. Though they may still do so as a form disambiguation.

When no TableRow => Type transform can be found the default should be a bean mapper. This would be equivalent to transformer: ({name, amount, unit}) => {name, amount, unit}), in javascript. To do the type conversion of individual properties the bean mapper should use the parameter type registry to look up a String => Type transform.

Java provides several mappings from a DataTable. They're currently all Type based but should also get a name based version.

   <T> T convert(DataTable dataTable, Type type, boolean transposed);
    <T> List<T> toList(DataTable dataTable, Type itemType);
    <T> List<List<T>> toLists(DataTable dataTable, Type itemType);
    <K, V> Map<K, V> toMap(DataTable dataTable, Type keyType, Type valueType);
    <K, V> List<Map<K, V>> toMaps(DataTable dataTable, Type keyType, Type valueType);

To facilitate diffing domain objects against data tables java also provides a transform

    DataTable toTable(List<?> objects, String... columnNames);

This would require that the mappings are extended to include a reverse transform. E.g. (name|regex): (String => Type, Type => String) and (name): (TableRow => Type, Type => TableRow). If none is defined they could default to toString and getting bean properties respectively.

edit:

To avoid creating dependencies between the cucumber-expressions library and the data-table library TableRow should be Map<String,Object>.

@aslakhellesoy
Copy link
Contributor Author

A ParameterType has a type: Type field, so it would be trivial to add a #lookupByType method to ParameterTypeRegistry.

If we do that, the Java API for specifying what Type to use for rows could look like this:

Given("the following ingredients:", Ingredient.class, (List<Ingredient> ingredients) -> {
  // ingredients is an Array of Ingredient
})

Or if the generic type can be inferred:

Given("the following ingredients:", (List<Ingredient> ingredients) -> {
  // ingredients is an Array of Ingredient
})

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Oct 15, 2017

A ParameterType has a type: Type field, so it would be trivial to add a #lookupByType method to ParameterTypeRegistry.

Its type is bound to the transform. So the Transform would have to be Transform<X,T> where X is either String or Map<String,String>. We can enforce that in the constructor but the change becomes less trivial. Not too much effort though.

public ParameterType(String name, String regexp, Class<T> type, Transformer<String, T> transformer) {
    this(name, singletonList(regexp), type, transformer, true, false);
}

public ParameterType(String name, Class<T> type, Transformer<Map<String,String>, T> transformer) {
    this(name, emptyList(), type, transformer, false, false);
}

If we do that, the Java API for specifying what Type to use for rows could look like this:

Given("the following ingredients:", Ingredient.class, (List<Ingredient> ingredients) -> {
  // ingredients is an Array of Ingredient
})

Yes and when there are multiple ways to create ingredients from a data table we can also use the name based version:

Given("the following english ingredients:", "english-ingredients" (List<Ingredient> ingredients) -> {
  // ingredients is an Array of Ingredient
})

Given("the following spanish ingredients:", "spanish-ingredients" (List<Ingredient> ingredients) -> {
  // ingredients is an Array of Ingredient
})

edit: There might not always be a SpanishIngredients type. Just the static Ingredients.spanish() constructor so naming things is still required.

@mpkorstanje
Copy link
Contributor

Overall I am not seeing any disambiguations problems with (name): (TableRow => Type, Type => TableRow).

I am seeing about disambiguation problems with (name|regex): (String => Type, Type => String) For example if I wanted to make a list of BankNotes(color: Colour, currency: ValutaType) out of:

 Given some stuff:
  | color | currency |
  | red   | GBP      |
  | green | NOK      |
  | blue  | EUR      |

without explicitly defining a transform for BankNotes. Then for TableRow => Type I'd use the the bean mapper. Now the bean mapper have to look up String => Type. But there can be multiple transforms registered for Colour and ValutaType.

I suppose I could select the one that matches the regex? And if multiple match, throw an exception?

@aslakhellesoy
Copy link
Contributor Author

See e032eba (branch datatable-spike) for a spike implementation. I just wanted to get a feel for it. Any comments? WDYT @mpkorstanje?

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Oct 15, 2017

The code that is now in asList0 should be in a transform Map<String,String> => Type so in asList0 it might look something like:

Transform<Map<String,String>, String> rowTransformer =  registry.lookupByType(rowtype);
for (Map<String, String> tableRow : rows) { 
    list.add(rowTransformer.transform(tableRow);
     ...
}

It shouldn't be too different from the existing TableConverter.java. Replace xStream with registry.lookupByType(rowtype).transform(tableRow) and we've got most of it.

edit:

The transform should work on a Map<String,String> and not a List<String> so the transform knows the table header of each value. This is required for bean mapping.

@aslakhellesoy
Copy link
Contributor Author

You mean Transformer and not Transform right?

The existing ParameterTypeRegistry#lookup* methods return a ParameterType. Are you saying the new #lookupByType method should not do that and return a Transformer instead?

Could illustrate what you mean by adding some tests and code to my spike branch?

@mpkorstanje
Copy link
Contributor

Sure! Just need some time.

@mpkorstanje
Copy link
Contributor

@stale
Copy link

stale bot commented Dec 15, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

@stale stale bot added the ⌛ stale Will soon be closed by stalebot unless there is activity label Dec 15, 2017
@aslakhellesoy aslakhellesoy added the 🧷 pinned Tells Stalebot not to close this issue label Dec 16, 2017
@stale stale bot removed the ⌛ stale Will soon be closed by stalebot unless there is activity label Dec 16, 2017
@mpkorstanje
Copy link
Contributor

I started the data tables/ cucumber expression spike for cucumber-jvm with the assumption that we could use the cucumber expressions to transform single elements in a data table.

For example:

| a | 12 |
| b | 13 |
| c | 14 |

Should be transformed to a Map<String, Int>

This supposedly could be done by looking up an expression for a type say Integer and use the associated transform to transform the field of the data table. In practice however this will run into disambiguation problems and other unexpected chicanery.

For example looking up String forces us to chose between the expressions {word} and {string} with neither being appropriate. The string need not be a word and may contain escaped characters that {string} will unescape.

And this is just the build in types. With custom types it gets even more interesting. So right now I reckon that using cucumber expressions for data tables is just the wrong idea. We might get it to work by force but conceptually the mapping is wrong.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Jan 7, 2018

There are three ways in which a table might be converted to an instance of a certain type.

  • Using the whole table to create a single instance.
  • Using individual rows to create a collection of instances. The first row may be used as header.
  • Using individual cells to a create a collection of instances.

The the transformer that handles this conversion can either be explicitly defined by name or implicitly by the target type. When trying to find a transformer by type they should be considered in order from most (convert the whole table) to least (convert individual cells).

Dealing with Lists, Lists of Lists and single instances is fairly straightforward. Search the converters in order and apply its transform. If no transform is found. Look for a list of the type you want to create. If no transform is found look for a list of a list of the type. Give up otherwise.

Dealing with maps requires some additional convention. First lookup a single instance transformer. If none exists then a map consists of a keyset and value set. The key set is created by taking the left column of a data table and transforming it to a list of keys. Note that a single column is also a datatable. The value set is created by taking the other columns and transforming them into a list of values. Note that this is recursive (but only once). These two lists are then paired up into a map.

This leads to the following possibilities:

Scenario: Datatable to single object
  Given the following chessboard:
    | |1|2|3|
    |A|♘| |♝|
    |B| | | |
    |C| |♝| |
  Then I have a board with one knight and two bishops

Scenario: Datatable rows to list of object
  Given the following groceries:
    | name  | price |
    | milk  | 9     |
    | bread | 7     |
    | soap  | 5     |
  Then I have:
  """
  [
    {name: "milk", price: "9"}, 
    {name: "bread", price: "7"}, 
    {name: "soap", price: "5"}
  ]
  """

Scenario: Datatable to list of lists of objects
  Given the following groceries:
    | name  | price |
    | milk  | 9     |
    | bread | 7     |
    | soap  | 5     |
  Then I have:
  """
  [
    ["name", 9"],
    ["milk", "7"],
    ["bread", "5"],
  ]
  """

Scenario: Datatable rows to map
  Given the priorities for groceries:
    | low    |  milk | 
    | medium | bread |
    | high   | soap  | 
  Then I have:
  """
  {
    low:  "milk", 
    medium: "bread", 
    high: "soap"
  ]
  """

Scenario: Datatable rows to map of maps
  Given the following groceries:
    |        | name  | price |
    | low    |  milk | 9     |
    | medium | bread | 7     |
    | high   | soap  | 5     |
  Then I have:
  """
  {
    low:  {name: "milk", price: "9"}, 
    medium: {name: "bread", price: "7"}, 
    high: {name: "soap", price: "5"}
  ]
  """

Scenario: Datatable rows to map of lists
  Given the following groceries:
    | jones   |  milk | bread  | soap |
    | smith   | bread | milk   |      | 
    | jackson | soap  |        |      |
  Then I have:
  """
  {
    jones:  ["milk", "bread", "soap" ],
    smith:   ["bread", "milk" ],
    jackson: ["soap" ], 
  ]
  """


Scenario: Datatable rows to map of values
  Given the following groceries:
    | jones   |  milk | bread  | soap |
    | smith   | bread | milk   |      | 
    | jackson | soap  |        |      |
  Then I have:
  """
  {
    jones:  Grocceries  ["milk", "bread", "soap" ],
    smith:   Grocceries ["bread", "milk" ],
    jackson: Grocceries ["soap" ], 
  ]
  """

But I might have missed a few

@aslakhellesoy
Copy link
Contributor Author

I would have loved to see this, but I'm closing this issue.

We're establishing a new issue triage process where we prioritise the oldest issues first.

Given the choice to work on this next or close it, I'm closing it because it's too big. If anyone feels strongly about this, please submit one or more pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants