Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialise debug_cb{,_data} in jq_init(); Populate captures also for zero-width matches; In the manual, define examples in the "examples" field not in "example" #2724

Merged
merged 3 commits into from
Jul 18, 2023

Conversation

emanuele6
Copy link
Member

Only examples defined in the "examples" field are generated, if they are defined in "example", they are just ignored.

Ref: #2547 (comment)

Screenshots from https://jqlang.github.io/jq/manual:
file
file

@emanuele6
Copy link
Member Author

A few of the manonig.test tests that were not being generated fail/leak


- program: 'sub("[^a-z]*(?<x>[a-z]*)"; "Z\(.x)"; "g")'
examples:
- program: 'sub("[^a-z]*(?<x>[a-z]+)"; "Z\(.x)"; "g")'
Copy link
Member Author

@emanuele6 emanuele6 Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to use + instead of * as a consequece of #2641

EDIT: actually, since the input is "123abc456def", and even if it were "123abc456def7", it should have output "ZabcZdefZ", not "ZabcZdefZnull"; this should not be necessary.
It seems to be a bug in _match_impl/3, if the pattern is able to match the empty string, it always matches the empty string at the end of the input:

$ ./jq -n '"1" | _match_impl("1*"; "g"; false)'
[
  {
    "offset": 0,
    "length": 1,
    "string": "1",
    "captures": []
  },
  {
    "offset": 1,
    "length": 0,
    "string": "",
    "captures": []
  }
]

The reason why this didn't work is that #2677 introduced a breaking change: If a pattern that is able to match the empty string has a match that goes up to the end of string, both the match and the empty string at the end of the string are matched (like for javascript's .replaceAll(), perl's s///g, and python's re.sub()) instead of matching only the match and not the empty string at the end that was jq's old behaviour (similar to vim's and sed's s///g, awk's gsub()).

The reson why "123foo" | sub("[^a-z]*(?<x>[a-z]+)"; "Z\(.x)"; "g") outputs "ZfooZnull" instead of "ZfooZ" is that match/2 currently does not populate the captures array for zero length matches, so for the empty string at the end of "123foo", "Z\(.x)" runs with an empty object as input.
I have now fixed this problem in one the commits in this PR; the captures array is also populated for empty matches.

@pkoppstein
Copy link
Contributor

Hopefully someone can tweak things so that both "example:" and "examples:" are recognized.
That would make things much easier, not least for those who might only want to give one example :-)

@emanuele6
Copy link
Member Author

Hopefully someone can tweak things so that both "example:" and "examples:" are recognized.
That would make things much easier, not least for those who might only want to give one example :-)

I don't see the point of doing that.

@pkoppstein
Copy link
Contributor

I don't see the point of doing that

Oh.

  1. If there's only one example, then the heading should not be "examples:". For one thing, it's misleading; for another, it's just wrong.
  2. If "examples:" is required, then writers who are not aware of the requirement will undoubtedly (as they have done) write "example:" when there is only one example, thus introducing grief, possibly for many people.
  3. If "examples:" is required, then writers who are aware of the requirement and who want to avoid being misleading will write more than one example, even if one example suffices. This leads to bloat and may well be annoying to more than just the aforesaid writer.

@emanuele6
Copy link
Member Author

@pkoppstein

But that is not an heading, it's a yaml file, and that part is the "examples" key of a yaml object.

It would be like having a JSON file in which you can use either {"queries":["foo","bar"]} or {"query":["foo","bar"]} which do exactly the same thing.

I don' think that is good, it just makes the formatted data harder to use for users. e.g. if a user wants to extract the examples from that file, they can do:

$ fq '.sections[].entries[]?.examples | arrays[]' docs/content/manual/manual.yml
{
  "input": "\"Hello, world!\"",
  "output": [
    "\"Hello, world!\""
  ],
  "program": "."
}
{
  "input": "12345678909876543212345",
  "output": [
...

If there are aliases for examples, now the user has to use:

$ fq '.sections[].entries[] | first(.["examples", "example", "Examples"] | arrays)[]' docs/content/manual/manual.yml

And you have to make sure that the precedence order you use for the query is the same that other programs that use that file use (the website generator, the man.test generator, etc).

Those types of things in structured data are the opposite of user friendly imho.

@emanuele6 emanuele6 changed the title Define tests in the "examples" field instead of "example" Initialise debug_cb{,_data} in jq_init(); Populate captures also for zero-width matches; In the manual, define examples in the "examples" field not in "example" Jul 18, 2023
@emanuele6 emanuele6 force-pushed the exampleS branch 3 times, most recently from 0e04167 to 9e1e95d Compare July 18, 2023 07:28
@pkoppstein
Copy link
Contributor

But that is not an heading, it's a yaml file

Well, I see your point, but the manual is primarily for humans (and hopefully for enjoyment and not just enlightenment). YAML is just a tool. Or as Lewis Carrol wrote:

'The question is,' said Humpty Dumpty, 'which is to be master — that's all.

@itchyny
Copy link
Contributor

itchyny commented Jul 18, 2023

If some CI (maybe manpage?) fails against unknown keys with descriptive error messages, that would be okay.

To make debug/0 not call an uninitialised function pointer when using
--run-tests or when using a jq_state on which jq_set_debug_cb() has not
been called.
Instead of just using {"captures":[]}.

sub functions are use captures for replacement expressions.
If we don't populate, captures for empty matches, the replacement
expression is run with an empty object as input instead of an object
containing the named captures with "" as value:

* before:

  $ jq -n '"123foo456bar" | gsub("[^a-z]*(?<x>[a-z]*)"; "Z\(.x)")'
  "ZfooZbarZnull"

* after:

  $ jq -n '"123foo456bar" | gsub("[^a-z]*(?<x>[a-z]*)"; "Z\(.x)")'
  "ZfooZbarZ"

---

I also removed a redundant

  result = NULL;
  if (result) {
    ...
  }
Only examples defined in the "examples" field are generated, if they are
defined in "example", they are just ignored.

Also fix a bunch of incorrect tests.
@nicowilliams
Copy link
Contributor

LGTM!

@nicowilliams nicowilliams merged commit 15fd31d into jqlang:master Jul 18, 2023
@nicowilliams
Copy link
Contributor

Thanks!

@emanuele6 emanuele6 deleted the exampleS branch July 18, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants