Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to find and filter tabular data #141

Closed
themccubbins opened this issue Jul 17, 2024 · 2 comments · Fixed by #163
Closed

Add a way to find and filter tabular data #141

themccubbins opened this issue Jul 17, 2024 · 2 comments · Fixed by #163
Labels
enhancement New feature or request

Comments

@themccubbins
Copy link

Find: there's no titles for tables so maybe the selector could be by column titles, or by some selector on the data.

Select: It would be really cool if you could select particular rows, columns and cells of tables.

@yshavit yshavit added the enhancement New feature or request label Jul 17, 2024
@yshavit
Copy link
Owner

yshavit commented Jul 18, 2024

Two questions for you:

  1. How did you want to select particular rows, columns, cells? By index, by searching within them? If searching, do you then want to return only the cell that matched, or something like "return all rows that have foo anywhere in any cell"?
  2. Does any particular syntax come to mind for you? That's something I really struggle with for this.

One thing that just occurred to me is that since bash strings are easy to make multiline, I could potentially have the table selector syntax be multiline somehow. Maybe something like:

mdq '|-|
     | /regex for columns, by header/ |
     | /regex for row; output the row if any column matches /

for example, if I had this table:

hello fizz world
one three two
four six five

then this:

|-|
| /o/  |
| five |

would result in:

hello world
four five
  • the columns with headers hello and world matched, since those two header texts matched /o/
  • the header row always matches (since markdown tables have to have headers, at least in github syntax)
  • that | four | five | row matched since its second column matched the five matcher (unquoted string)
  • the |s don't need to be aligned

In this proposed syntax, |-| is a table selector, but that alone would just match all tables, and return all their data; if you want the additional selector, you have to specify both the row and column selector. That makes it a lot more obvious which is which. Of course, you can always use the empty or * matcher for "any":

mdq '|-|
     | /o/ |
     | *   |

One additional wrinkle is that right now, I intentionally have the syntax such that every character is unambiguous as you read it -- the only real exception being escapes within quotes. This would break that, since |-| could either be "one token: table selector" or "three tokens: [pipe, list with empty matcher, pipe]". My concern isn't the parsing complexity, but rather the obviousness to a human reader. One option is to use curlies, which I've already considered for advanced #56. that would look something like:

mdq '{|-|}'  # just select tables

mdq '{|-|
      | /o/ |
      | *   |
     }'

How does that strike you?

@yshavit
Copy link
Owner

yshavit commented Jul 19, 2024

Oh, I could do :-: maybe? That mirrors the separator between the header row and data rows:

| Name | Value |
|:----:|:-----:|  <-- this bit
| Foo  | 123   |

The colons on both sides mean "column is centered", so it's a bit of a misnomer to use them for "tables" in general -- but that's probably okay.

Since I always have two matchers, maybe I can do away with all the other table markdown.

mdq ':-: /o/ *'

Hm, maybe I should repeat that between the row and column matchers?

mdq ':-: /o/ :-: five'
mdq ':-: /o/ :-: *'

I think that may be the winner so far.

@yshavit yshavit added this to the fill out selectors milestone Jul 21, 2024
yshavit added a commit that referenced this issue Jul 25, 2024
The only thing the interface actually needs is `fn try_select`. The
`fn matches` is just a convenient way of implementing `try_select`. This
commit formalizes that by making Selector only decaore `fn try_select`,
and without a default impl.

This is in preparation for #141. That implementation will let us create
"table slices", which aren't just a pointer to an existing table, but
actually creates a new table which is a subset of the original. That
won't fit into the `fn matches` pattern, so this PR breaks that pattern
up.
@yshavit yshavit closed this as completed in 2ff2d7d Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants