Skip to content

Tutorial

Yuval Shavit edited this page Jul 16, 2024 · 6 revisions

We'll start with some examples to get you oriented, and then we'll dive into the details:

tl;dr

# Select sections containing "usage":
$ cat example.md | mdq '# usage'

# Select sections containing "usage", and within those find all unordered list items:
$ cat example.md | mdq '# usage | -'

# ... or maybe you only want the list items containing "note":
$ cat example.md | mdq '# usage | - note'

You can also output as JSON, which is particularly useful for feeding into jq:

# count uncompleted tasks
$ incomplete_tasks="$(cat example.md | mdq --json '- [ ]' | jq '.items | length')"
$ if [[ "$incomplete_tasks" -gt 0 ]]; then echo 'some tasks are incomplete!'; fi

Most of what you need

mdq treats Markdown as a stream of elements, which are then filtered down using selectors. Each selector selects one kind of Markdown element (sections, list items, etc), optionally filtered by a string matcher you provide.

To use mdq, you basically need to know three things:

  1. How to write an individual selector to find a specific element types (sections, list items, etc)
  2. How to add string matching (for example, sections whose titles match a string)
  3. How to pipe multiple selectors together

None of those are hard, so let's jump into it! The following few items will probably get you 90% of what you need.

Individual selectors

There are just a handful of selector types you need to know about:

  • Find sections:

    $ cat example.md | mdq '# foo'       # find headers whose title contains "foo"
  • Find lists and tasks:

    $ cat example.md | mdq '- foo'       # find unordered list items containing "foo"
    $ cat example.md | mdq '1. foo'      # find ordered list items containing "foo"
                                         #(note: the number must be exactly "1.")
    $ cat example.md | mdq '- [ ] foo'   # find uncompleted task items containing "foo"
    $ cat example.md | mdq '- [x] foo'   # find completed task items containing "foo"
    $ cat example.md | mdq '- [?] foo'   # find all task items containing "foo"
  • Find links and images:

    $ cat example.md | mdq '[foo](bar)'  # find links with display text containing "foo"
                                         # and URL containing "bar"
    $ cat example.md | mdq '![foo](bar)' # ditto for images

String matching

In the above examples, we always looked for elements containing "foo" or "bar". You can match in other ways, too. Let's look at unordered lists as an example; but all the "foo"s and "bar"s above work the same way:

  • Empty string matchers ("any"):

    # Find all headers, regardless of their titles:
    $ cat example.md | mdq '#'   # an empty selector means "any"
    $ cat example.md | mdq '# *' # so does *; they're exactly equivalent
  • Unquoted strings:

    # Find sections whose titles contain the text "unquoted strings".
    $ cat example.md | mdq '# unquoted strings'
    • Unquoted strings are case-insensitive.
    • Leading and trailing whitespace is ignored.
    • This matches against any substring (see "anchors" below).
  • Quoted strings:

    # Find sections whose titles contain the text "quoted | strings".
    # (You'll see in a sec why the pipe char requires us to quote this.)
    $ cat example.md | mdq '# "quoted | strings"'
    
    # Quoted strings can have escape sequences (unquoted strings can't)
    $ cat example.md | mdq '# "typical \n \"escape\" \u{2603} sequences" '
    • Quoted strings are case-sensitive.
    • This matches against any substring (see "anchors" below).
  • Anchors:

    As mentioned above, both quoted and unquoted strings match against any substring. You can restrict this by using ^ to anchor to the start of a string, and $ to anchor to the end:

    $ cat example.md | mdq '# ^match start of string'   # unquoted
    $ cat example.md | mdq '# ^"match start of string"' # quoted
    
    $ cat example.md | mdq '# match end of string $'    # whitespace trimmed
    
    $ cat example.md | mdq '# ^ match full string $ '
  • Regular expressions:

    No matching tool would be complete without 'em:

    $ cat example.md | mdq '# /reg(ex|ular expressions)/'

Chaining selectors

The | character chains selectors together. Make sure to include this within a quoted string in your shell, so that the shell doesn't take it for itself!

# Find sections whose title contains "foo", and within those, find all unordered lists.
$ cat example.md | mdq '# foo | - *'

This is why we needed to quote "quoted | strings" above. Without the quote, the matcher would end at | and interpret the input as two selectors:

  1. # quoted
  2. strings

The first of those would look for sections whose title contains just quoted (without also checking for | strings). The second is in invalid selector.

JSON Output

The --json flag will cause mdq to output the selected items as JSON, instead of Markdown. The gory details are below, but if you just try it out, hopefully it's pretty intuitive.

Clone this wiki locally