Skip to content
phiresky edited this page May 25, 2023 · 8 revisions

Please see https://github.com/phiresky/ripgrep-all/issues/146 for the current state of the project

Custom adapters

Since version 1.0, you can specify custom adapters that invoke external preprocessing scripts in the config file.

For example, the integrated PDF-to-text adapter would look like the following in the config file:

    "custom_adapters": [
        {
            "name": "poppler",
            "version": 1,
            "description": "Uses pdftotext (from poppler-utils) to extract plain text from PDF files",

            "extensions": ["pdf"],
            "mimetypes": ["application/pdf"],

            "binary": "pdftotext",
            "args": ["-", "-"],
            "disabled_by_default": false,
            "match_only_by_mime": false,
            "output_path_hint": "${input_virtual_path}.txt.asciipagebreaks"
        }
    ]

More info about the custom adapter config can be found on docs.rs (CustomAdapterConfig)

Integrated adapters vs custom adapters vs rg --pre

With custom adapters, there's now three ways you could search custom files. Here's the (dis)advantages of each.

  • rg --pre and rg --search-zip: rg has integrated functionality to have custom preprocessors and to search some compressed files. The disadvantages are
    • Simplicity. '--pre' is one same script applied to all file types. You have to write decision logic yourself.
    • Caching. What makes adapters in rga fast is the caching mechanism, which allows fast search even when the preprocesser is slow (which is often the case). With '--pre' you'd have to implement this caching yourself, which isn't trivial. That's how rga got started ;).
    • Recursion. rga can recurse into archives, and return contents at any depth as a binary stream. The same can be implemented for other things that aren't strictly archives, like a pdf file that contains images, where the images may be searched by a different extractor.
  • Custom adapters. Custom adapters are great because they allow you to write an adapter in non-rust code and use external libraries. You could even hook lesspipe into it. They are limited in that they can only output a single file per input file though, so they cannot handle archives like zip.
  • Integrated adapters. Integrated adapters are fastest and most flexible because they are written in Rust and don't require external spawns.

If you think your adapter config is useful, you can share it by adding it to the wiki

[Todo: tesseract OCR adapter]

Clone this wiki locally