-
Is there a way to exclude (sub)folders or specific files from the index? I'm getting duplicate results because the same article will show up in different folders, eg Using Pelican as SSG and I can't figure out how to use For clarity, here is my output/source folder indicating where folders should be removed: .
├── 2019 /// REMOVE
├── 2020 /// REMOVE
├── 2021 /// REMOVE
├── 2022 /// REMOVE
├── ABOUT /// REMOVE
├── BingSiteAuth.xml
├── CNAME
├── ai
├── apps
├── archives.html /// REMOVE
├── author /// REMOVE
├── authors.html /// REMOVE
├── b2b-sales
├── books
├── careerplaybook
├── categories
├── csvs /// REMOVE
├── extra /// REMOVE
├── favicon.ico
├── googlexxxxxxxxxxx.html /// REMOVE
├── helpers
├── home-office
├── images
├── index.html
├── interests
├── leadership
├── learning
├── movies
├── pagefind /// my Pagefind index folder
├── pdfs
├── projects
├── python
├── random
├── robots.txt
├── sitemap.xml
├── startups
├── tag /// REMOVE
├── tags.html /// REMOVE
└── theme /// REMOVE |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
Hello! 👋 Hmm, Pagefind could probably do with a better way of excluding things if there are this many folders and files to exclude, but there is a way now that might just take a few experiments. Using the For example, if I have the file tree:
I could use the glob: "{index.html,about/**/*.html}" To only index the root Looking at your file tree, this will unfortunately get to be quite a long glob string, so I'll also open an issue for better configuration support here (The (I've just added a test for complex globs to make sure it works as expected: Scenario: Complex exclusionary file glob can be configured, so you can see a real implementation there) |
Beta Was this translation helpful? Give feedback.
Hello! 👋
Hmm, Pagefind could probably do with a better way of excluding things if there are this many folders and files to exclude, but there is a way now that might just take a few experiments.
Using the
glob
option on the Pagefind CLI, you can control what files are ingested (instead of the default**/*.{html}
which captures everything. The glob will be slightly ugly when it gets this complicated, since you'll need to opt-in to files for now — I'd definitely recommend configuring it in a pagefind config file rather than over the CLI.For example, if I have the file tree:
I could use the
p…