Improve documentation

- More explicitly state what is returned by each filter. - Make more clear that the `pipeline` does not have to be configured if no filters are desired. - Show examples of using builtin spell checker modes instead of pipeline and mention they _could_ be used together.
facelessuser · Sep 2, 2023 · 42a57b3 · 42a57b3
1 parent c05e0fe
commit 42a57b3
Show file tree

Hide file tree

Showing 15 changed files with 131 additions and 27 deletions.
diff --git a/docs/src/dictionary/en-custom.txt b/docs/src/dictionary/en-custom.txt
@@ -8,6 +8,7 @@ Booleans
 CPP
 CSS
 Changelog
+CommonMark
 Cygwin
 GitHub
 GitLab

diff --git a/docs/src/markdown/configuration.md b/docs/src/markdown/configuration.md
@@ -206,6 +206,34 @@ matrix:
 
 ### Pipeline
 
+/// note
+PySpelling's `pipeline` is designed to provide advanced, custom filtering above and beyond what spell checker's normally
+provide, but it may often be the case that what the spell checker provides is more than sufficient. It should be noted
+that `pipeline` filters are processed before sending the buffer to Aspell or Hunspell. By default, we disable any
+special modes of the spell checkers.
+
+Spellcheckers like Aspell have builtin filtering. If all you need is the builtin filters from Aspell, the `pipeline`
+configuration can be omitted. For instance, to use Aspell's builtin Markdown mode, simply set the Aspell option directly
+and omit the pipeline.
+
+```yaml
+- name: markdown
+  group: docs
+  sources:
+  - README.md
+  aspell:
+    lang: en
+    d: en_US
+    mode: markdown
+  dictionary:
+    wordlists:
+    - .spell-dict
+    output: build/dictionary/markdown.dic
+```
+
+You can also use PySpelling `pipeline` filters and enable special modes of the underlying spell checker if desired.
+///
+
 PySpelling allows you to define tasks that outline what kind of files you want to spell check, and then sends them down
 a pipeline that filters the content returning chunks of text with some associated context. Each chunk is sent down each
 step of the pipeline until it reaches the final step, the spell check step. Between filter steps, you can also insert
@@ -399,12 +427,13 @@ matrix:
 
 ### Spell Checker Options
 
-Since PySpelling is a wrapper around both Aspell and Hunspell, there are a number of spell checker specific options. As
-only a few options are present in both, it was decided to expose them via spell checker specific keywords: `aspell` and
-`hunspell` for Aspell and Hunspell respectively. Here you can set options like the default dictionary and search
-options. Not all options are exposed though, only relevant search options are passed directly to the spell checker.
-Things like replace options (which aren't relevant in PySpelling) and encoding (which are handled internally by
-PySpelling) are not accessible.
+Since PySpelling is a wrapper around both Aspell and Hunspell, there are a number of spell checker specific options.
+Spell checker specific options can be set under keywords: `aspell` and `hunspell` for Aspell and Hunspell respectively.
+Here you can set options like the default dictionary and search options.
+
+We will not list all available options here. In general we expose any and all options and only exclude those that we are
+aware of that could be problematic. For instance, we do not have an interface for interactive suggestions, so such
+options are not allowed with PySpelling.
 
 Spell checker specific options basically translate directly to the spell checker's command line options and only
 requires you to remove the leading `-`s you would normally specify on the command line. For instance, a short form

diff --git a/docs/src/markdown/filters/context.md b/docs/src/markdown/filters/context.md
@@ -3,9 +3,12 @@
 ## Usage
 
 The Context filter is used to create regular expression context delimiters for filtering out content you want from
-content you don't want. Depending on how the filter is configured, the opening delimiter will swap from ignoring text to
-gathering text. When the closing delimiter is met, the filter will swap back from gathering text to ignoring text.  If
-`context_visible_first` is set to `true`, the logic will be reversed.
+content you don't want. It takes a text buffer in and will return one or more text buffer with undesirable content
+filtered out.
+
+Depending on how the filter is configured, the opening delimiter will swap from ignoring text to gathering text. When
+the closing delimiter is met, the filter will swap back from gathering text to ignoring text. If `context_visible_first`
+is set to `true`, the logic will be reversed.
 
 Regular expressions are compiled with the MULTILINE flag so that `^` represents the start of a line and `$` represents
 the end of a line. `\A` and `\Z` would represent the start and end of the buffer.

diff --git a/docs/src/markdown/filters/cpp.md b/docs/src/markdown/filters/cpp.md
@@ -2,9 +2,12 @@
 
 ## Usage
 
-The CPP plugin is designed to find and return C/C++ style comments. When first in the chain, the CPP filter uses no
-special encoding detection. It will assume `utf-8` if no encoding BOM is found, and the user has not overridden the
-fallback encoding. Text is returned in chunks based on the context of the text: block, inline, or string (if enabled).
+The CPP filter is designed to find and return C/C++ style comments and strings. It accepts a text buffer and will return
+one or more text buffers containing content from comments and/or strings.
+
+When first in the chain, the CPP filter uses no special encoding detection. It will assume `utf-8` if no encoding BOM is
+found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of the
+text: block, inline, or string (if enabled).
 
 When the `strings` [option](#options) is enabled, content will be extracted from strings (not character constants).
 Support is available for all the modern C++ strings shown below. CPP will also handle decoding string escapes as well,

diff --git a/docs/src/markdown/filters/html.md b/docs/src/markdown/filters/html.md
@@ -2,9 +2,12 @@
 
 ## Usage
 
-The HTML filter is designed to capture HTML content, comments, and even attributes. It allows for filtering out specific
+The HTML filter is designed to capture HTML content, comments, and even attributes. It allows for filtering out specific 
 tags, and you can even filter them out with basic selectors.
 
+The filter accepts an HTML content buffer and will return one or more buffers containing just the text from HTML
+attributes and/or tags. The content will no longer be considered HTML.
+
 When first in the chain, the HTML filter will look for the encoding of the HTML in its header and convert the buffer to
 Unicode. It will assume `utf-8` if no encoding header is found, and the user has not overridden the fallback encoding.
 

diff --git a/docs/src/markdown/filters/javascript.md b/docs/src/markdown/filters/javascript.md
@@ -2,6 +2,9 @@
 
 ## Usage
 
+The JavaScript filter is designed to find and return only content from comments and/or strings. It takes a JavaScript
+buffer and returns one or more buffers containing the content of the comments and/or strings.
+
 When first in the chain, the JavaScript filter uses no special encoding detection. It will assume `utf-8` if no encoding
 BOM is found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of
 the text.  The filter can return JSDoc comments, block comment, inline comment, string, and template literal content.

diff --git a/docs/src/markdown/filters/markdown.md b/docs/src/markdown/filters/markdown.md
@@ -6,6 +6,41 @@ The Markdown filter converts a text file's buffer using Python Markdown and retu
 containing the text as HTML. It can be included via `pyspelling.filters.markdown`. When first in the chain, the file's
 default, assumed encoding is `utf-8` unless otherwise overridden by the user.
 
+/// tip
+The Markdown filter is not always needed. While Aspell has a built-in Markdown mode, it can be somewhat limited in
+ignoring content for advanced cases, but if all you need is basic Markdown support, then you can often just use Aspell's
+Markdown mode.
+
+```yaml
+- name: markdown
+  group: docs
+  sources:
+  - README.md
+  - INSTALL.md
+  - LICENSE.md
+  - CODE_OF_CONDUCT.md
+  aspell:
+    lang: en
+    d: en_US
+    mode: markdown
+  dictionary:
+    wordlists:
+    - .spell-dict
+    output: build/dictionary/markdown.dic
+```
+
+PySpelling's Markdown filter is useful if you:
+
+-   Already use Python Markdown and it's custom extensions and need support for the custom extensions.
+-   Need to convert the content to HTML to use PySpelling's advanced HTML filter to ignore content with CSS selectors.
+
+Python Markdown is not a CommonMark parser either, so if you need such a parser, you may have find and/or write your
+own.
+///
+
+To configure the Python Markdown filter, you can include it in the pipeline and setup various Markdown extensions if
+desired.
+
 ```yaml
 matrix:
 - name: markdown

diff --git a/docs/src/markdown/filters/odf.md b/docs/src/markdown/filters/odf.md
@@ -7,6 +7,8 @@ presentations (`odp`). It also supports their flat format as well: `fodt`, `fods
 return one chunk containing all the checkable strings in the file. In the case of presentations, it will actually send
 multiple chunks, one for each slide.
 
+Under the hood, content is parsed via the XML filter.
+
 ```yaml
 - name: odf
   sources:

diff --git a/docs/src/markdown/filters/ooxml.md b/docs/src/markdown/filters/ooxml.md
@@ -7,6 +7,8 @@ documents (`docx`), spreadsheets (`xlsx`), and presentations (`pptx`). In genera
 all the checkable strings in the file. In the case of presentations, it will actually send multiple chunks, one for each
 slide. Documents may return additional chunks for headers, footers, etc.
 
+Under the hood, content is parsed via the XML filter.
+
 ```yaml
 - name: ooxml
   sources:

diff --git a/docs/src/markdown/filters/python.md b/docs/src/markdown/filters/python.md
@@ -2,6 +2,9 @@
 
 ## Usage
 
+The Python filter is designed to find and return only content from comments and/or strings. It takes a Python buffer and
+returns one or more buffers containing the content of the comments and/or strings.
+
 When first in the chain, the Python filter will look for the encoding of the file in the header, and convert to Unicode
 accordingly. It will assume `utf-8` if no encoding header is found, and the user has not overridden the fallback
 encoding.

diff --git a/docs/src/markdown/filters/stylesheets.md b/docs/src/markdown/filters/stylesheets.md
@@ -3,9 +3,11 @@
 ## Usage
 
 The Stylesheets plugin is designed to find and return comments in CSS, SCSS, and SASS (CSS does not support inline
-comments). When first in the chain, the filter uses no special encoding detection. It will assume `utf-8` if no encoding
-BOM is found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of
-the text: block or inline.
+comments). The filters takes a CSS buffer and returns one or more buffers containing the content of comments.
+
+When first in the chain, the filter uses no special encoding detection. It will assume `utf-8` if no encoding BOM is
+found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of the
+text: block or inline.
 
 You can specify `sass` or `scss` in the option `stylesheets` if you need to capture inline comments.
 

diff --git a/docs/src/markdown/filters/url.md b/docs/src/markdown/filters/url.md
@@ -2,8 +2,9 @@
 
 ## Usage
 
-This is a filter that simply strips URLs and/or email address from a file or text buffer.  It takes a file or file
-buffer and returns a single `SourceText` object containing all the text in the file without URLs or email addresses.
+This is a filter that simply strips URLs and/or email address from a file or text buffer. It takes an input buffer and
+will return the buffer with URLs and/or emails addresses removed.
+
 When first in the chain, the file's default, assumed encoding is `utf-8` unless otherwise overridden by the user.
 
 ```yaml

diff --git a/docs/src/markdown/filters/xml.md b/docs/src/markdown/filters/xml.md
@@ -3,7 +3,9 @@
 ## Usage
 
 The XML filter is designed to capture XML content, comments, and even attributes. It allows for filtering out specific
-tags, and you can even filter them out with CSS selectors (even though this is XML content :slightly_smiling:).
+tags, and you can even filter them out with CSS selectors (even though this is XML content :slightly_smiling:). The
+filters takes an XML buffer and returns one or more text buffers containing the content of XML comments, attributes,
+etc. The returned content should no longer be considered XML.
 
 When first in the chain, the XML filter will look for the encoding of the file in its header and convert the buffer to
 Unicode. It will assume `utf-8` if no encoding header is found, and the user has not overridden the fallback encoding.

diff --git a/docs/src/markdown/index.md b/docs/src/markdown/index.md
@@ -1,6 +1,4 @@
-# Basic Usage
-
-## Overview
+# Setup &amp; Overview
 
 PySpelling is a module to help with automating spell checking in a project with [Aspell][aspell] or
 [Hunspell][hunspell]. It is essentially a wrapper around the command line utility of these two spell checking tools,

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -34,7 +34,7 @@ theme:
 
 nav:
   - Usage:
-    - Basic Usage: index.md
+    - Setup &amp; Overview: index.md
     - Configuration: configuration.md
     - Spelling Pipeline: pipeline.md
     - Plugin API: api.md
@@ -63,12 +63,10 @@ markdown_extensions:
   - markdown.extensions.toc:
       slugify: !!python/object/apply:pymdownx.slugs.slugify {kwds: {case: lower}}
       permalink: ""
-  - markdown.extensions.admonition:
   - markdown.extensions.smarty:
       smart_quotes: false
   - pymdownx.betterem:
   - markdown.extensions.attr_list:
-  - markdown.extensions.def_list:
   - markdown.extensions.tables:
   - markdown.extensions.abbr:
   - markdown.extensions.footnotes:
@@ -114,9 +112,28 @@ markdown_extensions:
       - refs.md
   - pymdownx.keys:
       separator: "\uff0b"
-  - pymdownx.details:
-  - pymdownx.tabbed:
   - pymdownx.saneheaders:
+  - pymdownx.blocks.admonition:
+      types:
+      - new
+      - settings
+      - note
+      - abstract
+      - info
+      - tip
+      - success
+      - question
+      - warning
+      - failure
+      - danger
+      - bug
+      - example
+      - quote
+  - pymdownx.blocks.details:
+  - pymdownx.blocks.html:
+  - pymdownx.blocks.definition:
+  - pymdownx.blocks.tab:
+      alternate_style: True
 
 extra:
   social:
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,7 @@ Booleans @@
     CPP
     CSS
     Changelog
+    CommonMark
     Cygwin
     GitHub
     GitLab
@@ Expand Down @@