Skip to content

Commit

Permalink
Improve documentation
Browse files Browse the repository at this point in the history
- More explicitly state what is returned by each filter.
- Make more clear that the `pipeline` does not have to be configured
  if no filters are desired.
- Show examples of using builtin spell checker modes instead of
  pipeline and mention they _could_ be used together.
  • Loading branch information
facelessuser committed Sep 2, 2023
1 parent c05e0fe commit 42a57b3
Show file tree
Hide file tree
Showing 15 changed files with 131 additions and 27 deletions.
1 change: 1 addition & 0 deletions docs/src/dictionary/en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Booleans
CPP
CSS
Changelog
CommonMark
Cygwin
GitHub
GitLab
Expand Down
41 changes: 35 additions & 6 deletions docs/src/markdown/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,34 @@ matrix:

### Pipeline

/// note
PySpelling's `pipeline` is designed to provide advanced, custom filtering above and beyond what spell checker's normally
provide, but it may often be the case that what the spell checker provides is more than sufficient. It should be noted
that `pipeline` filters are processed before sending the buffer to Aspell or Hunspell. By default, we disable any
special modes of the spell checkers.

Spellcheckers like Aspell have builtin filtering. If all you need is the builtin filters from Aspell, the `pipeline`
configuration can be omitted. For instance, to use Aspell's builtin Markdown mode, simply set the Aspell option directly
and omit the pipeline.

```yaml
- name: markdown
group: docs
sources:
- README.md
aspell:
lang: en
d: en_US
mode: markdown
dictionary:
wordlists:
- .spell-dict
output: build/dictionary/markdown.dic
```

You can also use PySpelling `pipeline` filters and enable special modes of the underlying spell checker if desired.
///

PySpelling allows you to define tasks that outline what kind of files you want to spell check, and then sends them down
a pipeline that filters the content returning chunks of text with some associated context. Each chunk is sent down each
step of the pipeline until it reaches the final step, the spell check step. Between filter steps, you can also insert
Expand Down Expand Up @@ -399,12 +427,13 @@ matrix:

### Spell Checker Options

Since PySpelling is a wrapper around both Aspell and Hunspell, there are a number of spell checker specific options. As
only a few options are present in both, it was decided to expose them via spell checker specific keywords: `aspell` and
`hunspell` for Aspell and Hunspell respectively. Here you can set options like the default dictionary and search
options. Not all options are exposed though, only relevant search options are passed directly to the spell checker.
Things like replace options (which aren't relevant in PySpelling) and encoding (which are handled internally by
PySpelling) are not accessible.
Since PySpelling is a wrapper around both Aspell and Hunspell, there are a number of spell checker specific options.
Spell checker specific options can be set under keywords: `aspell` and `hunspell` for Aspell and Hunspell respectively.
Here you can set options like the default dictionary and search options.

We will not list all available options here. In general we expose any and all options and only exclude those that we are
aware of that could be problematic. For instance, we do not have an interface for interactive suggestions, so such
options are not allowed with PySpelling.

Spell checker specific options basically translate directly to the spell checker's command line options and only
requires you to remove the leading `-`s you would normally specify on the command line. For instance, a short form
Expand Down
9 changes: 6 additions & 3 deletions docs/src/markdown/filters/context.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
## Usage

The Context filter is used to create regular expression context delimiters for filtering out content you want from
content you don't want. Depending on how the filter is configured, the opening delimiter will swap from ignoring text to
gathering text. When the closing delimiter is met, the filter will swap back from gathering text to ignoring text. If
`context_visible_first` is set to `true`, the logic will be reversed.
content you don't want. It takes a text buffer in and will return one or more text buffer with undesirable content
filtered out.

Depending on how the filter is configured, the opening delimiter will swap from ignoring text to gathering text. When
the closing delimiter is met, the filter will swap back from gathering text to ignoring text. If `context_visible_first`
is set to `true`, the logic will be reversed.

Regular expressions are compiled with the MULTILINE flag so that `^` represents the start of a line and `$` represents
the end of a line. `\A` and `\Z` would represent the start and end of the buffer.
Expand Down
9 changes: 6 additions & 3 deletions docs/src/markdown/filters/cpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@

## Usage

The CPP plugin is designed to find and return C/C++ style comments. When first in the chain, the CPP filter uses no
special encoding detection. It will assume `utf-8` if no encoding BOM is found, and the user has not overridden the
fallback encoding. Text is returned in chunks based on the context of the text: block, inline, or string (if enabled).
The CPP filter is designed to find and return C/C++ style comments and strings. It accepts a text buffer and will return
one or more text buffers containing content from comments and/or strings.

When first in the chain, the CPP filter uses no special encoding detection. It will assume `utf-8` if no encoding BOM is
found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of the
text: block, inline, or string (if enabled).

When the `strings` [option](#options) is enabled, content will be extracted from strings (not character constants).
Support is available for all the modern C++ strings shown below. CPP will also handle decoding string escapes as well,
Expand Down
5 changes: 4 additions & 1 deletion docs/src/markdown/filters/html.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@

## Usage

The HTML filter is designed to capture HTML content, comments, and even attributes. It allows for filtering out specific
The HTML filter is designed to capture HTML content, comments, and even attributes. It allows for filtering out specific
tags, and you can even filter them out with basic selectors.

The filter accepts an HTML content buffer and will return one or more buffers containing just the text from HTML
attributes and/or tags. The content will no longer be considered HTML.

When first in the chain, the HTML filter will look for the encoding of the HTML in its header and convert the buffer to
Unicode. It will assume `utf-8` if no encoding header is found, and the user has not overridden the fallback encoding.

Expand Down
3 changes: 3 additions & 0 deletions docs/src/markdown/filters/javascript.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## Usage

The JavaScript filter is designed to find and return only content from comments and/or strings. It takes a JavaScript
buffer and returns one or more buffers containing the content of the comments and/or strings.

When first in the chain, the JavaScript filter uses no special encoding detection. It will assume `utf-8` if no encoding
BOM is found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of
the text. The filter can return JSDoc comments, block comment, inline comment, string, and template literal content.
Expand Down
35 changes: 35 additions & 0 deletions docs/src/markdown/filters/markdown.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,41 @@ The Markdown filter converts a text file's buffer using Python Markdown and retu
containing the text as HTML. It can be included via `pyspelling.filters.markdown`. When first in the chain, the file's
default, assumed encoding is `utf-8` unless otherwise overridden by the user.

/// tip
The Markdown filter is not always needed. While Aspell has a built-in Markdown mode, it can be somewhat limited in
ignoring content for advanced cases, but if all you need is basic Markdown support, then you can often just use Aspell's
Markdown mode.

```yaml
- name: markdown
group: docs
sources:
- README.md
- INSTALL.md
- LICENSE.md
- CODE_OF_CONDUCT.md
aspell:
lang: en
d: en_US
mode: markdown
dictionary:
wordlists:
- .spell-dict
output: build/dictionary/markdown.dic
```
PySpelling's Markdown filter is useful if you:
- Already use Python Markdown and it's custom extensions and need support for the custom extensions.
- Need to convert the content to HTML to use PySpelling's advanced HTML filter to ignore content with CSS selectors.
Python Markdown is not a CommonMark parser either, so if you need such a parser, you may have find and/or write your
own.
///
To configure the Python Markdown filter, you can include it in the pipeline and setup various Markdown extensions if
desired.
```yaml
matrix:
- name: markdown
Expand Down
2 changes: 2 additions & 0 deletions docs/src/markdown/filters/odf.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ presentations (`odp`). It also supports their flat format as well: `fodt`, `fods
return one chunk containing all the checkable strings in the file. In the case of presentations, it will actually send
multiple chunks, one for each slide.

Under the hood, content is parsed via the XML filter.

```yaml
- name: odf
sources:
Expand Down
2 changes: 2 additions & 0 deletions docs/src/markdown/filters/ooxml.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ documents (`docx`), spreadsheets (`xlsx`), and presentations (`pptx`). In genera
all the checkable strings in the file. In the case of presentations, it will actually send multiple chunks, one for each
slide. Documents may return additional chunks for headers, footers, etc.

Under the hood, content is parsed via the XML filter.

```yaml
- name: ooxml
sources:
Expand Down
3 changes: 3 additions & 0 deletions docs/src/markdown/filters/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## Usage

The Python filter is designed to find and return only content from comments and/or strings. It takes a Python buffer and
returns one or more buffers containing the content of the comments and/or strings.

When first in the chain, the Python filter will look for the encoding of the file in the header, and convert to Unicode
accordingly. It will assume `utf-8` if no encoding header is found, and the user has not overridden the fallback
encoding.
Expand Down
8 changes: 5 additions & 3 deletions docs/src/markdown/filters/stylesheets.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
## Usage

The Stylesheets plugin is designed to find and return comments in CSS, SCSS, and SASS (CSS does not support inline
comments). When first in the chain, the filter uses no special encoding detection. It will assume `utf-8` if no encoding
BOM is found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of
the text: block or inline.
comments). The filters takes a CSS buffer and returns one or more buffers containing the content of comments.

When first in the chain, the filter uses no special encoding detection. It will assume `utf-8` if no encoding BOM is
found, and the user has not overridden the fallback encoding. Text is returned in chunks based on the context of the
text: block or inline.

You can specify `sass` or `scss` in the option `stylesheets` if you need to capture inline comments.

Expand Down
5 changes: 3 additions & 2 deletions docs/src/markdown/filters/url.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

## Usage

This is a filter that simply strips URLs and/or email address from a file or text buffer. It takes a file or file
buffer and returns a single `SourceText` object containing all the text in the file without URLs or email addresses.
This is a filter that simply strips URLs and/or email address from a file or text buffer. It takes an input buffer and
will return the buffer with URLs and/or emails addresses removed.

When first in the chain, the file's default, assumed encoding is `utf-8` unless otherwise overridden by the user.

```yaml
Expand Down
4 changes: 3 additions & 1 deletion docs/src/markdown/filters/xml.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
## Usage

The XML filter is designed to capture XML content, comments, and even attributes. It allows for filtering out specific
tags, and you can even filter them out with CSS selectors (even though this is XML content :slightly_smiling:).
tags, and you can even filter them out with CSS selectors (even though this is XML content :slightly_smiling:). The
filters takes an XML buffer and returns one or more text buffers containing the content of XML comments, attributes,
etc. The returned content should no longer be considered XML.

When first in the chain, the XML filter will look for the encoding of the file in its header and convert the buffer to
Unicode. It will assume `utf-8` if no encoding header is found, and the user has not overridden the fallback encoding.
Expand Down
4 changes: 1 addition & 3 deletions docs/src/markdown/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Basic Usage

## Overview
# Setup & Overview

PySpelling is a module to help with automating spell checking in a project with [Aspell][aspell] or
[Hunspell][hunspell]. It is essentially a wrapper around the command line utility of these two spell checking tools,
Expand Down
27 changes: 22 additions & 5 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ theme:

nav:
- Usage:
- Basic Usage: index.md
- Setup & Overview: index.md
- Configuration: configuration.md
- Spelling Pipeline: pipeline.md
- Plugin API: api.md
Expand Down Expand Up @@ -63,12 +63,10 @@ markdown_extensions:
- markdown.extensions.toc:
slugify: !!python/object/apply:pymdownx.slugs.slugify {kwds: {case: lower}}
permalink: ""
- markdown.extensions.admonition:
- markdown.extensions.smarty:
smart_quotes: false
- pymdownx.betterem:
- markdown.extensions.attr_list:
- markdown.extensions.def_list:
- markdown.extensions.tables:
- markdown.extensions.abbr:
- markdown.extensions.footnotes:
Expand Down Expand Up @@ -114,9 +112,28 @@ markdown_extensions:
- refs.md
- pymdownx.keys:
separator: "\uff0b"
- pymdownx.details:
- pymdownx.tabbed:
- pymdownx.saneheaders:
- pymdownx.blocks.admonition:
types:
- new
- settings
- note
- abstract
- info
- tip
- success
- question
- warning
- failure
- danger
- bug
- example
- quote
- pymdownx.blocks.details:
- pymdownx.blocks.html:
- pymdownx.blocks.definition:
- pymdownx.blocks.tab:
alternate_style: True

extra:
social:
Expand Down

0 comments on commit 42a57b3

Please sign in to comment.