Add improved docs

wooorm · Nov 11, 2022 · 4d1626d · 4d1626d
1 parent 54baf82
commit 4d1626d
Showing 1 changed file with 123 additions and 45 deletions.
diff --git a/readme.md b/readme.md
@@ -6,28 +6,71 @@
 [![Size][size-badge]][size]
 [![Chat][chat-badge]][chat]
 
-A Latin-script language parser for [**retext**][retext] producing **[nlcst][]**
-nodes.
+A natural language parser, for Latin-script languages, that produces [nlcst][].
+
+## Contents
+
+*   [What is this?](#what-is-this)
+*   [When should I use this?](#when-should-i-use-this)
+*   [Install](#install)
+*   [Use](#use)
+*   [API](#api)
+    *   [`ParseLatin()`](#parselatin)
+*   [Algorithm](#algorithm)
+*   [Types](#types)
+*   [Compatibility](#compatibility)
+*   [Related](#related)
+*   [Contribute](#contribute)
+*   [Security](#security)
+*   [License](#license)
+
+## What is this?
+
+This package exposes a parser that takes Latin-script natural language and
+produces a syntax tree.
+
+## When should I use this?
+
+If you want to handle natural language as syntax trees manually, use this.
+
+Alternatively, you can use the retext plugin [`retext-latin`][retext-latin],
+which wraps this project to also parse natural language at a higher-level
+(easier) abstraction.
 
 Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum
 penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”),
-`parse-latin` does a good job at tokenizing it.
+this project does a good job at tokenizing it.
 
-Note also that `parse-latin` does a decent job at tokenizing Latin-like scripts,
-Cyrillic (“Добро пожаловать!”), Georgian (“როგორა ხარ?”), Armenian (“Շատ հաճելի
-է”), and such.
+For English and Dutch, you can instead use [`parse-english`][parse-english] and
+[`parse-dutch`][parse-dutch].
 
-## Install
+You can somewhat use this for Latin-like scripts, such as Cyrillic
+(“Добро пожаловать!”), Georgian (“როგორა ხარ?”), Armenian (“Շատ հաճելի է”),
+and such.
 
-This package is ESM only: Node 12+ is needed to use it and it must be `import`ed
-instead of `require`d.
+## Install
 
-[npm][]:
+This package is [ESM only][esm].
+In Node.js (version 14.14+, 16.0+), install with [npm][]:
 
 ```sh
 npm install parse-latin
 ```
 
+In Deno with [`esm.sh`][esmsh]:
+
+```js
+import {ParseLatin} from 'https://esm.sh/parse-latin@5'
+```
+
+In browsers with [`esm.sh`][esmsh]:
+
+```html
+<script type="module">
+  import {ParseLatin} from 'https://esm.sh/parse-latin@5?bundle'
+</script>
+```
+
 ## Use
 
 ```js
@@ -39,7 +82,7 @@ const tree = new ParseLatin().parse('A simple sentence.')
 console.log(inspect(tree))
 ```
 
-Which, when inspecting, yields:
+Yields:
 
 ```txt
 RootNode[1] (1:1-1:19, 0-18)
@@ -58,58 +101,79 @@ RootNode[1] (1:1-1:19, 0-18)
 
 ## API
 
-This package exports the following identifiers: `ParseLatin`.
+This package exports the identifier `ParseLatin`.
 There is no default export.
 
-### `ParseLatin(value)`
-
-Exposes the functionality needed to tokenize natural Latin-script languages into
-a syntax tree.
-If `value` is passed here, it’s not needed to give it to `#parse()`.
+### `ParseLatin()`
 
-#### `ParseLatin#tokenize(value)`
+Create a new parser.
 
-Tokenize `value` (`string`) into letters and numbers (words), white space, and
-everything else (punctuation).
-The returned nodes are a flat list without paragraphs or sentences.
+#### `ParseLatin#parse(value)`
 
-###### Returns
+Turn natural language into a syntax tree.
 
-[`Array.<Node>`][nlcst] — Nodes.
+##### Parameters
 
-#### `ParseLatin#parse(value)`
+###### `value`
 
-Tokenize `value` (`string`) into an [NLCST][] tree.
-The returned node is a `RootNode` with in it paragraphs and sentences.
+Value to parse (`string`).
 
-###### Returns
+##### Returns
 
-[`Node`][nlcst] — Root node.
+[`RootNode`][root].
 
 ## Algorithm
 
-> Note: The easiest way to see **how parse-latin tokenizes and parses**, is by
-> using the [online parser demo][demo], which
-> shows the syntax tree corresponding to the typed text.
+> 👉 **Note**:
+> The easiest way to see how `parse-latin` parses, is by using the
+> [online parser demo][demo], which shows the syntax tree corresponding to
+> the typed text.
 
-`parse-latin` splits text into white space, word, and punctuation tokens.
-`parse-latin` starts out with a pretty easy definition, one that most other
-tokenizers use:
+`parse-latin` splits text into white space, punctuation, symbol, and word
+tokens:
 
-*   A “word” is one or more letter or number characters
-*   A “white space” is one or more white space characters
-*   A “punctuation” is one or more of anything else
+*   “word” is one or more unicode letters or numbers
+*   “white space” is one or more unicode white space characters
+*   “punctuation” is one or more unicode punctuation characters
+*   “symbol” is one or more of anything else
 
-Then, it manipulates and merges those tokens into a ([nlcst][]) syntax tree,
-adding sentences and paragraphs where needed.
+Then, it manipulates and merges those tokens into a syntax tree, adding
+sentences and paragraphs where needed.
 
-*   Some punctuation marks are part of the word they occur in, such as
+*   some punctuation marks are part of the word they occur in, such as
     `non-profit`, `she’s`, `G.I.`, `11:00`, `N/A`, `&c`, `nineteenth- and…`
-*   Some full-stops do not mark a sentence end, such as `1.`, `e.g.`, `id.`
-*   Although full-stops, question marks, and exclamation marks (sometimes) end a
+*   some periods do not mark a sentence end, such as `1.`, `e.g.`, `id.`
+*   although periods, question marks, and exclamation marks (sometimes) end a
     sentence, that end might not occur directly after the mark, such as `.)`,
     `."`
-*   And many more exceptions
+*   …and many more exceptions
+
+## Types
+
+This package is fully typed with [TypeScript][].
+It exports no additional types.
+
+## Compatibility
+
+This package is at least compatible with all maintained versions of Node.js.
+As of now, that is Node.js 14.14+ and 16.0+.
+It also works in Deno and modern browsers.
+
+## Related
+
+*   [`parse-english`](https://github.com/wooorm/parse-english)
+    — English (natural language) parser
+*   [`parse-dutch`](https://github.com/wooorm/parse-dutch)
+    — Dutch (natural language) parser
+
+## Contribute
+
+Yes please!
+See [How to Contribute to Open Source][contribute].
+
+## Security
+
+This package is safe.
 
 ## License
 
@@ -141,10 +205,24 @@ adding sentences and paragraphs where needed.
 
 [demo]: https://wooorm.com/parse-latin/
 
+[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
+
+[esmsh]: https://esm.sh
+
+[typescript]: https://www.typescriptlang.org
+
+[contribute]: https://opensource.guide/how-to-contribute/
+
 [license]: license
 
 [author]: https://wooorm.com
 
-[retext]: https://github.com/retextjs/retext
-
 [nlcst]: https://github.com/syntax-tree/nlcst
+
+[root]: https://github.com/syntax-tree/nlcst#root
+
+[retext-latin]: https://github.com/retextjs/retext/tree/main/packages/retext-latin
+
+[parse-english]: https://github.com/wooorm/parse-english
+
+[parse-dutch]: https://github.com/wooorm/parse-dutch