Replace rehype and remark in GDSchool #80

NathanLovato · 2024-12-19T20:32:43Z

This task tracks replacements planned for https://github.com/GDQuest/school/tree/main/build_scripts/remark

Xananax · 2024-12-23T13:56:52Z

All the scripts in this directory and sub-directories need to be ported:

I'll explain the purpose of each file below.

The two top scripts rehype-mdx-gdschool and remark-mdx-gdschool mostly only run the scripts in /lib, but also do a few things on their own.

The unified ecosytem is a bit complex. The way it works is by chaining:

a parsers that transform text into an AST that follows a common spec.
one or more transformers that modify the AST, possibly into a different AST altogether.
a compiler that turns the AST back into something else.

In our case, we use:

remark to parse markdown into an AST called mdast (for "markdown abstract syntax tree").
several remark plugins to operate on the mdast (they really should be called "mdast plugins", but that's not the naming convention).
rehype to transform mdast into hast (for "hypertext abstract syntax tree").
several rehype plugins to operate on the hast.
finally, a rehype compiled that outputs an HTML string.

parsers/transformers of note we use:

rehype-prism-plus: a plugin that parses code blocks and adds <span/> blocks with class names so we can do syntax highlighting.
remark-mdx: A plugin that parses React nodes and embedded javascript such as import statements and JSX. It augments the mdast with a lot of new node types and attributes.

So, to recap:

rehype-mdx-gdschool is a rehype plugin. It transforms the hast, e.g, operate on an AST representing HTML nodes.
remark-mdx-gdschool is a remark plugin. It uses sub remark plugins to transform the mdast, e.g, operate on an AST representing markdown nodes.

In the case of our parser, we might not need at all the distinction. In this case, the rehype plugins mostly happened because of the complexity of the mdast making some operations easier to do in the rehype phase.

Additionally, further processing happens at build time inside the Next runtime.

Compilation plugins, file by file

`rehype-mdx-gdschool`

All this plugin does is rework code blocks. Mainly, it:

it finds <pre><code> blocks and adds the classes gdquest-code-container and gdquest-code to them respectively.
if the code block is a diff code block (e.g, the class contains the string -diff), then:
- it counts the added and removed lines for diff lines
- it numbers each line type with a separate incrementor and adds a line attribute to each line (<span class="code-line line-number inserted" line="3">). This is not used at the moment, but the idea was to be able to number lines like in github diffs
- it removed the + and - characters at the beginning of lines (they get-readded with CSS in the site)

`remark-mdx-gdschool`

Uses plugins from /lib, but also does a few things of its own. Namely:

counts all headings so they can be added to the TOC automatically
extracts potential data and metadata JSX nodes from the markdown.
extracts frontMatter yaml.
extracts properties title, unlocked, description, use_when, examples from the frontmatter, with defaults for each (the default for title is to use the first heading in the markdown)
extracts optional properties image, thumbnail, cover from the frontmatter if they are present, verifies the image paths are correct, and augments the property with width, height, and alt text.
extracts the reading time (currently not used anywhere)
generates the lastModified property from the file's last modified date (currenty not used anywhere)

Finally, ensures:

metadata contains the key title
data is consistent and contains at least the important keys that were extracted.

Note: the parser currently also attempts to create "next" and "previous" links for each page, but this is not used anywhere in the site, because it depends on alphabetical order and is not accurate.

The file - headingNodeToProps transforms a heading node into an object index, depth, text, slug}

The file GeneratedMDXTypes lists the properties generated by the build, and can serve as reference.

`extractSourcesToImports`

Transforms

## Title!

![logo](imagepath.png)

to

import imagepath from 'imagepath.png'

<img src={imagepath} with={} heigh={} alt="logo"/>

in the markdown.

In short, finds images and videos and:

verifies image path
adds width, height, and alt text to the image object

`addSpecialTypeProperty`

This is a plugin made to work around a React limitation.

Motivation: I wanted editors to be able to insert nodes in any order inside other nodes, and have them grouped. For example, I wanted to be able to write:

<Practice>

<Note>A note</Note>

<Hint>Some hint</Hint>

<Description>Some text</Description>

<Hint>Another hint</Hint>

<Requirement>Some requirement</Requirement>

</Practice>

And have the resulting mdast represent:

<Practice>

<main>
  <Description>Some text</Description>

<Requirements>
  <Requirement>Some requirement</Requirement>
</Requirements>
</main>


<section className="additional">
  <Note>A note</Note>
</section>

<Hints>
    <Hint>Some hint</Hint>
    <Hint>Another hint</Hint>
  </Hints>

</Practice>

Ideally, I would've recomposed the mdast tree directly. But since it was complicated and unyieldy, I fell back on the less than ideal solution of letting React reorder the nodes at runtime.

Because React does not differentiate between children types (children are "opaque"), I add a special __TYPE property to each react component. So, <Requirement/> becomes <Requirement __TYPE="Requirement"/>.

Inside the React component, I can then loop children and be certain the __TYPE property exists and allows me to discriminate on the type of child.

Ideally, this wouldn't be ported, but rather, the logic that sorts nodes would, so it can be done at compile time.

`joinPractices` and `joinSearchables`

Both those files do a similar thing.

joinPractices finds subsequent <Practice> nodes (ignoring whitespace) and ensures:

They are contained inside a <PracticesContainer/> node
Each practice is numbered with an id practice-[uid] wher [uid] is the unique id we add to each practice
Extracts requirement amount from each practice and adds them to a dictionary of {[uid]: amount}
Adds that dictionary to the PracticesContainer node; this allows it to track if all the requirements are met for each practice

This happens even if there's only one practice.

joinSearchables does the same thing, but for <Searchable> nodes. It:

Ensures they are contained inside a <SearchablesContainer/> node
Extracts the "title" and "search" properties from each searchable and adds them to a list of {title, search, id, index} objects. This will be used in React to select the searchables when the user types in the search bar.

Further processing

After all of this, there are three more processing steps which could, presumably, be done at compile time:

At startup time, the markdown files that have been generically parsed get more granularly transformed depending on their type: lessons, products, etc
At runtime inside some components, React processes further some elements like Practices
At build time (the Next.js one, that is), the markdown files are processed to generate html.Some pages that can't be generated statically will be rendered at runtime instead.

Files involved in Startup Time:

Courses Processing

Each course gets processed individually, on a per-course basis. In practice, all courses use the same processor except node essentials (both Godot 3 and Godot 4 versions).

Such default processor can be found in use in, for example, src/app/courses/(godot4)/learn_2d_gamedev_godot_4/chapters/index.ts.

The meat is in src/utils/courses/prepareModule.ts, which mostly ensures the TOC is correct, with previous/next links, annotates the modules IDs, denotates which are free/unlocked, and so on.

The processing of node essentials is all in-situ, in src/app/courses/(godot4)/node_essentials_godot_4/chapters/index.ts

Components Processing

The components:

All use the __TYPE special property to reorder and group children.

Products Processing

fileDB: processes products and courses data, and ensures all of them are put into database-like objects which can be iterated over (with .map() for example), but also accessed by slugs. The slugs are strongly typed, such as using products.items['non-existing-slug'] will throw a typescript error.

Products are processed in the src/app/products/data.ts file.

The default processing of a course happens in src/app/courses/data.ts

Markdown -> HTML

This transformation happens either automatically (when accessing a url that has a page.mdx file) or through inserting the resulting component, for example in CoursePage.tsx.

There's nothing that makes this step necessary; we could insert an html string instead for a strictly equal result.

NathanLovato mentioned this issue Dec 19, 2024

Priorities (tracker) #76

Open

14 tasks

NathanLovato added the enhancement New feature or request label Dec 19, 2024

NathanLovato changed the title ~~Replace the rehype plugins in GDSchool~~ Replace rehype and remark in GDSchool Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace rehype and remark in GDSchool #80

Replace rehype and remark in GDSchool #80

NathanLovato commented Dec 19, 2024 •

edited

Loading

Xananax commented Dec 23, 2024 •

edited

Loading

Replace rehype and remark in GDSchool #80

Replace rehype and remark in GDSchool #80

Comments

NathanLovato commented Dec 19, 2024 • edited Loading

Xananax commented Dec 23, 2024 • edited Loading

Compilation plugins, file by file

rehype-mdx-gdschool

remark-mdx-gdschool

extractSourcesToImports

addSpecialTypeProperty

joinPractices and joinSearchables

Further processing

Files involved in Startup Time:

Courses Processing

Components Processing

Products Processing

Markdown -> HTML

NathanLovato commented Dec 19, 2024 •

edited

Loading

Xananax commented Dec 23, 2024 •

edited

Loading

`rehype-mdx-gdschool`

`remark-mdx-gdschool`

`extractSourcesToImports`

`addSpecialTypeProperty`

`joinPractices` and `joinSearchables`