Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Collections #373

Merged
merged 51 commits into from
Dec 19, 2022
Merged

Content Collections #373

merged 51 commits into from
Dec 19, 2022

Conversation

bholmesdev
Copy link
Contributor

@bholmesdev bholmesdev commented Nov 3, 2022

  • Start Date: 2022-10-03
  • Status: Draft

Summary

Content Collections are a way to fetch Markdown and MDX frontmatter in your Astro projects in a consistent, performant, and type-safe way.

This is paired with Render Content, a solution to script and style bleed when importing globs of Markdown and MDX Content.

Both proposals are presented below, and are meant to be reviewed and accepted as a pair.

astro-content-smol.mp4

Links

  1. Content Collections Full Rendered Proposal
  2. Render Content Full Rendered Proposal

@bholmesdev bholmesdev changed the title Content schemas Content Schemas Nov 3, 2022
@natemoo-re
Copy link
Member

Overall I am super thrilled by this, great work!

I have one very nitpicky bikeshed, which is that the ~schema.ts file probably doesn't need the ~ prefix? If we do want to use a prefix, perhaps _schema.ts is better?

@JLarky
Copy link

JLarky commented Nov 4, 2022

I like the changes that you described in the video. The part that is unclear for me right now is if zod stuff is optional or not.

The ~schema.ts and .astro remind me autoimporting in Vue, so I actually wonder if you can generate schema from frontmatter automatically? At least in a "field X exists in at least one file, let's call it x?: any or like you can do with cli in rails rails g model NameOfModel column_name:datatype column_name2:datatype2 :) see

@bholmesdev
Copy link
Contributor Author

bholmesdev commented Nov 4, 2022

@JLarky, yes schema files are totally optional! You can still call fetchContent for entries in a collection, but you'll get an array of type any. You'll need a Zod schema to get type checking.

And that's a good suggestion! It was avoided since types like Date or email are hard to infer. That said, we could offer some sensible defaults for strings and numbers. I'd also love to lean into our CLI to generate schema starters / recipes for you, similar to that Ruby example you presented there.

@bholmesdev
Copy link
Contributor Author

bholmesdev commented Nov 4, 2022

@natemoo-re Fair point! the ~ was a loose preference more than anything, since it:

  1. Indicates schema is "magic"
  2. Future-proofs us if we support collections of ts or js files in the future
  3. Sorts schema definitions to the top of the directory. Always bothered me to have a folder of posts and a schema.ts in sorted alphabetically in the middle.

I understand ~ is not an Astro convention though. I may shy away from _ since it tells Astro to ignore files in src/pages. I've heard @schema.ts floated as well, which would be consistent with CMS conventions I've seen.

@louiss0
Copy link

louiss0 commented Nov 30, 2022

I'm confused about what you did with schema but could you more importantly talk about define collection? What else does it do rather than allow you to define a schema? Does it have other configuration details? Oh and good luck. I hope this RFC doesn't hurt performance at all or slow down SSG

 defineCollection({
  schema: {
    title: z.string(),
    slug: z.string(),
    // mark optional properties with `.optional()`
    image: z.string().optional(),
    tags: z.array(z.string()),
    // transform to another data type with `transform`
    // ex. convert date strings to Date objects
    publishedDate: z.string().transform((str) => new Date(str)),
  },
});
```

@bholmesdev
Copy link
Contributor Author

@louiss0 Ah yes, I should clarify: we may introduce collection config options other than schema in the future. One that's been raised is a custom slug mapper if you want to compute entry slugs yourself:

// not final code
const blog = defineCollection({
  slug: ({ id, data }) => data.customSlug ?? slugify(slug),
  schema: {...},
})

Nesting schema as its own key should keep doors open like this.

And thanks! Happy to share we're seeing perf gains from content schemas over Astro.glob if anything 👍

@bholmesdev bholmesdev changed the title Content Schemas Content Collections Dec 9, 2022
@naiyerasif
Copy link

naiyerasif commented Dec 11, 2022

Are there any plans to introduce a concept of relationships between collections? For example, a blog collection may have an array of authors which may be part of an author collection. Usually, maintaining such relationships manually is a huge pain and having some good DX around this might be helpful.

Another thing I'd really love is to have some search primitive akin to SQL. For example,

const allBlogPostsAfter2020 = await search(`
  blog.* from blog
  where publishedDate.year > 2022
  order by publishedDate asc
`);

where publishedDate.year gets resolved by a function defined in the schema (if the function does not exist on the primitive itself).

Furthermore, the search API can flatten the getCollection and getEntry into one API.

// this gives you all the blog posts
const allBlogPosts = await search(`blog.* from blog`)

// this gives you an entry
const firstBlogPost = await search(`blog.* from blog where title = "First Blog Post"`)

// this gives you the latest entry
const latestBlogPost = await search(`blog.* from blog order by publishedDate asc limit 1`)

This might also work nicely with relationships using joins.

const blogPostsByAstro = await search(`
  blog.* from blog, author 
  where blog.authorId = author.id 
  and author.name = "Astro"
`)

Lume does something similar using its search and relations plugins.

@bholmesdev
Copy link
Contributor Author

@naiyerasif Ah, I love these ideas!

  • Relationships: we've definitely built schemas with relations in mind. Since you can define a type for every field, we could certainly introduce a "reference" type to refer to other collections. This was kept out of the RFC to keep our scope well-defined, but it's a feature we're very excited to explore.
  • Relational querying: this is an interesting thought, and matches our analogy of collections to database tables. I'd point to Nuxt's Content feature for some prior art here. They decided to use MongoDB's query language to treat content as document-based. I'd be hesitant to 100% mimic SQL querying per your example since it would be difficult to offer intellisense for a generic string vs. helper functions. Still, I'd love to find an answer here that's beginner-friendly, while still powerful enough for advanced users.

@louiss0
Copy link

louiss0 commented Dec 11, 2022

@naiyerasif Ah, I love these ideas!

  • Relationships: we've definitely built schemas with relations in mind. Since you can define a type for every field, we could certainly introduce a "reference" type to refer to other collections. This was kept out of the RFC to keep our scope well-defined, but it's a feature we're very excited to explore.
  • Relational querying: this is an interesting thought, and matches our analogy of collections to database tables. I'd point to Nuxt's Content feature for some prior art here. They decided to use MongoDB's query language to treat content as document-based. I'd be hesitant to 100% mimic SQL querying per your example since it would be difficult to offer intellisense for a generic string vs. helper functions. Still, I'd love to find an answer here that's beginner-friendly, while still powerful enough for advanced users.

How far have you gotten in the last few days? Did you fix the windows problem? Is the magic layouts feature going to go away after this feature is standardized?

@naiyerasif
Copy link

Relationships: we've definitely built schemas with relations in mind. Since you can define a type for every field, we could certainly introduce a "reference" type to refer to other collections. This was kept out of the RFC to keep our scope well-defined, but it's a feature we're very excited to explore.

This can be a separate RFC if you think the current RFC may become too big.

Relational querying: this is an interesting thought, and matches our analogy of collections to database tables. I'd point to Nuxt's Content feature for some prior art here. They decided to use MongoDB's query language to treat content as document-based. I'd be hesitant to 100% mimic SQL querying per your example since it would be difficult to offer intellisense for a generic string vs. helper functions. Still, I'd love to find an answer here that's beginner-friendly, while still powerful enough for advanced users.

Any fluent query DSL (like Nuxt's Content) should be fine. I agree that having helper functions for such an API would be immensely helpful. I think this should be a part of this RFC since you're already planning for something similar with getCollection and getEntry.

@pilcrowonpaper
Copy link

pilcrowonpaper commented Dec 13, 2022

Is it possible for the collection argument for getCollection() to be a route as well? So if I have content/blog/en, I can use either of these?

const blogs = await getCollection("blog");
const englishBlogs = await getCollection("blog/en");

To not introduce complexity, schemas will still be limited to top-level (blog in this case).

@pilcrowonpaper
Copy link

pilcrowonpaper commented Dec 13, 2022

Also, why not move renderEntry() inside the entry object? Does entry have to be a POJO?

// now
const { Content } = renderEntry(entry);
// idea
const { Content } = entry.render();

@bholmesdev
Copy link
Contributor Author

@pilcrowonpaper Good questions!

  1. No admittedly. Since collections are considered one level deep, you can only query for the top-level blog collection with the collection argument. However, we do offer a filter function where you can check for /en at the front of each entry slug. More on that in the blue Tip section on the landing page example.
  2. This has been suggested, and was avoided in the initial RFC due to technical limitations. But with @matthewp's recent changes to our content renderer, this may be possible! We're very close to an experimental release so I plan to table this for future refinement. I'm glad to hear there's interest though.

@bholmesdev
Copy link
Contributor Author

Alright everyone, thank you so much for your input and excitement over these past few weeks. We plan to discuss Content Collections during the RFC call on our discord tomorrow (2pm ET), and hope to reach consensus for an experimental release!

I'll highlight 2 final tweaks that were made:

  1. We now support slug configuration from our src/content/config. This is useful for generating slugs based on frontmatter, or mapping your preferred directory structure (ex. /content/blog/2022-05-10/post.md) to URLs on your site (ex. /content/blog/post). You can use the slug argument like so:
import { defineCollection } from 'astro:content';

const blog = defineCollection({
  slug({ id, data }) {
    return data.slug ?? myCustomSlugify(id);
  },
  schema: {...}
});

export const collections = { blog };
  1. @matthewp has heroically made renderEntry more powerful and stable with some new head-hoisting internals! If that jargon has your head spinning, here's the big takeaway: MDX styles and scripts are only injected when the <Content /> component is used. This means you can safely call renderEntry for headings and injectedFrontmatter without worrying about a bloated bundle (cc @andersk). Read the updated Detailed Design for full details.

And that's it! Hope to see y'all on the call tomorrow 👋

@matthewp
Copy link
Contributor

Things to figure out before unflagging:

  • How should users colocate related data that is not the content from the schema? Currently suggesting _ folders are ignored.
  • What about relative image links? Should those be treated differently from images in side of md files outside of the content/ folder.

@louiss0
Copy link

louiss0 commented Dec 17, 2022


aliases: [content collections criticism,]
tags: []
note type: Main
created:
day: Friday December 16th 2022
time: 20:49:37

Content Collections Criticism

First off Id like to say a good job on this RFC. It is good enough for me to create a project around and good enough for me to maybe scale it. I like the fact that you have decided to put the render function on the entry instead of us having to import it. I like get collectionToPaths() function Please add it! I even like the fact that Zod was chosen for this RFC. You said that maybe in the future other formats could be supported. I hope the next one is JSON. But no tool is perfect. This is only the beginning so I have decided to talk about some big changes to consider.

RSS Feeds

Right now with content collections, I can't seem to get RSS feeds working not that it's that important. But I want to have that power the issue is Can't use RSS with content folder I feel like this issue needs to be fixed immediately. I even copy and pasted the stack trace onto the issue so that the error can be addressed quickly.

Magic Properties

In Astro there is so far two magic frontmatter properties that are available.
The first one is draft: and the second one is layout. I believe that both of them should be removed since people are expected to create their blogs by using Content Collections. Or they should at least not be able to be used in the /content folder. The reason why is that these properties are just not needed anymore. If the layout property was to be used it would lead to bad design and coupling. When you use the layout key the render function activates the mechanism responsible for rendering layouts. This RFC is better of being used to make the developer have to import the layout that the person wants to use inside of the page he or she wants to use it. Having magic layouts in the /content folder can only lead to bad behavior.

I could argue about removing the magic properties completely but then when it comes to pages the user would have to write lots of boilerplate code inside of pages. Since the /content folder exists the only things .mdx is good for is just being used as a templating language and importing components. So I think it's just better to just remove them from the content folder. Now as far as draft: is concerned. I'd argue that if someone was thinking about whether or not a page should be a draft of not it should just be put in the content folder instead. But draft is a minor thing. But if it is going to exist as a first-class citizen I'd suggest letting the developer know about it throughout the creation of each collection through types.

Extending and Defining a default Schema

There is a problem with the schema property of define collection. It only allows me to define a ZodRawShape I can't use z.object() on it at all. This means I can't use other features of zod in order to construct my schemas at all. This is not good. This means that if a developer wants to be able to reuse other schema definitions like title author draft updatedDate and even pubDate they would have to rewrite them all over again. Remember the DRY principle.

import { z, defineCollection } from 'astro:content';

// In this example the title is repeated twice 
const releases = defineCollection({
  schema: {
    title: z.string(), // here
    version: z.number(),
  },
});

const engineeringBlog = defineCollection({
  schema: {
    title: z.string(), //and  here
    tags: z.array(z.string()),
    image: z.string().optional(),
  },
});

export const collections = {
  releases: releases,
  // Don't forget 'quotes' for collection names containing dashes
  'engineering-blog': engineeringBlog,
};

I think there should be a default schema for people to use. I'd put it in the astro config but it could also exist in the /config.ts file for the content folder.

{
 contentCollections: {
	 defaultSchema: {
		 title: z.string().max(90)
		 author: z.string("Authors name").default("Authors name")
	 } 
 }
}

Users would probably then have the power to extend it by importing it from astro:content

import {defaultSchema, z } from "astro:content"

export const collections = {
	blog: defineCollection({
		schema: defaultSchema 
	})
}

I'm asking for the developer to have more access to zod's features. The schema key expects just a plain object. That makes it so that I can't just use the other functions that zod provides. If you don't intend to let developers gain full access to the API of zod through define collection. You could at least give them back some of its capabilities by using the extends: key in define collections and the omit: key so that you can omit some keys from a schema.

Schema for injected frontmatter

At the moment injected frontmatter is just not typeable at all. I think the user should have access to all the types for the frontmatter that is injected into the pages. I would like to have an injectedFrontmatterSchema: available for collections. If that is not possible I the render() function should have a generic argument that allows the user to pass in a type.

Injected Frontmatter Schema

const blog = defineCollection({
injectedFrontmatterSchema:{
 readingTime: z.string().datetime(),
 author: z.string().default("Shelton Louis"),
 
	} 
})

Example with the render function

render<T extends Record<string, unknown> >(): Promise<{  
Content: AstroComponentFactory;  
headings: MarkdownHeading[];  
injectedFrontmatter: T;  
}>

Lastly

I don't have many other concerns from here but are going to find a better way of generating types from collections. Generating an entry map for each individual collection seems good in the short run but bad in the long run. The thing is that the map can become huge and ts may not be able to tell us the answers we need and there could be some scalability issues when it comes to writing and erasing types.

I wish there was a way to specify injected front matter as default and each collection as well. That way people don't have to keep having to read the file where they put remark plugins to find which ones they injected. A key to specify injected front Matter Schemas would be nice.

@ispringle
Copy link

Would it be possible to provide a function to the collection config so that we can transform/create slugs in our own way? For example, the current slug normalizer that's creating the slug value is not dealing with unicode characters. It's pretty common that websites in languages which have characters beyond the standard ASCII alphanumerals will strip those characters out of URLs. Another example is the current setup doesn't remove whitespaces.

Seems the simplest solution is to continue to provide a simple slugifier and then allow people who want a more advanced one to provide that to the config. Of course you could just map over the collection returned by getCollection, update the slug field, and then use that new object, but this would need to be redone in ever file that uses that collection, but this is less than ideal,.

@louiss0
Copy link

louiss0 commented Dec 19, 2022

Would it be possible to provide a function to the collection config so that we can transform/create slugs in our own way? For example, the current slug normalizer that's creating the slug value is not dealing with unicode characters. It's pretty common that websites in languages which have characters beyond the standard ASCII alphanumerals will strip those characters out of URLs. Another example is the current setup doesn't remove whitespaces.

Seems the simplest solution is to continue to provide a simple slugifier and then allow people who want a more advanced one to provide that to the config. Of course you could just map over the collection returned by getCollection, update the slug field, and then use that new object, but this would need to be redone in ever file that uses that collection, but this is less than ideal,.

The solution is built in already.

import { defineCollection } from 'astro:content';

const blog = defineCollection({
slug({ id, data }) {
return data.slug ?? myCustomSlugify(id);
},
schema: {...}
});

export const collections = { blog };

@bholmesdev
Copy link
Contributor Author

@ispringle fair point! Other RFC reviewers raised slug customization as well, so we decided to ship a slug option as part of your collection config (see added section). This should address the more advanced use cases you raised.

Also curious to hear how our default slugger can be improved! We have an existing issue for handling file name spaces, but open to further ideas as well.

@bholmesdev
Copy link
Contributor Author

Things to figure out before unflagging:

  • How should users colocate related data that is not the content from the schema? Currently suggesting _ folders are ignored.
  • What about relative image links? Should those be treated differently from images in side of md files outside of the content/ folder.

Both points have been addressed in the final RFC draft. With these resolved... this RFC is officially accepted and good-to-merge 🥳

Thanks again to everyone for your time and input. You can try on the experimental release with our shiny new Content Collections docs.

We'll also be marking this RFC as closed. So if you have future ideas, we encourage you to start a new discussion. Thanks again 🙌

@bholmesdev bholmesdev merged commit 11cf879 into main Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.