Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work In Progress: hugolib: Extract date and slug from filename #4436

Closed
wants to merge 13 commits into from

Conversation

bep
Copy link
Member

@bep bep commented Feb 21, 2018

This commit adds a new config option which, when enabled and no date is set in front matter, will make Hugo try to parse the date from the content filename.

Also, the filenames in these cases will make for very poor permalinks, so we will also use the remaining part as the page slug if that value is not set in front matter.

This should make it easier to move content from Jekyll to Hugo.

To enable, put this in your config.toml:

[frontmatter]
dateFallbacks = ["filename"]

Fixes #285
Closes #3310

@bep bep force-pushed the date-filename branch 4 times, most recently from af20b02 to 3314a40 Compare February 21, 2018 13:51
@bep
Copy link
Member Author

bep commented Feb 21, 2018

@vassudanagunta does my description above make sense for this particular case?

I'm not sure about the config, but it can maybe force a cleaner implementation. We use metadata lots of places in the code, but for the end user I think it makes more sense to talk about default values etc. for the page front matter values.

[frontmatter]
defaultDate = ["filename"]

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Feb 21, 2018

@bep the description makes sense.

It looks like you're choosing something more future-proof than a boolean setting for this which I obviously love :)

frontmatter vs metadata: It depends on which mental model you want the end users to have:

  • A page has metadata, which you can override in front matter.
  • A page has front matter, which can have default values.

I agree that whichever one you choose, it would be good to be consistent in the docs and API. Which mental model makes the docs easier to write and understand?

I think I prefer "default" to "fallback", because they mean the same thing and "default" is already used in many other places:

[frontmatter]
defaultDate = ["filename"]

or perhaps the following because you can pluralize "default" (I see why you want that):

[frontmatter]
dateDefaults = ["filename"]

btw, you can add many of the other issues/PRs I reference in #4341 to your Closes list.

Is page_frontmatter.go the beginning of a refactor?

@bep
Copy link
Member Author

bep commented Feb 21, 2018

Is page_frontmatter.go the beginning of a refactor?

Yes, or at least for all the dates for starters. The code I committed is just the first start, I will ping you when there is something to look at.

´defaultDate` is a better name, thanks.

@bep bep force-pushed the date-filename branch 5 times, most recently from bcb92ab to 7f897ed Compare February 22, 2018 09:24
@bep
Copy link
Member Author

bep commented Feb 22, 2018

@vassudanagunta if I could pick your brain:

There are currently some failing tests:

--- FAIL: TestMetadataDates (0.17s)
	page_test.go:1097: test 4, Date is: 2018-02-22 11:02:34.215129233 +0100 CET m=+0.040275985. Expected: 1969-01-10 09:17:42 +0000 UTC
	page_test.go:1097: test 4, LastMod is: 2018-02-22 11:02:34.215129233 +0100 CET m=+0.040275985. Expected: 1969-01-10 09:17:42 +0000 UTC
	page_test.go:1097: test 4, param date is: 2018-02-22 11:02:34.215129233 +0100 CET m=+0.040275985. Expected: 1969-01-10 09:17:42 +0000 UTC
	page_test.go:1097: test 8, Date is: 2018-02-22 11:02:34.259801735 +0100 CET m=+0.084947146. Expected: 1969-01-10 09:17:42 +0000 UTC
	page_test.go:1097: test 8, param date is: 2018-02-22 11:02:34.259801735 +0100 CET m=+0.084947146. Expected: 1969-01-10 09:17:42 +0000 UTC
	page_test.go:1097: test 16, PubDate is: 2018-02-22 11:02:34.338479224 +0100 CET m=+0.163622275. Expected: 0001-01-01 00:00:00 +0000 UTC
	page_test.go:1097: test 16, param publishdate is: 2018-02-22 11:02:34.338479224 +0100 CET m=+0.163622275. Expected: 0001-01-01 00:00:00 +0000 UTC
	page_test.go:1097: test 17, PubDate is: 2018-02-22 11:02:34.346865549 +0100 CET m=+0.172008349. Expected: 0001-01-01 00:00:00 +0000 UTC
	page_test.go:1097: test 17, param publishdate is: 2018-02-22 11:02:34.346865549 +0100 CET m=+0.172008349. Expected: 0001-01-01 00:00:00 +0000 UTC

These are all related to to the modTimeFallback -- and I understand why they fail, but I'm not 100% sure what the sensible behaviour would be (this also relates to the new date filename fallback; where there currently is missing some tests).

But the behaviour for is now at least algorithmic with loops and stuff, and easier to understand and change than a massive if/else/switch.

To sum up. In order, this is what happens:

  1. Date is set. The default date fallbacks (modTime, filename) works only on Date. So if date is missing from front matter and there is a fallback configured, use the fallback value for Date.
  2. Lastmod is set, first date found in front matter with one of "lastmod", "modified". These are also set in .Params, but only converted to a date if it is a date.
  3. PublishDate: same as above with "publishdate", "pubdate", "published"
  4. ExpiryDate: same as above with "expirydate", "unpublishdate"
  5. .Date is set to .PublishDate if zero, .PublishDate set to .Date if zero, and .Lastmod gets the value from .Date if zero.
  6. Param date is set if .Date is non-zero, similar with lastmod with .LastMod, publishdate with .PublishDate, expirydate with ExpiryDate.

The above means that .Params for date, lastmod, and publishdate will always get a non-zero and non-nil value.

My main question is the 1): I'm tempted to keep it as it is now, in this PR (and adjust tests) as it is pretty easy to reason about. And it fits with the semantics (at least for the filename fallback): It is the author date (not last modified or anything else).

bep added 3 commits February 22, 2018 18:14
This commit adds a new config option  which, when enabled and no date is set in front matter, will make Hugo try to parse the date from the content filename.

Also, the filenames in these cases will make for very poor permalinks, so we will also use the remaining part as the page `slug` if that value is not set in front matter.

This should make it easier to move content from Jekyll to Hugo.

To enable, put this in your `config.toml`:

```toml
[frontmatter]
defaultDate  = ["filename"]
```

Fixes gohugoio#285
Closes gohugoio#3310
Closes gohugoio#3762
Closes gohugoio#4340
@vassudanagunta
Copy link
Contributor

vassudanagunta commented Feb 22, 2018

@bep of course.

I'm not 100% sure what the sensible behaviour would be

Buddy, I'm with you.

I think the use of the file system's modification timestamp is problematic:

  • It is volatile. It gets updated for extraneous / meaningless reasons.
  • It is not guaranteed to be preserved by Git (e.g. for a fresh clone of a Git repo, the modification time stamp of all files is set to the time of the clone) or many other VCS or CMS systems.

As such, despite similarities in name, I don't think the file's modification timestamp should be treated as synonymous with the LastMod front matter value. LastMod thus has a reliable, meaningful value, either explicitly set, or synced to Git's author date, which has stable semantics. From the docs:

.Lastmod

the date the content was last modified. .Lastmod pulls from the lastmodfield in a content’s front matter.

  • If lastmod is not set, and .GitInfo feature is disabled, the front matter date field will be used.
  • If lastmod is not set, and .GitInfo feature is enabled, .GitInfo.AuthorDate will be used instead.

But in your new logic, as Tests 4 and 8 show, the file's modification timestamp takes precedence over PublishDate as the default Date. I'm not sure the timestamp should be used at all, but certainly it shouldn't beat out a more reliable and explicitly set value.

Likewise, I'm not sure PublishDate should ever be set to the file's modification timestamp (tests 16 and 17).

The following "copy" precedence:

Date <- PublishDate, LastMod, (modification timestamp only if enabled), nil/zero

PublishDate <- Date, nil/zero

LastMod <- Date, PublishDate, (modification timestamp only if enabled), nil/zero

A value of nil or zero date is useful. It can be reasoned about, since template logic can check for it: if publishdate is nil, do special logic. There is no way to express if publishdate was set to the filesystem mod timestamp, do special logic. I'd prefer "unknown" or "not set" were represented by nil rather than the zero date (0001-01-01T00:00:00Z, but that's less of a problem.

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Feb 22, 2018

@bep I recommend merging PR #4340. I think the test enhancements are germane to this effort. See 1cea4d3 for a full description of the commit. I just rebased it, and updated it as follows:

  • moved improved tests from page_metadata_test.go to page_frontmatter_test.go in keeping with your work here.
  • incorporated the changes you made to TestMetadataDates into it, except your // TODO(bep) date from modTime which I'll let you put back in depending on what you think about my previous comment.:

Also, rather than disabling test cases to get the test suite to pass, simply change the symbol in the "output" column of the test case. The point of the table is to visualize various cases of missing values and how it gets filled. You could, for example, if you agreed with my "copy" precedences above, with near zero effort quickly change the tests to match.

Also, the framework has support for filename based dates. To test how it would interact with the other dates, simply uncomment these lines and modify the output column values:

https://github.com/vassudanagunta/hugo/blob/1cea4d37e851c04bc47c329b2aa6806665275a3b/hugolib/page_frontmatter_test.go#L166

https://github.com/vassudanagunta/hugo/blob/1cea4d37e851c04bc47c329b2aa6806665275a3b/hugolib/page_frontmatter_test.go#L177

https://github.com/vassudanagunta/hugo/blob/1cea4d37e851c04bc47c329b2aa6806665275a3b/hugolib/page_frontmatter_test.go#L284-L287

@bep
Copy link
Member Author

bep commented Feb 22, 2018

@vassudanagunta the "file system's modification timestamp" is deprecated and will be removed in a couple of Hugo versions, so I'm not spending philosophical time on that.

Could you revise your comments above and ignore (forget about it, pretend it's not there) the "useModTimeAsFallback"?

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Feb 22, 2018

@bep I'm totally down to take it out. But are you sure you want to take it out altogether? See #2239.

Below is what I think, based on the semantics of these fields, whichever way you decide re useModTimeAsFallback. Note that these are independent / not transitive / not a sequence of steps. For example, PublishDate should never be set from LastMod, even if Date is.

If useModTimeAsFallback is removed:

Date <- PublishDate, LastMod, nil/zero

PublishDate <- Date, nil/zero

LastMod <- Date, PublishDate, nil/zero

If useModTimeAsFallback remains an option:

Date <- PublishDate, LastMod, (modification timestamp only if enabled), nil/zero

PublishDate <- Date, nil/zero

LastMod <- Date, PublishDate, (modification timestamp only if enabled), nil/zero

@vassudanagunta
Copy link
Contributor

@bep consider also #3977 (where you have many comments) which just fell off the map (stale bot).

@kaushalmodi
Copy link
Contributor

Once done with these changes, can someone explain the change in 69b97b2.

I find that change concerning for my use case.. I depend on the publishdate to be always non-nil i.e. If publishdate is not set, it should take date's value. Seems like, that commit now stops that from happening now?

@@ -1059,6 +1113,9 @@ func TestMetadataDates(t *testing.T) {
checkDate(t, i+1, "LastMod", p.ExpiryDate, test.expExp, fi)

// check Page Params
// TODO(bep) we need to rewrite these date tests to more unit style.
// The params checks below are currently flawed, as they don't check for the
// absense (nil) of a date.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in PR #4340. Please see my comment on why you really should just merge it.

I'm not sure what you mean by "more unit style". I don't think you mean tests of smaller "units" such as an individual functions. What I did with these tests is to cover "the complex" that governs date values (front matter, filenames, file timestamps)... the "unit" is the blackbox the is Hugo's content processing. Or perhaps the table of inputs and outputs don't seem unit style to you? But that table is the data for data-driven unit tests. Inputs and outputs stripped of all the scaffolding code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A creating a site, then do a full build, and check the result will by most people categorised as an integration test. We do that a lot in Hugo, because building all is often cheaper/easier than creating testable and smaller units. My new frontmatterHandler has no dependencies to Page or Site and can be tested as an unit.

@bep
Copy link
Member Author

bep commented Feb 23, 2018

To all of you, please stop with this on-going commentary of my code. The PR is clearly marked IN PROGRESS in two places.

@bep
Copy link
Member Author

bep commented Feb 23, 2018

OK, some hours skiing have (maybe) created some clarity in my head about all of this. I think my hesitation to adding more config has been hiding the obvious solution.

Quickly illustrated by its default (what you get when doing nothing):

[frontmatter]
date = ["date", "publishDate", "lastMod"]
lastMod = ["lastMod", "date"]
publishDate  = ["publishDate", "date"]
expiryDate = ["expiryDate"]
  • Note that some of the fields above (lastMod, publishDate) maps to several others in "front matter terms".
  • The above may looks complex, but it is trivial to implement, avoids all discussions about how the different dates behaves when not set, and most users will not have to do anything.
  • I will also make sure that any custom frontmatter config will be merged, so you can just create the date setting, if that is what you want.

And custom config may look like this:

[frontmatter]
date = ["date",  "filename", "publishDate", "lastMod"]
lastMod = ["lastMod", "modTime", "date"]

This should also solve #3977

There is also the --enableGitInfo which trumps everything about lastMod, but I think that will have to be left out of it.

Too complex? @kaushalmodi @vassudanagunta

@bep
Copy link
Member Author

bep commented Feb 23, 2018

A quick note to the above:

  • This will also keep the modTime feature (we just deprecate the old flag)
  • And with one func per setting, it should be easy to plug in new ones ... if we really need that.

@kaushalmodi
Copy link
Contributor

@bep

[frontmatter]
date = ["date", "publishDate", "lastMod"]

Too complex?

Reads like plain English to me :) 👍

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Feb 25, 2018

hiding the obvious solution

Too complex?

It's exactly what I've been recommending all along.

  1. Should either of the following be allowed?

    [frontmatter]
    date = ["filename"]
    publishDate = ["date", "publishDate"]
    

    I don't think so. If publishDate is specified in front matter, that's the page's publishDate, set in stone. I can't think of any legitimate case to allow otherwise, and if there is some esoteric use case out there, people accidentally shooting themselves in the foot will be far more common that that esoteric case. So for every page value, the same-named front-matter field should always be its implicit and non-overridable first source. This config is about allowing people to configure defaults. We should make that clear by putting the world "default" in there somewhere. (If internally the code prepends the list with same-named front matter field for a convenient, elegant implementation, that's totally cool.)

  2. lastMod and modTime sound like the same thing. Confusing, error prone, not self-documenting. How about fileModTimestamp for the latter?

  3. A gitAuthorDate option is how you would incorporate GitInfo into this model, whether now or later.

  4. This model can work for all front matter defaults. If so, we'd eventually want to support literals. To do that we should consider distinguishing literals from symbolic identifiers. I'm not sure this is necessary, but should be considered.

I've incorporated all of the above thoughts below. I've including some items that are out-of-scope for this PR because they test the flexibility of the design:

[frontmatter]
format = "yaml"

[frontmatter.defaults]
date = ["<filename>", "<publishDate>", "<lastMod>"]
lastMod = ["<gitAuthorDate>", "<date>", "<fileModTimestamp>"]
title = ["<filename>"]
copyright = ["<publishDate>", "2018"]
isCJKLanguage = true

@bep
Copy link
Member Author

bep commented Feb 26, 2018

A gitAuthorDate option is how you would incorporate GitInfo into this model,

Yes, but it isn't "front matter", so that will have to wait.

Should either of the following be allowed?

Let me say it this way: If you add a custom setting, all must be specificed, so:

date = ["filename"]

Is allowed, but that will be limiting (files without date pattern will not get a date).

By adding "magic logic" to the above, you will be back to the "if then else" spaghetti we already had, which was hard to explain and understand.

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Feb 26, 2018

Yes, but it isn't "front matter", so that will have to wait.

gitAuthorDate is the same as filename: neither are front matter but they both can act as sources for a front matter value. I included out of scope items because this is a design discussion.

By adding "magic logic"

It's not magic or hard to understand if the semantics are about customizing default values, e.g. what PublishDate will be if you do not explicitly set PublishDate it in the front matter.

Say your config.toml has:

[frontmatter.defaults]
publishDate  = ["date"]
title = ["<filename>"]
copyrightYear = ["<publishDate>"]

and your content/post/customize-front-matter-defaults.md is:

---
title: How to Customize Hugo Front Matter Default Values
date: 2017-12-27
publishDate: 2018-01-01
copyrightYear: 2017
---
content

What would you expect the post's title, publishDate and copyrightYear to be? Which behavior would be harder to understand?

@bep
Copy link
Member Author

bep commented Feb 26, 2018

What would you expect the post's title, publishDate and copyrightYear to be?

This is about four specific dates (see above). Not about copyrightYear(whatever that is).

@vassudanagunta
Copy link
Contributor

When walking, driving, skiing or designing, it is better to look ahead than at the ground immediately in front of you.

I've said all I can say. All yours.

@bep
Copy link
Member Author

bep commented Feb 26, 2018

I will postpone this until some later time, as I can see that we're on different pages.

I will let this sit for a while.

But the "predisposed plan" is to find a way to tell Hugo where to pick the values for these 4 fields:

type PageDates struct {
	Date        time.Time
	Lastmod     time.Time
	PublishDate time.Time
	ExpiryDate  time.Time
}

In my head talking about "default values for front matter params" is giving false expectations.

Given:

date = ["date", "publishDate", "lastMod"]

We could remove "date" and specify that as a hardcoded default, but that would (I think) give away flexibility and clarity for a slightly less verbose config.

I agree that it could be useful to create a mapping for other front matter parameters, too, but default values should be handled in the templates (use the .Param func).

As a slightly stupid example, having this as a default setting would be really surprising (to me):

[frontmatter]
defaultColor = ["blue"]

Which I can guarantee that someone will try if the above was the naming scheme.

@bep bep closed this Feb 26, 2018
@kaushalmodi
Copy link
Contributor

kaushalmodi commented Feb 26, 2018

The spec can be made tighter. I agree that adding this feature will give people to try illogical things and open unnecessary bug reports and discussions.

How about this spec?:

  1. The only 4 front-matter variables that can be assigned the precedence list are date, lastMod, publishDate and expiryDate.

  2. Regardless of this precedence configuration, if a front-matter variable has a value assigned, that will take the highest precedence.

    This goes in line with what @bep said above:

    We could remove "date" and specify that as a hardcoded default, but that would (I think) give away flexibility and clarity for a slightly less verbose config.

    I wouldn't call that giving up flexibility. With a tighter spec like we, there will be lesser maintenance overhead, and more flexibility than what we have right now... Why would someone set a foo date in front-matter and not intend to use it?

  3. The values in the precedence list have to be names of front-matter variables that are of type time.Time (example: 2018-02-26) or string, but with the condition that the string values can be casted to time.Time (example: "2018-02-26").

Example of a valid configuration:

[frontmatter]
date = ["publishDate", "lastMod"] # date (implied) > publishDate > lastMod
lastMod = ["date"]                # lastMod (implied) > date
publishDate  = ["date"]           # publishDate (implied) > date

@moorereason
Copy link
Contributor

@kaushalmodi,
For point 3, we should take any value that cast.ToTimeE will accept.

@kaushalmodi
Copy link
Contributor

we should take any value that cast.ToTimeE will accept.

Right, I was just giving my perspective of a non-Go coder :)

@github-actions
Copy link

github-actions bot commented Feb 6, 2022

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract date and permalink from filename
4 participants