Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaladoc: Warn about special characters in filenames according to the default Jekyll rules #14657

Merged
merged 1 commit into from
Apr 25, 2022

Conversation

jchyb
Copy link
Contributor

@jchyb jchyb commented Mar 10, 2022

This PR adds functionality of escaping special characters in generated filenames. These rules consist of the default Jekyll rules, which do not allow to put some chars in certain places of names of the deployed files - otherwise those file are not being generated. Since GitHub Pages uses Jekyll for deployment, an effect of these rules can be wide (even if the Jekyll itself can have those rules changed).
Those rules can be found here.

They consist of:
* _ at the beginning of the filenames
* ~ at the beginning of the filenames (docs above mention the end of the filenames, but neither the original issue nor my tests confirm)
* # at the beginning of the filenames
* . at the beginning of the filenames (doesn’t really matter here, but added for clarity and to future-proof)

Tests were adjusted to accomodate the changes.
I also tested GitHub Pages manually: without the PR, with the PR.

Instead of escaping the characters, we collect them and report them in a warning after generating the documentation, hinting about adding a .nojekyll file if using GitHub Pages.

Important: will break links (previous _docs will become docs if -Yapi-subdirectory is not used).

Fixes #14612

@jchyb jchyb requested a review from pikinier20 March 10, 2022 10:44
@jchyb jchyb force-pushed the scaladoc/jekyll-escapes branch from cce9982 to 1c4489c Compare March 16, 2022 01:32
@julienrf
Copy link
Contributor

If the problem manifests only with Jekyll, I’d be tempted to find a fix on the Jekyll side rather than the Scaladoc side.

What happens if you do .nojekyll?

Otherwise, could you please provide a list of page URLs in the Scala API that are affected by this change?

@jchyb
Copy link
Contributor Author

jchyb commented Mar 16, 2022

Yes, .nojekyll would fix it, I imagine. Same with the include option in the .config.yml file. What made me provide the fix was the idea that Scala 2 Scaladoc handles cases like this by default (as mentioned in #14612), but otherwise I have got no strong opinions about this. I realize that breaking linking compatibility is always problematic. In Scala 3 api I only found api/scala/%23::$.html that would become api/scala/$hash::$.html

@pikinier20
Copy link
Contributor

I'd say we need to merge that before 3.1.3-RC1 release because otherwise users would need to work with Jekyll to have their static sites deployed.

@julienrf
Copy link
Contributor

We need to make a change in static site to make it generate under docs instead of _docs.

IIUC, the output of Scaladoc does not contain _docs? (unless you name your package _docs, but this is the responsibility of the programmer, not of Scaladoc) So, that would not be a problem to be deployed on GitHub Pages?

@pikinier20
Copy link
Contributor

We need to make a change in static site to make it generate under docs instead of _docs.

IIUC, the output of Scaladoc does not contain _docs? (unless you name your package _docs, but this is the responsibility of the programmer, not of Scaladoc) So, that would not be a problem to be deployed on GitHub Pages?

It does contain _docs when you create static site without -Yapi-subdirectory

Copy link
Contributor

@pikinier20 pikinier20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one thing to change here. Besides that, LGTM.

Comment on lines 30 to 34
val relativised = root.toPath.relativize(path).toString()
// we remove all the '_' from _docs, _blog ... as having
// $underlinedocs in the url by default will look quite ugly
val withUnderlineRemoved = relativised.replaceFirst("^\\_", "")
Paths.get(withUnderlineRemoved)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should there just do:

root.toPath.resolve("docs").relativize(path)

@julienrf
Copy link
Contributor

It does contain _docs when you create static site without -Yapi-subdirectory

Oh, that’s an issue then. I thought _docs was only the name of the source directory, not of the output directory.

@pikinier20
Copy link
Contributor

If the problem manifests only with Jekyll, I’d be tempted to find a fix on the Jekyll side rather than the Scaladoc side.

What happens if you do .nojekyll?

Otherwise, could you please provide a list of page URLs in the Scala API that are affected by this change?

I agree that when user decides to create a page which is not accepted by Jekyll then they should take care of that in Jekyll.

In case of the _docs the problem is on our side since user can't do anything to change it.

@@ -101,7 +101,7 @@ abstract class Renderer(rootPackage: Member, val members: Map[DRI, Member], prot
val all = navigablePage +: redirectPages
// We need to check for conflicts only if we have top-level member called docs
val hasPotentialConflict =
rootPackage.members.exists(m => m.name.startsWith("_docs"))
rootPackage.members.exists(m => m.name.startsWith("docs"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t it the case only if apiSubdirectory is false? Also, would a package named blog cause a conflict too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blog won't cause a conflict because it's rendered under docs

This place is just a heuristics that informs user about potential conflict. Even so, of course, checking the apiSubdirectory flag seems sensible.

@julienrf
Copy link
Contributor

In case of the _docs the problem is on our side since user can't do anything to change it.

I agree with changing this in Scaladoc too. I am not sure about escaping some symbols because of Jekyll. I prefer if we tell the users to put a .nojekyll file.

@pikinier20
Copy link
Contributor

In case of the _docs the problem is on our side since user can't do anything to change it.

I agree with changing this in Scaladoc too. I am not sure about escaping some symbols because of Jekyll. I prefer if we tell the users to put a .nojekyll file.

We can check generated links if they contain invalid characters and produce a warning with possible solution. Should we have a -Yjekyll-friendly option to escape characters?

@julienrf
Copy link
Contributor

We can check generated links if they contain invalid characters and produce a warning with possible solution.

I like this idea! I prefer not introducing another configuration flag 😄

@jchyb
Copy link
Contributor Author

jchyb commented Mar 16, 2022

Sound good to me as well. I will redo this PR so that we:

  • replace the _docs with docs like it is already being done here
  • throw warning with all the problematic links collected there (could be good if for whatever reason users will still want to use jekyll, then they can easily configure the jekyll config file themselves manually excluding the files) with a suggestion of adding a .nojekyll file if using Github Pages
  • maybe add a general warning about jekyll in the Scala 3 scaladoc documentation

@jchyb jchyb force-pushed the scaladoc/jekyll-escapes branch from 1c4489c to 4402151 Compare March 17, 2022 09:08
Comment on lines 26 to 34
val relativizeFrom = if args.apiSubdirectory then docsPath else root.toPath
def relativize(path: Path): Path =
if args.apiSubdirectory then
docsPath.relativize(path)
else
val relativised = root.toPath.relativize(path).toString()
// we remove all the '_' from _docs to avoid unnecessary
// incompatiblilites with Jekyll
val withUnderlineRemoved = relativised.replaceFirst("^\\_", "")
Paths.get(withUnderlineRemoved)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I was not able to simplify this, @pikinier20 suggestion resulted in obtaining paths like ../_docs/A.md instead of docs/A.md like we want to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we then just be specific to _docs?

ie, relativised.replaceFirst("^\\_docs", "docs") (it’s a pity that there is no replaceFirstLiterally)

Copy link
Contributor

@pikinier20 pikinier20 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  def relativize(path: Path): Path =
    if args.apiSubdirectory then
      docsPath.relativize(path)
    else
      val relativised = docsPath.relativize(path)
      Paths.get("docs").resolve(relativised)

@jchyb jchyb changed the title Scaladoc: Escape special characters in filenames according to the default Jekyll rules Scaladoc: Warn about special characters in filenames according to the default Jekyll rules Mar 17, 2022
@jchyb jchyb force-pushed the scaladoc/jekyll-escapes branch from 4402151 to 60a0ed7 Compare March 17, 2022 11:40


// We collect and report any generated files incompatible with Jekyll
private lazy val jekyllIncompatLinks = mutable.HashSet[String]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the scope of this cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used anywhere where Paths are generated, from Tasty reading to rendering, so DocContext felt like the right place for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem that might come is that this set exists throughout the life of the entire Scaladoc process. However, it shouldn't be that big so that's probably OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think it would be too big. My concern is that if we run scaladoc twice it may still warn about the files of the first run, or something like that.
Would it be possible to emit the warnings on the fly, rather than keeping a global mutable set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. It would be possible to emit them on the fly, but it would be much less elegant in my opinion with harder to read repeated suggestions. With any empty package potentially causing multiple paths to be affected, users are going to see this warning a lot, so I would personally prefer it to be as concise as possible - also having the paths printed right next to each other will make them easier to work with in for the Jekyll config. I'm unsure about the scope now - I was under the impression that the scaladoc process always ended after generation, but I don't actually know how it's handled by sbt etc. @pikinier20 What do you think? Sorry for dragging this issue for such a long time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DocContext is re-created on each run so we dont need to worry about warnings from previous runs. Also, I don't have strong opinion about keeping links in set instead of warning on the fly. For me, it can stay as it is.

@jchyb jchyb force-pushed the scaladoc/jekyll-escapes branch from 60a0ed7 to 4d6fd7e Compare March 17, 2022 14:16
@@ -17,6 +17,8 @@ import java.io.PrintStream
import scala.io.Codec
import java.net.URL
import scala.util.Try
import scala.collection.mutable
import dotty.tools.scaladoc.util.Check._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this import used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I've made the import more explicit since it was only one method, not sure why I made it like this before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like nothing changed here.
There were problems with GitHub last week. Perhaps something didn't get pushed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, thanks for noticing, I've force pushed it just now.

@jchyb jchyb force-pushed the scaladoc/jekyll-escapes branch from 4d6fd7e to b8ffef5 Compare March 21, 2022 10:44
As Jekyll (and in extension GitHub pages) makes starting a file/folder
name with a couple characters illegal, an additional check is added for
those and problematic paths are reported.
To avoid always having "_docs" if not using -Yapi_subdirectory in url by
default in the static site, we also replace that with "docs".
Some tests concerning links were adjusted to accomodate the changes.
Example testcase for illegal jekyll chars in Scaladoc was also added.
@pikinier20 pikinier20 force-pushed the scaladoc/jekyll-escapes branch from b8ffef5 to d3ce129 Compare April 25, 2022 07:05
@pikinier20 pikinier20 enabled auto-merge April 25, 2022 07:05
@pikinier20 pikinier20 merged commit 4e41a4c into scala:main Apr 25, 2022
@Kordyjan Kordyjan added this to the 3.2.0 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scaladoc: special characters not escaped in filename generated by the Scaladoc (sbt doc)
4 participants