-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First refactor of sphinxHtmlToMarkdown.ts #584
Conversation
scripts/lib/sphinx/unifiedParser.ts
Outdated
export async function sphinxHtmlToMarkdownUnifiedPlugin( | ||
htmlToProcess: string, | ||
handlers?: Record<string, Handle>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know your opinion about moving the unified plugin to this file. I'm not sure it's the best option because to end up with the same result, we need all the transformations done in the sphinxHtmlToMarkdown.ts
. The only advantage I see is to reduce the amount of code we have in that script, but all our unified plugins are very different from each other to make something general (we have 6 plugins in total).
Even though I'm more on board with not moving it, I wanted to show you the change in this PR to know your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wouldn't move it until we in the same PR switch other code to use the same config.
We might still want to split out sphinxHtmlToMarkdown.ts into several smaller follows. But that would be splitting out more than only this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it back to the sphinxHtmlToMarkdown.ts
file!
function preprocessHtml( | ||
$: CheerioAPI, | ||
$main: Cheerio<any>, | ||
baseSourceUrl: string | undefined, | ||
isReleaseNotes: boolean, | ||
releaseNotesTitle: string | undefined, | ||
) { | ||
// remove html extensions in relative links | ||
$main.find("a").each((_, link) => { | ||
const $link = $(link); | ||
const href = $link.attr("href"); | ||
if (href && !href.startsWith("http")) { | ||
$link.attr("href", href.replaceAll(".html", "")); | ||
} | ||
}); | ||
|
||
// Custom heading for release notes | ||
if (isReleaseNotes && releaseNotesTitle) { | ||
$("h1").html(releaseNotesTitle); | ||
} | ||
|
||
// remove permalink links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same code as before. I just created the helper function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
const main = $(`[role='main']`); | ||
const $main = $(main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know what the difference is? Either way, it's good to keep the code the same as before like you are doing. Makes this PR easier to review and less risky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure, but I think it's the same thing, at least with all the tests I did. I removed the second one and renamed the first to $main
.
scripts/lib/sphinx/unifiedParser.ts
Outdated
export async function sphinxHtmlToMarkdownUnifiedPlugin( | ||
htmlToProcess: string, | ||
handlers?: Record<string, Handle>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wouldn't move it until we in the same PR switch other code to use the same config.
We might still want to split out sphinxHtmlToMarkdown.ts into several smaller follows. But that would be splitting out more than only this code.
@@ -581,3 +507,41 @@ function buildSpanId(id: string): MdxJsxFlowElement { | |||
children: [], | |||
}; | |||
} | |||
|
|||
// translate type headings to titles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing because the function is more generic than this. I recommend using proper docstring and something like this:
Find the element that both matches the
selector
and whose content is the same astext
.
No need to mention how this function gets used. It could be used for many purposes.
--
Also the return type should be Cheerio
. Ditto that all the functions should have a return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I kept all the original comments like that one, and it's confusing. I was planning to add the docstring to all functions and improve all the comments in a follow-up. This one, in particular, was inside the "main" function before, and it's only copied outside. Nonetheless, I added the comment you wrote, thanks 👍
if ($dl.hasClass("class")) { | ||
return "class"; | ||
} else if ($dl.hasClass("function")) { | ||
return "function"; | ||
} else if ($dl.hasClass("exception")) { | ||
return "exception"; | ||
} else if ($dl.hasClass("property")) { | ||
return "property"; | ||
} else if ($dl.hasClass("method")) { | ||
return "method"; | ||
} else if ($dl.hasClass("attribute")) { | ||
return "attribute"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TS might be mad, but you could deduplicate this with something like this:
for (const className of ["function", ...]) {
if ($d1.hasClass(className)) {
return className
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better. To avoid nesting a lot of long lists of types, I added a type definition in PythonObjectMeta.ts
that perhaps we can use in more places. This is an example of what I mean by a long list of types (without the as
in the return, the compiler complains about the return type being a string):
function getPythonApiType(
$dl: Cheerio<any>,
):
| "function"
| "class"
| "exception"
| "method"
| "property"
| "attribute"
| "module"
| undefined {
for (const className of [
"function",
"class",
"exception",
"method",
"property",
"attribute",
"module",
]) {
if ($dl.hasClass(className)) {
return className as
| "function"
| "class"
| "exception"
| "method"
| "property"
| "attribute"
| "module";
}
}
if (href?.includes(`/_modules/`)) { | ||
//_modules/qiskit_ibm_runtime/ibm_backend | ||
const match = href?.match(/_modules\/(.*?)(#|$)/); | ||
if (match) { | ||
const newHref = `${baseSourceUrl ?? ""}${match[1]}.py`; | ||
$a.attr("href", newHref); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically should be preserving source links to GitHub. Interesting..To investigate more in #519.
Co-authored-by: Eric Arellano <[email protected]>
$child.find(".viewcode-link").closest("a").remove(); | ||
const id = $dl.find("dt").attr("id") || ""; | ||
|
||
if (child.name === "dt" && $dl.hasClass("class")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only part of the refactor that is tricky to review because it mixes a few refactors:
- changing variable names like
$page
to$
- using
getPythonApiType
helper (great!) - using early returns and changing the control flow
That last one is making it hard to review. What do you think about reverting that change for now and only making the first two changes in this PR? If that's going to be hard to pull off, lmk and I can more closely review this to convince myself this is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no problem. I brought back the code we had before just adding the minimum changes possible to make it work (the new variable names and two arguments for one helper function).
The way I test these changes is by regenerating the three APIs and seeing if we introduce any differences. The refactor changed the flow of that part because there are several if conditions with the same code, so I moved the default return to be an early return at the beginning and the duplicate code to be the default at the end. Once you do that, there are redundant ifs that can be removed as well.
I think it's an interesting refactor to do, but I agree we can add it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful!
This PR is a follow-up from #584. None of the logic is changed, only the control flow. This uses early returns to make it more clear how each distinct API type is handled. It also DRYs this code: ```typescript const priorPythonApiType = meta.python_api_type; if (!priorPythonApiType) { meta.python_api_type = python_type; meta.python_api_name = id; } ``` --------- Co-authored-by: Eric Arellano <[email protected]>
Part of #223. First small refactor of the `sphinxHtmlToMarkdown.ts` script. This PR divides the `sphinxHtmlToMarkdown` function into different helper functions, reducing its length and allowing the possibility of adding tests for different parts of the code. Changes in this PR: - Renamed the `$page` variable to `$` to reduce the noise introduced by the number of calls we had and to follow the [Cheerio](https://cheerio.js.org/docs/basics/loading#load) convention. - Moved the code to find all the images into a helper function. - Grouped similar transformations to the HTML (done previous to its conversion to markdown) into a helper function called `preprocess`. - Moved the conversion from HTML to markdown process to a different function to simplify the `sphinxHtmlToMarkdown` function (divide more in the future). - Refactored the recursive search of methods, adding early returns and merging some parts. - Moved the unified plugin to the `unifiedParser.ts` file in order to abstract the transformation, which can now be called from other files. That script can group together all the unified plugins, such as the link checker, the mergeClassMembers.ts, and this one. In this PR, the code has been copied almost identically to how it was before, except for some little adjustments, like in the unified plugin where we now use an anonymous function and have the two visitors merged. More changes will be introduced in a follow-up. --------- Co-authored-by: Eric Arellano <[email protected]>
This PR is a follow-up from #584. None of the logic is changed, only the control flow. This uses early returns to make it more clear how each distinct API type is handled. It also DRYs this code: ```typescript const priorPythonApiType = meta.python_api_type; if (!priorPythonApiType) { meta.python_api_type = python_type; meta.python_api_name = id; } ``` --------- Co-authored-by: Eric Arellano <[email protected]>
Part of Qiskit#223. First small refactor of the `sphinxHtmlToMarkdown.ts` script. This PR divides the `sphinxHtmlToMarkdown` function into different helper functions, reducing its length and allowing the possibility of adding tests for different parts of the code. Changes in this PR: - Renamed the `$page` variable to `$` to reduce the noise introduced by the number of calls we had and to follow the [Cheerio](https://cheerio.js.org/docs/basics/loading#load) convention. - Moved the code to find all the images into a helper function. - Grouped similar transformations to the HTML (done previous to its conversion to markdown) into a helper function called `preprocess`. - Moved the conversion from HTML to markdown process to a different function to simplify the `sphinxHtmlToMarkdown` function (divide more in the future). - Refactored the recursive search of methods, adding early returns and merging some parts. - Moved the unified plugin to the `unifiedParser.ts` file in order to abstract the transformation, which can now be called from other files. That script can group together all the unified plugins, such as the link checker, the mergeClassMembers.ts, and this one. In this PR, the code has been copied almost identically to how it was before, except for some little adjustments, like in the unified plugin where we now use an anonymous function and have the two visitors merged. More changes will be introduced in a follow-up. --------- Co-authored-by: Eric Arellano <[email protected]>
) This PR is a follow-up from Qiskit#584. None of the logic is changed, only the control flow. This uses early returns to make it more clear how each distinct API type is handled. It also DRYs this code: ```typescript const priorPythonApiType = meta.python_api_type; if (!priorPythonApiType) { meta.python_api_type = python_type; meta.python_api_name = id; } ``` --------- Co-authored-by: Eric Arellano <[email protected]>
Part of #223.
First small refactor of the
sphinxHtmlToMarkdown.ts
script. This PR divides thesphinxHtmlToMarkdown
function into different helper functions, reducing its length and allowing the possibility of adding tests for different parts of the code.Changes in this PR:
$page
variable to$
to reduce the noise introduced by the number of calls we had and to follow the Cheerio convention.preprocess
.sphinxHtmlToMarkdown
function (divide more in the future).unifiedParser.ts
file in order to abstract the transformation, which can now be called from other files. That script can group together all the unified plugins, such as the link checker, the mergeClassMembers.ts, and this one.In this PR, the code has been copied almost identically to how it was before, except for some little adjustments, like in the unified plugin where we now use an anonymous function and have the two visitors merged.
More changes will be introduced in a follow-up.