Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First refactor of sphinxHtmlToMarkdown.ts #584

Merged
merged 4 commits into from
Jan 5, 2024

Conversation

arnaucasau
Copy link
Collaborator

@arnaucasau arnaucasau commented Jan 4, 2024

Part of #223.

First small refactor of the sphinxHtmlToMarkdown.ts script. This PR divides the sphinxHtmlToMarkdown function into different helper functions, reducing its length and allowing the possibility of adding tests for different parts of the code.

Changes in this PR:

  • Renamed the $page variable to $ to reduce the noise introduced by the number of calls we had and to follow the Cheerio convention.
  • Moved the code to find all the images into a helper function.
  • Grouped similar transformations to the HTML (done previous to its conversion to markdown) into a helper function called preprocess.
  • Moved the conversion from HTML to markdown process to a different function to simplify the sphinxHtmlToMarkdown function (divide more in the future).
  • Refactored the recursive search of methods, adding early returns and merging some parts.
  • Moved the unified plugin to the unifiedParser.ts file in order to abstract the transformation, which can now be called from other files. That script can group together all the unified plugins, such as the link checker, the mergeClassMembers.ts, and this one.

In this PR, the code has been copied almost identically to how it was before, except for some little adjustments, like in the unified plugin where we now use an anonymous function and have the two visitors merged.

More changes will be introduced in a follow-up.

Comment on lines 31 to 33
export async function sphinxHtmlToMarkdownUnifiedPlugin(
htmlToProcess: string,
handlers?: Record<string, Handle>,
Copy link
Collaborator Author

@arnaucasau arnaucasau Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know your opinion about moving the unified plugin to this file. I'm not sure it's the best option because to end up with the same result, we need all the transformations done in the sphinxHtmlToMarkdown.ts. The only advantage I see is to reduce the amount of code we have in that script, but all our unified plugins are very different from each other to make something general (we have 6 plugins in total).

Even though I'm more on board with not moving it, I wanted to show you the change in this PR to know your opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wouldn't move it until we in the same PR switch other code to use the same config.

We might still want to split out sphinxHtmlToMarkdown.ts into several smaller follows. But that would be splitting out more than only this code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it back to the sphinxHtmlToMarkdown.ts file!

Comment on lines 260 to 281
function preprocessHtml(
$: CheerioAPI,
$main: Cheerio<any>,
baseSourceUrl: string | undefined,
isReleaseNotes: boolean,
releaseNotesTitle: string | undefined,
) {
// remove html extensions in relative links
$main.find("a").each((_, link) => {
const $link = $(link);
const href = $link.attr("href");
if (href && !href.startsWith("http")) {
$link.attr("href", href.replaceAll(".html", ""));
}
});

// Custom heading for release notes
if (isReleaseNotes && releaseNotesTitle) {
$("h1").html(releaseNotesTitle);
}

// remove permalink links
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code as before. I just created the helper function here.

@arnaucasau arnaucasau marked this pull request as draft January 4, 2024 21:58
@arnaucasau arnaucasau marked this pull request as ready for review January 5, 2024 14:13
Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comment on lines 43 to 44
const main = $(`[role='main']`);
const $main = $(main);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know what the difference is? Either way, it's good to keep the code the same as before like you are doing. Makes this PR easier to review and less risky.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure, but I think it's the same thing, at least with all the tests I did. I removed the second one and renamed the first to $main.

Comment on lines 31 to 33
export async function sphinxHtmlToMarkdownUnifiedPlugin(
htmlToProcess: string,
handlers?: Record<string, Handle>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wouldn't move it until we in the same PR switch other code to use the same config.

We might still want to split out sphinxHtmlToMarkdown.ts into several smaller follows. But that would be splitting out more than only this code.

@@ -581,3 +507,41 @@ function buildSpanId(id: string): MdxJsxFlowElement {
children: [],
};
}

// translate type headings to titles
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing because the function is more generic than this. I recommend using proper docstring and something like this:

Find the element that both matches the selector and whose content is the same as text.

No need to mention how this function gets used. It could be used for many purposes.

--

Also the return type should be Cheerio. Ditto that all the functions should have a return type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I kept all the original comments like that one, and it's confusing. I was planning to add the docstring to all functions and improve all the comments in a follow-up. This one, in particular, was inside the "main" function before, and it's only copied outside. Nonetheless, I added the comment you wrote, thanks 👍

Comment on lines 532 to 544
if ($dl.hasClass("class")) {
return "class";
} else if ($dl.hasClass("function")) {
return "function";
} else if ($dl.hasClass("exception")) {
return "exception";
} else if ($dl.hasClass("property")) {
return "property";
} else if ($dl.hasClass("method")) {
return "method";
} else if ($dl.hasClass("attribute")) {
return "attribute";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS might be mad, but you could deduplicate this with something like this:

for (const className of ["function", ...]) {
  if ($d1.hasClass(className)) {
      return className
  }
}

Copy link
Collaborator Author

@arnaucasau arnaucasau Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. To avoid nesting a lot of long lists of types, I added a type definition in PythonObjectMeta.ts that perhaps we can use in more places. This is an example of what I mean by a long list of types (without the as in the return, the compiler complains about the return type being a string):

function getPythonApiType(
  $dl: Cheerio<any>,
):
  | "function"
  | "class"
  | "exception"
  | "method"
  | "property"
  | "attribute"
  | "module"
  | undefined {
  for (const className of [
    "function",
    "class",
    "exception",
    "method",
    "property",
    "attribute",
    "module",
  ]) {
    if ($dl.hasClass(className)) {
      return className as
        | "function"
        | "class"
        | "exception"
        | "method"
        | "property"
        | "attribute"
        | "module";
    }
  }

scripts/lib/sphinx/sphinxHtmlToMarkdown.ts Outdated Show resolved Hide resolved
Comment on lines +299 to +306
if (href?.includes(`/_modules/`)) {
//_modules/qiskit_ibm_runtime/ibm_backend
const match = href?.match(/_modules\/(.*?)(#|$)/);
if (match) {
const newHref = `${baseSourceUrl ?? ""}${match[1]}.py`;
$a.attr("href", newHref);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technically should be preserving source links to GitHub. Interesting..To investigate more in #519.

Co-authored-by: Eric Arellano <[email protected]>
scripts/lib/sphinx/PythonObjectMeta.ts Outdated Show resolved Hide resolved
$child.find(".viewcode-link").closest("a").remove();
const id = $dl.find("dt").attr("id") || "";

if (child.name === "dt" && $dl.hasClass("class")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only part of the refactor that is tricky to review because it mixes a few refactors:

  • changing variable names like $page to $
  • using getPythonApiType helper (great!)
  • using early returns and changing the control flow

That last one is making it hard to review. What do you think about reverting that change for now and only making the first two changes in this PR? If that's going to be hard to pull off, lmk and I can more closely review this to convince myself this is safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no problem. I brought back the code we had before just adding the minimum changes possible to make it work (the new variable names and two arguments for one helper function).

The way I test these changes is by regenerating the three APIs and seeing if we introduce any differences. The refactor changed the flow of that part because there are several if conditions with the same code, so I moved the default return to be an early return at the beginning and the duplicate code to be the default at the end. Once you do that, there are redundant ifs that can be removed as well.

I think it's an interesting refactor to do, but I agree we can add it in a follow-up.

Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful!

@arnaucasau arnaucasau added this pull request to the merge queue Jan 5, 2024
Merged via the queue into Qiskit:main with commit 601016b Jan 5, 2024
2 checks passed
@arnaucasau arnaucasau deleted the refactor-HtmlToMarkdown branch January 5, 2024 18:03
github-merge-queue bot pushed a commit that referenced this pull request Jan 8, 2024
This PR is a follow-up from #584.

None of the logic is changed, only the control flow. This uses early
returns to make it more clear how each distinct API type is handled. It
also DRYs this code:

```typescript
        const priorPythonApiType = meta.python_api_type;
        if (!priorPythonApiType) {
          meta.python_api_type = python_type;
          meta.python_api_name = id;
        }
```

---------

Co-authored-by: Eric Arellano <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jan 9, 2024
Part of #223.

First small refactor of the `sphinxHtmlToMarkdown.ts` script. This PR
divides the `sphinxHtmlToMarkdown` function into different helper
functions, reducing its length and allowing the possibility of adding
tests for different parts of the code.

Changes in this PR:

- Renamed the `$page` variable to `$` to reduce the noise introduced by
the number of calls we had and to follow the
[Cheerio](https://cheerio.js.org/docs/basics/loading#load) convention.
- Moved the code to find all the images into a helper function.
- Grouped similar transformations to the HTML (done previous to its
conversion to markdown) into a helper function called `preprocess`.
- Moved the conversion from HTML to markdown process to a different
function to simplify the `sphinxHtmlToMarkdown` function (divide more in
the future).
- Refactored the recursive search of methods, adding early returns and
merging some parts.
- Moved the unified plugin to the `unifiedParser.ts` file in order to
abstract the transformation, which can now be called from other files.
That script can group together all the unified plugins, such as the link
checker, the mergeClassMembers.ts, and this one.

In this PR, the code has been copied almost identically to how it was
before, except for some little adjustments, like in the unified plugin
where we now use an anonymous function and have the two visitors merged.

More changes will be introduced in a follow-up.

---------

Co-authored-by: Eric Arellano <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jan 9, 2024
This PR is a follow-up from #584.

None of the logic is changed, only the control flow. This uses early
returns to make it more clear how each distinct API type is handled. It
also DRYs this code:

```typescript
        const priorPythonApiType = meta.python_api_type;
        if (!priorPythonApiType) {
          meta.python_api_type = python_type;
          meta.python_api_name = id;
        }
```

---------

Co-authored-by: Eric Arellano <[email protected]>
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
Part of Qiskit#223.

First small refactor of the `sphinxHtmlToMarkdown.ts` script. This PR
divides the `sphinxHtmlToMarkdown` function into different helper
functions, reducing its length and allowing the possibility of adding
tests for different parts of the code.

Changes in this PR:

- Renamed the `$page` variable to `$` to reduce the noise introduced by
the number of calls we had and to follow the
[Cheerio](https://cheerio.js.org/docs/basics/loading#load) convention.
- Moved the code to find all the images into a helper function.
- Grouped similar transformations to the HTML (done previous to its
conversion to markdown) into a helper function called `preprocess`.
- Moved the conversion from HTML to markdown process to a different
function to simplify the `sphinxHtmlToMarkdown` function (divide more in
the future).
- Refactored the recursive search of methods, adding early returns and
merging some parts.
- Moved the unified plugin to the `unifiedParser.ts` file in order to
abstract the transformation, which can now be called from other files.
That script can group together all the unified plugins, such as the link
checker, the mergeClassMembers.ts, and this one.

In this PR, the code has been copied almost identically to how it was
before, except for some little adjustments, like in the unified plugin
where we now use an anonymous function and have the two visitors merged.

More changes will be introduced in a follow-up.

---------

Co-authored-by: Eric Arellano <[email protected]>
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
)

This PR is a follow-up from Qiskit#584.

None of the logic is changed, only the control flow. This uses early
returns to make it more clear how each distinct API type is handled. It
also DRYs this code:

```typescript
        const priorPythonApiType = meta.python_api_type;
        if (!priorPythonApiType) {
          meta.python_api_type = python_type;
          meta.python_api_name = id;
        }
```

---------

Co-authored-by: Eric Arellano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants