Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name #10947

Closed
wants to merge 12 commits into from

Conversation

pedr
Copy link
Collaborator

@pedr pedr commented Aug 30, 2024

Fixes #10946

Related to OneNote Importer PR #10642

Summary

When working on OneNote PR I caught a bug where if a HTML had a link to a local resource with a very long name (>255 characters) fsDriver-node stat would throw an ENAMETOOLONG.

Initially my solution was to ignore certain link types (like mailto:), but while this can help ignore certain cases, I think th best solution is to treat this error similar to what already happens with ENOENT

I'm not sure if this change should also be implemented on others fsDrivers, so I just implemented where this seems to be a issue.

Testing

I added an automated test.

It is also possible to test manually:

  1. Create a HTML from the example HTML bellow
  2. Import as a file
  3. It should not crash
Example HTML:
<html>
<body>
    <a href="1234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678.pdf" />
</body>
</html>

@pedr pedr added bug It's a bug desktop All desktop platforms labels Aug 30, 2024
@pedr pedr requested a review from laurent22 August 30, 2024 18:23
@@ -97,6 +97,7 @@ export default class FsDriverNode extends FsDriverBase {
};
} catch (error) {
if (error.code === 'ENOENT') return null;
if (error.code === 'ENAMETOOLONG') return null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this logic should be here. The way stats work is that it either returns the file info, or null if the file doesn't exist. But in this case it exists, except it cannot be processed. So whatever we need to solve it's definitely not here that the fix should be

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR name made this more confusing, my bad.

These links are not local resources. The way that importer works is by reading the HTML for links and trying to check if they exist locally or not. When it is a will be a very long link like, or a mailto: to a support address with a template included, for example, fsDriver stat will throw the ENAMETOOLONG.

There isn't any way to file exist because ENAMETOOLONG since it means it is a name longer than the system supports.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that a link is being fed to this function, so that should not happen. The current behaviour is correct - it crashes when invalid data is passed to this function. Now the fix is to prevent that data to get there in the first place

@pedr pedr changed the title Desktop: Fixes #10946: Allow import of files with links with very long names Desktop: Fixes #10946: Allow processing links with very long names on HTML/MD importer Sep 2, 2024
@pedr pedr changed the title Desktop: Fixes #10946: Allow processing links with very long names on HTML/MD importer Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name Sep 2, 2024
@pedr
Copy link
Collaborator Author

pedr commented Sep 4, 2024

While this is still helpful to ignore these links that are not a local file, I found that this wouldn't be able to fix the problem by itself. By looking more on it I discovered an unexpected behaviour in the onenote-converter

The issue is that when we are rendering the onenote to html when the paragraph starts with an specific unicode (0x000B) character, the styles get shifted by one making a text that should be a <span> to become a <a> (by default if a style is a hyperlink it will add the text itself as href). By fixing this here 50c8697 it should address the behaviour reported on this PR/issue.

if (error && error.code === 'ERR_INVALID_URL') return true;
throw error;
}
return false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your logic now is that if the protocol is defined you return false, and otherwise you return false. Basically I don't think your check on the protocol is doing anything

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way is the URL object defined on mobile?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that if the link has a protocol it isn't a local file (https:, mailto:, tel:, onenote:, etc), so it would fail if we tried to instantiate the URL object.

I don't think it is an issue if the URL exists on mobile or not because this Importer_Md is one of the importer classes that are imported dynamically (because it isn't used by mobile anyways, we only use Importer/Exporter for Jex on mobile). Should I avoid using it anyway?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I avoid using it anyway?

I don't know, would you mind checking? Do we use URL elsewhere on mobile, how do we usually parse URLs on mobile?

The idea was that if the link has a protocol it isn't a local file (https:, mailto:, tel:, onenote:, etc), so it would fail if we tried to instantiate the URL object.

That's not what your code does though - it instantiate URL and either returns true if it fails, or false otherwise. Protocol has no relevance to your code.


public isLinkToLocalFile(path: string) {
try {
new URL(path);
Copy link
Owner

@laurent22 laurent22 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I've just checked c:\\test.txt or c:/test.txt and new URL doesn't throw an error which means they won't be considered link to files. Not to mention that file:///path/to/file.txt is both a valid URL and valid link to a local file. Actually I'm wondering what we're doing here testing for URLs when we are not looking for URLs?

Are there no other reasonable ways to know if something look like a valid local path? I feel like this is a solved problem and there must be plenty of libraries or code snippets out there that can do this. Maybe even in Joplin source code we already have this.

@pedr
Copy link
Collaborator Author

pedr commented Sep 19, 2024

  • Implement a simpler solution: exclude protocols that could cause this error, mainly mailto

@laurent22
Copy link
Owner

Closing for now to clear the PR backlog. We can reopen once it's ready

@laurent22 laurent22 closed this Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug It's a bug desktop All desktop platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importing HTML with very long links can crash because file name too long
3 participants