Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name #10947

pedr · 2024-08-30T18:23:32Z

Related to OneNote Importer PR #10642

Summary

When working on OneNote PR I caught a bug where if a HTML had a link to a local resource with a very long name (>255 characters) fsDriver-node stat would throw an ENAMETOOLONG.

Initially my solution was to ignore certain link types (like mailto:), but while this can help ignore certain cases, I think th best solution is to treat this error similar to what already happens with ENOENT

I'm not sure if this change should also be implemented on others fsDrivers, so I just implemented where this seems to be a issue.

Testing

I added an automated test.

It is also possible to test manually:

Create a HTML from the example HTML bellow
Import as a file
It should not crash

Example HTML:

<html>
<body>
    <a href="1234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678.pdf" />
</body>
</html>

laurent22 · 2024-08-31T14:44:00Z

packages/lib/fs-driver-node.ts

@@ -97,6 +97,7 @@ export default class FsDriverNode extends FsDriverBase {
 			};
 		} catch (error) {
 			if (error.code === 'ENOENT') return null;
+			if (error.code === 'ENAMETOOLONG') return null;


I don't think this logic should be here. The way stats work is that it either returns the file info, or null if the file doesn't exist. But in this case it exists, except it cannot be processed. So whatever we need to solve it's definitely not here that the fix should be

The PR name made this more confusing, my bad.

These links are not local resources. The way that importer works is by reading the HTML for links and trying to check if they exist locally or not. When it is a will be a very long link like, or a mailto: to a support address with a template included, for example, fsDriver stat will throw the ENAMETOOLONG.

There isn't any way to file exist because ENAMETOOLONG since it means it is a name longer than the system supports.

The problem is that a link is being fed to this function, so that should not happen. The current behaviour is correct - it crashes when invalid data is passed to this function. Now the fix is to prevent that data to get there in the first place

pedr · 2024-09-04T19:15:31Z

While this is still helpful to ignore these links that are not a local file, I found that this wouldn't be able to fix the problem by itself. By looking more on it I discovered an unexpected behaviour in the onenote-converter

The issue is that when we are rendering the onenote to html when the paragraph starts with an specific unicode (0x000B) character, the styles get shifted by one making a text that should be a <span> to become a <a> (by default if a style is a hyperlink it will add the text itself as href). By fixing this here 50c8697 it should address the behaviour reported on this PR/issue.

laurent22 · 2024-09-13T17:00:37Z

packages/lib/services/interop/InteropService_Importer_Md.ts

+			if (error && error.code === 'ERR_INVALID_URL') return true;
+			throw error;
+		}
+		return false;


I think your logic now is that if the protocol is defined you return false, and otherwise you return false. Basically I don't think your check on the protocol is doing anything

By the way is the URL object defined on mobile?

The idea was that if the link has a protocol it isn't a local file (https:, mailto:, tel:, onenote:, etc), so it would fail if we tried to instantiate the URL object.

I don't think it is an issue if the URL exists on mobile or not because this Importer_Md is one of the importer classes that are imported dynamically (because it isn't used by mobile anyways, we only use Importer/Exporter for Jex on mobile). Should I avoid using it anyway?

Should I avoid using it anyway?

I don't know, would you mind checking? Do we use URL elsewhere on mobile, how do we usually parse URLs on mobile?

The idea was that if the link has a protocol it isn't a local file (https:, mailto:, tel:, onenote:, etc), so it would fail if we tried to instantiate the URL object.

That's not what your code does though - it instantiate URL and either returns true if it fails, or false otherwise. Protocol has no relevance to your code.

…rce-from-uris' into skip-creating-resource-from-uris

laurent22 · 2024-09-17T18:42:32Z

packages/lib/services/interop/InteropService_Importer_Md.ts

+
+	public isLinkToLocalFile(path: string) {
+		try {
+			new URL(path);


By the way I've just checked c:\\test.txt or c:/test.txt and new URL doesn't throw an error which means they won't be considered link to files. Not to mention that file:///path/to/file.txt is both a valid URL and valid link to a local file. Actually I'm wondering what we're doing here testing for URLs when we are not looking for URLs?

Are there no other reasonable ways to know if something look like a valid local path? I feel like this is a solved problem and there must be plenty of libraries or code snippets out there that can do this. Maybe even in Joplin source code we already have this.

pedr · 2024-09-19T14:36:07Z

Implement a simpler solution: exclude protocols that could cause this error, mainly mailto

laurent22 · 2024-10-26T20:25:02Z

Closing for now to clear the PR backlog. We can reopen once it's ready

pedr added 4 commits August 30, 2024 10:57

wip adding new checks to md import

b1db308

changing checks in favour of handling error

deceb77

handle error on node fsDriver stat

7d25c00

reverting changes

f82c6d3

pedr added bug It's a bug desktop All desktop platforms labels Aug 30, 2024

pedr requested a review from laurent22 August 30, 2024 18:23

laurent22 reviewed Aug 31, 2024

View reviewed changes

pedr changed the title ~~Desktop: Fixes #10946: Allow import of files with links with very long names~~ Desktop: Fixes #10946: Allow processing links with very long names on HTML/MD importer Sep 2, 2024

pedr changed the title ~~Desktop: Fixes #10946: Allow processing links with very long names on HTML/MD importer~~ Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name Sep 2, 2024

pedr and others added 4 commits September 2, 2024 15:07

removing error handling from stat

f7cf69f

adding a new way to check if it is to a localfile

0223773

simplifying importLocalFile and removing unused function

07e7ff7

Merge branch 'dev' into skip-creating-resource-from-uris

fbc7aea

laurent22 and others added 2 commits September 10, 2024 22:28

Merge branch 'dev' into skip-creating-resource-from-uris

3a8f417

Merge branch 'dev' into skip-creating-resource-from-uris

706bdc0

laurent22 reviewed Sep 13, 2024

View reviewed changes

pedr added 2 commits September 17, 2024 15:09

remove repeated logic

bf7e03b

Merge remote-tracking branch 'refs/remotes/origin/skip-creating-resou…

f49b092

…rce-from-uris' into skip-creating-resource-from-uris

laurent22 reviewed Sep 17, 2024

View reviewed changes

laurent22 closed this Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name #10947

Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name #10947

pedr commented Aug 30, 2024 •

edited

Loading

laurent22 Aug 31, 2024

pedr Sep 2, 2024

laurent22 Sep 2, 2024

pedr commented Sep 4, 2024

laurent22 Sep 13, 2024

laurent22 Sep 13, 2024

pedr Sep 13, 2024

laurent22 Sep 14, 2024

laurent22 Sep 17, 2024 •

edited

Loading

pedr commented Sep 19, 2024

laurent22 commented Oct 26, 2024

Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name #10947

Desktop: Fixes #10946: Stop crashing HTML/MD importer when content has link with very long name #10947

Conversation

pedr commented Aug 30, 2024 • edited Loading

Summary

Testing

laurent22 Aug 31, 2024

Choose a reason for hiding this comment

pedr Sep 2, 2024

Choose a reason for hiding this comment

laurent22 Sep 2, 2024

Choose a reason for hiding this comment

pedr commented Sep 4, 2024

laurent22 Sep 13, 2024

Choose a reason for hiding this comment

laurent22 Sep 13, 2024

Choose a reason for hiding this comment

pedr Sep 13, 2024

Choose a reason for hiding this comment

laurent22 Sep 14, 2024

Choose a reason for hiding this comment

laurent22 Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

pedr commented Sep 19, 2024

laurent22 commented Oct 26, 2024

pedr commented Aug 30, 2024 •

edited

Loading

laurent22 Sep 17, 2024 •

edited

Loading