Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse error: https://ubuntu.com/blog/feed #13401

Open
nobuto-m opened this issue Dec 13, 2023 · 2 comments
Open

parse error: https://ubuntu.com/blog/feed #13401

nobuto-m opened this issue Dec 13, 2023 · 2 comments

Comments

@nobuto-m
Copy link
Contributor

Summary

https://ubuntu.com/blog/feed fails to be parsed from time to time and it can be confirmed by some public validators too.

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fubuntu.com%2Fblog%2Ffeed

This feed does not validate.

'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte (maybe a high-bit character?) [help]


line 1, column 0: XML parsing error: <unknown>:1:0: not well-formed (invalid token) [help]

    ??�???ʝD?1?=o?"?æh?ϛ???^u?�

Source: https://ubuntu.com/blog/feed

Process

Read the feed from a software.

Current and expected result

Current: The feed cannot be parsed by a software
Expected: no parse error and the content of the feed is visible in a software

Screenshot

image

Browser details

NodeOperationError: Non-whitespace before first tag.
Line: 0
Column: 1
Char: �
    at Object.execute (/usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/nodes/RssFeedRead/RssFeedRead.node.ts:75:11)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at Workflow.runNode (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/Workflow.ts:1284:8)
    at /usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/WorkflowExecute.ts:1018:29

image

@nobuto-m
Copy link
Contributor Author

nobuto-m commented Feb 7, 2024

Ah, it looks like it's intermittently reproducible.

$ for _ in {1..10}; do wget https://ubuntu.com/blog/feed; done
--2024-02-07 11:13:05--  https://ubuntu.com/blog/feed
Resolving ubuntu.com (ubuntu.com)... 2620:2d:4000:1::26, 2620:2d:4000:1::28, 2620:2d:4000:1::27, ...
Connecting to ubuntu.com (ubuntu.com)|2620:2d:4000:1::26|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 116522 (114K) [application/rss+xml]
Saving to: ‘feed’

2024-02-07 11:13:07 (174 KB/s) - ‘feed’ saved [116522/116522]

--2024-02-07 11:13:07--  https://ubuntu.com/blog/feed
Resolving ubuntu.com (ubuntu.com)... 2620:2d:4000:1::28, 2620:2d:4000:1::26, 2620:2d:4000:1::27, ...
Connecting to ubuntu.com (ubuntu.com)|2620:2d:4000:1::28|:443... connected.
HTTP request sent, awaiting response... 200 
Length: unspecified [application/rss+xml]
Saving to: ‘feed.1’

2024-02-07 11:13:09 (137 KB/s) - ‘feed.1’ saved [34371]

...
$ file feed*
feed:   XML 1.0 document, Unicode text, UTF-8 text, with very long lines (1643)
feed.1: data
feed.2: data
feed.3: data
feed.4: data
feed.5: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (1643)
feed.6: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (1643)
feed.7: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (1643)
feed.8: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (1643)
feed.9: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (1643)

$ du -h feed*
116K	feed
36K	feed.1
36K	feed.2
36K	feed.3
36K	feed.4
116K	feed.5
116K	feed.6
116K	feed.7
116K	feed.8
116K	feed.9


$ xmllint --noout feed; echo $?
0

$ xmllint --noout feed.1; echo $?
feed.1:1: parser error : Start tag expected, '<' not found
�w
^
1

feed, feed.{5..9} are good, but feed.{1..4} are bad data.

feed_good.xml.gz
feed_bad.xml.gz

@nobuto-m
Copy link
Contributor Author

nobuto-m commented Feb 7, 2024

Filed this too internally since I'm not sure if it's on a content generation side or environment/infra issue.
https://portal.admin.canonical.com/C161991/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant