Does Web crawling works for pdf links? #11

ajaytiwa · 2024-11-25T14:06:39Z

Hello Team,

Does web crawling works for pdf links as well?

I wanted to crawl webpage , which contains pdf links , I want to get the PDF content/file to saved as file but I am getting below error.

Can you please help me if webcrawl supports to fetch pdf links or pdf document as well while crawling?

ERROR 2024-11-25 19:29:01,158 [[MuleRuntime].uber.14: [webcrawling].webcrawlingFlow2.CPU_LITE @1a9b3a5f] [processor: webcrawlingFlow2/processors/0; event: cc5227c1-ab34-11ef-bdcd-ac74b1e63e98] com.mule.mulechain.crawler.internal.MulechainwebcrawlerOperations: org.jsoup.UnsupportedMimeTypeException: *Unhandled content type. Must be text/*, /xml, or /+xml. Mimetype=application/pdf, URL=http://example.pdf

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Web crawling works for pdf links? #11

Does Web crawling works for pdf links? #11

ajaytiwa commented Nov 25, 2024

Does Web crawling works for pdf links? #11

Does Web crawling works for pdf links? #11

Comments

ajaytiwa commented Nov 25, 2024