Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronus Url Parsing #61

Closed
Deorigami opened this issue Sep 9, 2024 · 2 comments
Closed

Asynchronus Url Parsing #61

Deorigami opened this issue Sep 9, 2024 · 2 comments
Assignees

Comments

@Deorigami
Copy link

Hey there, great libs btw.
I never use both Ksoup/Jsoup so this question might be dumb.

I see that ksoup only catch the first loaded html, and it doesnt wait until the page finished loading all it's necessary html elements #cmiiw

is there any function that maybe i dont know ?
what i need is just to get a URL preview.

Thank you

@itboy87
Copy link
Collaborator

itboy87 commented Sep 9, 2024

@Deorigami Hi, you can do it like this:

       data class MetaData(
            val ogTitle: String? = null,
            val ogDescription: String? = null,
            val ogImage: String? = null,
            val ogUrl: String? = null,
            val twitterTitle: String? = null,
            val twitterDescription: String? = null,
            val twitterImage: String? = null,
            val title: String? = null,
            val description: String? = null,
            val canonical: String? = null,
            val htmlTitle: String? = null,
            val favicon: String? = null
        )

        
        val url = "https://github.com/fleeksoft/ksoup"

        val doc = Ksoup.parseGetRequest(url)
        // Read the <head> tag
        val head = doc.head()

        // Extract Open Graph metadata
        val ogTitle = head.selectFirst("meta[property=og:title]")?.attr("content")
        val ogDescription = head.selectFirst("meta[property=og:description]")?.attr("content")
        val ogImage = head.selectFirst("meta[property=og:image]")?.attr("content")
        val ogUrl = head.selectFirst("meta[property=og:url]")?.attr("content")

        // Extract Twitter metadata
        val twitterTitle = head.selectFirst("meta[name=twitter:title]")?.attr("content")
        val twitterDescription = head.selectFirst("meta[name=twitter:description]")?.attr("content")
        val twitterImage = head.selectFirst("meta[name=twitter:image]")?.attr("content")

        // Extract standard metadata
        val titleTag = head.selectFirst("meta[name=title]")?.attr("content")
        val descriptionTag = head.selectFirst("meta[name=description]")?.attr("content")

        // Extract canonical URL
        val canonicalTag = head.selectFirst("link[rel=canonical]")?.attr("href")

        // Fetch standard <title> tag
        val htmlTitle = doc.title()

        // Fetch favicon
        var faviconTag = head.selectFirst("link[rel~=icon]")?.attr("href")
        if (faviconTag != null && !faviconTag.startsWith("http")) {
            faviconTag = doc.baseUri() + faviconTag
        }

        // Create a MetaData object
        val metaData = MetaData(
            ogTitle = ogTitle,
            ogDescription = ogDescription,
            ogImage = ogImage,
            ogUrl = ogUrl,
            twitterTitle = twitterTitle,
            twitterDescription = twitterDescription,
            twitterImage = twitterImage,
            title = titleTag,
            description = descriptionTag,
            canonical = canonicalTag,
            htmlTitle = htmlTitle,
            favicon = faviconTag
        )

        // Print out all the extracted metadata
        println("MetaData Object: $metaData")

@itboy87 itboy87 closed this as completed Sep 9, 2024
@itboy87
Copy link
Collaborator

itboy87 commented Sep 23, 2024

@Deorigami in ksoup 0.1.9 i added function Ksoup.parseMetaData to parse website meta data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants