Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch HTML Page Meta Data (Preview Information) #64

Closed
itboy87 opened this issue Sep 9, 2024 · 0 comments
Closed

Fetch HTML Page Meta Data (Preview Information) #64

itboy87 opened this issue Sep 9, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@itboy87
Copy link
Collaborator

itboy87 commented Sep 9, 2024

Create a helper function to simplify the process of fetching meta data from a webpage. The function should extract key information such as Open Graph, Twitter card data, and standard metadata from the page’s tag. This will help users retrieve essential meta information efficiently.

Existing solution:

data class MetaData(
    val ogTitle: String? = null,
    val ogDescription: String? = null,
    val ogImage: String? = null,
    val ogUrl: String? = null,
    val twitterTitle: String? = null,
    val twitterDescription: String? = null,
    val twitterImage: String? = null,
    val title: String? = null,
    val description: String? = null,
    val canonical: String? = null,
    val htmlTitle: String? = null,
    val favicon: String? = null
)

val url = "https://github.com/fleeksoft/ksoup"

val doc = Ksoup.parseGetRequest(url)
// Read the <head> tag
val head = doc.head()

// Extract Open Graph metadata
val ogTitle = head.selectFirst("meta[property=og:title]")?.attr("content")
val ogDescription = head.selectFirst("meta[property=og:description]")?.attr("content")
val ogImage = head.selectFirst("meta[property=og:image]")?.attr("content")
val ogUrl = head.selectFirst("meta[property=og:url]")?.attr("content")

// Extract Twitter metadata
val twitterTitle = head.selectFirst("meta[name=twitter:title]")?.attr("content")
val twitterDescription = head.selectFirst("meta[name=twitter:description]")?.attr("content")
val twitterImage = head.selectFirst("meta[name=twitter:image]")?.attr("content")

// Extract standard metadata
val titleTag = head.selectFirst("meta[name=title]")?.attr("content")
val descriptionTag = head.selectFirst("meta[name=description]")?.attr("content")

// Extract canonical URL
val canonicalTag = head.selectFirst("link[rel=canonical]")?.attr("href")

// Fetch standard <title> tag
val htmlTitle = doc.title()

// Fetch favicon
var faviconTag = head.selectFirst("link[rel~=icon]")?.attr("href")
if (faviconTag != null && !faviconTag.startsWith("http")) {
    faviconTag = doc.baseUri() + faviconTag
}

// Create a MetaData object
val metaData = MetaData(
    ogTitle = ogTitle,
    ogDescription = ogDescription,
    ogImage = ogImage,
    ogUrl = ogUrl,
    twitterTitle = twitterTitle,
    twitterDescription = twitterDescription,
    twitterImage = twitterImage,
    title = titleTag,
    description = descriptionTag,
    canonical = canonicalTag,
    htmlTitle = htmlTitle,
    favicon = faviconTag
)

// Print out all the extracted metadata
println("MetaData Object: $metaData")

This code demonstrates how to use the helper function to extract various meta tags from a webpage and package them into a MetaData data class for easy access and manipulation.

Proposed Solution:

// parsing meta data direct from URL
Ksopu.parseMetaData(SourceReader)

// Element extension functions
Element.parseMetaData()

Linked to: #63

@itboy87 itboy87 added the enhancement New feature or request label Sep 9, 2024
@itboy87 itboy87 changed the title Fetch Page Meta Data Fetch HTML Page Meta Data (Preview Data) Sep 9, 2024
@itboy87 itboy87 changed the title Fetch HTML Page Meta Data (Preview Data) Fetch HTML Page Meta Data (Preview Information) Sep 9, 2024
@itboy87 itboy87 closed this as completed Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant