Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category descriptions #89

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Category descriptions #89

wants to merge 5 commits into from

Conversation

max-ostapenko
Copy link

  • Generated category descriptions and updated a sync to BQ table

@max-ostapenko max-ostapenko marked this pull request as ready for review December 20, 2024 19:03
@max-ostapenko max-ostapenko changed the title generated descriptions Category descriptions Dec 20, 2024
@rviscomi
Copy link
Member

So these are all AI-generated?

For some reason most of them try to connect the category back to web performance, which seems strange. For example:

Domain parking solutions redirect domains to a different location or page. These should be lightweight and avoid performance issues.

@max-ostapenko
Copy link
Author

max-ostapenko commented Dec 20, 2024

So these are all AI-generated?

Yes.

For some reason most of them try to connect the category back to web performance, which seems strange.

I have mentioned:

The HTTP Archive Tracks How the Web is Built.
We periodically crawl ...

to keep these aligned with the use-cases, but yeah, it leans too much on performance topic.
Let's just drop the second part.

@max-ostapenko
Copy link
Author

@pmeenan I see category descriptions are being pulled into detected_technologies, which is nice of our test run.
But can it break anything on WPT side?

@pmeenan
Copy link
Member

pmeenan commented Dec 20, 2024

Shouldn't. Not sure what the technologies page in WPT uses but new fields are usually not a problem.

@@ -4,7 +4,8 @@
3
],
"name": "CMS",
"priority": 1
"priority": 1,
"description": "Content Management Systems (CMS) are platforms used to create, manage, and modify content on a website without needing specialized technical knowledge."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor criticism, but I think it'd read better if the descriptions didn't always repeat the technology name, seeing as how it'll always be accompanied by the name of the technology in the UI. Is that easily fixable?

Left a few suggested edits with what I had in mind. They kind of read as incomplete sentences, which is ok, but maybe we should also remove the period at the end to show that it was intentional?

Suggested change
"description": "Content Management Systems (CMS) are platforms used to create, manage, and modify content on a website without needing specialized technical knowledge."
"description": "Platforms used to create, manage, and modify content on a website without needing specialized technical knowledge"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rviscomi What if we shorten it even more, to avoid repetition of terms "platforms, systems, tools" in each description.
Or is it too much?

E.g.

Suggested change
"description": "Content Management Systems (CMS) are platforms used to create, manage, and modify content on a website without needing specialized technical knowledge."
"description": "Create, manage, and modify content on a website without needing specialized technical knowledge"

src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
Copy link

WPT test run for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=250114_52_1
Detected technologies:

{
    "detected": {
        "IaaS": "Google Cloud",
        "JavaScript libraries": "web-vitals",
        "RUM": "web-vitals",
        "Performance": "Priority Hints,Google Cloud Trace",
        "Security": "HSTS",
        "Webmail": "Google Workspace",
        "Email": "Google Workspace",
        "Analytics": "Google Analytics",
        "CDN": "Cloudflare",
        "Miscellaneous": "RSS,Open Graph"
    },
    "detected_apps": {
        "Google Cloud": "",
        "web-vitals": "",
        "Priority Hints": "",
        "HSTS": "",
        "Google Workspace": "",
        "Google Cloud Trace": "",
        "Google Analytics": "",
        "Cloudflare": "",
        "RSS": "",
        "Open Graph": ""
    },
    "detected_technologies": {
        "Google Cloud": {
            "name": "Google Cloud",
            "description": "Google Cloud is a suite of cloud computing services.",
            "slug": "google-cloud",
            "categories": [
                {
                    "id": 63,
                    "slug": "iaas",
                    "groups": [
                        7
                    ],
                    "name": "IaaS",
                    "priority": 8,
                    "description": "Provides computing resources"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google Cloud.svg",
            "website": "https://cloud.google.com",
            "pricing": [],
            "cpe": "cpe:2.3:a:google:cloud_platform:*:*:*:*:*:*:*:*"
        },
        "web-vitals": {
            "name": "web-vitals",
            "description": "The web-vitals JavaScript is a tiny, modular library for measuring all the web vitals metrics on real users.",
            "slug": "web-vitals",
            "categories": [
                {
                    "id": 59,
                    "slug": "javascript-libraries",
                    "groups": [
                        9
                    ],
                    "name": "JavaScript libraries",
                    "priority": 9,
                    "description": "Provide pre-written code"
                },
                {
                    "id": 78,
                    "slug": "rum",
                    "groups": [
                        2
                    ],
                    "name": "RUM",
                    "priority": 9,
                    "description": "Tools that track performance as experienced by users"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "web-vitals.svg",
            "website": "https://github.com/GoogleChrome/web-vitals",
            "pricing": [],
            "cpe": null
        },
        "Priority Hints": {
            "name": "Priority Hints",
            "description": "Priority Hints exposes a mechanism for developers to signal a relative priority for browsers to consider when fetching resources.",
            "slug": "priority-hints",
            "categories": [
                {
                    "id": 92,
                    "slug": "performance",
                    "groups": [
                        7
                    ],
                    "name": "Performance",
                    "priority": 9,
                    "description": "Tools that measure and optimize site speed"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Priority Hints.svg",
            "website": "https://wicg.github.io/priority-hints/",
            "pricing": [],
            "cpe": null
        },
        "HSTS": {
            "name": "HSTS",
            "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
            "slug": "hsts",
            "categories": [
                {
                    "id": 16,
                    "slug": "security",
                    "groups": [
                        11
                    ],
                    "name": "Security",
                    "priority": 9,
                    "description": "Technologies that protect websites from vulnerabilities and attacks"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "default.svg",
            "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
            "pricing": [],
            "cpe": null
        },
        "Google Workspace": {
            "name": "Google Workspace",
            "description": "Google Workspace, formerly G Suite, is a collection of cloud computing, productivity and collaboration tools.",
            "slug": "google-workspace",
            "categories": [
                {
                    "id": 30,
                    "slug": "webmail",
                    "groups": [
                        4
                    ],
                    "name": "Webmail",
                    "priority": 2,
                    "description": "Systems that allow users to send and receive emails through a browser"
                },
                {
                    "id": 75,
                    "slug": "email",
                    "groups": [
                        4,
                        2
                    ],
                    "name": "Email",
                    "priority": 9,
                    "description": "Integration technologies that affect user communication"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google.svg",
            "website": "https://workspace.google.com/",
            "pricing": [],
            "cpe": null
        },
        "Google Cloud Trace": {
            "name": "Google Cloud Trace",
            "description": "Google Cloud Trace is a distributed tracing system that collects latency data from applications and displays it in the Google Cloud Console.",
            "slug": "google-cloud-trace",
            "categories": [
                {
                    "id": 92,
                    "slug": "performance",
                    "groups": [
                        7
                    ],
                    "name": "Performance",
                    "priority": 9,
                    "description": "Tools that measure and optimize site speed"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "google-cloud-trace.svg",
            "website": "https://cloud.google.com/trace",
            "pricing": [],
            "cpe": null
        },
        "Google Analytics": {
            "name": "Google Analytics",
            "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
            "slug": "google-analytics",
            "categories": [
                {
                    "id": 10,
                    "slug": "analytics",
                    "groups": [
                        8
                    ],
                    "name": "Analytics",
                    "priority": 9,
                    "description": "Tools that track user behavior and provide insights into website performance"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google Analytics.svg",
            "website": "https://google.com/analytics",
            "pricing": [],
            "cpe": null
        },
        "Cloudflare": {
            "name": "Cloudflare",
            "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
            "slug": "cloudflare",
            "categories": [
                {
                    "id": 31,
                    "slug": "cdn",
                    "groups": [
                        7
                    ],
                    "name": "CDN",
                    "priority": 9,
                    "description": "Distribute website content globally to improve load times for users"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "CloudFlare.svg",
            "website": "https://www.cloudflare.com",
            "pricing": [],
            "cpe": null
        },
        "RSS": {
            "name": "RSS",
            "description": "RSS is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format.",
            "slug": "rss",
            "categories": [
                {
                    "id": 19,
                    "slug": "miscellaneous",
                    "groups": [
                        6
                    ],
                    "name": "Miscellaneous",
                    "priority": 10,
                    "description": "Tools and technologies that don't fit into other categories"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "RSS.svg",
            "website": "https://www.rssboard.org/rss-specification",
            "pricing": [],
            "cpe": null
        },
        "Open Graph": {
            "name": "Open Graph",
            "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
            "slug": "open-graph",
            "categories": [
                {
                    "id": 19,
                    "slug": "miscellaneous",
                    "groups": [
                        6
                    ],
                    "name": "Miscellaneous",
                    "priority": 10,
                    "description": "Tools and technologies that don't fit into other categories"
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Open Graph.png",
            "website": "https://ogp.me",
            "pricing": [],
            "cpe": null
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants