External use of Internet.nl #1255

bwbroersma · 2024-01-29T14:58:46Z

Make a default content snippet to ask by external re-using tools
- Translated versions of Internet.nl (see https://internet.nl/copyright/)
  - Brazil
  - Denmark
- Basisbeveiliging
- Digital Insights Platform (https://www.digitalinsightsplatform.nl/blog/gerben-klein-baltink-over-internet.nl)
- ...
Create overview of external tools on the website

simonbesters · 2024-02-05T14:47:17Z

Hi, product owner of Digital Insights Platform here. We are happy to add (more) content about internet.nl within our product. As for now it is:

To determine the security score, we employ the standards and benchmarks established by the Dutch government. These are tested through the platform internet.nl.

bwbroersma · 2024-02-06T12:00:07Z

Hey @simonbesters, thanks for joining the conversation. Currently zero attribution is required, but we're thinking of some 'advised' attribution to make some clearer distinction between Internet.nl and tools using Internet.nl results.

I wondered: do you currently 'scrape' the site or do you use the json REST API via batch.internet.nl?

BTW Internet.nl is in the process of updating the API server to docker (see #1253), there are still a few release-blockers, but after that it should be fairly easy to setup a private Internet.nl-API instance (and then you could also setup a brand in the scraper UA, see #1257).

simonbesters · 2024-02-06T13:30:25Z

We scrape, outside office hours.

bwbroersma · 2024-04-03T14:22:01Z

We scrape, outside office hours.

@simonbesters: Please note the new rule number 7 of the application form (which is not yet deployed on the online application form).

dipnluser · 2024-04-06T08:42:36Z

Hi @bwbroersma, DIP programmer here. We really like the 'scraping' method, because it gives us a nice internet.nl page to give to the user (like a municipality ciso). We poll the request status very slowly, and we visit the result page only once. We request about 300 domains per day. In random order (in our checks queue), so your server gets them across 6 - 12 hours. We scrape some key results from the result page (eg. https://internet.nl/site/www.waterland.nl/2722343/) and save the permalink, and that's what we give the user if they want to see why their website didn't get a 100%. We really need that internet.nl result page for the ciso. I don't think the API makes one, or does it? And if it does, why use the API instead of the front-end?

bwbroersma · 2024-04-06T15:17:34Z

See other users of the API, batch-calls will - nest to JSON - also give a result page, e.g. https://batch.internet.nl/site/www.rijksoverheid.nl/5899563/ in this case. The benefit of using the API is that it performs a fair scheduling and better use of the resources, you will be able to make 1 request for 300 domains, or 2000 domains, and don't have to guess how to best handle the scheduling. Furthermore the batch resources are different from the single test (internet.nl) instance, so large batches will never slow down regular users of the site.

dipnluser · 2024-04-08T21:18:52Z

What do you mean, scheduling? Our jobs run synchronously, so we will wait for results. (About 20 sec on average in 2024 and 2023.) Will the batch api requests be much slower, or more unpredictable? We will still do requests per 1 domain, not all 300 at the same time, because the queue doesn't know that. If possible we'll ask mail and site for 1 domain in the same request, so 300 requests x 2, but not 2 requests x 300.

dipnluser · 2024-04-08T21:27:34Z

From the TOS:

Causing heavy loads for the Service makes things slower for other users. We therefore request the users to honor the following ‘fair use’ rules:

Maximum 2 batch requests per week;

Per batch request a maximum of 5000 domain names;

That sounds like a problem... Even if I completely change the way the queue works, it would be 7 batches per week. And users do adhoc tests for a single site (site & mail), so there would also be batches of 1, OR those would still use the scraping method. I'm gonna sleep on it. I've requested API access, and I'll give the batch api a try soon.

dipnluser · 2024-07-17T17:59:16Z

@bwbroersma Our batch API implementation is live, and it works beautifully ❤️ so this should unload the internet.nl instance somewhat. Thanks guys.

bwbroersma added the content label Jan 29, 2024

bwbroersma assigned baknu and bwbroersma Jan 29, 2024

bwbroersma added this to the v1.9 milestone Mar 18, 2024

rudiedirkx mentioned this issue Sep 13, 2024

[bug?] Batch API status stuck at registering #1497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External use of Internet.nl #1255

External use of Internet.nl #1255

bwbroersma commented Jan 29, 2024 •

edited by baknu

Loading

simonbesters commented Feb 5, 2024

bwbroersma commented Feb 6, 2024

simonbesters commented Feb 6, 2024

bwbroersma commented Apr 3, 2024

dipnluser commented Apr 6, 2024

bwbroersma commented Apr 6, 2024

dipnluser commented Apr 8, 2024

dipnluser commented Apr 8, 2024

dipnluser commented Jul 17, 2024

External use of Internet.nl #1255

External use of Internet.nl #1255

Comments

bwbroersma commented Jan 29, 2024 • edited by baknu Loading

simonbesters commented Feb 5, 2024

bwbroersma commented Feb 6, 2024

simonbesters commented Feb 6, 2024

bwbroersma commented Apr 3, 2024

dipnluser commented Apr 6, 2024

bwbroersma commented Apr 6, 2024

dipnluser commented Apr 8, 2024

dipnluser commented Apr 8, 2024

dipnluser commented Jul 17, 2024

bwbroersma commented Jan 29, 2024 •

edited by baknu

Loading