Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External use of Internet.nl #1255

Open
1 of 2 tasks
bwbroersma opened this issue Jan 29, 2024 · 9 comments
Open
1 of 2 tasks

External use of Internet.nl #1255

bwbroersma opened this issue Jan 29, 2024 · 9 comments
Assignees
Labels
Milestone

Comments

@bwbroersma
Copy link
Collaborator

bwbroersma commented Jan 29, 2024

@simonbesters
Copy link

Hi, product owner of Digital Insights Platform here. We are happy to add (more) content about internet.nl within our product. As for now it is:

To determine the security score, we employ the standards and benchmarks established by the Dutch government. These are tested through the platform internet.nl.

@bwbroersma
Copy link
Collaborator Author

Hey @simonbesters, thanks for joining the conversation. Currently zero attribution is required, but we're thinking of some 'advised' attribution to make some clearer distinction between Internet.nl and tools using Internet.nl results.

I wondered: do you currently 'scrape' the site or do you use the json REST API via batch.internet.nl?

BTW Internet.nl is in the process of updating the API server to docker (see #1253), there are still a few release-blockers, but after that it should be fairly easy to setup a private Internet.nl-API instance (and then you could also setup a brand in the scraper UA, see #1257).

@simonbesters
Copy link

We scrape, outside office hours.

@bwbroersma bwbroersma added this to the v1.9 milestone Mar 18, 2024
@bwbroersma
Copy link
Collaborator Author

We scrape, outside office hours.

@simonbesters: Please note the new rule number 7 of the application form (which is not yet deployed on the online application form).

@dipnluser
Copy link

Hi @bwbroersma, DIP programmer here. We really like the 'scraping' method, because it gives us a nice internet.nl page to give to the user (like a municipality ciso). We poll the request status very slowly, and we visit the result page only once. We request about 300 domains per day. In random order (in our checks queue), so your server gets them across 6 - 12 hours. We scrape some key results from the result page (eg. https://internet.nl/site/www.waterland.nl/2722343/) and save the permalink, and that's what we give the user if they want to see why their website didn't get a 100%. We really need that internet.nl result page for the ciso. I don't think the API makes one, or does it? And if it does, why use the API instead of the front-end?

@bwbroersma
Copy link
Collaborator Author

See other users of the API, batch-calls will - nest to JSON - also give a result page, e.g. https://batch.internet.nl/site/www.rijksoverheid.nl/5899563/ in this case. The benefit of using the API is that it performs a fair scheduling and better use of the resources, you will be able to make 1 request for 300 domains, or 2000 domains, and don't have to guess how to best handle the scheduling. Furthermore the batch resources are different from the single test (internet.nl) instance, so large batches will never slow down regular users of the site.

@dipnluser
Copy link

What do you mean, scheduling? Our jobs run synchronously, so we will wait for results. (About 20 sec on average in 2024 and 2023.) Will the batch api requests be much slower, or more unpredictable? We will still do requests per 1 domain, not all 300 at the same time, because the queue doesn't know that. If possible we'll ask mail and site for 1 domain in the same request, so 300 requests x 2, but not 2 requests x 300.

@dipnluser
Copy link

From the TOS:

Causing heavy loads for the Service makes things slower for other users. We therefore request the users to honor the following ‘fair use’ rules:

  • Maximum 2 batch requests per week;
  • Per batch request a maximum of 5000 domain names;

That sounds like a problem... Even if I completely change the way the queue works, it would be 7 batches per week. And users do adhoc tests for a single site (site & mail), so there would also be batches of 1, OR those would still use the scraping method. I'm gonna sleep on it. I've requested API access, and I'll give the batch api a try soon.

@dipnluser
Copy link

@bwbroersma Our batch API implementation is live, and it works beautifully ❤️ so this should unload the internet.nl instance somewhat. Thanks guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants