-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identical URLs requested multiple times #38
Comments
I assume you are referring to the message in #34 (comment) saying that the caching is "best effort". In my usage case that best effort is resulting in hitting the same URLs many many times. As shown in my test case above, even the most trivial possible case of multiple HTML files having the same link results in multiple hits. Oh well. I'll see what I can do on my fork then. |
Can you try this branch which visits each URL only once? |
I just tried that branch and it does seem to solve the test case that I posted above. When I run it against my real site, however, I'm still seeing some multiple hits to the same URLs. However, even in the master branch that duplicate effort is perhaps not as bad as I thought. I instrumented a fork of the code (as of yesterday's master) to log each cache hit and miss. In a run against my site under test there were 9870 distinct URLs that got a cache miss out of 11192 cache misses over all. Of the distinct URLs, 408 had more than one cache miss, the highest runners having 34 and 32 misses. I was running with |
Can you tell me the server program and your website you experimented with if you don't mind so that I can test on my machine? The branch should remove any duplicates of requests to the same URLs so I guess the branch has bugs. |
Sorry, the web server I'm testing against is not yet public. It will be soon though; cleaning up broken links after a migration is one of our last steps before going live. |
I should say, our content management team was delighted with the report that I provided based directly on muffet output. Thanks for making the tool available to us all. |
Thank you Fred. I'm glad to hear that. I ran the new branch on my mid-sized website of around 40 pages and it didn't have any problem. But I could see some duplicate logs as you mentioned while they are expected. For example, although the URLs So maybe the duplicate accesses you saw were because of those reasons. |
When I run muffet against a local site I see in the logs that some pages are being requested many times in a single run. This seem unnecessary and puts extra load on the server being tested.
Here is a simple example. Create "test.html" with this content:
and "test2.html" with this:
Then serve this content with
python3 -m http.server
.And run
muffet http://localhost:8000/test.html
.The python http.server output I get is this:
This shows that "/foo.html" was requested multiple times.
Strangely, small changes to those html files cause different results. If I add a link to test.html, muffet requests foo.html only once in the run.
The text was updated successfully, but these errors were encountered: