CookieJar improvements #7583

Dreamsorcerer · 2023-09-06T17:34:19Z

While looking through #7577, we found a few details that could possibly be improved.

It should really output duplicate cookies in the Cookie header (e.g. when cookies with different paths have the same name) to match the recommended behaviour in the spec. This may be fairly complex to refactor, as the cookies/morsels used from http.cookies don't seem to support this. Upon reviewing the documentation, I get the impression that http.cookies is intended for server-side, while http.cookiejar is intended for client-side. So, we may need to replace http.cookies to support this behaviour.
Cookies are now ordered by path length, but it should also use creation timestamp as a tie-breaker (point 2: https://www.rfc-editor.org/rfc/rfc6265.html#section-5.4). This is probably most relevant if cookies with the same name and path match (but different domains, like a root domain and subdomain). I don't think we currently store creation timestamp, so we'd probably need to add this.
(Implement filter_cookies() with domain-matching and path-matching #7944) .filter_cookies() currently just iterates over all cookies and checks each one if they are valid. This could be made more efficient because they are stored like self._cookies[(domain, path)][name], therefore we could do domain-matching and path-matching on the keys, instead of testing every single cookie.

aiohttp/aiohttp/cookiejar.py

Lines 141 to 142 in 4639b36

for val in self._cookies.values():

yield from val.values()

The text was updated successfully, but these errors were encountered:

Rongronggg9 · 2023-11-10T19:35:33Z

I profiled my program with yappi and found that .filter_cookies() consumed 27.5% (23.1s/83.9s) of the total CPU time consumed by requests.

As we can see, the preparation before filtering is very expensive.

aiohttp/aiohttp/cookiejar.py

Lines 237 to 252 in 7ed2dd3

    
           self._do_expiration() 
        
           if not isinstance(request_url, URL): 
        
               warnings.warn( 
        
                   "The method accepts yarl.URL instances only, got {}".format( 
        
                       type(request_url) 
        
                   ), 
        
                   DeprecationWarning, 
        
               ) 
        
               request_url = URL(request_url) 
        
           filtered: Union["SimpleCookie[str]", "BaseCookie[str]"] = ( 
        
               SimpleCookie() if self._quote_cookie else BaseCookie() 
        
           ) 
        
           hostname = request_url.raw_host or "" 
        
           request_origin = URL() 
        
           with contextlib.suppress(ValueError): 
        
               request_origin = request_url.origin()

However, not all requests will have cookies in their jar, for example, the initial request, or, when the session is only used to request those URLs that never sent cookies (images, videos, files, etc).

So I have another suggestion: test if there are any cookies in the jar before really doing anything.

Dreamsorcerer · 2023-11-10T23:14:57Z

Open PRs that probably resolve these performance issues: #7784 #7777 #7790

Rongronggg9 · 2023-11-11T14:17:13Z

Open PRs that probably resolve these performance issues: #7784 #7777 #7790

I see. But they do not eliminate the need to call URL.origin(), which is also expensive, even when the jar is empty. Would you think that my suggestion is a good idea? If so, I can open a PR.

Dreamsorcerer · 2023-11-11T15:37:26Z

If it's an easy change, feel free to make a PR, it's easier for me to evaluate the code.

bdraco · 2023-11-11T16:41:30Z

I see. But they do not eliminate the need to call URL.origin(), which is also expensive, even when the jar is empty. Would you think that my suggestion is a good idea? If so, I can open a PR.

I see origin being expensive in the profile as well. Its much more expensive if its an ip address instead of a hostname because it has to recreate the ip_address object. I think you'll need to do another PR for that one

bdraco · 2023-11-11T17:23:57Z

It would be nice if we had a simple benchmark script to compare before and after changes for the cookie jar (probably the url dispatcher as well).

The cookie jar and the url dispatcher tend to be the bottlenecks for large aiohttp installs so anything we can do to improve them will make things scale much better.

Dreamsorcerer · 2023-11-11T17:33:12Z

This is a benchmarks repo, which I've not looked at yet, maybe if that is dusted off it can be used?
https://github.com/aio-libs/aiohttp-benchmarks

bdraco · 2023-11-11T17:40:52Z

It looks like those are mostly? end-to-end benchmarks. Since we already know where the bottlenecks are, I'd be more interested in something that adds 10000 cookies to the cookie jar and does timing on how long it takes to call filter_cookies. Probably one should have an ip address in the url, and one should have a hostname.

For the url dispatcher add 5000 resources and see how much time it takes to dispatch to the to the last one in the list vs the first one in the list.

See also aio-libs/aiohttp#7583 Signed-off-by: Rongrong <[email protected]>

…he jar is empty or all cookies have expired (#7822) **This is a backport of PR #7819 as merged into master (dfc3f89).**  ## What do these changes do? The filtering itself and its preparation in `CookieJar.filter_cookies()` is expensive. Sometimes there are no cookies in the jar or all cookies have expired. Skip filtering and its preparation in this case. Because the empty check is much cheaper than `_do_expiration()`, I think it deserves to be duplicated before and after calling `_do_expiration()`. ```console $ python3.11 -m timeit -s 'from collections import defaultdict; d=defaultdict(foo="bar")' \ > 'if not d: pass' 50000000 loops, best of 5: 8.3 nsec per loop $ python3.11 -m timeit -s 'from collections import defaultdict; d=defaultdict()' \ > 'if not d: pass' 50000000 loops, best of 5: 8.74 nsec per loop $ python3.11 -m timeit -s 'from aiohttp import CookieJar; cj = CookieJar()' \ > 'cj._do_expiration()' 200000 loops, best of 5: 1.86 usec per loop ```  ## Are there changes in behavior for the user? No.  ## Related issue number #7583 (comment)  ## Checklist - [x] I think the code is well written - [ ] Unit tests for the changes exist - [ ] Documentation reflects the changes - [x] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [x] Add a new news fragment into the `CHANGES` folder * name it `<issue_id>.<type>` for example (588.bugfix) * if you don't have an `issue_id` change it to the pr id after creating the pr * ensure type is one of the following: * `.feature`: Signifying a new feature. * `.bugfix`: Signifying a bug fix. * `.doc`: Signifying a documentation improvement. * `.removal`: Signifying a deprecation or removal of public API. * `.misc`: A ticket has been closed, but it is not of interest to users. * Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

… in CookieJar (#7824) #7583 #7819 (comment)

…` when the jar is empty or all cookies have expired (aio-libs#7822) **This is a backport of PR aio-libs#7819 as merged into master (dfc3f89).**  The filtering itself and its preparation in `CookieJar.filter_cookies()` is expensive. Sometimes there are no cookies in the jar or all cookies have expired. Skip filtering and its preparation in this case. Because the empty check is much cheaper than `_do_expiration()`, I think it deserves to be duplicated before and after calling `_do_expiration()`. ```console $ python3.11 -m timeit -s 'from collections import defaultdict; d=defaultdict(foo="bar")' \ > 'if not d: pass' 50000000 loops, best of 5: 8.3 nsec per loop $ python3.11 -m timeit -s 'from collections import defaultdict; d=defaultdict()' \ > 'if not d: pass' 50000000 loops, best of 5: 8.74 nsec per loop $ python3.11 -m timeit -s 'from aiohttp import CookieJar; cj = CookieJar()' \ > 'cj._do_expiration()' 200000 loops, best of 5: 1.86 usec per loop ```  No.  aio-libs#7583 (comment)  - [x] I think the code is well written - [ ] Unit tests for the changes exist - [ ] Documentation reflects the changes - [x] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [x] Add a new news fragment into the `CHANGES` folder * name it `<issue_id>.<type>` for example (588.bugfix) * if you don't have an `issue_id` change it to the pr id after creating the pr * ensure type is one of the following: * `.feature`: Signifying a new feature. * `.bugfix`: Signifying a bug fix. * `.doc`: Signifying a documentation improvement. * `.removal`: Signifying a deprecation or removal of public API. * `.misc`: A ticket has been closed, but it is not of interest to users. * Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

See also aio-libs/aiohttp#7583 Signed-off-by: Rongrong <[email protected]>

Dreamsorcerer added the enhancement label Sep 6, 2023

Dreamsorcerer mentioned this issue Sep 6, 2023

CookieJar - return 'best-match' and not LIFO #7577

Merged

5 tasks

xiangxli mentioned this issue Nov 2, 2023

Implement filter_cookies() with domain-matching and path-matching #7777

Closed

5 tasks

Rongronggg9 mentioned this issue Nov 11, 2023

Skip filtering CookieJar when the jar is empty or all cookies have expired #7819

Merged

5 tasks

Rongronggg9 added a commit to Rongronggg9/RSS-to-Telegram-Bot that referenced this issue Nov 12, 2023

perf(web): avoid CookieJar's overhead if no cookie

664da37

See also aio-libs/aiohttp#7583 Signed-off-by: Rongrong <[email protected]>

This was referenced Nov 12, 2023

Only check origin if insecure scheme and there are origins to treat as secure, in CookieJar.filter_cookies() #7821

Merged

[PR #7819/dfc3f899 backport][3.9] Skip filtering CookieJar when the jar is empty or all cookies have expired #7822

Merged

Rongronggg9 mentioned this issue Nov 12, 2023

Use timestamp instead of datetime to achieve faster cookie expiration in CookieJar #7824

Merged

5 tasks

patchback bot mentioned this issue Nov 12, 2023

[PR #7821/366ba40f backport][3.9] Only check origin if insecure scheme and there are origins to treat as secure, in CookieJar.filter_cookies() #7825

Merged

5 tasks

Dreamsorcerer pushed a commit that referenced this issue Nov 14, 2023

Use timestamp instead of datetime to achieve faster cookie expiration…

8ae650b

… in CookieJar (#7824) #7583 #7819 (comment)

xiangxli added a commit to xiangxli/aiohttp that referenced this issue Dec 4, 2023

aio-libs#7583

e91ec9d

xiangxli mentioned this issue Dec 4, 2023

Implement filter_cookies() with domain-matching and path-matching #7944

Merged

5 tasks

silvered-shark added a commit to silvered-shark/RSS-to-Telegram-Bot that referenced this issue Jun 17, 2024

perf(web): avoid CookieJar's overhead if no cookie

2c3f465

See also aio-libs/aiohttp#7583 Signed-off-by: Rongrong <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CookieJar improvements #7583

CookieJar improvements #7583

Dreamsorcerer commented Sep 6, 2023 •

edited

Loading

Rongronggg9 commented Nov 10, 2023

Dreamsorcerer commented Nov 10, 2023

Rongronggg9 commented Nov 11, 2023

Dreamsorcerer commented Nov 11, 2023

bdraco commented Nov 11, 2023

bdraco commented Nov 11, 2023

Dreamsorcerer commented Nov 11, 2023

bdraco commented Nov 11, 2023 •

edited

Loading

CookieJar improvements #7583

CookieJar improvements #7583

Comments

Dreamsorcerer commented Sep 6, 2023 • edited Loading

Rongronggg9 commented Nov 10, 2023

Dreamsorcerer commented Nov 10, 2023

Rongronggg9 commented Nov 11, 2023

Dreamsorcerer commented Nov 11, 2023

bdraco commented Nov 11, 2023

bdraco commented Nov 11, 2023

Dreamsorcerer commented Nov 11, 2023

bdraco commented Nov 11, 2023 • edited Loading

Dreamsorcerer commented Sep 6, 2023 •

edited

Loading

bdraco commented Nov 11, 2023 •

edited

Loading