Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module #87028

Closed
erlend-aasland opened this issue Jan 8, 2021 · 18 comments
Closed

Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module #87028

erlend-aasland opened this issue Jan 8, 2021 · 18 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@erlend-aasland
Copy link
Contributor

BPO 42862
Nosy @rhettinger, @berkerpeksag, @serhiy-storchaka, @corona10, @pablogsal, @erlend-aasland
PRs
  • bpo-40956: Convert _sqlite3.Cache to Argument Clinic #24135
  • bpo-42862: Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module #24203
  • bpo-42862: Strip stale sqlite3 cache ignores from c-analyzer #26876
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-06-03.20:01:40.172>
    created_at = <Date 2021-01-08.00:39:03.919>
    labels = ['type-feature', 'library', '3.11']
    title = 'Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module'
    updated_at = <Date 2021-06-23.14:04:34.134>
    user = 'https://github.com/erlend-aasland'

    bugs.python.org fields:

    activity = <Date 2021-06-23.14:04:34.134>
    actor = 'corona10'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-06-03.20:01:40.172>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2021-01-08.00:39:03.919>
    creator = 'erlendaasland'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42862
    keywords = ['patch']
    message_count = 18.0
    messages = ['384625', '384627', '384691', '384892', '384923', '384926', '384927', '384928', '384929', '384935', '384976', '385193', '385376', '385444', '385445', '393932', '395039', '396422']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'berker.peksag', 'serhiy.storchaka', 'corona10', 'pablogsal', 'erlendaasland']
    pr_nums = ['24135', '24203', '26876']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue42862'
    versions = ['Python 3.11']

    @erlend-aasland
    Copy link
    Contributor Author

    Pro: less code, less complexity, improved maintainability
    Con: minor performance hit

    PoC here: https://github.com/erlend-aasland/cpython/commits/sqlite-cache

    $ ./python.exe
    >>> import sqlite3
    >>> con = sqlite3.connect(":memory:")
    >>> con.execute("select * from sqlite_master")
    >>> con.execute("select * from sqlite_master")
    >>> c = con.cache()
    >>> c.cache_info()
    CacheInfo(hits=1, misses=1, maxsize=128, currsize=1)

    The test suite runs approx. 10-20 ms slower with this change. Using _functools._lru_cache_wrapper iso. functools.lru_cache almost removes this performance regression.

    Berker, is it worth pursuing?

    @erlend-aasland erlend-aasland added 3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 8, 2021
    @erlend-aasland
    Copy link
    Contributor Author

    Short diffstat: 8 files changed, 85 insertions(+), 406 deletions(-)

    @erlend-aasland
    Copy link
    Contributor Author

    $ python3.10 -m timeit -s 'import sqlite3; con = sqlite3.connect(":memory:"); query="select * from sqlite_master"' 'con.execute(query); con.execute(query)'
    100000 loops, best of 5: 2.95 usec per loop
    $ ./python.exe -m timeit -s 'import sqlite3; con = sqlite3.connect(":memory:"); query="select * from sqlite_master"' 'con.execute(query); con.execute(query)'
    100000 loops, best of 5: 2.68 usec per loop

    @rhettinger
    Copy link
    Contributor

    +1 This seems reasonable to me.

    @erlend-aasland
    Copy link
    Contributor Author

    I can throw up the PoC branch as a draft PR after #68323 is merged. We can just close the PR if this is uninteresting or something we want to postpone.

    @berkerpeksag
    Copy link
    Member

    I don't see any reason to merge #68323 if we are going to remove cache.[ch] in this issue.

    I was -0 before but since Raymond gave his +1, you can count me as +1 too.

    @erlend-aasland
    Copy link
    Contributor Author

    I don't see any reason to merge #68323 if we are going to remove cache.[ch] in this issue.

    Yes, I've thought about that myself. A small argument pro merging #68323 would be that if we for some reason decide to revert this change, then cache.[ch] would still be prepared for module state (which is needed for multi-phase init).

    @berkerpeksag
    Copy link
    Member

    We can always reopen #68323 and merge it even if we revert this one for some reason :)

    @erlend-aasland
    Copy link
    Contributor Author

    True that :) I'll close #68323 for now and open a PR for this later today.

    @serhiy-storchaka
    Copy link
    Member

    I do not like using _functools._lru_cache_wrapper. It is a deep implementation detail, private function of private module. Use functools.lru_cache. If it is few nanoseconds slower, that cost is only added at connection creation time. It is insignificant in any real application in comparison with IO and any real work with data. Thousands of short-living in-memory DB instances are only created in tests.

    @erlend-aasland
    Copy link
    Contributor Author

    I do not like using _functools._lru_cache_wrapper. It is a deep implementation detail, private function of private module. Use functools.lru_cache.

    All right, thanks.

    @erlend-aasland
    Copy link
    Contributor Author

    I need some help debugging the Windows issues. There are a handful of tests that fail because the sqlite3 is clinging on to objects or file descriptors. AFAICT, a GC bug. For example, test_open_uri fails with:

    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: [...]

    I've tried the following, with no success:

    • Remove GC from the Connection type
    • Remove the clear method, and modify traverse to only visit Py_TYPE(self) (https://bugs.python.org/issue42866#msg384675)
    • Add heap type GC (visit type, GC tracking) to all pysqlite types

    Maybe it's best to reopen #68323 and continue the work with multi-phase init? That is finalise bpo-40956, add heap type GC (https://bugs.python.org/issue42866#msg384675), establish module state, implement multi-phase init, and then revisit this issue.

    @erlend-aasland
    Copy link
    Contributor Author

    Tried applying bpo-42972 to sqlite and functools, but the error persists.

    @erlend-aasland
    Copy link
    Contributor Author

    This works:

    1. fully implement GC in connection (bpo-42972)
    2. also visit statement_cache
    3. explicitly close connections _and_ call GC in problematic tests

    The first point might not be needed for this particular fix.
    The last point is a workaround, not a solution. Or is it ok to call gc.collect() in the test suite?

    @erlend-aasland
    Copy link
    Contributor Author

    Or is it ok to call gc.collect() in the test suite?

    Seems like it's ok:

    $ grep -r gc.collect Lib/test | wc -l
        366

    @erlend-aasland erlend-aasland added 3.11 only security fixes and removed 3.10 only security fixes labels May 14, 2021
    @berkerpeksag
    Copy link
    Member

    Or is it ok to call gc.collect() in the test suite?

    It is but in this case I'd say it's a bit weird that we need to call it. Unfortunately, I don't have much time to investigate it at the moment.

    @pablogsal
    Copy link
    Member

    New changeset f461a7f by Erlend Egeberg Aasland in branch 'main':
    bpo-42862: Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module (GH-24203)
    f461a7f

    @corona10
    Copy link
    Member

    New changeset 34356a0 by Erlend Egeberg Aasland in branch 'main':
    bpo-42862: Strip stale sqlite3 cache ignores from c-analyzer (GH-26876)
    34356a0

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants