Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database corruption with the shelve module #91228

Open
HubTou mannequin opened this issue Mar 20, 2022 · 8 comments
Open

Database corruption with the shelve module #91228

HubTou mannequin opened this issue Mar 20, 2022 · 8 comments
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@HubTou
Copy link
Mannequin

HubTou mannequin commented Mar 20, 2022

BPO 47072
Nosy @malemburg, @terryjreedy, @koobs, @HubTou
Files
  • shelve-test.zip: Small test program to reproduce the bug
  • shelve-test-3.10.zip: Small test program and results
  • shelve-test-3.10-b.zip: Small test program and results, with better record size
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-03-20.17:29:23.398>
    labels = ['type-bug', '3.8', '3.10']
    title = 'Database corruption with the shelve module'
    updated_at = <Date 2022-03-27.12:01:09.272>
    user = 'https://github.com/HubTou'

    bugs.python.org fields:

    activity = <Date 2022-03-27.12:01:09.272>
    actor = 'Leilei'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Demos and Tools']
    creation = <Date 2022-03-20.17:29:23.398>
    creator = 'HubTou'
    dependencies = []
    files = ['50693', '50703', '50704']
    hgrepos = []
    issue_num = 47072
    keywords = []
    message_count = 7.0
    messages = ['415625', '416036', '416045', '416063', '416108', '416110', '416119']
    nosy_count = 4.0
    nosy_names = ['lemburg', 'terry.reedy', 'koobs', 'HubTou']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue47072'
    versions = ['Python 3.8', 'Python 3.10']

    @HubTou
    Copy link
    Mannequin Author

    HubTou mannequin commented Mar 20, 2022

    After adding a few records, the shelve module corrupts the database keys (the database is still readable if an element key is known, but no more iterable):

    Traceback (most recent call last):
      File "./shelve-test.py", line 81, in <module>
        _verify_whois_cache()
      File "./shelve-test.py", line 61, in _verify_whois_cache
        for key in db.keys():
      File "/usr/local/lib/python3.8/_collections_abc.py", line 720, in __iter__
        yield from self._mapping
      File "/usr/local/lib/python3.8/shelve.py", line 95, in __iter__
        for k in self.dict.keys():
    SystemError: Negative size passed to PyBytes_FromStringAndSize

    I provide a short test program and data that systematically reproduces the bug. I added the a script showing execution messages, the resulting database in DB and text formats.

    Tested with Python 3.8.12 on FreeBSD 13.0-RELEASE-p8.
    I suppose Python is using my system package db5-5.3.28_8 (Oracle Berkeley DB, revision 5.3).

    See also similar issues:
    https://bugs.python.org/issue33074
    https://bugs.python.org/issue30388

    @HubTou HubTou mannequin added stdlib Python modules in the Lib dir 3.8 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Mar 20, 2022
    @terryjreedy
    Copy link
    Member

    3.8 only gets security patches. If you can, please test with a newer version.

    @HubTou
    Copy link
    Mannequin Author

    HubTou mannequin commented Mar 26, 2022

    Hello,
    Same results with Python 3.10.4:

    [...]
    Adding 185.220.102.6
    Database has 62 records for 442368 bytes. Last record was 640 bytes long
    Traceback (most recent call last):
      File "./shelve-test.py", line 84, in <module>
        _verify_whois_cache()
      File "./shelve-test.py", line 63, in _verify_whois_cache
        for key in db.keys():
      File "/usr/local/lib/python3.10/_collections_abc.py", line 881, in __iter__
        yield from self._mapping
      File "/usr/local/lib/python3.10/shelve.py", line 95, in __iter__
        for k in self.dict.keys():
    SystemError: Negative size passed to PyBytes_FromStringAndSize
    # freebsd-version -uk
    13.0-RELEASE-p8
    13.0-RELEASE-p10
    # python3.10 --version
    Python 3.10.4

    The point at which the database breaks depends (from 50 to 500+ records), the size of the database doesn't seem to be relevant (from 400K to 1800K).

    The size of the record *apparently* doesn't seem to be relevant (but I'm not 100% sure it's the right figure), though I've had other shelve module uses without issues with many more records but much smaller and less complex.

    @HubTou
    Copy link
    Mannequin Author

    HubTou mannequin commented Mar 26, 2022

    I modified the test program to better reflect the size of the data structures stored in shelve (sys.getsizeof() which I used was far off the real size).

    I saw that the database was corrupted with big records, though even bigger previous records had not corrupted it. Records larger than 1K (mentioned in one of the other problem report) were routinely OK. Records larger than 4K (also mentioned on another PR) were sometimes OK.

    When I took a problematic record and used it single alone or with few other records, no corruption occurred.

    Any idea?

    @HubTou
    Copy link
    Mannequin Author

    HubTou mannequin commented Mar 27, 2022

    Additional note: the test code WORKS under Windows 8.1 / Python 3.9.1 (though the data file is suffixed .dat instead of .db) resulting in a 4 MB database with 1065 records, some of them > 11 KB.

    So maybe the bug is system dependent.

    @HubTou HubTou mannequin added 3.10 only security fixes labels Mar 27, 2022
    @HubTou
    Copy link
    Mannequin Author

    HubTou mannequin commented Mar 27, 2022

    The storage format used under Windows is completely different from the one used under Unix (or *BSD).

    Apart from the .dat datafile, there is a .dir index file with CSV lines such as "'key', (offset, length)".

    Whereas under Unix (or *BSD), I have:

    # file whois_cache.db
    whois_cache.db: Berkeley DB 1.85 (Hash, version 2, native byte-order)

    I'll make a test on a Linux Raspberry Pi, to see if the issue is *BSD specific...

    @malemburg
    Copy link
    Member

    On 27.03.2022 09:56, Hubert Tournier wrote:

    The storage format used under Windows is completely different from the one used under Unix (or *BSD).

    The shelve module uses the dbm module underneath and this will pick
    its storage mechanism based on what's available on the platform:

    https://docs.python.org/3/library/dbm.html
    https://github.com/python/cpython/blob/3.10/Lib/dbm/__init__.py

    It's likely that you'll get the dbm.dumb interface on Windows.
    On Linux, you typically have one of gdbm or the Berkley DB installed.

    dbm.whichdb() will tell you which type of dbm implementation your
    files are likely using.

    More on the differences of DBM style libs:
    http://www.ccl.net/cca/software/UNIX/apache/apacheRH7.0/local-copies/dbm.html

    Aside: You are probably better off using SQLite with a pickle
    layer to store arbitrary objects. This is much more mature than
    the dbm modules.

    @noobLei noobLei mannequin removed stdlib Python modules in the Lib dir labels Mar 27, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ronaldoussoren
    Copy link
    Contributor

    #74573 might be related, this is an older similar issue on macOS.

    @terryjreedy terryjreedy removed the 3.8 (EOL) end of life label Apr 11, 2022
    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 29, 2023
    offbyone added a commit to offbyone/planet that referenced this issue Oct 10, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants