Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from VULCOID to VCID #811 #896

Merged
merged 1 commit into from
Sep 8, 2022
Merged

Conversation

TG1999
Copy link
Contributor

@TG1999 TG1999 commented Sep 2, 2022

Use uuid instead of base36
Reference: #811

Signed-off-by: Tushar Goel [email protected]

@TG1999 TG1999 linked an issue Sep 2, 2022 that may be closed by this pull request
3 tasks
Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 36 chars UUID seems overkill, do we really have a need for that many unique possibilities?
Using the first 18 chars (or even less) would be plenty imho.

vulnerabilities/models.py Outdated Show resolved Hide resolved
@mjherzog
Copy link
Member

mjherzog commented Sep 5, 2022

I suspect that with 36 (or even 18) we will end up with a situation similar to Docker image layers where you will need a truncated VCID for reports and other human-oriented purposes. This may be something we want to do from the start given that something like 10 characters will probably be unique within a particular set of VCIDs for a project or codebase.

@TG1999
Copy link
Contributor Author

TG1999 commented Sep 5, 2022

I have truncated the uuid length to 10, this is how a sample VCID looks like now VCID-2f5e133a-b

@tdruez
Copy link
Contributor

tdruez commented Sep 6, 2022

VCID looks like now VCID-2f5e133a-b

In that example, the last dash is not very useful. I would suggest to split the chars towards the middle for better readability.

This worth more discussion in any cases before implementation, using uuid4 maybe not be the best for our need.

@TG1999
Copy link
Contributor Author

TG1999 commented Sep 6, 2022

@tdruez I experimented a bit with shortUUID, and formed an ID like this VCID-JebnC-khqZw, does this look good ?

@tdruez
Copy link
Contributor

tdruez commented Sep 6, 2022

formed an ID like this VCID-JebnC-khqZw, does this look good ?

@TG1999 Yes, I think it looks much better than a uuid4. Do we really need a new dependency to generate this though?
It feels like this could probably done in a couple lines of Python.
Also, did you include the integers to raise the number of possibility per characters?

@pombredanne
Copy link
Member

Here is what I suggest after a discussion in our weekly call. Reuse something such as "VCID-25fx-gxgp-m24r" where we use three groups of 4 letters, in the same style the GitHub GHSA vulnerability ids.
Basically we could quite likely encode 60 to 64 bits of ids in 12 characters ( 96 bits ) alright. For instance 60 bits could be used with base32 (5 bits encoded per character). We likely can use base36 or base32 or else. Being case sensitive is better.
May be using a base32 and avoiding confusable characters such as : 0 (zero) and O (big and small O) and i (small i and big I) and l (small l and big L)... all the ASCII letters and numbers (36 signs) minus these 4 is 32 characters

"VCID-25fx-gxgp-m24r" encoded using 60 bits in three groups of base32 4-chars strings.

@pombredanne
Copy link
Member

Here is a snippet based on @TG1999 code:

>>> import uuid, hashlib, base64
>>> uid = base64.b32encode(hashlib.sha1(uuid.uuid4().bytes).digest())[:12].decode('utf-8').lower()
>>> f'VCID-{uid[:4]}-{uid[4:8]}-{uid[8:12]}'
'VCID-x2at-frm3-gg6j'
>>> uid
'x2atfrm3gg6j'

This could be something such as:

from hashlib import sha256
from uuid import uuid4


def build_vcid(prefix='VCID'):
    """
    Return a new VulnerableCode VCID unique identifier string using the ``prefix``.

    For example::
    >>> import re
    >>> vcid = build_vcid()
    >>> # VCID-6npv-94wz-hhuq
    >>> assert re.match('VCID(-[a-z1-9]{4}){3}', vcid), vcid
    """
    # we keep only 64 bits (e.g. 8 bytes)
    uid = sha256(uuid4().bytes).digest()[:8]
    # we keep only 12 encoded bytes (which corresponds to 60 bits)
    uid = base32_custom(uid)[:12].decode('utf-8').lower()
    return f'{prefix}-{uid[:4]}-{uid[4:8]}-{uid[8:12]}'


_base32_alphabet = b'ABCDEFGHJKMNPQRSTUVWXYZ123456789'
_base32_table = None


def base32_custom(btes):
    """
    Encode the ``btes`` bytes object using a Base32 encoding using a custom
    alphabet and return a bytes object.

    Code copied and modified from the Python Standard Library:
    base64.b32encode function

    SPDX-License-Identifier: Python-2.0
    Copyright (c) The Python Software Foundation

    For example::
    >>> assert base32_custom(b'abcd') == b'ABTZE25E', base32_custom(b'abcd')
    >>> assert base32_custom(b'abcde00000xxxxxPPPPP') == b'PFUGG3DFGA2DAPBTSB6HT8D2MBJFAXCT'
    """
    global _base32_table
    # Delay the initialization of the table to not waste memory
    # if the function is never called
    if _base32_table is None:
        b32tab = [bytes((i,)) for i in _base32_alphabet]
        _base32_table = [a + b for a in b32tab for b in b32tab]

    encoded = bytearray()
    from_bytes = int.from_bytes

    for i in range(0, len(btes), 5):
        c = from_bytes(btes[i: i + 5], 'big')
        encoded += (
            _base32_table[c >> 30] +  # bits 1 - 10
            _base32_table[(c >> 20) & 0x3ff] +  # bits 11 - 20
            _base32_table[(c >> 10) & 0x3ff] +  # bits 21 - 30
            _base32_table[c & 0x3ff]  # bits 31 - 40
        )
    return bytes(encoded)

Use uuid instead of base36
Reference: aboutcode-org#811

Signed-off-by: Tushar Goel <[email protected]>
('vulnerabilities', '0022_alter_vulnerability_vulnerability_id'),
]

def save_vulnerability_id(apps, schema_editor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TG1999 would using a bulk_update() be a better approach?
@tdruez ^ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I've never used it, so you'll have to give it a try and let me know :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks! let me give it a shot with and without

@pombredanne pombredanne merged commit 2a70836 into aboutcode-org:main Sep 8, 2022
@pombredanne
Copy link
Member

I used bulk_update for good results in 7c708e8 and merged it all. Thank you ++ @TG1999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit VULCOID ...
4 participants