Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Unification of Unique Identifier Generation #6376

Open
GitHK opened this issue Sep 17, 2024 · 0 comments
Open

Proposal: Unification of Unique Identifier Generation #6376

GitHK opened this issue Sep 17, 2024 · 0 comments
Assignees
Labels
t:enhancement Improvement or request on an existing feature

Comments

@GitHK
Copy link
Contributor

GitHK commented Sep 17, 2024

          ## Proposal: Unification of Unique Identifier Generation

CC: @sanderegg @matusdrobuliak66 @giancarloromeo @GitHK

We currently have multiple methods across the codebase for generating unique identifiers. To improve consistency and flexibility, I propose unifying this functionality under servicelib.identifiers_utils. The main objectives are:

  1. Generate Context-Specific Unique Name Identifiers:

    • The ability to generate unique identifiers based on different contexts/scopes, such as globally unique, unique within a project, process, hostname, or cluster.
    • The context can be passed as discriminators that will define the scope of uniqueness.
  2. Standardized Identifier Formats:

    • Provide support for generating both standard UUIDs and human-readable identifiers with optional prefixes.
    • UUIDs should follow a standard format like uuid4 for general uniqueness or uuid5 (namespace-based) for deterministic IDs based on specific discriminators.
    • Human-readable identifiers should support optional prefixes (e.g., pay_123456124 for payment identifiers).

Example Implementation:

import hashlib
import time
import uuid
import socket
from models_library.basic_types import IdStr

def short_sha256(input_string: str, length: int = 8) -> IdStr:
    """Generates a truncated SHA-256 hash of the input string."""
    sha_signature = hashlib.sha256(input_string.encode()).hexdigest()
    return IdStr(sha_signature[:length])


def generate_name_identifier(*discriminators, prefix: str | None = None, length: int = 8) -> IdStr:
    """
    Generates a unique identifier based on the provided discriminators (e.g., project name, hostname).
    Optionally includes a human-readable prefix and truncates the identifier to the desired length.
    """
    idr = short_sha256("/".join(map(str, discriminators)), length=length)
    if prefix:
        idr = f"{prefix}_{idr}"
    return idr


def generate_uuid(*discriminators, base_uuid: uuid.UUID | None = None) -> uuid.UUID:
    """
    Generates a UUID based on the provided discriminators.
    Uses uuid5 for namespace-based determinism.
    """
    if not base_uuid:
        base_uuid = uuid.uuid4()
    return uuid.uuid5(base_uuid, "/".join(map(str, discriminators)))


# Example usage
def get_rabbitmq_client_unique_name(prefix: str) -> IdStr:
    """
    Generates a unique RabbitMQ client name based on the hostname and current time,
    with an optional prefix.
    """
    hostname = socket.gethostname()
    return generate_name_identifier(time.time(), hostname, prefix=prefix, length=8)

Key Points and Improvements:

  1. Contextual Uniqueness: The generate_name_identifier function allows you to pass any relevant context (e.g., hostname, project, or process) to ensure uniqueness within the intended scope.

  2. Prefix Support: Human-readable prefixes can be added to identifiers for better clarity and debugging, such as pay_ for payment identifiers or user_ for user-related identifiers.

  3. Shortened SHA-256 Identifiers: For identifiers that require truncation, we use a shortened SHA-256 hash, which can be configured via the length parameter to balance between uniqueness and brevity. However, consider using longer truncations if there are concerns about collisions in large systems.

  4. UUID Generation: For cases requiring globally unique or deterministic identifiers, the generate_uuid function uses uuid5 for generating namespace-based UUIDs (preferred over uuid3 due to its stronger cryptographic properties).

  5. Flexibility: Both generate_name_identifier and generate_uuid functions are flexible, allowing users to define how discriminators affect uniqueness within their system.

Next Steps:

  • We can further extend this utility by allowing specific discriminators, such as user ID or session ID, for more fine-grained uniqueness when necessary.
  • Additional formats (e.g., base62 encoding for compact identifiers) can be considered if we find a need to reduce the length of identifiers without sacrificing uniqueness.

Originally posted by @pcrespov in #6365 (comment)

@GitHK GitHK changed the title ## Proposal: Unification of Unique Identifier Generation Proposal: Unification of Unique Identifier Generation Sep 17, 2024
@GitHK GitHK added the t:enhancement Improvement or request on an existing feature label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:enhancement Improvement or request on an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants