Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLA-2023] Improve metadata caching #260

Merged
merged 17 commits into from
Oct 16, 2024

Conversation

enjinabner
Copy link
Contributor

@enjinabner enjinabner commented Oct 4, 2024

PR Type

enhancement, configuration changes


Description

  • Added a new command SyncAttributeMetadata to sync attribute metadata to cache, improving performance.
  • Introduced caching in MetadataService with methods fetchAndCache and getCache for efficient metadata retrieval.
  • Updated Token model to utilize cached metadata instead of fetching it directly.
  • Added configuration for attribute metadata syncing, including a data_chunk_size setting.
  • Registered the new command in the CoreServiceProvider.

Changes walkthrough 📝

Relevant files
Configuration changes
enjin-platform.php
Add configuration for attribute metadata syncing                 

config/enjin-platform.php

  • Added configuration for attribute metadata syncing.
  • Introduced sync_metadata array with data_chunk_size setting.
  • +12/-0   
    Enhancement
    SyncAttributeMetadata.php
    Implement SyncAttributeMetadata command for caching           

    src/Commands/SyncAttributeMetadata.php

  • Created new command SyncAttributeMetadata.
  • Command syncs attribute metadata to cache.
  • Utilizes progress bar for tracking sync progress.
  • +53/-0   
    CoreServiceProvider.php
    Register SyncAttributeMetadata command in service provider

    src/CoreServiceProvider.php

    • Registered SyncAttributeMetadata command.
    +2/-0     
    Token.php
    Use cached metadata in Token model                                             

    src/Models/Laravel/Token.php

  • Changed metadata fetching to use cache.
  • Replaced MetadataService::fetch with MetadataService::getCache.
  • +3/-3     
    MetadataService.php
    Enhance MetadataService with caching capabilities               

    src/Services/Database/MetadataService.php

  • Added caching functionality for metadata.
  • Introduced fetchAndCache and getCache methods.
  • Implemented cache key generation.
  • +33/-0   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    github-actions bot commented Oct 4, 2024

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Cache Key Collision
    The method cacheKey generates cache keys that might collide if different attributes have similar value suffixes. Consider including a more unique identifier in the cache key.

    Error Handling
    The command lacks error handling for potential failures in the caching process, which could lead to incomplete sync operations without proper logging or user notification.

    Copy link

    github-actions bot commented Oct 4, 2024

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Add exception handling to the attribute metadata syncing process to enhance robustness

    Consider handling exceptions that may arise from database queries or cache
    operations to prevent the command from failing unexpectedly.

    src/Commands/SyncAttributeMetadata.php [40-49]

    -$query->chunk(
    -    config('enjin-platform.sync_metadata.data_chunk_size'),
    -    function ($attributes) use ($progress, $service): void {
    -        $attributes->each(function (Attribute $attribute) use ($progress, $service): void {
    -            $attribute->value = HexConverter::hexToString($attribute->value);
    -            $service->fetchAndCache($attribute);
    -            $progress->advance(1);
    -        });
    -    }
    -);
    +try {
    +    $query->chunk(
    +        config('enjin-platform.sync_metadata.data_chunk_size'),
    +        function ($attributes) use ($progress, $service): void {
    +            $attributes->each(function (Attribute $attribute) use ($progress, $service): void {
    +                $attribute->value = HexConverter::hexToString($attribute->value);
    +                $service->fetchAndCache($attribute);
    +                $progress->advance(1);
    +            });
    +        }
    +    );
    +} catch (\Exception $e) {
    +    $this->error('Failed to sync attribute metadata: ' . $e->getMessage());
    +}
    Suggestion importance[1-10]: 8

    Why: Adding exception handling is a valuable improvement as it prevents the command from failing unexpectedly due to database or cache operation errors, enhancing the robustness and reliability of the code.

    8
    Ensure the validity of URLs before using them to generate cache keys

    Validate the URL format of the attribute's value before attempting to generate a
    cache key to avoid potential errors or unexpected behaviors.

    src/Services/Database/MetadataService.php [57-59]

    +if (!filter_var($suffix, FILTER_VALIDATE_URL)) {
    +    throw new \InvalidArgumentException("Invalid URL provided for cache key.");
    +}
     return self::$cacheKey . ':' . $suffix;
    Suggestion importance[1-10]: 7

    Why: Validating URLs before generating cache keys is a good practice to prevent potential errors or unexpected behaviors, contributing to the stability and correctness of the caching mechanism.

    7
    Performance
    Optimize data processing in chunks to handle large datasets more efficiently

    Implement a mechanism to handle large data sets efficiently in the chunk processing
    to avoid memory overflow or performance bottlenecks.

    src/Commands/SyncAttributeMetadata.php [40-49]

    -$query->chunk(
    +$query->chunkById(
         config('enjin-platform.sync_metadata.data_chunk_size'),
         function ($attributes) use ($progress, $service): void {
             $attributes->each(function (Attribute $attribute) use ($progress, $service): void {
                 $attribute->value = HexConverter::hexToString($attribute->value);
                 $service->fetchAndCache($attribute);
                 $progress->advance(1);
             });
         }
     );
    Suggestion importance[1-10]: 7

    Why: Using chunkById instead of chunk can improve performance and memory usage when processing large datasets, making the code more efficient and scalable.

    7
    Possible issue
    Validate cached metadata before use to ensure data integrity

    Ensure that the attribute fetched from the cache is valid and not expired before
    using it, to prevent using stale or incorrect data.

    src/Models/Laravel/Token.php [157]

     $fetchedMetadata = $this->attributes['metadata'] ?? MetadataService::getCache($tokenUriAttribute);
    +if (!$fetchedMetadata) {
    +    $fetchedMetadata = MetadataService::fetchAndCache($tokenUriAttribute);
    +}
    Suggestion importance[1-10]: 6

    Why: Ensuring that cached metadata is valid before use helps maintain data integrity and prevents the use of stale or incorrect data, which is beneficial for the application's reliability.

    6

    Copy link
    Member

    @leonardocustodio leonardocustodio left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I've not tested it yet but I wonder if wouldn't be better to dispatch the requests as jobs instead of using a worker.

    And maybe use a "lower priority" queue

    @enjinabner
    Copy link
    Contributor Author

    I've not tested it yet but I wonder if wouldn't be better to dispatch the requests as jobs instead of using a worker.

    And maybe use a "lower priority" queue

    I'm considering using concurrency in laravel instead of queue so I can still show the actual progress in the progress bar.

    @enjinabner
    Copy link
    Contributor Author

    I've not tested it yet but I wonder if wouldn't be better to dispatch the requests as jobs instead of using a worker.
    And maybe use a "lower priority" queue

    I'm considering using concurrency in laravel instead of queue so I can still show the actual progress in the progress bar. Throwing it to the queue is good but I wont know what actually happened in the queue.

    @leonardocustodio
    Copy link
    Member

    My concern is making the requets too fast, concurrency could make it even worse.

    I'm thinking about this scenario:

    • Let's say a lot of tokens/users store their metadata in the same hosting provider. Or even could be their own provider but they have lots of token...

    Making too many requests at once could get us flagged as being a "bot/spammer" and rate-limit us or even get us blocked.

    I don't think there is a "urgency" to complete this "job" as fast as possible. Making something more slow but steady could be a better approach, also not sure that we do need to follow the progress.

    Maybe could we store something like: "metadata_last_updated_at" ?

    Just throwing some ideas here, let me know what you think

    @enjinabner
    Copy link
    Contributor Author

    enjinabner commented Oct 7, 2024

    My concern is making the requets too fast, concurrency could make it even worse.

    I'm thinking about this scenario:

    • Let's say a lot of tokens/users store their metadata in the same hosting provider. Or even could be their own provider but they have lots of token...

    Making too many requests at once could get us flagged as being a "bot/spammer" and rate-limit us or even get us blocked.

    I don't think there is a "urgency" to complete this "job" as fast as possible. Making something more slow but steady could be a better approach, also not sure that we do need to follow the progress.

    Maybe could we store something like: "metadata_last_updated_at" ?

    Just throwing some ideas here, let me know what you think

    That's a good point, however aside from performance I wanna make sure that the user will always get the updated metadata, that's why its important to finish it fast. I've also considered metadata_last_updated_at but I still need to read the url and compare the data, so its kinda pointless.

    Maybe we can consider a webhook were adaptors need to call to make sure that platform will know that the metadata is updated, otherwise we'll sync them in a scheduled manner. For this case, I'm okay synching it in a slow manner.

    @leonardocustodio
    Copy link
    Member

    Wouldn't a mutation refreshMetadata solve the issue? Just like in the indexer?
    If the adopter knows the metadata has changed, he can call it with the collectionId / tokenId and it will refresh instantly

    @enjinabner
    Copy link
    Contributor Author

    Wouldn't a mutation refreshMetadata solve the issue? Just like in the indexer? If the adopter knows the metadata has changed, he can call it with the collectionId / tokenId and it will refresh instantly

    Yea, creating a mutation is fine as well. As long as adaptor has the ability to refresh it themselves. But we need to mention this in the docs

    @enjinabner enjinabner merged commit fff121c into master Oct 16, 2024
    7 checks passed
    @enjinabner enjinabner deleted the feature/pla-2023/metadata-caching branch October 16, 2024 00:21
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Development

    Successfully merging this pull request may close these issues.

    2 participants