[PLA-2023] Improve metadata caching #260

enjinabner · 2024-10-04T09:20:29Z

PR Type

enhancement, configuration changes

Description

Added a new command SyncAttributeMetadata to sync attribute metadata to cache, improving performance.
Introduced caching in MetadataService with methods fetchAndCache and getCache for efficient metadata retrieval.
Updated Token model to utilize cached metadata instead of fetching it directly.
Added configuration for attribute metadata syncing, including a data_chunk_size setting.
Registered the new command in the CoreServiceProvider.

Changes walkthrough 📝

Relevant files

Configuration changes

enjin-platform.php `Add configuration for attribute metadata syncing` config/enjin-platform.php Added configuration for attribute metadata syncing. Introduced `sync_metadata` array with `data_chunk_size` setting.	+12/-0

Enhancement

SyncAttributeMetadata.php `Implement SyncAttributeMetadata command for caching` src/Commands/SyncAttributeMetadata.php Created new command `SyncAttributeMetadata`. Command syncs attribute metadata to cache. Utilizes progress bar for tracking sync progress.	+53/-0
CoreServiceProvider.php `Register SyncAttributeMetadata command in service provider` src/CoreServiceProvider.php Registered `SyncAttributeMetadata` command.	+2/-0
Token.php `Use cached metadata in Token model` src/Models/Laravel/Token.php Changed metadata fetching to use cache. Replaced `MetadataService::fetch` with `MetadataService::getCache`.	+3/-3
MetadataService.php `Enhance MetadataService with caching capabilities` src/Services/Database/MetadataService.php Added caching functionality for metadata. Introduced `fetchAndCache` and `getCache` methods. Implemented cache key generation.	+33/-0

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

github-actions · 2024-10-04T09:21:34Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Cache Key Collision The method `cacheKey` generates cache keys that might collide if different attributes have similar `value` suffixes. Consider including a more unique identifier in the cache key. Error Handling The command lacks error handling for potential failures in the caching process, which could lead to incomplete sync operations without proper logging or user notification.

github-actions · 2024-10-04T09:22:10Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible bug	Add exception handling to the attribute metadata syncing process to enhance robustness Consider handling exceptions that may arise from database queries or cache operations to prevent the command from failing unexpectedly. src/Commands/SyncAttributeMetadata.php [40-49] -$query->chunk( - config('enjin-platform.sync_metadata.data_chunk_size'), - function ($attributes) use ($progress, $service): void { - $attributes->each(function (Attribute $attribute) use ($progress, $service): void { - $attribute->value = HexConverter::hexToString($attribute->value); - $service->fetchAndCache($attribute); - $progress->advance(1); - }); - } -); +try { + $query->chunk( + config('enjin-platform.sync_metadata.data_chunk_size'), + function ($attributes) use ($progress, $service): void { + $attributes->each(function (Attribute $attribute) use ($progress, $service): void { + $attribute->value = HexConverter::hexToString($attribute->value); + $service->fetchAndCache($attribute); + $progress->advance(1); + }); + } + ); +} catch (\Exception $e) { + $this->error('Failed to sync attribute metadata: ' . $e->getMessage()); +} Suggestion importance[1-10]: 8 Why: Adding exception handling is a valuable improvement as it prevents the command from failing unexpectedly due to database or cache operation errors, enhancing the robustness and reliability of the code.	8
Possible bug	Ensure the validity of URLs before using them to generate cache keys Validate the URL format of the attribute's value before attempting to generate a cache key to avoid potential errors or unexpected behaviors. src/Services/Database/MetadataService.php [57-59] +if (!filter_var($suffix, FILTER_VALIDATE_URL)) { + throw new \InvalidArgumentException("Invalid URL provided for cache key."); +} return self::$cacheKey . ':' . $suffix; Suggestion importance[1-10]: 7 Why: Validating URLs before generating cache keys is a good practice to prevent potential errors or unexpected behaviors, contributing to the stability and correctness of the caching mechanism.	7
Performance	Optimize data processing in chunks to handle large datasets more efficiently Implement a mechanism to handle large data sets efficiently in the chunk processing to avoid memory overflow or performance bottlenecks. src/Commands/SyncAttributeMetadata.php [40-49] -$query->chunk( +$query->chunkById( config('enjin-platform.sync_metadata.data_chunk_size'), function ($attributes) use ($progress, $service): void { $attributes->each(function (Attribute $attribute) use ($progress, $service): void { $attribute->value = HexConverter::hexToString($attribute->value); $service->fetchAndCache($attribute); $progress->advance(1); }); } ); Suggestion importance[1-10]: 7 Why: Using chunkById instead of chunk can improve performance and memory usage when processing large datasets, making the code more efficient and scalable.	7
Possible issue	Validate cached metadata before use to ensure data integrity Ensure that the attribute fetched from the cache is valid and not expired before using it, to prevent using stale or incorrect data. src/Models/Laravel/Token.php [157] $fetchedMetadata = $this->attributes['metadata'] ?? MetadataService::getCache($tokenUriAttribute); +if (!$fetchedMetadata) { + $fetchedMetadata = MetadataService::fetchAndCache($tokenUriAttribute); +} Suggestion importance[1-10]: 6 Why: Ensuring that cached metadata is valid before use helps maintain data integrity and prevents the use of stale or incorrect data, which is beneficial for the application's reliability.	6

leonardocustodio

I've not tested it yet but I wonder if wouldn't be better to dispatch the requests as jobs instead of using a worker.

And maybe use a "lower priority" queue

enjinabner · 2024-10-06T23:20:36Z

I've not tested it yet but I wonder if wouldn't be better to dispatch the requests as jobs instead of using a worker.

And maybe use a "lower priority" queue

I'm considering using concurrency in laravel instead of queue so I can still show the actual progress in the progress bar.

enjinabner · 2024-10-06T23:22:04Z

I've not tested it yet but I wonder if wouldn't be better to dispatch the requests as jobs instead of using a worker.
And maybe use a "lower priority" queue

I'm considering using concurrency in laravel instead of queue so I can still show the actual progress in the progress bar. Throwing it to the queue is good but I wont know what actually happened in the queue.

leonardocustodio · 2024-10-06T23:50:43Z

My concern is making the requets too fast, concurrency could make it even worse.

I'm thinking about this scenario:

Let's say a lot of tokens/users store their metadata in the same hosting provider. Or even could be their own provider but they have lots of token...

Making too many requests at once could get us flagged as being a "bot/spammer" and rate-limit us or even get us blocked.

I don't think there is a "urgency" to complete this "job" as fast as possible. Making something more slow but steady could be a better approach, also not sure that we do need to follow the progress.

Maybe could we store something like: "metadata_last_updated_at" ?

Just throwing some ideas here, let me know what you think

enjinabner · 2024-10-07T00:09:54Z

My concern is making the requets too fast, concurrency could make it even worse.

I'm thinking about this scenario:

Let's say a lot of tokens/users store their metadata in the same hosting provider. Or even could be their own provider but they have lots of token...

Making too many requests at once could get us flagged as being a "bot/spammer" and rate-limit us or even get us blocked.

I don't think there is a "urgency" to complete this "job" as fast as possible. Making something more slow but steady could be a better approach, also not sure that we do need to follow the progress.

Maybe could we store something like: "metadata_last_updated_at" ?

Just throwing some ideas here, let me know what you think

That's a good point, however aside from performance I wanna make sure that the user will always get the updated metadata, that's why its important to finish it fast. I've also considered metadata_last_updated_at but I still need to read the url and compare the data, so its kinda pointless.

Maybe we can consider a webhook were adaptors need to call to make sure that platform will know that the metadata is updated, otherwise we'll sync them in a scheduled manner. For this case, I'm okay synching it in a slow manner.

leonardocustodio · 2024-10-07T00:41:46Z

Wouldn't a mutation refreshMetadata solve the issue? Just like in the indexer?
If the adopter knows the metadata has changed, he can call it with the collectionId / tokenId and it will refresh instantly

enjinabner · 2024-10-07T00:48:09Z

Wouldn't a mutation refreshMetadata solve the issue? Just like in the indexer? If the adopter knows the metadata has changed, he can call it with the collectionId / tokenId and it will refresh instantly

Yea, creating a mutation is fine as well. As long as adaptor has the ability to refresh it themselves. But we need to mention this in the docs

src/GraphQL/Schemas/Primary/Substrate/Mutations/RefreshMetadataMutation.php

enjinabner added 3 commits October 2, 2024 08:04

wip

713ab90

Fix issues

4b58444

Fix issues

c53a6d5

enjinabner added the enhancement New feature or request label Oct 4, 2024

enjinabner self-assigned this Oct 4, 2024

enjinabner requested review from v16Studios and leonardocustodio as code owners October 4, 2024 09:20

github-actions bot added Configuration changes Review effort [1-5]: 3 labels Oct 4, 2024

leonardocustodio reviewed Oct 5, 2024

View reviewed changes

enjinabner added 5 commits October 7, 2024 11:05

Refactor codes

8d9b6e8

Refactor

dea6a0f

Refactor

86e8dfe

Refactor

61c7cf9

Refactor

22378be

enjinabner requested a review from leonardocustodio October 7, 2024 05:00

leonardocustodio previously approved these changes Oct 7, 2024

View reviewed changes

Add rate limiting and make mutation public

4b9802b

enjinabner dismissed leonardocustodio’s stale review via 4b9802b October 8, 2024 05:41

Refactor

da52f7a

leonardocustodio requested changes Oct 8, 2024

View reviewed changes

src/GraphQL/Schemas/Primary/Substrate/Mutations/RefreshMetadataMutation.php Outdated Show resolved Hide resolved

Merge branch 'master' into feature/pla-2023/metadata-caching

f1f4bdb

enjinabner added 2 commits October 10, 2024 13:37

PR comment

c05c2ca

Add metadata event

cdce9ec

enjinabner requested a review from leonardocustodio October 10, 2024 23:48

Check for null

36ee8ee

leonardocustodio previously approved these changes Oct 14, 2024

View reviewed changes

Fix job params

2dfd0be

enjinabner dismissed leonardocustodio’s stale review via 2dfd0be October 14, 2024 13:21

Merge branch 'master' into feature/pla-2023/metadata-caching

591e841

enjinabner requested a review from leonardocustodio October 14, 2024 13:31

leonardocustodio previously approved these changes Oct 14, 2024

View reviewed changes

Refactor

05f59bc

enjinabner dismissed leonardocustodio’s stale review via 05f59bc October 15, 2024 03:17

enjinabner requested a review from leonardocustodio October 15, 2024 04:04

leonardocustodio approved these changes Oct 15, 2024

View reviewed changes

enjinabner merged commit fff121c into master Oct 16, 2024
7 checks passed

enjinabner deleted the feature/pla-2023/metadata-caching branch October 16, 2024 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PLA-2023] Improve metadata caching #260

[PLA-2023] Improve metadata caching #260

enjinabner commented Oct 4, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Oct 4, 2024

github-actions bot commented Oct 4, 2024

leonardocustodio left a comment

enjinabner commented Oct 6, 2024

enjinabner commented Oct 6, 2024

leonardocustodio commented Oct 6, 2024

enjinabner commented Oct 7, 2024 •

edited

Loading

leonardocustodio commented Oct 7, 2024

enjinabner commented Oct 7, 2024

[PLA-2023] Improve metadata caching #260

[PLA-2023] Improve metadata caching #260

Conversation

enjinabner commented Oct 4, 2024 • edited by github-actions bot Loading

PR Type

Description

Changes walkthrough 📝

github-actions bot commented Oct 4, 2024

PR Reviewer Guide 🔍

github-actions bot commented Oct 4, 2024

PR Code Suggestions ✨

leonardocustodio left a comment

Choose a reason for hiding this comment

enjinabner commented Oct 6, 2024

enjinabner commented Oct 6, 2024

leonardocustodio commented Oct 6, 2024

enjinabner commented Oct 7, 2024 • edited Loading

leonardocustodio commented Oct 7, 2024

enjinabner commented Oct 7, 2024

enjinabner commented Oct 4, 2024 •

edited by github-actions bot

Loading

enjinabner commented Oct 7, 2024 •

edited

Loading