Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally specify limit for number of entities in a record. #373

Merged
merged 4 commits into from
Dec 3, 2024

Conversation

blackwinter
Copy link
Member

@blackwinter blackwinter commented Oct 10, 2024

Just a proof-of-concept to gauge interest in such a feature; the immediate need has been rendered moot (due to the record's holdings being suppressed from publishing now).

This is a brute-force approach to dealing with OOM situations when Alma records have an excessive number of items (e.g. 99374518570506441: >12000 entities = ~10 GB heap for the Record instance). Loosely modeled after JAXP's totalEntitySizeLimit, except that we don't abort processing but instead skip the remaining entities.

Use Metafix instance setter setMaxEntityCount(int) or set system property org.metafacture.metafix.maxEntityCount=<int>.

Alternative options:

  • Increase maximum heap size for JVM.
  • Significantly reduce memory requirement for Record instances.

Possible optimizations (implemented in dc4f1af):

  1. Drop the setter and make the limit static final; this way the JVM can (potentially) eliminate the check altogether when no limit has been set (less than zero).
  2. Skip all remaining literals (instead of just the ones inside entities), thereby simplifying the check in literal().

@dr0i dr0i self-requested a review October 14, 2024 09:54
This is a brute-force approach to dealing with OOM situations when Alma records have an excessive number of items (e.g. 99374518570506441: >12000 entities = ~10 GB heap for the Record instance).

Use Metafix instance setter `setMaxEntityCount(int)` or set system property `org.metafacture.metafix.maxEntityCount=<int>`.

Alternative options:

- Increase maximum heap size for JVM.
- Significantly reduce memory requirement for Record instances.
Copy link
Member

@dr0i dr0i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. It prevents possible flooding, as we already experienced. So I like this feature.

@blackwinter blackwinter marked this pull request as ready for review November 14, 2024 15:56
@blackwinter blackwinter requested a review from fsteeg November 14, 2024 15:56
@blackwinter blackwinter assigned fsteeg and unassigned blackwinter Nov 14, 2024
Copy link
Member

@fsteeg fsteeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but should probably include a test.

@fsteeg fsteeg assigned blackwinter and unassigned fsteeg Nov 15, 2024
@blackwinter
Copy link
Member Author

Regarding test: As currently implemented, the feature cannot be exercised under test conditions because the limit is already set (system property already evaluated) before the test gets the chance to influence it. Either change the implementation to make it testable (keeping performance implications in mind) or skip test coverage in this case (assuming that this feature is only used in extremely rare edge cases).

@blackwinter
Copy link
Member Author

Adding a unit test proved to be quite messy when attempting to keep the instance member variable final. Added an integration test instead (a10b3d8). This has the added benefit that it actually exercises the retrieval of the system property.

@blackwinter blackwinter merged commit 8b65b6e into master Dec 3, 2024
1 check passed
@blackwinter blackwinter deleted the maxEntityCount branch December 3, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants