Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8534] Ensure index creation is idempotent in face of failures #12308

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

lokeshj1703
Copy link
Contributor

Change Logs

We need to ensure the user can execute CREATE INDEX name .... with same name, failing many times and then be able to eventually succeed with the right syntax/parameters.

spark-sql (default)> create index idx_bloom on hudi_table using bloom_filter(city) options(func='lower');

24/11/16 09:58:44 ERROR SparkSQLDriver: Failed in [create index idx_bloom on hudi_table using bloom_filter(city) options(func='lower')]
java.lang.IllegalArgumentException: The value of hoodie.functional.index.type should be one of COLUMN_STATS,BLOOM_FILTERS,SECONDARY_INDEX, but was bloom_filter

spark-sql (default)> create index idx_bloom on hudi_table using bloom_filters(city) options(func='lower');
24/11/16 09:59:11 ERROR SparkSQLDriver: Failed in [create index idx_bloom on hudi_table using bloom_filters(city) options(func='lower')]
java.lang.IllegalArgumentException: The value of hoodie.functional.index.type should be one of COLUMN_STATS,BLOOM_FILTERS,SECONDARY_INDEX, but was bloom_filter

The PR ensures that the metadata related to index is deleted in case creation fails. For this we make a call to drop the corresponding index if its creation fails. The drop call ensures that any metadata for the index is deleted.

Impact

NA

Risk level (write none, low medium or high below)

low

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

lokeshj1703 and others added 9 commits November 21, 2024 16:03
 - Take the new tests in TestSecondaryIndexSupport, need to add query predicate verification asserts
 - See changes in both production files, around fixing one-off errors and wrong filepath/filename to build bloom index.
   - This needs to be done in a clean way
 - See changes in TestFunctionalIndex to unignore it and have it passing. take these changes.
 - Ignore my comments on memory management for now.
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants