Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599

Merged
merged 50 commits into from
Aug 18, 2023

Conversation

capistrant
Copy link
Contributor

@capistrant capistrant commented Jun 2, 2022

This PR addresses part of the proposal in #12526

Description

Made an incompatible change to druid_segments table

Added a new column, last_used VARCHAR(255) used_flag_last_updated VARCHAR(255) to the druid_segments table. When creating the druid_segments table from scratch, this column is not nullable. However, to more easily facilitate an upgrade to a version of Druid with this code, the migration steps to ALTER druid_segments allow the column to have null values. Perhaps we can document an optional post upgrade step to ALTER the table to not allow nulls that is contingent upon completing the existing optional post-upgrade step to populate last_used for already unused rows. The coordinator will incrementally update these nulls to valid date strings for all already unused segments post-upgrade. This automatically brings the metastore into compliance with the new version of druid.

This column is a UTC date string corresponding to the last time that the used column was modified.

Added a configuration to the druid.coordinator.kill. family, druid.coordinator.kill.bufferPeriod

druid.coordinator.kill.bufferPeriod is a Duration that defines the amount of time that a segment must have been unused for before KillUnusedSegments will potentially kill it. For example, with the default PT24H. If I mark a segment X unused at 2022-06-01T00:05:00.000Z. By rule, this segment cannot be killed until at or after 2022-06-02T00:05:00.000Z.

The most prominent use of this new configuration is to prevent deletion of data from Druid by mistake. It can be thought of like the trash folder in HDFS. Marking segments unused is the rm. Without trash, the data is just gone. With trash, we can recover. The period is the amount of time before pending trash sent away for good.

Added a single threaded Executor on the coordinator for updating null values of new column

Segments who are already unused at the time of the upgrade will have a null value for used_flag_last_updated. In this state, these segments will never be killed. The Coordinator will start a background thread to update all of these null values for segments who are already marked unused (segments with null values who are still used don't need a valid date string for this column. They will get one if/when they are marked unused).

Alternative Design

Alternatively, I could have embedded last_used within the payload. The functionality at the end of the day would be the same, but I have listed some pros and cons below.

Pros

  • No table schema change needed to upgrade Druid, making life easier for an operator.

Cons

  • If you were adding this feature in the initial pre-release development of Druid, you'd probably steer away from accumulating too much in the payload column and instead create native columns in the database table
    • embedding it in this column is almost hiding it. Or at least making it harder for others to discover and understand.
  • You need to pull the payload back in the query for unused segments. Increased db I/O, network, druid memory and compute demands
  • You need to process the payloads in Druid code instead of simply filtering on the column value if it were native to the metastore table

Upgrade Plan

Requiring a database table alteration is a breaking change. It introduces an extra requirement for an operator to upgrade from an older version of Druid. I have done my best to limit the complexity of upgrading and provide as much assistance as possible to the Druid operators out there.

Requiring a database table alteration is a breaking change for any cluster whose coordinator doesn't have DDL permission on startup. If there are no such privileges, the operator of the cluster will need to handle adding the column in order to update to 0.24+. I have done my best to limit the complexity of upgrading and provide as much assistance as possible to the Druid operators out there.

No work necessary case

There is a case where there is no extra work necessary for the Druid operator to upgrade. If the operator has used the default of true for druid.metadata.storage.connector.createTables, and their metastore user has DDL privs, the table will automatically be altered at startup if the last_used used_flag_last_updated column is missing from the druid_segments table.

The coordinator and overlord will startup just fine. And all future changes to the used column (plus new segment creation) will populate last_used used_flag_last_updated. For existing segments who match used = true, the value of last_used used_flag_last_updated will be null. I have coded up the logic for finding segments to kill in a way that ignores the bufferPeriod for these segments with null column values. This means that KillUnusedSegments will use the same logic as pre-upgrade when looking for killable segments in the metadata store. The segment killing mechanism will ignore these rows with null used_flag_last_updated. The Coordinator will eventually populate the column with a real date string, thus starting the bufferPeriod clock.

Some work necessary case

If the operator has either:

  • set druid.metadata.storage.connector.createTables to false
  • not given their metastore user DDL privs

They will need to execute an ALTER statement themselves before the upgrade. To make things easy for them, I have created a new Druid cli tool called UpdateTables that can perform this for them by executing the same alter table code path as the no work necessary case described above. This tool, as well as the actual ALTER command - in case the operator wants to do it themselves - is documented in a new upgrade-prep.md resource.

Optional post upgrade action

If the operator is interested in populating all of their legacy unused segments with a last_used date string following the upgrade, I have included another action in the new UpdateTables cli tool that will do so. Again, I have documented the tool and provided the actual UPDATE statement that they could use manually. This step is completely optional and only needed if the operator desires that their legacy unused segments come under the purview of the new bufferPeriod config post-upgrade.

This is no longer required now that the coordinator will handle populating used_flag_last_updated with already unused segments at the time of the upgrade.


Key changed/added classes in this PR
  • MetadataStorageConnector - added 2 methods to interface
  • IndexerSQLMetadataStorageCoordinator - create new segments with last_used column populated
  • SQLMetadataConnector - code for altering and validating the druid_segments table
  • SQLMetadataSegmentPublisher - publish segments with last_used column populated
  • SegmentsMetadataManager - update signature of interface method
  • SqlSegmentsMetadataManager - manage and use last_used column properly
  • SqlSegmentsMetadataQuery.java - manage last_used properly
  • DruidCoordinatorConfig - new config
  • Main - new cli tool
  • UpdateTables - new cli tool

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@gianm
Copy link
Contributor

gianm commented Jun 14, 2022

#12404 (currently-open PR) also contemplates doing schema changes in the metadata store. @capistrant I see you've commented in there too. Will the feature in this PR be able to use that other PR's mechanism? (Once it's merged?)

@gianm
Copy link
Contributor

gianm commented Jun 14, 2022

Alternatively, should #12404 use the schema migration mechanism from this patch? I haven't reviewed both closely. I'm not sure which mechanism is preferable. Curious what @AmatyaAvadhanula and @imply-cheddar and @kfaraz think too.

@gianm
Copy link
Contributor

gianm commented Jun 14, 2022

As to the main change: what do you think of calling the new column last_updated? IMO, that's clearer, since last_used sounds like it might be the last time the segment was loaded or queried or something like that.

Generally, this change would be very welcome. Measuring time-to-kill from the last update time makes more sense than measuring it from the data interval.

@capistrant
Copy link
Contributor Author

capistrant commented Jun 15, 2022

As to the main change: what do you think of calling the new column last_updated? IMO, that's clearer, since last_used sounds like it might be the last time the segment was loaded or queried or something like that.

Generally, this change would be very welcome. Measuring time-to-kill from the last update time makes more sense than measuring it from the data interval.

I'm certainly open to different naming. last_changed spooks me because there could be a future where we change a row for some other reason than flipping the used flag. Perhaps used_last_updated, [marked_]unused_since (although this one might require us to re-think what the timestamp is when used==true).

@capistrant
Copy link
Contributor Author

#12404 (currently-open PR) also contemplates doing schema changes in the metadata store. @capistrant I see you've commented in there too. Will the feature in this PR be able to use that other PR's mechanism? (Once it's merged?)

My plan is to use as much of this PRs process as is feasible. Our approaches are nearly identical for the actual alteration of the table on startup of the controlling service (overlord for that one, coordinator for mine). The difference really emerges in the migration process. The linked PR is more complicated than mine because it is replacing existing always-on functionality. That requires them to be deliberate about how they transition from querying sql the old way to the new way. They background a thread on startup that incrementally brings the table up to spec by populating rows. I could certainly implement a similar approach instead of allowing the operator to optionally make their existing unused segments compatible with the buffer period by using a druid cli tool or an adhoc sql statement.

Long story short, I don't think there is anything 12404 would/should take from my implementation. However, there is certainly shared code that I can take from them once they are merged. I will contemplate the idea of a background thread that brings the table up to spec, it would really be pretty straight forward. If there are no unused rows with null last_used exit immediately. Otherwise use 12404 incremental batching process to update with current date until there are no null values in the column. That would eliminate overhead/confusion for our operators

@capistrant
Copy link
Contributor Author

Todo reminder for myself: evaluate feasibility of integration test of UpdateTable.java cli tool

@abhishekagarwal87
Copy link
Contributor

@capistrant - are you still working on this PR? Asking since this is a really useful feature. And after this, we can enable the auto-killing of unused segments.

@capistrant
Copy link
Contributor Author

@capistrant - are you still working on this PR? Asking since this is a really useful feature. And after this, we can enable the auto-killing of unused segments.

@abhishekagarwal87 yes I am still very much interested in getting these merged upstream. I have continued to try and keep it up to date with master. I see that I have some recent conflicts that I have to resolve. I will try to fit that in this week

@abhishekagarwal87
Copy link
Contributor

@capistrant - Just checking to see if you are still planning to take this through. If not, are you open to someone else picking it up?

@jon-wei
Copy link
Contributor

jon-wei commented Jul 25, 2023

@capistrant - Checking again to see if you're interested in picking this up again, if not, would you mind if I picked it up?

@jon-wei
Copy link
Contributor

jon-wei commented Aug 14, 2023

@capistrant Thanks for the update, taking a look at this today

Copy link
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maytasm maytasm mentioned this pull request Aug 16, 2023
10 tasks
Copy link
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maytasm
Copy link
Contributor

maytasm commented Aug 17, 2023

@jon-wei @capistrant Is this PR good to merge?

@jon-wei , I tried clicking on https://github.com/apache/druid/pull/12599/files#r940734962 in your previous comment but i'm not seeing any comment when I open the link

@jon-wei jon-wei merged commit 9c124f2 into apache:master Aug 18, 2023
74 checks passed
@kfaraz kfaraz mentioned this pull request Aug 18, 2023
10 tasks
kfaraz added a commit that referenced this pull request Aug 18, 2023
Follow up changes to #12599 

Changes:
- Rename column `used_flag_last_updated` to `used_status_last_updated`
- Remove new CLI tool `UpdateTables`.
  - We already have a `CreateTables` with similar functionality, which should be able to
  handle update cases too.
  - Any user running the cluster for the first time should either just have `connector.createTables`
  enabled or run `CreateTables` which should create tables at the latest version.
  - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has
  been added to Druid, and users would have to run `CreateTables` anyway.
- Remove `upgrade-prep.md` and include that info in `metadata-init.md`.
- Fix log messages to adhere to Druid style
- Use lambdas
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
@maytasm
Copy link
Contributor

maytasm commented Jan 23, 2024

Should used_flag_last_updated column be part of the index in the segment table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants