-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599
Conversation
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/metadata/SqlSegmentsMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
#12404 (currently-open PR) also contemplates doing schema changes in the metadata store. @capistrant I see you've commented in there too. Will the feature in this PR be able to use that other PR's mechanism? (Once it's merged?) |
Alternatively, should #12404 use the schema migration mechanism from this patch? I haven't reviewed both closely. I'm not sure which mechanism is preferable. Curious what @AmatyaAvadhanula and @imply-cheddar and @kfaraz think too. |
As to the main change: what do you think of calling the new column Generally, this change would be very welcome. Measuring time-to-kill from the last update time makes more sense than measuring it from the data interval. |
I'm certainly open to different naming. |
My plan is to use as much of this PRs process as is feasible. Our approaches are nearly identical for the actual alteration of the table on startup of the controlling service (overlord for that one, coordinator for mine). The difference really emerges in the migration process. The linked PR is more complicated than mine because it is replacing existing always-on functionality. That requires them to be deliberate about how they transition from querying sql the old way to the new way. They background a thread on startup that incrementally brings the table up to spec by populating rows. I could certainly implement a similar approach instead of allowing the operator to optionally make their existing unused segments compatible with the buffer period by using a druid cli tool or an adhoc sql statement. Long story short, I don't think there is anything 12404 would/should take from my implementation. However, there is certainly shared code that I can take from them once they are merged. I will contemplate the idea of a background thread that brings the table up to spec, it would really be pretty straight forward. If there are no unused rows with null |
Todo reminder for myself: evaluate feasibility of integration test of UpdateTable.java cli tool |
@capistrant - are you still working on this PR? Asking since this is a really useful feature. And after this, we can enable the auto-killing of unused segments. |
@abhishekagarwal87 yes I am still very much interested in getting these merged upstream. I have continued to try and keep it up to date with master. I see that I have some recent conflicts that I have to resolve. I will try to fit that in this week |
@capistrant - Just checking to see if you are still planning to take this through. If not, are you open to someone else picking it up? |
@capistrant - Checking again to see if you're interested in picking this up again, if not, would you mind if I picked it up? |
@capistrant Thanks for the update, taking a look at this today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@capistrant LGTM, can you address https://github.com/apache/druid/pull/12599/files#r940734962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jon-wei @capistrant Is this PR good to merge? @jon-wei , I tried clicking on https://github.com/apache/druid/pull/12599/files#r940734962 in your previous comment but i'm not seeing any comment when I open the link |
Follow up changes to #12599 Changes: - Rename column `used_flag_last_updated` to `used_status_last_updated` - Remove new CLI tool `UpdateTables`. - We already have a `CreateTables` with similar functionality, which should be able to handle update cases too. - Any user running the cluster for the first time should either just have `connector.createTables` enabled or run `CreateTables` which should create tables at the latest version. - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has been added to Druid, and users would have to run `CreateTables` anyway. - Remove `upgrade-prep.md` and include that info in `metadata-init.md`. - Fix log messages to adhere to Druid style - Use lambdas
Should used_flag_last_updated column be part of the index in the segment table? |
This PR addresses part of the proposal in #12526
Description
Made an incompatible change to druid_segments table
Added a new column,
last_used VARCHAR(255)
used_flag_last_updated VARCHAR(255)
to the druid_segments table. When creating the druid_segments table from scratch, this column is not nullable. However, to more easily facilitate an upgrade to a version of Druid with this code, the migration steps to ALTER druid_segments allow the column to have null values.Perhaps we can document an optional post upgrade step to ALTER the table to not allow nulls that is contingent upon completing the existing optional post-upgrade step to populate last_used for already unused rows.The coordinator will incrementally update these nulls to valid date strings for all already unused segments post-upgrade. This automatically brings the metastore into compliance with the new version of druid.This column is a UTC date string corresponding to the last time that the
used
column was modified.Added a configuration to the
druid.coordinator.kill.
family,druid.coordinator.kill.bufferPeriod
druid.coordinator.kill.bufferPeriod
is a Duration that defines the amount of time that a segment must have been unused for before KillUnusedSegments will potentially kill it. For example, with the defaultPT24H
. If I mark a segment X unused at2022-06-01T00:05:00.000Z
. By rule, this segment cannot be killed until at or after2022-06-02T00:05:00.000Z
.The most prominent use of this new configuration is to prevent deletion of data from Druid by mistake. It can be thought of like the trash folder in HDFS. Marking segments unused is the
rm
. Without trash, the data is just gone. With trash, we can recover. The period is the amount of time before pending trash sent away for good.Added a single threaded Executor on the coordinator for updating null values of new column
Segments who are already unused at the time of the upgrade will have a null value for
used_flag_last_updated
. In this state, these segments will never be killed. The Coordinator will start a background thread to update all of these null values for segments who are already marked unused (segments with null values who are still used don't need a valid date string for this column. They will get one if/when they are marked unused).Alternative Design
Alternatively, I could have embedded last_used within the payload. The functionality at the end of the day would be the same, but I have listed some pros and cons below.
Pros
Cons
Upgrade Plan
Requiring a database table alteration is a breaking change. It introduces an extra requirement for an operator to upgrade from an older version of Druid. I have done my best to limit the complexity of upgrading and provide as much assistance as possible to the Druid operators out there.Requiring a database table alteration is a breaking change for any cluster whose coordinator doesn't have DDL permission on startup. If there are no such privileges, the operator of the cluster will need to handle adding the column in order to update to 0.24+. I have done my best to limit the complexity of upgrading and provide as much assistance as possible to the Druid operators out there.
No work necessary case
There is a case where there is no extra work necessary for the Druid operator to upgrade. If the operator has used the default of
true
fordruid.metadata.storage.connector.createTables
, and their metastore user has DDL privs, the table will automatically be altered at startup if thelast_used
used_flag_last_updated
column is missing from the druid_segments table.The coordinator and overlord will startup just fine. And all future changes to the used column (plus new segment creation) will populate
last_used
used_flag_last_updated
. For existing segments who matchused = true
, the value oflast_used
used_flag_last_updated
will benull
.I have coded up the logic for finding segments to kill in a way that ignores the bufferPeriod for these segments withThe segment killing mechanism will ignore these rows with nullnull
column values. This means thatKillUnusedSegments
will use the same logic as pre-upgrade when looking for killable segments in the metadata store.used_flag_last_updated
. The Coordinator will eventually populate the column with a real date string, thus starting the bufferPeriod clock.Some work necessary case
If the operator has either:
druid.metadata.storage.connector.createTables
to falseThey will need to execute an ALTER statement themselves before the upgrade. To make things easy for them, I have created a new Druid cli tool called UpdateTables that can perform this for them by executing the same alter table code path as the no work necessary case described above. This tool, as well as the actual ALTER command - in case the operator wants to do it themselves - is documented in a new upgrade-prep.md resource.
Optional post upgrade action
If the operator is interested in populating all of their legacy unused segments with a last_used date string following the upgrade, I have included another action in the new UpdateTables cli tool that will do so. Again, I have documented the tool and provided the actual UPDATE statement that they could use manually. This step is completely optional and only needed if the operator desires that their legacy unused segments come under the purview of the new bufferPeriod config post-upgrade.This is no longer required now that the coordinator will handle populating
used_flag_last_updated
with already unused segments at the time of the upgrade.Key changed/added classes in this PR
This PR has: