-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: report drop and truncate in jobs table #19004
Comments
tentatively adding 1.2 milestone. |
odd... I guess I always assumed those were considered schema changes... off the top of my head, I don't know what we would take off to do this, so I think a "tentative" add sounds good. I'll follow up once I figure out what capacity looks like. Thanks for filing. |
+1 on this. From a logical point of view, all SQL statements belong in either SHOW QUERIES or SHOW JOBS. Since the other schema changes show up in SHOW JOBS, that seems like a reasonable place to put DROP and TRUNCATE. |
Seems okay. The only difference from other schema changes is that the DROP and TRUNCATE are complete as far as the user is concerned (The table is gone/or the table is empty). In other words the schema change is complete. It's just the deleted data being GC-ed in the background after the schema change and the expectation that the disk usages goes down significantly after the operation that is surprising. |
In our production testing, where clusters are bound to run out of disk space eventually, we've found that the currently functionality is insufficient. It turns out that There is likely performance that we need to optimize, but knowing how far the actual deletion has processed is a baseline that doesn't seem too difficult to achieve. See also #19329 (comment) where we discuss using |
This begs to question why we count certain commands as schema changes while other are not. We currently define them based on whether or not the operation requires backfilling, but I've been getting feedback (and I'm personally confused) since it matches the technical implementation, not how users would expect things to be classified. |
@dianasaur323 commands that change the schema are called schema changes. Some of these commands need to backfill data and thus take time to execute. In the case of DROP once SQL stops referencing the data SQL responds back to the user that it has completed the schema change, which it indeed has. The only problem is that users also expect the command to delete all the data associated with the table, which we do asynchronously. We do the same with TRUNCATE, where we don't delete the all the underlying data immediately, just provide the illusion via SQL that it's gone. |
@vivekmenezes On that point, the reason why I'm asking is because in the jobs table, we only show progress of schema changes that require backfilling, hence the confusion. Both the things you mentioned (DROP and TRUNCATE) have confused some users out in the field. I'm wondering if we should do something about that? |
The reason why we did it this way is because users were complaining about DROP/TRUNCATE being too slow. The correct solution is to drop this data quickly which we can't accomplish in the near term. There is also the added concern that users need to query the old data using AS OF SYSTEM TIME which prevents us from just blowing away the ranges. I agree we need to do something |
I think we want to rely on I'd really just like to track the status of the deletion (which is a kind of backfill) in the jobs tab. I think it's OK that |
Motivated by the fact that previously, `DROP TABLE` did not allow any kind of introspection. Now, at least there's something you can look at in `/debug/requests` to see activity. Touches cockroachdb#19004 Release note: None
Motivated by the fact that previously, `DROP TABLE` did not allow any kind of introspection. Now, at least there's something you can look at in `/debug/requests` to see activity. Touches cockroachdb#19004 Release note: None
Motivated by the fact that previously, `DROP TABLE` did not allow any kind of introspection. Now, at least there's something you can look at in `/debug/requests` to see activity. Touches cockroachdb#19004 Release note: None
I'm tentatively marking this as 1.3 right now. I don't think we have capacity to fix this now, and I think it might make sense to wrap it into some jobs work that is going to tentatively get scheduled for 1.3... |
Creates a job for truncate and drop table statements, which is completed when the GC TTL expires and the table data and ID is deleted. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes cockroachdb#19004 Release note: None
Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes cockroachdb#19004 Release note: None
29993: sql: create jobs for truncated and dropped tables r=eriktrinh a=eriktrinh Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes #19004 Co-authored-by: Erik Trinh <[email protected]>
Both operations can take a long long long time. For added visibility, we should include those in the jobs table.
eg: I dropped the only user table in a 1.7TB (total data used) cluster. Hours later, I can't see much change in usage (it's there, just subtle) and there's absolutely no visibility into the progress or even whether it's doing anything (other than a single goroutine on the node executing the drop).
The text was updated successfully, but these errors were encountered: