sql: report drop and truncate in jobs table #19004

mberhault · 2017-10-03T20:51:54Z

Both operations can take a long long long time. For added visibility, we should include those in the jobs table.
eg: I dropped the only user table in a 1.7TB (total data used) cluster. Hours later, I can't see much change in usage (it's there, just subtle) and there's absolutely no visibility into the progress or even whether it's doing anything (other than a single goroutine on the node executing the drop).

mberhault · 2017-10-03T20:52:11Z

tentatively adding 1.2 milestone.

dianasaur323 · 2017-10-04T23:35:34Z

odd... I guess I always assumed those were considered schema changes... off the top of my head, I don't know what we would take off to do this, so I think a "tentative" add sounds good. I'll follow up once I figure out what capacity looks like. Thanks for filing.

cuongdo · 2017-10-05T00:37:28Z

+1 on this. From a logical point of view, all SQL statements belong in either SHOW QUERIES or SHOW JOBS. Since the other schema changes show up in SHOW JOBS, that seems like a reasonable place to put DROP and TRUNCATE.

vivekmenezes · 2017-10-05T01:29:47Z

Seems okay. The only difference from other schema changes is that the DROP and TRUNCATE are complete as far as the user is concerned (The table is gone/or the table is empty). In other words the schema change is complete. It's just the deleted data being GC-ed in the background after the schema change and the expectation that the disk usages goes down significantly after the operation that is surprising.

tbg · 2017-11-15T16:26:13Z

In our production testing, where clusters are bound to run out of disk space eventually, we've found that the currently functionality is insufficient. It turns out that DROP TABLE is essentially never fast enough to thwart impending doom, even with aggressive GC TTLs. This could be due to the actual background deletion or slowness of the GC queue in purging the deletions, but either way it's not acceptable that there is no indication of the progress even though the background task is still ongoing.

There is likely performance that we need to optimize, but knowing how far the actual deletion has processed is a baseline that doesn't seem too difficult to achieve.

See also #19329 (comment) where we discuss using DROP for that purpose.

dianasaur323 · 2017-11-15T18:37:18Z

This begs to question why we count certain commands as schema changes while other are not. We currently define them based on whether or not the operation requires backfilling, but I've been getting feedback (and I'm personally confused) since it matches the technical implementation, not how users would expect things to be classified.

vivekmenezes · 2017-11-16T21:13:58Z

@dianasaur323 commands that change the schema are called schema changes. Some of these commands need to backfill data and thus take time to execute. In the case of DROP once SQL stops referencing the data SQL responds back to the user that it has completed the schema change, which it indeed has. The only problem is that users also expect the command to delete all the data associated with the table, which we do asynchronously. We do the same with TRUNCATE, where we don't delete the all the underlying data immediately, just provide the illusion via SQL that it's gone.

dianasaur323 · 2017-11-16T21:53:33Z

@vivekmenezes On that point, the reason why I'm asking is because in the jobs table, we only show progress of schema changes that require backfilling, hence the confusion. Both the things you mentioned (DROP and TRUNCATE) have confused some users out in the field. I'm wondering if we should do something about that?

vivekmenezes · 2017-11-16T22:04:21Z

The reason why we did it this way is because users were complaining about DROP/TRUNCATE being too slow. The correct solution is to drop this data quickly which we can't accomplish in the near term. There is also the added concern that users need to query the old data using AS OF SYSTEM TIME which prevents us from just blowing away the ranges.

I agree we need to do something

tbg · 2017-11-16T22:09:17Z

I think we want to rely on AS OF SYSTEM TIME as a recovery mechanism for fat-fingering a drop table. I don't think that's principally in the way of removing the data more quickly if desired, though (a bit of care has been taken that the quick-delete can't be fat-fingered easily).

I'd really just like to track the status of the deletion (which is a kind of backfill) in the jobs tab. I think it's OK that DROP returns and the job is still running. After all, the DB is gone.

Motivated by the fact that previously, `DROP TABLE` did not allow any kind of introspection. Now, at least there's something you can look at in `/debug/requests` to see activity. Touches cockroachdb#19004 Release note: None

dianasaur323 · 2017-12-10T23:14:12Z

I'm tentatively marking this as 1.3 right now. I don't think we have capacity to fix this now, and I think it might make sense to wrap it into some jobs work that is going to tentatively get scheduled for 1.3...

Creates a job for truncate and drop table statements, which is completed when the GC TTL expires and the table data and ID is deleted. Fixes cockroachdb#19004 Release note: None

Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Fixes cockroachdb#19004 Release note: None

Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes cockroachdb#19004 Release note: None

29993: sql: create jobs for truncated and dropped tables r=eriktrinh a=eriktrinh Creates a job for statements involving dropping or truncated tables, including DROP DATABASE. The job is completed when the GC TTL expires and both table data and ID is deleted for each of the tables involved. Detailed running statuses are added to provide visibility to the progress of the dropping or truncating of tables. This is surfaced by adding an additional status field to the payload proto of jobs, and concatenated to the running status when populating the interal jobs table. For dropping or truncating jobs, the detailed running status is determined by the status of the table at the earliest stage of the schema change. Fixes #19004 Co-authored-by: Erik Trinh <[email protected]>

mberhault added this to the 1.2 milestone Oct 3, 2017

tbg mentioned this issue Dec 5, 2017

sql: trace schema changes #20491

Merged

tbg mentioned this issue Dec 8, 2017

cherrypick-1.1: sql: trace schema changes #20576

Merged

dianasaur323 modified the milestones: 1.2, 1.3 Dec 10, 2017

knz added the A-telemetry label May 9, 2018

knz added A-bulkio-schema-changes C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels May 9, 2018

vivekmenezes assigned vivekmenezes and unassigned vivekmenezes Jun 4, 2018

eriktrinh mentioned this issue Sep 10, 2018

sql: create jobs for truncated and dropped tables #29993

Merged

This was referenced Sep 18, 2018

sql: investigate zone config behaviour during dropping of a {table,database} #30375

Closed

sql: Inconsistent zone config behavior when dropping database or table #24179

Closed

craig bot closed this as completed in #29993 Sep 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: report drop and truncate in jobs table #19004

sql: report drop and truncate in jobs table #19004

mberhault commented Oct 3, 2017

mberhault commented Oct 3, 2017

dianasaur323 commented Oct 4, 2017

cuongdo commented Oct 5, 2017

vivekmenezes commented Oct 5, 2017

tbg commented Nov 15, 2017

dianasaur323 commented Nov 15, 2017

vivekmenezes commented Nov 16, 2017

dianasaur323 commented Nov 16, 2017

vivekmenezes commented Nov 16, 2017

tbg commented Nov 16, 2017

dianasaur323 commented Dec 10, 2017

sql: report drop and truncate in jobs table #19004

sql: report drop and truncate in jobs table #19004

Comments

mberhault commented Oct 3, 2017

mberhault commented Oct 3, 2017

dianasaur323 commented Oct 4, 2017

cuongdo commented Oct 5, 2017

vivekmenezes commented Oct 5, 2017

tbg commented Nov 15, 2017

dianasaur323 commented Nov 15, 2017

vivekmenezes commented Nov 16, 2017

dianasaur323 commented Nov 16, 2017

vivekmenezes commented Nov 16, 2017

tbg commented Nov 16, 2017

dianasaur323 commented Dec 10, 2017