Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSQL sometimes deletes internal topics when the server shuts down #4654

Closed
rodesai opened this issue Feb 27, 2020 · 5 comments · Fixed by #4658
Closed

KSQL sometimes deletes internal topics when the server shuts down #4654

rodesai opened this issue Feb 27, 2020 · 5 comments · Fixed by #4658
Milestone

Comments

@rodesai
Copy link
Contributor

rodesai commented Feb 27, 2020

When KSQL shuts down, it closes all the queries in the engine. One of the steps of closing a query is to delete the internal topics.

To Reproduce
Start KSQL and run a query, then shut KSQL down (presumably by sending a term signal).

Expected behavior
KSQL exits cleanly but leaves all internal topics in tact

Actual behaviour
KSQL deletes the internal topics (and therefore their data)

@rodesai rodesai added the bug label Feb 27, 2020
@apurvam
Copy link
Contributor

apurvam commented Feb 27, 2020

What do you mean deletes internal topics? For persistent queries, or just for transient queries?

@rodesai
Copy link
Contributor Author

rodesai commented Feb 27, 2020

for persistent queries - so it deletes all the repartition topics and changelogs. This is racy, which might explain why we don't hit this in dev environments. Adding some details now.

@apurvam
Copy link
Contributor

apurvam commented Feb 27, 2020

thanks. Would be good to understand when it was introduced.

@rodesai
Copy link
Contributor Author

rodesai commented Feb 27, 2020

So the underlying issue is that the shutdown handling logic iterates over all the running queries and terminates them (calls QueryMetadata.close) and the engine makes no distinction between terminating in the context of a shutdown vs in response to a user's statement. And so it tries to delete every query's internal topics (see KsqlEngine.close).

It looks like maybe this bug is being hidden by a race condition, which is why when I tried to reproduce by just shutting the server down by sending a SIGTERM, I didn't see the internal topics deleted.

Let's assume I've started KSQL and run 1 query.

When the JVM gets the SIGTERM, 2 threads react to it:

Jetty's ShutdownThread (which register's with the JVM as a shutdown handler), which calls ApplicationServer.doStop which eventually invokes KsqlRestApplication.triggerShutdown

main, which also calls KsqlRestApplication.triggerShutdown

Both calls wind up in KsqlEngine.close, and then eventually in KsqlEngine.unregisterQuery. One thread (Thread A) removes the query from allLiveQueries and moves on, and the other thread (lets call it Thread B) sees allLiveQueries doesn't have that query and moves on. Thread A will then go on and try to delete the query's internal topics. However, if Thread B is able to successfully close the admin client before this happens, the topics won't get deleted.

If you introduce a sleep just before the AdminClient is closed, it's pretty reliable to reproduce:

diff --git a/ksql-engine/src/main/java/io/confluent/ksql/services/DefaultServiceContext.java b/ksql-engine/src/main/java/io/confluent/ksql/services/DefaultServiceContext.java
index d7e495d1c..acdb1a00a 100644
--- a/ksql-engine/src/main/java/io/confluent/ksql/services/DefaultServiceContext.java
+++ b/ksql-engine/src/main/java/io/confluent/ksql/services/DefaultServiceContext.java
@@ -140,6 +140,11 @@ public class DefaultServiceContext implements ServiceContext {

   @Override
   public void close() {
+    try {
+      Thread.sleep(30000);
+    } catch (final Exception e) {
+      System.out.println("WE GOT AN ITERRUPT");
+    }
     if (adminClientSupplier.isInitialized()) {
       adminClientSupplier.get().close();
     }

@apurvam apurvam added this to the 0.8.0 milestone Feb 27, 2020
@rodesai rodesai changed the title KSQL deletes internal topics when the server shuts down KSQL sometimes deletes internal topics when the server shuts down Feb 27, 2020
@apurvam apurvam modified the milestones: 0.8.0, 0.7.1 Feb 27, 2020
@agavra
Copy link
Contributor

agavra commented Feb 27, 2020

Fixed by #4658

@agavra agavra closed this as completed Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants