Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ticdc_owner_status is not removed when removing a failed changefeed #10760

Closed
kennytm opened this issue Mar 12, 2024 · 8 comments
Closed
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. found/gs severity/moderate type/bug The issue is confirmed as a bug.

Comments

@kennytm
Copy link
Contributor

kennytm commented Mar 12, 2024

What did you do?

# upstream: 127.0.0.1:4000, downstream: 127.0.0.1:4001

cdc cli changefeed create --sink-uri 'mysql://[email protected]:4001/' -c testcdc3
mysql -u root -h 127.0.0.1 -P 4000 test -e 'create table a(id bigint primary key, val varchar(200)); insert into a values (1, "one"), (2, "two"), (3, "three");'

# force an error
mysql -u root -h 127.0.0.1 -P 4001 test -e 'drop table a;';
mysql -u root -h 127.0.0.1 -P 4000 test -e 'insert into a values (4, "four");'

# speed up "warning" → "failed", optional.
cdc cli unsafe delete-service-gc-safepoint

# wait until ticdc_owner_status changed from 6 to 2
curl -s http://127.0.0.1:8300/metrics | grep ticdc_owner_status

# remove the changefeed after it has failed
cdc cli changefeed remove -c testcdc3

# check the metrics again.
curl -s http://127.0.0.1:8300/metrics | grep ticdc_owner_status

What did you expect to see?

The series ticdc_owner_status{changefeed="testcdc3",namespace="default"} no longer exists

What did you see instead?

It remains at value 2 (failed)

This means a Prometheus Alert will keep firing for a changefeed that is already gone.

Versions of the cluster

v6.5.5

@kennytm kennytm added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. affects-6.5 This bug affects the 6.5.x(LTS) versions. labels Mar 12, 2024
@kennytm
Copy link
Contributor Author

kennytm commented Mar 12, 2024

While #10513 has not been merged to release-6.5 I don't think that PR has any effect on this issue. I haven't tested release-7.5 though.

EDIT: Not reproducible on v7.5.1.

@fubinzh
Copy link

fubinzh commented Mar 13, 2024

/severity moderate

@kennytm
Copy link
Contributor Author

kennytm commented Mar 13, 2024

if there is a way to immediately fail a changefeed we could quickly check if the recently unstuck #10513 has fixed this 😉

@asddongmen
Copy link
Contributor

asddongmen commented Mar 13, 2024

if there is a way to immediately fail a changefeed we could quickly check if the recently unstuck #10513 has fixed this 😉

Yes, there is a way to do it.

  1. Set cdc server config gc-ttl to 1.
  2. Start cdc server with this config.
  3. Create a changefeed and pause it, wait about 30 minutes to make sure GC is advanced in upstream. (Because the default gc-life-time and gc-interval is 10 minutes in uptream TiDB).

The changefeed should already failed.

@kennytm
Copy link
Contributor Author

kennytm commented Mar 13, 2024

  1. Create a changefeed and pause it, wait about 30 minutes to make sure GC is advanced in upstream.

i mean this is the step i'd like to skip 😅

@asddongmen
Copy link
Contributor

  1. Create a changefeed and pause it, wait about 30 minutes to make sure GC is advanced in upstream.

i mean this is the step i'd like to skip 😅

Maybe we can add an error injecting API to do it?

@kennytm
Copy link
Contributor Author

kennytm commented Mar 14, 2024

  1. Create a changefeed and pause it, wait about 30 minutes to make sure GC is advanced in upstream.

i mean this is the step i'd like to skip 😅

Maybe we can add an error injecting API to do it?

Yeah. But not really high priority if you need to introduce another PR to get this.

@ti-chi-bot ti-chi-bot added the affects-8.1 This bug affects the 8.1.x(LTS) versions. label Apr 9, 2024
@asddongmen asddongmen removed the affects-8.1 This bug affects the 8.1.x(LTS) versions. label May 21, 2024
@asddongmen asddongmen self-assigned this May 21, 2024
@asddongmen
Copy link
Contributor

asddongmen commented May 28, 2024

Duplicate with #10449, fixed by #10513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. found/gs severity/moderate type/bug The issue is confirmed as a bug.
Projects
Development

No branches or pull requests

4 participants