Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing Cassandra shutdown example to avoid data corruption #39199

Merged
merged 1 commit into from
Jan 23, 2017
Merged

fixing Cassandra shutdown example to avoid data corruption #39199

merged 1 commit into from
Jan 23, 2017

Conversation

deimosfr
Copy link
Contributor

@deimosfr deimosfr commented Dec 23, 2016

Hi,

I was playing with Cassandra example stored in the Kubernetes project and I encountered issues on shutdown (not anytime). After checking it looks like the shutdown of a node is brutal and data corruption may occur during a flush on disk. To avoid that, I'm suggesting a hook to gracefully shutdown Cassandra before stopping the container.

Here are logs of corruption after a pod delete:

/10.2.76.4:[-8699848499000118463, -8567123670484406873, -8496767951391579058, -8426990834929543369, -7697118318683556771, -6942779781591907873, -6795880495022459877, -6496399078175245235, -5450122121479522544, -5002551029990001224, -4914532712178218138, -4884518674849288097, -3667338763252443465, -3316742521554936832, -2844544359955291760, -1291351295404368159, -794348397160283083, -705240847455001090, -652995206518489298, -284127251294286231, 173240967232234690, 616476682204879844, 826670457841382100, 1815369334084765465, 4431706613761077084, 4743606016174161647, 5637469692783959686, 5802957011124852712, 6759688243703331970, 7679657413128857702, 7713766696628426028, 9098158217036036188]

ERROR 16:23:06 Exception in thread Thread[CompactionExecutor:2,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /cassandra_data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/mc-2-big-Data.db
	at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator.computeNext(BigTableScanner.java:351) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator.computeNext(BigTableScanner.java:265) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.io.sstable.format.big.BigTableScanner.hasNext(BigTableScanner.java:245) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:232) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:184) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:82) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) ~[apache-cassandra-3.9.jar:3.9]
	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) ~[apache-cassandra-3.9.jar:3.9]

It works well for me now and do not have data corruption anymore.

@k8s-ci-robot
Copy link
Contributor

Hi @deimosfr. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-reviewable
Copy link

This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 23, 2016
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Dec 23, 2016
@brendandburns
Copy link
Contributor

@k8s-bot ok to test

LGTM.

Thanks!

@brendandburns brendandburns added release-note Denotes a PR that will be considered when it comes time to generate release notes. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-label-needed labels Jan 2, 2017
@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 22b936649d8acdfe9c9a42a07f81d07daed3445b. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@brendandburns
Copy link
Contributor

@deimosfr ugh, looks like the switch from 2016 to 2017 confused some automation. Can you rebase?

thanks (and sorry!)

--brendan

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 4, 2017
@deimosfr
Copy link
Contributor Author

deimosfr commented Jan 4, 2017

@brendandburns done, please let me know if it's ok for you now

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 4, 2017
@0xmichalis
Copy link
Contributor

@deimosfr this needs another rebase. It seems you have pulled a bunch of preexisting commits somehow.

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 8, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 12, 2017
@0xmichalis
Copy link
Contributor

@jonashuckestein you guys may be interested in this.

@deimosfr
Copy link
Contributor Author

@Kargakis is something missing to merge ?

@0xmichalis
Copy link
Contributor

@chrislovecnm can you also have a look? I would expect termination signaling to be handled by the kubelets and a higher terminationGracePeriod in the StatefulSet would make more sense to me but I am not familiar with c*.

@brendandburns
Copy link
Contributor

/lgtm

Sorry for the delay.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2017
@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit c165e90. Full PR test history.

cc @deimosfr, your PR dashboard

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@0xmichalis
Copy link
Contributor

@k8s-bot kubemark e2e test this

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 39199, 37273, 29183, 39638, 40199)

lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been merged ages ago but isn't it better to call nodetool drain? It should stop accepting new data and flush on disk.

cc @chrislovecnm @jondubois

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, nodetool drain is better to ensure all data have been flushed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #49618 to fix this

liggitt pushed a commit to liggitt/kubernetes that referenced this pull request Aug 12, 2017
…p-drain

Automatic merge from submit-queue (batch tested with PRs 47724, 49984, 49785, 49803, 49618)

Cassandra example, use nodetool drain in preStop

Related to kubernetes#39199 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants