Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Make CSI launch Fuse pod #15165

Closed
wants to merge 1 commit into from
Closed

Conversation

ssz1997
Copy link
Contributor

@ssz1997 ssz1997 commented Mar 17, 2022

What changes are proposed in this pull request?

Making CSI launch a separate Fuse pod, instead of Fuse process in the same CSI nodeserver container

Why are the changes needed?

If nodeserver container or node-plugin pod for any reason is down, we lose Alluxio Fuse process and it's very cumbersome to bring it back. With a separate Fuse pod, CSI pod won't affect Fuse process.

Solves #14917

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • Supported title prefixes are: [WIP], [SMALLFIX], [DOCFIX]

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@ssz1997 ssz1997 changed the title [DRAFT] Make CSI launch Fuse pod [WIP] Make CSI launch Fuse pod Mar 17, 2022
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

All checks passed!

@HelloHorizon HelloHorizon marked this pull request as draft March 17, 2022 21:55
@HelloHorizon HelloHorizon added the area-k8s Alluxio Kubernetes Integration label Mar 23, 2022
alluxio-bot pushed a commit that referenced this pull request Mar 29, 2022
### What changes are proposed in this pull request?
Unmount corrupted folder is case csi-nodeserver restarted

Related to issue #14917

This change just mitigate the issue. If node-server restated, it will
try to remount the folder. But user job still can not access data during
restarting period.

Just notice these is another solution for this issue
#15165. Believe this solution is
more robustness and better for critical job. But this solution need more
efforts to be mutual for production workload. Maybe at this stage, we
can keep the both.

**Notice: this change requires k8s version above 1.18**
This change not tested yet, we need to kill the nodeserver. After it
restarted, check if the mount path is remounted

pr-link: #15191
change-id: cid-fa4311a52288a0349084c08f6d202cf3e4069da4
@ssz1997
Copy link
Contributor Author

ssz1997 commented Mar 29, 2022

New PR at #15221. Close this one.

@ssz1997 ssz1997 closed this Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-k8s Alluxio Kubernetes Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants