Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stable beluga as a petals service process not stopping #542

Closed

Conversation

biswaroop1547
Copy link
Contributor

use SIGKILL as failsafe when SIGTERM doesn't exit process.

@biswaroop1547
Copy link
Contributor Author

@swarnimarun @Janaka-Steph review please 🙏🏻

@Janaka-Steph
Copy link
Contributor

Janaka-Steph commented Nov 10, 2023

p2pd process is still not killed.

Service running

python3.1 37862 stephane   43u  IPv4 0x3c8cec623f100c6d      0t0  TCP *:8734 (LISTEN)
p2pd      37867 stephane   10u  IPv4 0x3c8cec623edac4d5      0t0  TCP 127.0.0.1:65336 (LISTEN)

Ask to stop

src/controller_binaries.rs:211 2023-11-10T12:57:45 [INFO] - stopping service service_id = stable-beluga-2
src/controller_binaries.rs:220 2023-11-10T12:57:45 [INFO] - service pid = 37840
src/controller_binaries.rs:227 2023-11-10T12:57:45 [INFO] - terminating service: process_name(bash) process_id(37840)
src/controller_binaries.rs:254 2023-11-10T12:57:45 [INFO] - service stopped!

Service stopped

p2pd      37867 stephane   10u  IPv4 0x3c8cec623edac4d5      0t0  TCP 127.0.0.1:65336 (LISTEN)

@Janaka-Steph
Copy link
Contributor

Killing by name works, even though I get error message pkill: signalling pid 30017: Operation not permitted

    let process_name = "p2pd";
    if let Err(err) = Command::new("pkill").arg(process_name).spawn() {
        eprintln!("Failed to kill process {}: {}", process_name, err);
    }

The setup-petals.sh script could assign a more specific name to this process to distinguish it from Swarm processes.

@biswaroop1547
Copy link
Contributor Author

@Janaka-Steph this PR doesn't take care of #511 currently due to this issue

@biswaroop1547
Copy link
Contributor Author

biswaroop1547 commented Nov 10, 2023

The setup-petals.sh script could assign a more specific name to this process to distinguish it from Swarm processes.

But this process gets started internally and setup-petals.sh doesn't have access to this process right after creation, are you suggesting maybe we can rename this process to be something right after setup-petals.sh script starts the petals server so that it could be found easily?

@Janaka-Steph
Copy link
Contributor

But which issue are we trying to solve here then?
On main, process python3.1 is killed, p2pd is not but it's an other issue. I don't have any other processes running.

@biswaroop1547
Copy link
Contributor Author

biswaroop1547 commented Nov 10, 2023

it does seem to kill the process but this spinner never resolves for me, which is happening due to a race condition if I understand correctly, hence I added drop statements, so using that it works and it doesn't hang on this spinner state
image

Also here if SIGTERM doesn't work incase then SIGKILL will be sent for ending the process.

@biswaroop1547
Copy link
Contributor Author

closing this PR, since after using the latest commit from main branch, and having a clean local state the stopping issue isn't happening anymore. Thanks for the help 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants