Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak and channel closure issues when reusing/dropping Model #865

Open
solaoi opened this issue Oct 19, 2024 · 1 comment
Open

Memory leak and channel closure issues when reusing/dropping Model #865

solaoi opened this issue Oct 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@solaoi
Copy link

solaoi commented Oct 19, 2024

Describe the bug

When initializing and dropping the Model repeatedly:

  1. Memory usage continuously increases as GGUF models aren't properly cleaned up
  2. Channel is erroneously closed after the first iteration

Steps to Reproduce

  1. Create a service that initializes and drops the model multiple times
  2. Run the following code:
use anyhow::Result;
use mistralrs::{GgufModelBuilder, PagedAttentionMetaBuilder, TextMessageRole, TextMessages};
use std::time::Duration;
use tokio::time::sleep;

struct ChatService {
    model: Option<mistralrs::Model>,
}

impl ChatService {
    async fn new() -> Result<Self> {
        Ok(Self { model: None })
    }

    async fn initialize_model(&mut self) -> Result<()> {
        self.model = Some(
            GgufModelBuilder::new(
                "gguf_models/mistral_v0.1/",
                vec!["mistral-7b-instruct-v0.1.Q4_K_M.gguf"],
            )
            .with_chat_template("chat_templates/mistral.json")
            .with_paged_attn(|| PagedAttentionMetaBuilder::default().build())?
            .build()
            .await?,
        );
        Ok(())
    }

    async fn chat(&self, prompt: &str) -> Result<String> {
        let messages = TextMessages::new().add_message(TextMessageRole::User, prompt);

        let response = self
            .model
            .as_ref()
            .unwrap()
            .send_chat_request(messages)
            .await?;

        Ok(response.choices[0]
            .message
            .content
            .clone()
            .unwrap_or_default())
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    for i in 0..3 {
        println!("Iteration {}", i);

        let mut service = ChatService::new().await?;
        service.initialize_model().await?;

        let response = service.chat("Write a short greeting").await?;
        println!("Response: {}", response);

        // Model is dropped here, but GGUF remains in memory
        drop(service);

        // Wait to make memory usage observable
        sleep(Duration::from_secs(5)).await;
    }

    Ok(())
}

Cargo.toml is here:

[package]
name = "memory_bug_mistral"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
tokio = { version = "1", features = ["full"] }
anyhow = "1.0"
mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git", branch = "master", features = [
    "metal",
] }
regex="1.10.6"

Observed Behavior

Memory usage increases with each iteration even after explicit drop
After first iteration, receiving error:

Error: Channel was erroneously closed!

Expected Behavior

  1. Memory should be properly freed when model is dropped
  2. Channel should remain functional for subsequent iterations

Latest commit or version

@solaoi
Copy link
Author

solaoi commented Oct 31, 2024

@EricLBuehler
I'm wondering if you have any plans to address this memory management issue in the library?
While I could work around it using a web server or child processes for now, I'd like to understand your timeline for implementing a native solution. This would help me decide whether to proceed with a temporary workaround or wait for an official fix. Could you share your thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant