Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller concurrency appears to cause reconciliations to be skipped #1322

Closed
einarmo opened this issue Oct 24, 2023 · 4 comments · Fixed by #1324
Closed

Controller concurrency appears to cause reconciliations to be skipped #1322

einarmo opened this issue Oct 24, 2023 · 4 comments · Fixed by #1324
Labels
bug Something isn't working help wanted Not immediately prioritised, please help! runtime controller runtime related

Comments

@einarmo
Copy link

einarmo commented Oct 24, 2023

Current and expected behavior

I have a controller that uses reconcile_on for a kind of relationship where some resource (a pod) is referenced by my CRD. When the external resource is modified, I reconcile a subset of all instances of my resource.

In my actual code I use reconcile_on, but the same happens when using reconcile_all_on. This demonstrates the problem:

#[tokio::main]
async fn main() -> Result<()> {
    println!("Starting controller");
    let client = Client::try_default().await?;

    let things = Api::<MyCRD>::namespaced(client.clone(), "dev");
    let pods = Api::<Pod>::namespaced(client.clone(), "dev");

    let (_, writer) = reflector::store();
    let pods = kube::runtime::reflector(
        writer,
        watcher(pods, watcher::Config::default(),
    );

    let (mut tx, rx) = futures::channel::mpsc::channel(100);

    let mut stream = Box::pin(pods.touched_objects());
    tokio::task::spawn(async move {
        while let Some(evt) = stream.next().await {
            println!("Trigger reconciliation from change to pod");
            let _ = tx.try_send(());
        }
    });

    let config = Config::default().concurrency(2);

    Controller::new(things, watcher::Config::default())
        .with_config(config)
        .reconcile_all_on(rx)
        .shutdown_on_signal()
        .run(
            |j, _| async move {
                println!("Reconcile {:?}", j.metadata.name);
                Ok::<_, ControllerError>(Action::requeue(Duration::from_secs(100)))
            },
            |_, _, _| Action::requeue(Duration::from_secs(10)),
            Arc::new(()),
        )
        .for_each(|res| async move {
            match res {
                Ok(o) => info!("Reconciled {o:?}"),
                Err(e) => warn!("Failed to reconcile: {e}"),
            }
        })
        .await;

    Ok(())
}

I have 3 instances of MyCRD with names "inst-1", "inst-2", "inst-3". I expect roughly the following log output when I start the controller, then modify a pod:

Starting controller
Reconcile Some("inst-1")
Reconcile Some("inst-2")
Reconcile Some("inst-3")
Trigger reconciliation from change to pod
Reconcile Some("inst-1")
Reconcile Some("inst-2")
Reconcile Some("inst-3")

Instead, what I get is

Starting controller
Reconcile Some("inst-1")
Reconcile Some("inst-2")
Reconcile Some("inst-3")
Trigger reconciliation from change to pod
Reconcile Some("inst-2")
Reconcile Some("inst-3")
...

Consistently, one of the instances are skipped. If you set concurrency to 1 instead, then two are skipped. Removing concurrency makes it so that none are skipped.

Is this intentional? If so, what am I doing wrong? I can work around this by just implementing the concurrency with a semaphore in my reconcile method, but this feels like a bug to me.

Possible solution

No response

Additional context

No response

Environment

Server Version: v1.27.4
Running minikube

Configuration and features

k8s-openapi = { version = "0.20.0", features = ["latest"] }
kube = { version = "0.86.0", features = ["runtime", "derive", "unstable-runtime"] }

Affected crates

kube-runtime

Would you like to work on fixing this bug?

None

@einarmo einarmo added the bug Something isn't working label Oct 24, 2023
@clux
Copy link
Member

clux commented Oct 24, 2023

Thanks for the report.

If true, this is not intentional. At most what you should expect to see is deduplication of the objects that are queued to run (i.e. if you queued up multiple inst-1 reconciliations while inst-1 was already running, then you'd only get the last).

The concurrency was added relatively recently in https://github.com/kube-rs/kube/pull/1277/files
I don't have time to look at this at the moment, so putting a help wanted on it.

Might be good to get a failing test case for this setup as a starting point.

@clux clux added help wanted Not immediately prioritised, please help! runtime controller runtime related labels Oct 24, 2023
@clux clux modified the milestone: 0.87.0 Oct 24, 2023
@github-project-automation github-project-automation bot moved this to Defining in Kube Roadmap Oct 24, 2023
@clux clux moved this from Defining to Backlog in Kube Roadmap Oct 24, 2023
@einarmo
Copy link
Author

einarmo commented Oct 24, 2023

Alright, thanks. If I find the time I'll see if I can figure out why this is happening, since I'm able to reproduce.

@co42
Copy link
Contributor

co42 commented Oct 25, 2023

@einarmo you can try #1324 it should fix this issue.

@einarmo
Copy link
Author

einarmo commented Oct 25, 2023

Commented on the PR. I arrived at the same solution when experimenting with this just now, and it does fix the sample I posted above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Not immediately prioritised, please help! runtime controller runtime related
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants