Controller concurrency appears to cause reconciliations to be skipped #1322

einarmo · 2023-10-24T09:16:14Z

Current and expected behavior

I have a controller that uses reconcile_on for a kind of relationship where some resource (a pod) is referenced by my CRD. When the external resource is modified, I reconcile a subset of all instances of my resource.

In my actual code I use reconcile_on, but the same happens when using reconcile_all_on. This demonstrates the problem:

#[tokio::main]
async fn main() -> Result<()> {
    println!("Starting controller");
    let client = Client::try_default().await?;

    let things = Api::<MyCRD>::namespaced(client.clone(), "dev");
    let pods = Api::<Pod>::namespaced(client.clone(), "dev");

    let (_, writer) = reflector::store();
    let pods = kube::runtime::reflector(
        writer,
        watcher(pods, watcher::Config::default(),
    );

    let (mut tx, rx) = futures::channel::mpsc::channel(100);

    let mut stream = Box::pin(pods.touched_objects());
    tokio::task::spawn(async move {
        while let Some(evt) = stream.next().await {
            println!("Trigger reconciliation from change to pod");
            let _ = tx.try_send(());
        }
    });

    let config = Config::default().concurrency(2);

    Controller::new(things, watcher::Config::default())
        .with_config(config)
        .reconcile_all_on(rx)
        .shutdown_on_signal()
        .run(
            |j, _| async move {
                println!("Reconcile {:?}", j.metadata.name);
                Ok::<_, ControllerError>(Action::requeue(Duration::from_secs(100)))
            },
            |_, _, _| Action::requeue(Duration::from_secs(10)),
            Arc::new(()),
        )
        .for_each(|res| async move {
            match res {
                Ok(o) => info!("Reconciled {o:?}"),
                Err(e) => warn!("Failed to reconcile: {e}"),
            }
        })
        .await;

    Ok(())
}

I have 3 instances of MyCRD with names "inst-1", "inst-2", "inst-3". I expect roughly the following log output when I start the controller, then modify a pod:

Starting controller
Reconcile Some("inst-1")
Reconcile Some("inst-2")
Reconcile Some("inst-3")
Trigger reconciliation from change to pod
Reconcile Some("inst-1")
Reconcile Some("inst-2")
Reconcile Some("inst-3")

Instead, what I get is

Starting controller
Reconcile Some("inst-1")
Reconcile Some("inst-2")
Reconcile Some("inst-3")
Trigger reconciliation from change to pod
Reconcile Some("inst-2")
Reconcile Some("inst-3")
...

Consistently, one of the instances are skipped. If you set concurrency to 1 instead, then two are skipped. Removing concurrency makes it so that none are skipped.

Is this intentional? If so, what am I doing wrong? I can work around this by just implementing the concurrency with a semaphore in my reconcile method, but this feels like a bug to me.

Possible solution

No response

Additional context

No response

Environment

Server Version: v1.27.4
Running minikube

Configuration and features

k8s-openapi = { version = "0.20.0", features = ["latest"] }
kube = { version = "0.86.0", features = ["runtime", "derive", "unstable-runtime"] }

Affected crates

kube-runtime

Would you like to work on fixing this bug?

None

The text was updated successfully, but these errors were encountered:

clux · 2023-10-24T11:50:09Z

Thanks for the report.

If true, this is not intentional. At most what you should expect to see is deduplication of the objects that are queued to run (i.e. if you queued up multiple inst-1 reconciliations while inst-1 was already running, then you'd only get the last).

The concurrency was added relatively recently in https://github.com/kube-rs/kube/pull/1277/files
I don't have time to look at this at the moment, so putting a help wanted on it.

Might be good to get a failing test case for this setup as a starting point.

einarmo · 2023-10-24T12:56:24Z

Alright, thanks. If I find the time I'll see if I can figure out why this is happening, since I'm able to reproduce.

co42 · 2023-10-25T07:41:30Z

@einarmo you can try #1324 it should fix this issue.

einarmo · 2023-10-25T07:46:50Z

Commented on the PR. I arrived at the same solution when experimenting with this just now, and it does fix the sample I posted above.

einarmo added the bug Something isn't working label Oct 24, 2023

clux added help wanted Not immediately prioritised, please help! runtime controller runtime related labels Oct 24, 2023

clux modified the milestone: 0.87.0 Oct 24, 2023

clux added this to Kube Roadmap Oct 24, 2023

github-project-automation bot moved this to Defining in Kube Roadmap Oct 24, 2023

clux moved this from Defining to Backlog in Kube Roadmap Oct 24, 2023

co42 mentioned this issue Oct 25, 2023

Fix Controller: pending messages are stuck in the scheduled map #1324

Merged

clux closed this as completed in #1324 Oct 27, 2023

github-project-automation bot moved this from Backlog to Done in Kube Roadmap Oct 27, 2023

louismrose mentioned this issue Nov 20, 2023

Watch not triggering reconciliation #1316

Closed

zach-robinson-dev mentioned this issue Sep 19, 2024

sinker occasionally stops reconciling some ResourceSyncs influxdata/sinker#159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller concurrency appears to cause reconciliations to be skipped #1322

Controller concurrency appears to cause reconciliations to be skipped #1322

einarmo commented Oct 24, 2023 •

edited

Loading

clux commented Oct 24, 2023

einarmo commented Oct 24, 2023

co42 commented Oct 25, 2023

einarmo commented Oct 25, 2023

Controller concurrency appears to cause reconciliations to be skipped #1322

Controller concurrency appears to cause reconciliations to be skipped #1322

Comments

einarmo commented Oct 24, 2023 • edited Loading

Current and expected behavior

Possible solution

Additional context

Environment

Configuration and features

Affected crates

Would you like to work on fixing this bug?

clux commented Oct 24, 2023

einarmo commented Oct 24, 2023

co42 commented Oct 25, 2023

einarmo commented Oct 25, 2023

einarmo commented Oct 24, 2023 •

edited

Loading