Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix that task gets inserted multiple times into unfinishedTasks #4

Merged

Conversation

friederici
Copy link
Contributor

There is a problem with the pod/task termination, so that a task is inserted multiple times into the unfinishedTasks List.

I believe the cause is, that onPodTermination() would reset tasks that are already FINISHED or FINISHED_WITH_ERRORS back to PROCESSING_FINISHED.

This change fixes the issue in my local test setup.

@Lehmann-Fabian
Copy link
Member

Lehmann-Fabian commented Oct 25, 2023

Thank you!
That case is definitely not considered. How does this error show up? Is there any warning?
Your solution, however, needs to be race-condition approved.
Let's add a level to the State and check that the next level is higher.

public enum State {

    RECEIVED_CONFIG(0), UNSCHEDULED(1), SCHEDULED(2), PREPARED(3), INIT_WITH_ERRORS(3), PROCESSING_FINISHED(4), FINISHED(5), FINISHED_WITH_ERROR(5), ERROR(1000), DELETED(Integer.MAX_VALUE);

    public final int level;

    State(int level) {
        this.level = level;
    }

}

And in the Scheduler adjust only this method:

    Task changeStateOfTask(Pod pod, State state){
        Task t = getTaskByPod( pod );
        if( t != null ){
            synchronized ( t.getState() ){
                if( t.getState().getState().level > state.level ){
                    t.getState().setState( state );
                    return t;
                } else {
                    log.warn( "Task {} was already in state {} and cannot be changed to {}", t.getConfig().getRunName(), t.getState().getState(), state );
                    return null;
                }
            }
        }
        return null;
    }

Can you please confirm if this adjusts to your needs?

@friederici
Copy link
Contributor Author

friederici commented Oct 26, 2023

That case is definitely not considered. How does this error show up? Is there any warning?

It was not noticable in this version of KubernetesScheduler, because there were no important actions undertaken after Tasks finished. However, in my fork, I want to collect Tasks resource usage after Tasks finished. So I put my hook into taskWasFinished() and was confronted that this got called around 4 times per Task, and spoiled my statistics. Therefore I had to hunt down the cause and found it.

Can you please confirm if this adjusts to your needs?

The comparison t.getState().getState().level > state.level must be less than, but otherwise that does the trick.

I've updated the PR to reflect your proposed solution.

@Lehmann-Fabian
Copy link
Member

Great, as you spotted this problem, the Commit should be fully assigned to you.

Please change the log level from warning to debug. Four times per task is way too much.

@friederici
Copy link
Contributor Author

I've updated the PR accordingly

@Lehmann-Fabian Lehmann-Fabian merged commit f968cb5 into CommonWorkflowScheduler:master Oct 26, 2023
@Lehmann-Fabian
Copy link
Member

Merged. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants