Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][master server ] when get exception,dead loop can not get out #4226

Closed
crazycarry opened this issue Dec 15, 2020 · 5 comments
Closed

[Bug][master server ] when get exception,dead loop can not get out #4226

crazycarry opened this issue Dec 15, 2020 · 5 comments
Labels
discussion discussion

Comments

@crazycarry
Copy link
Contributor

when i upgrade ds to 1.3.3, i find a bug like before ,in the class MaterSchedulerService


   public void run() {
        logger.info("master scheduler started");
        while (Stopper.isRunning()){
            InterProcessMutex mutex = null;
            try {
                boolean runCheckFlag = OSUtils.checkResource(masterConfig.getMasterMaxCpuloadAvg(), masterConfig.getMasterReservedMemory());
                if(!runCheckFlag) {
                    Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                    continue;
                }
                if (zkMasterClient.getZkClient().getState() == CuratorFrameworkState.STARTED) {

                    mutex = zkMasterClient.blockAcquireMutex();

                    int activeCount = masterExecService.getActiveCount();
                    // make sure to scan and delete command  table in one transaction
                    Command command = processService.findOneCommand();
                    if (command != null) {
                        logger.info("find one command: id: {}, type: {}", command.getId(),command.getCommandType());

                        try{

                            ProcessInstance processInstance = processService.handleCommand(logger,
                                    getLocalAddress(),
                                    this.masterConfig.getMasterExecThreads() - activeCount, command);
                            if (processInstance != null) {
                                logger.info("start master exec thread , split DAG ...");
                                masterExecService.execute(new MasterExecThread(processInstance, processService, nettyRemotingClient));
                            }
                        }catch (Exception e){
                            logger.error("scan command error ", e);
                            processService.moveToErrorCommand(command, e.toString());
                        }
                    } else{
                        //indicate that no command ,sleep for 1s
                        Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                    }
                }
            } catch (Exception e){
                logger.error("master scheduler thread error",e);
            } finally{
                zkMasterClient.releaseMutex(mutex);
            }
        }
    }

when the db get a error or some other exception,the loop do not hava any function to down it

@crazycarry crazycarry added the bug Something isn't working label Dec 15, 2020
@zt-1997
Copy link
Contributor

zt-1997 commented Dec 25, 2020

I will fix it.

@CalvinKirs
Copy link
Member

CalvinKirs commented Dec 25, 2020

I don't think it should be closed here. There is no difference between exit and close master server .

@CalvinKirs CalvinKirs added discussion discussion and removed bug Something isn't working labels Dec 25, 2020
@CalvinKirs
Copy link
Member

Deplly thanks for your enthusiastic participation, about this question, I don't think it is a BUG, in fact, when such problems, we should consider is that mysql is health problem, or is the version of adaptation problem, etc., you can see, we here have a sleep time, is actually based on this design is also considered the Circuit Breaker. If you quit abruptly, it's the same as shutting down the service.

@CalvinKirs
Copy link
Member

I will fix it.

Deplly thanks for your enthusiastic participation, if you are interested, you can from #4124 looking for tasks you like inside, if you need any help, can also issue the message the first time.

@caishunfeng
Copy link
Contributor

This issue is not a bug. It will be closed because no update for a long time. You can reopen it if need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion
Projects
None yet
Development

No branches or pull requests

4 participants