Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.7.0] CircleCI API 404 response kills the job #79

Closed
davet1985 opened this issue Apr 27, 2022 · 16 comments
Closed

[1.7.0] CircleCI API 404 response kills the job #79

davet1985 opened this issue Apr 27, 2022 · 16 comments
Labels
bug Something isn't working question Further information is requested

Comments

@davet1985
Copy link

Orb version

1.7.0

What happened

The job exited unexpectedly upon receiving a 404 from the CircleCI API when making a call to get a workflow.

Full job logs

#!/bin/bash -eo pipefail
tag_pattern=""

# If a pattern is wrapped with slashes, remove them.
if [[ "$tag_pattern" == /*/ ]]; then
  tag_pattern=${tag_pattern:1:-1}
fi

fetch(){
  echo "DEBUG: Making API Call to ${1}"
  url=$1
  target=$2
  http_response=$(curl -f -s -X GET -H "Circle-Token:${CIRCLECI_API_KEY}" -o "${target}" -w "%{http_code}" "${url}")
  if [ $http_response != "200" ]; then
      echo "ERROR: Server returned error code: $http_response"
      cat ${target}
      exit 1
  else
      echo "DEBUG: API Success"
  fi
}

load_variables(){
  # just confirm our required variables are present
  : ${CIRCLE_BUILD_NUM:?"Required Env Variable not found!"}
  : ${CIRCLE_PROJECT_USERNAME:?"Required Env Variable not found!"}
  : ${CIRCLE_PROJECT_REPONAME:?"Required Env Variable not found!"}
  : ${CIRCLE_REPOSITORY_URL:?"Required Env Variable not found!"}
  : ${CIRCLE_JOB:?"Required Env Variable not found!"}
  # Only needed for private projects
  if [ -z "${CIRCLECI_API_KEY}" ]; then
    echo "CIRCLECI_API_KEY not set. Private projects will be inaccessible."
  else
    fetch "https://circleci.com/api/v2/me" "/tmp/me.cci"
    me=$(jq -e '.id' /tmp/me.cci)
    echo "Using API key for user: ${me}"
  fi
  VCS_TYPE="github"
}


fetch_filtered_active_builds(){
  if [ "false" != "true" ];then
    echo "Orb parameter 'consider-branch' is false, will block previous builds on any branch." 
    jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
  elif [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
    # I'm not sure why this is here, seems identical to above?
    echo "CIRCLE_TAG and orb parameter tag-pattern is set, fetch active builds"
    jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
  else
    : ${CIRCLE_BRANCH:?"Required Env Variable not found!"}
    echo "Only blocking execution if running previous jobs on branch: ${CIRCLE_BRANCH}"
    jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/tree/${CIRCLE_BRANCH}?filter=running"
  fi

  if [ ! -z $TESTING_MOCK_RESPONSE ] && [ -f $TESTING_MOCK_RESPONSE ];then
    echo "Using test mock response"
    cat $TESTING_MOCK_RESPONSE > /tmp/jobstatus.json
  else
    echo "Attempting to access CircleCI api. If the build process fails after this step, ensure your CIRCLECI_API_KEY  is set."
    fetch "$jobs_api_url_template" "/tmp/jobstatus.json"
    if [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
      jq "[ .[] | select((.build_num | . == \"${CIRCLE_BUILD_NUM}\") or (.vcs_tag | (. != null and test(\"${tag_pattern}\"))) ) ]" /tmp/jobstatus.json >/tmp/jobstatus_tag.json
      mv /tmp/jobstatus_tag.json /tmp/jobstatus.json
    fi
    echo "API access successful"
  fi
}

fetch_active_workflows(){
  cp /tmp/jobstatus.json /tmp/augmented_jobstatus.json
  for workflow in `jq -r ".[] | .workflows.workflow_id //empty" /tmp/augmented_jobstatus.json | uniq`
  do
    echo "Checking time of workflow: ${workflow}"
    workflow_file=/tmp/workflow-${workflow}.json
    if [ ! -z $TESTING_MOCK_WORKFLOW_RESPONSES ] && [ -f $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json ]; then
      echo "Using test mock workflow response"
      cat $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json > ${workflow_file}
    else
      fetch "https://circleci.com/api/v2/workflow/${workflow}" "${workflow_file}"
    fi
    created_at=`jq -r '.created_at' ${workflow_file}`
    echo "Workflow was created at: ${created_at}"
    cat /tmp/augmented_jobstatus.json | jq --arg created_at "${created_at}" --arg workflow "${workflow}" '(.[] | select(.workflows.workflow_id == $workflow) | .workflows) |= . + {created_at:$created_at}' > /tmp/augmented_jobstatus-${workflow}.json
    #DEBUG echo "new augmented_jobstatus:"
    #DEBUG cat /tmp/augmented_jobstatus-${workflow}.json
    mv /tmp/augmented_jobstatus-${workflow}.json /tmp/augmented_jobstatus.json
  done
}

update_comparables(){     
  fetch_filtered_active_builds

  fetch_active_workflows

  load_current_workflow_values
  
  JOB_NAME="${CIRCLE_JOB}"
  if [ "^validate-controllers$" ] ;then
    JOB_NAME="^validate-controllers$"
  fi

  # falsey parameters are empty strings, so always compare against 'true' 
  if [ "false" = "true" ] ;then
    echo "Orb parameter block-workflow is true."
    echo "This job will block until no previous workflows have *any* jobs running."
    oldest_running_build_num=`jq 'sort_by(.workflows.created_at)| .[0].build_num' /tmp/augmented_jobstatus.json`
    oldest_commit_time=`jq 'sort_by(.workflows.created_at)| .[0].workflows.created_at' /tmp/augmented_jobstatus.json`
  else
    echo "Orb parameter block-workflow is false."
    echo "Only blocking execution if running previous jobs matching this job: ${JOB_NAME}"
    oldest_running_build_num=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)|  .[0].build_num" /tmp/augmented_jobstatus.json`
    oldest_commit_time=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)|  .[0].workflows.created_at" /tmp/augmented_jobstatus.json`
  fi
  if [ -z "$oldest_commit_time" ]; then
    echo "API Error - unable to load previous job timings. Report to developer."
    exit 1
  fi
  echo "Oldest job: $oldest_running_build_num"
  if [ -z $oldest_commit_time ];then
    echo "API Call for existing jobs failed, failing this build.  Please check API token"
    echo "All running jobs:"
    cat /tmp/jobstatus.json || exit 0
    echo "All running jobs with created_at:"
    cat /tmp/augmented_jobstatus.json || exit 0
    echo "All worfklow details."
    cat /tmp/workflow-*.json
    exit 1
  fi
}

load_current_workflow_values(){
   my_commit_time=`jq '.[] | select( .build_num == '"${CIRCLE_BUILD_NUM}"').workflows.created_at' /tmp/augmented_jobstatus.json`
}

cancel_current_build(){
  echo "Cancelleing build ${CIRCLE_BUILD_NUM}"
  cancel_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/${CIRCLE_BUILD_NUM}/cancel?circle-token=${CIRCLECI_API_KEY}"
  curl -s -X POST $cancel_api_url_template > /dev/null
}



#
# We can skip a few use cases without calling API
#
if [ ! -z "$CIRCLE_PR_REPONAME" ]; then
  echo "Queueing on forks is not supported. Skipping queue..."
  # It's important that we not fail here because it could cause issues on the main repo's branch
  exit 0
fi
if [ "*" = "*" ] || [ "*" = "${CIRCLE_BRANCH}" ]; then
  echo "${CIRCLE_BRANCH} queueable"
else
  echo "Queueing only happens on * branch, skipping queue"
  exit 0
fi

#
# Set values that wont change while we wait
# 
load_variables
max_time=20
echo "This build will block until all previous builds complete."
echo "Max Queue Time: ${max_time} minutes."
wait_time=0
loop_time=11
max_time_seconds=$((max_time * 60))

#
# Queue Loop
#
confidence=0
while true; do
  update_comparables
  echo "This Workflow Timestamp: $my_commit_time"
  echo "Oldest Workflow Timestamp: $oldest_commit_time"
  if [[ ! -z "$my_commit_time" ]] && [[ "$oldest_commit_time" > "$my_commit_time" || "$oldest_commit_time" = "$my_commit_time" ]] ; then
    # API returns Y-M-D HH:MM (with 24 hour clock) so alphabetical string compare is accurate to timestamp compare as well
    # recent-jobs API does not include pending, so it is posisble we queried in between a workfow transition, and we;re NOT really front of line.
    if [ $confidence -lt 1 ];then
      # To grow confidence, we check again with a delay.
      confidence=$((confidence+1))
      echo "API shows no previous jobs/workflows, but it is possible a previous workflow has pending jobs not yet visible in API."
      echo "Rerunning check ${confidence}/1"
    else
      echo "Front of the line, WooHoo!, Build continuing"
      break
    fi
  else
    # If we fail, reset confidence
    confidence=0
    echo "This build (${CIRCLE_BUILD_NUM}) is queued, waiting for build number (${oldest_running_build_num}) to complete."
    echo "Total Queue time: ${wait_time} seconds."
  fi

  if [ $wait_time -ge $max_time_seconds ]; then
    echo "Max wait time exceeded, considering response."
    if [ "false" == "true" ];then
      echo "Orb parameter dont-quit is set to true, letting this job proceed!"
      exit 0
    else
      cancel_current_build
      sleep 10 # wait for API to cancel this job, rather than showing as failure
      exit 1 # but just in case, fail job
    fi
  fi

  sleep $loop_time
  wait_time=$(( loop_time + wait_time ))
done

wf-963-cluster-autoscaler-tag-fix queueable
DEBUG: Making API Call to https://circleci.com/api/v2/me
DEBUG: API Success
Using API key for user: "2eddcc82-ce3a-478e-bf5c-a2f9fe456784"
This build will block until all previous builds complete.
Max Queue Time: 20 minutes.
Orb parameter 'consider-branch' is false, will block previous builds on any branch.
Attempting to access CircleCI api. If the build process fails after this step, ensure your CIRCLECI_API_KEY  is set.
DEBUG: Making API Call to https://circleci.com/api/v1.1/project/github/appvia/wayfinder?filter=running
DEBUG: API Success
API access successful
Checking time of workflow: 6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: API Success
Workflow was created at: 2022-04-27T11:39:52Z
Checking time of workflow: 5e49bff4-b61c-48f4-8fd2-8ec7da55769d
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/5e49bff4-b61c-48f4-8fd2-8ec7da55769d
DEBUG: API Success
Workflow was created at: 2022-04-27T11:40:24Z
Checking time of workflow: 6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: API Success
Workflow was created at: 2022-04-27T11:39:52Z
Checking time of workflow: bf767bb5-f023-45c9-9432-29e8ac9088c6
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/bf767bb5-f023-45c9-9432-29e8ac9088c6
DEBUG: API Success
Workflow was created at: 2022-04-27T11:33:35Z
Checking time of workflow: 63242a50-754b-4d33-9c80-c7cc979aa6d3
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/63242a50-754b-4d33-9c80-c7cc979aa6d3
DEBUG: API Success
Workflow was created at: 2022-04-27T11:10:46Z
Checking time of workflow: 013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: API Success
Workflow was created at: 2022-04-21T19:30:12Z
Checking time of workflow: f7cc0a81-2c84-46a2-8a0c-3d3fbd6cd87b
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/f7cc0a81-2c84-46a2-8a0c-3d3fbd6cd87b

Exited with code exit status 22
CircleCI received exit code 22

Expected behavior

If the workflow is not found, it should be ignored and the process should continue onto the next workflow to check.

@lorenzocadamuro
Copy link

lorenzocadamuro commented Apr 27, 2022

I have the same problem 👆

@mrsheepuk
Copy link

Same problem here...

@davet1985
Copy link
Author

If you're looking for a workaround, you can pull the script directly into your CircleCI config.yml, with the updates made in #80

For example:

  job-requiring-queue:
    # add all the parameters and set the defaults as required
    parameters:
      consider-branch:
        type: boolean
        default: false
        description: "Should we only consider jobs running on the same branch?"
      block-workflow:
        type: boolean
        # this is false at COMMAND level as intention is to only block CURRENT job.
        default: false
        description: "If true, this job will block until no other workflows with an earlier timestamp are running. Typically used as first job."
      time:
        type: string
        default: "20"
        description: "How many minutes to wait before giving up."
      dont-quit:
        type: boolean
        default: false
        description: "Quitting is for losers. Force job through once time expires instead of failing."
      only-on-branch:
        type: string
        default: "*"
        description: "Only queue on specified branch"
      vcs-type:
        type: string
        default: "github"
        description: "Override VCS to 'bitbucket' if needed."
      confidence:
        type: string
        default: "1"
        description: "Due to scarce API, we need to requery the recent jobs list to ensure we're not just in a pending state for previous jobs.  This number indicates the threhold for API returning no previous pending jobs. Default is a single confirmation."
      circleci-api-key:
        type: env_var_name
        default: CIRCLECI_API_KEY
        description: "In case you use a different Environment Variable Name than CIRCLECI_API_KEY, supply it here."
      tag-pattern:
        type: string
        default: ""
        description: "Set to queue jobs using a regex pattern f.ex '^v[0-9]+\\.[0-9]+\\.[0-9]+$' to filter CIRCLECI_TAG"
      job-regex:
        type: string
        default: ""
        description: "Allow multiple job names to be blocked until front of line f.ex '^runTests*'"
    steps:
      - checkout
      # run block including the modified queue script
      - run:
          name: Queue Until Front of Line
          command: |
            tag_pattern="<<parameters.tag-pattern>>"

            # If a pattern is wrapped with slashes, remove them.
            if [[ "$tag_pattern" == /*/ ]]; then
              tag_pattern=${tag_pattern:1:-1}
            fi

            fetch(){
              echo "DEBUG: Making API Call to ${1}"
              url=$1
              target=$2
              http_response=$(curl -s -X GET -H "Circle-Token:${<< parameters.circleci-api-key >>}" -o "${target}" -w "%{http_code}" "${url}")
              if [ $http_response == "404" ]; then
                echo "DEBUG: API Not found"
              else
                if [ $http_response != "200" ]; then
                  echo "ERROR: Server returned error code: $http_response"
                  cat ${target}
                  exit 1
                else
                  echo "DEBUG: API Success"
                fi
              fi
            }

            load_variables(){
              # just confirm our required variables are present
              : ${CIRCLE_BUILD_NUM:?"Required Env Variable not found!"}
              : ${CIRCLE_PROJECT_USERNAME:?"Required Env Variable not found!"}
              : ${CIRCLE_PROJECT_REPONAME:?"Required Env Variable not found!"}
              : ${CIRCLE_REPOSITORY_URL:?"Required Env Variable not found!"}
              : ${CIRCLE_JOB:?"Required Env Variable not found!"}
              # Only needed for private projects
              if [ -z "${<< parameters.circleci-api-key >>}" ]; then
                echo "<< parameters.circleci-api-key >> not set. Private projects will be inaccessible."
              else
                fetch "https://circleci.com/api/v2/me" "/tmp/me.cci"
                me=$(jq -e '.id' /tmp/me.cci)
                echo "Using API key for user: ${me}"
              fi
              VCS_TYPE="<<parameters.vcs-type>>"
            }


            fetch_filtered_active_builds(){
              if [ "<<parameters.consider-branch>>" != "true" ];then
                echo "Orb parameter 'consider-branch' is false, will block previous builds on any branch." 
                jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
              elif [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
                # I'm not sure why this is here, seems identical to above?
                echo "CIRCLE_TAG and orb parameter tag-pattern is set, fetch active builds"
                jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
              else
                : ${CIRCLE_BRANCH:?"Required Env Variable not found!"}
                echo "Only blocking execution if running previous jobs on branch: ${CIRCLE_BRANCH}"
                jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/tree/${CIRCLE_BRANCH}?filter=running"
              fi

              if [ ! -z $TESTING_MOCK_RESPONSE ] && [ -f $TESTING_MOCK_RESPONSE ];then
                echo "Using test mock response"
                cat $TESTING_MOCK_RESPONSE > /tmp/jobstatus.json
              else
                echo "Attempting to access CircleCI api. If the build process fails after this step, ensure your << parameters.circleci-api-key >>  is set."
                fetch "$jobs_api_url_template" "/tmp/jobstatus.json"
                if [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
                  jq "[ .[] | select((.build_num | . == \"${CIRCLE_BUILD_NUM}\") or (.vcs_tag | (. != null and test(\"${tag_pattern}\"))) ) ]" /tmp/jobstatus.json >/tmp/jobstatus_tag.json
                  mv /tmp/jobstatus_tag.json /tmp/jobstatus.json
                fi
                echo "API access successful"
              fi
            }

            fetch_active_workflows(){
              cp /tmp/jobstatus.json /tmp/augmented_jobstatus.json
              for workflow in `jq -r ".[] | .workflows.workflow_id //empty" /tmp/augmented_jobstatus.json | uniq`
              do
                echo "Checking time of workflow: ${workflow}"
                workflow_file=/tmp/workflow-${workflow}.json
                if [ ! -z $TESTING_MOCK_WORKFLOW_RESPONSES ] && [ -f $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json ]; then
                  echo "Using test mock workflow response"
                  cat $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json > ${workflow_file}
                else
                  fetch "https://circleci.com/api/v2/workflow/${workflow}" "${workflow_file}"
                fi
                created_at=`jq -r '.created_at' ${workflow_file}`
                if [ $created_at != "null" ]; then
                  echo "Workflow was created at: ${created_at}"
                  cat /tmp/augmented_jobstatus.json | jq --arg created_at "${created_at}" --arg workflow "${workflow}" '(.[] | select(.workflows.workflow_id == $workflow) | .workflows) |= . + {created_at:$created_at}' > /tmp/augmented_jobstatus-${workflow}.json
                  #DEBUG echo "new augmented_jobstatus:"
                  #DEBUG cat /tmp/augmented_jobstatus-${workflow}.json
                  mv /tmp/augmented_jobstatus-${workflow}.json /tmp/augmented_jobstatus.json
                else
                  echo "Workflow not found: ${workflow}"
                fi
              done
            }

            update_comparables(){     
              fetch_filtered_active_builds

              fetch_active_workflows

              load_current_workflow_values
              
              JOB_NAME="${CIRCLE_JOB}"
              if [ "<<parameters.job-regex>>" ] ;then
                JOB_NAME="<<parameters.job-regex>>"
              fi

              # falsey parameters are empty strings, so always compare against 'true' 
              if [ "<<parameters.block-workflow>>" = "true" ] ;then
                echo "Orb parameter block-workflow is true."
                echo "This job will block until no previous workflows have *any* jobs running."
                oldest_running_build_num=`jq 'sort_by(.workflows.created_at)| .[0].build_num' /tmp/augmented_jobstatus.json`
                oldest_commit_time=`jq 'sort_by(.workflows.created_at)| .[0].workflows.created_at' /tmp/augmented_jobstatus.json`
              else
                echo "Orb parameter block-workflow is false."
                echo "Only blocking execution if running previous jobs matching this job: ${JOB_NAME}"
                oldest_running_build_num=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)|  .[0].build_num" /tmp/augmented_jobstatus.json`
                oldest_commit_time=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)|  .[0].workflows.created_at" /tmp/augmented_jobstatus.json`
              fi
              if [ -z "$oldest_commit_time" ]; then
                echo "API Error - unable to load previous job timings. Report to developer."
                exit 1
              fi
              echo "Oldest job: $oldest_running_build_num"
              if [ -z $oldest_commit_time ];then
                echo "API Call for existing jobs failed, failing this build.  Please check API token"
                echo "All running jobs:"
                cat /tmp/jobstatus.json || exit 0
                echo "All running jobs with created_at:"
                cat /tmp/augmented_jobstatus.json || exit 0
                echo "All worfklow details."
                cat /tmp/workflow-*.json
                exit 1
              fi
            }

            load_current_workflow_values(){
              my_commit_time=`jq '.[] | select( .build_num == '"${CIRCLE_BUILD_NUM}"').workflows.created_at' /tmp/augmented_jobstatus.json`
            }

            cancel_current_build(){
              echo "Cancelleing build ${CIRCLE_BUILD_NUM}"
              cancel_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/${CIRCLE_BUILD_NUM}/cancel?circle-token=${<< parameters.circleci-api-key >>}"
              curl -s -X POST $cancel_api_url_template > /dev/null
            }



            #
            # We can skip a few use cases without calling API
            #
            if [ ! -z "$CIRCLE_PR_REPONAME" ]; then
              echo "Queueing on forks is not supported. Skipping queue..."
              # It's important that we not fail here because it could cause issues on the main repo's branch
              exit 0
            fi
            if [ "<<parameters.only-on-branch>>" = "*" ] || [ "<<parameters.only-on-branch>>" = "${CIRCLE_BRANCH}" ]; then
              echo "${CIRCLE_BRANCH} queueable"
            else
              echo "Queueing only happens on <<parameters.only-on-branch>> branch, skipping queue"
              exit 0
            fi

            #
            # Set values that wont change while we wait
            # 
            load_variables
            max_time=<<parameters.time>>
            echo "This build will block until all previous builds complete."
            echo "Max Queue Time: ${max_time} minutes."
            wait_time=0
            loop_time=11
            max_time_seconds=$((max_time * 60))

            #
            # Queue Loop
            #
            confidence=0
            while true; do
              update_comparables
              echo "This Workflow Timestamp: $my_commit_time"
              echo "Oldest Workflow Timestamp: $oldest_commit_time"
              if [[ ! -z "$my_commit_time" ]] && [[ "$oldest_commit_time" > "$my_commit_time" || "$oldest_commit_time" = "$my_commit_time" ]] ; then
                # API returns Y-M-D HH:MM (with 24 hour clock) so alphabetical string compare is accurate to timestamp compare as well
                # recent-jobs API does not include pending, so it is posisble we queried in between a workfow transition, and we;re NOT really front of line.
                if [ $confidence -lt <<parameters.confidence>> ];then
                  # To grow confidence, we check again with a delay.
                  confidence=$((confidence+1))
                  echo "API shows no previous jobs/workflows, but it is possible a previous workflow has pending jobs not yet visible in API."
                  echo "Rerunning check ${confidence}/<<parameters.confidence>>"
                else
                  echo "Front of the line, WooHoo!, Build continuing"
                  break
                fi
              else
                # If we fail, reset confidence
                confidence=0
                echo "This build (${CIRCLE_BUILD_NUM}) is queued, waiting for build number (${oldest_running_build_num}) to complete."
                echo "Total Queue time: ${wait_time} seconds."
              fi

              if [ $wait_time -ge $max_time_seconds ]; then
                echo "Max wait time exceeded, considering response."
                if [ "<<parameters.dont-quit>>" == "true" ];then
                  echo "Orb parameter dont-quit is set to true, letting this job proceed!"
                  exit 0
                else
                  cancel_current_build
                  sleep 10 # wait for API to cancel this job, rather than showing as failure
                  exit 1 # but just in case, fail job
                fi
              fi

              sleep $loop_time
              wait_time=$(( loop_time + wait_time ))
            done
      - run:
          name: The job to do next

@eddiewebb
Copy link
Owner

@davet1985 thanks for report and PR.

I'm not clear though why a workflow declared by a running job would not exist when queried.

And if it doesn't, it indicates faulty data to make a decision on.

Do you have any insight on the underlying cause?

@lorenzocadamuro
Copy link

lorenzocadamuro commented Apr 28, 2022

Hey @eddiewebb

I'm not clear though why a workflow declared by a running job would not exist when queried.

This is what I've been trying to figure out since two days ago.

From my side, that problem started on 04/26, when CircleCI apparently released a new update: https://circleci.com/changelog/#updated-cli-commands-for-private-orbs

Don't know if it can help.

@davet1985
Copy link
Author

@davet1985 thanks for report and PR.

I'm not clear though why a workflow declared by a running job would not exist when queried.

And if it doesn't, it indicates faulty data to make a decision on.

Do you have any insight on the underlying cause?

Hi @eddiewebb, I totally agree, it's very strange behaviour from CircleCI and no I don't have any insight on why it's happening, but it certainly seems to be affecting multiple people. Possibly a bug in a new release of CircleCI's API?! I am seeing the same issue when I call the API manually.

The change to the code makes it slightly more defensive when a 404 response is found.

andrewseguin pushed a commit to angular/components that referenced this issue Apr 28, 2022
@nebolsin
Copy link

We've also experienced this issue, and in my research I found a very old job from 2019 stuck in the running state, which was not visible in the UI and any attempt to directly access it's build_url or workflow resulted in 404. I guess something has changed on the CircleCI side, maybe they just cleared up old data.

Anyway, I was able to cancel this old job by calling the API directly and unblock our deployment pipeline.

First, find out if there're some old jobs stuck in running state on your deploy branch and make note of their build_num:

curl -H 'Circle-Token: <your token>' 'https://circleci.com/api/v1.1/project/github/<org>/<repo>/tree/<branch>?filter=running' | | jq -c 'map(.queued_at, .build_num)'

and cancel those builds via Cancel a build API:

curl -H 'Circle-Token: <your token>' -X POST 'https://circleci.com/api/v1.1/project/github/<org>/<repo>/<build_num>/cancel'

@asselinpaul
Copy link

@nebolsin same here, had a stuck job from November 2019, appreciate the command

@davet1985
Copy link
Author

@nebolsin thanks very much for sharing this, I found two workflows stuck which I have been able to cancel.

@eddiewebb
Copy link
Owner

@nebolsin thank you! I was just about tp point out the missing workflow seems to be much older -- was AT LEAST older than 1 week

Checking time of workflow: 013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: API Success
Workflow was created at: 2022-04-21T19:30:12Z

And so I suspect we must have a retention period issue on the workflow side. I will raise this internally but it seems like the right "fix" is to address hanging builds.

Would it make sense to still allow the orb to be tolerant of 404, I think only for workflow and perhaps even a trivial date check...

@davet1985 - is your org using the new retention policy controls?
https://circleci.com/docs/2.0/persist-data/#custom-storage-usage

@eddiewebb
Copy link
Owner

@nebolsin , @asselinpaul , @davet1985 - did any of you happen to know for those stuck jobs if the workflow was in fact missing? I would love a JOB id to debug.

@davet1985
Copy link
Author

@davet1985 - is your org using the new retention policy controls?

@eddiewebb those all seem to be set to the defaults.

@eddiewebb eddiewebb added bug Something isn't working question Further information is requested labels Apr 28, 2022
@nebolsin
Copy link

@eddiewebb I agree that the root cause here is CircleCI still reporting running jobs, even though their workflows are no longer accessible. Unfortunately, in my case it's a private project, so I cannot help with a job id.

I checked our last successful deploy before this issue manifested itself was on 2022-04-25T19:23:00Z and the logs indicate that the stuck job was definitely there, but it's workflow was accessible. It didn't block the deploy at that time because job name was different:

DEBUG: Making API Call to https://circleci.com/api/v1.1/project/github/<org>/<repo>/tree/master?filter=running
DEBUG: API Success
API access successful
Checking time of workflow: ca97a6cf-86da-4d81-9de4-1cdf13bd2f8b
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/ca97a6cf-86da-4d81-9de4-1cdf13bd2f8b
DEBUG: API Success
Workflow was created at: 2022-04-25T19:23:00Z
Checking time of workflow: 6a5834ae-e451-498b-8d17-e3376893ea5c
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6a5834ae-e451-498b-8d17-e3376893ea5c
DEBUG: API Success
Workflow was created at: 2019-12-05T23:53:42Z
Orb parameter block-workflow is false.
Only blocking execution if running previous jobs matching this job: web_deploy
Oldest job: 320754
This Workflow Timestamp: "2022-04-25T19:23:00Z"
Oldest Workflow Timestamp: "2022-04-25T19:23:00Z

And the first problematic deploy was on 2022-04-26T18:05:19Z:

DEBUG: Making API Call to https://circleci.com/api/v1.1/project/github/<org>/<repo>tree/master?filter=running
DEBUG: API Success
API access successful
Checking time of workflow: dba1bb0f-e0db-4511-87c7-1f7035d6b0fe
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/dba1bb0f-e0db-4511-87c7-1f7035d6b0fe
DEBUG: API Success
Workflow was created at: 2022-04-26T18:05:19Z
Checking time of workflow: 6a5834ae-e451-498b-8d17-e3376893ea5c
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6a5834ae-e451-498b-8d17-e3376893ea5c

Exited with code exit status 22

CircleCI received exit code 22

You can see that the API call for the exact same workflow now results in an error, and I checked it manually via curl — it definitely was 404 (in fact, I can still get this old job details both via API v1: Single job and API v2: Get job details endpoints, and I still get 404 when trying to access it's workflow).

@eddiewebb
Copy link
Owner

Ok, got some results that bring clarity.

  1. this is related to CircleCI recently released retention policies
  2. we are temporarily blocking access to data older than 3 months, and will eventually delete this permanently
  3. there is pending inconsistency between which APIS enforce this or not (workflows does, recent jobs does not)
  4. future state those pending/old/stale jobs will also be deleted, so data will be consistent

I'm not on the team that control's those decisions, so I am happy to convey feedback but encourage anyone impacted to raise a support ticket or vote on ideas.circleci.com related to it.

@eddiewebb
Copy link
Owner

I believe changes recently went live restricting data in the recent-builds API that will prevent this specific scenario from occurring.

I am going to close this issue currently unless anybody does see it still, or @nebolsin 's fix (#79 (comment)) does not address the stale data from hung jobs.

Thank you all for the contribution and discussion.

@asfaltboy
Copy link

In my case I still had to cancel a workflow that was "stuck in running state" from 2019. After that, all seems fine (for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants