Maintenance window's "NextExecutionTime" is updated as soon as execution begins, causing instances to be shut down at the next scheduler interval #101

georgematthew · 2019-06-26T17:54:10Z

After working around #99 and #100, I am still unable to use the SSM maintenance window functionality. I've attempted to outline the behavior I am seeing below. Please let me know if I can elaborate on anything.

The instances that are configured with a schedule that references a maintenance window are started at least 10 minutes before the maintenance window based on the schedule/period created from the maintenance window's NextExecutionTime. The running period is 2 hours in duration, as expected. This matches the maintenance window duration.
The SSM maintenance window tasks begin. By this time the instances are running and recognized by SSM. I am executing Run Command tasks to run the AWS-UpdateSSMAgent and AWS-RunPatchBaseline documents.
At the next scheduler interval (10 minutes later, for example), the instances are stopped because the scheduler has created a new schedule/period based on the maintenance window's updated NextExecutionTime. It appears that the previously created period/schedule is overwritten and the scheduler believes that the desired state is "stopped". In my case, the NextExecutionTime is one week in the future, as the maintenance window is scheduled once per week. This causes the pending Run Command tasks to fail and tasks that have yet to start to report NoInstancesInTag.

The expected behavior is that the scheduler would keep the instances running for the duration of the maintenance window.

Is this a bug in the scheduler's maintenance window functionality or am I failing to understand something about how this solution is intended to be used?

The text was updated successfully, but these errors were encountered:

georgebearden · 2019-07-03T01:02:53Z

Hi George - Again, sorry for the delay on this :) Let me get a test environment set up so I can run through this scenario specifically, and then update this issue with findings.

georgematthew · 2019-07-16T19:16:47Z

Hi George, have you had a chance to test this?

tapughose · 2019-09-26T17:42:02Z

@georgematthew , I tried to reproduce the issue. I was only able to reproduce if I had error in setting up ssm_maintenance_window and use_maintenance_window. In order to make scheduler to honor maintenance window we need to set ssm_maintenance_window to maintenance_window name. In addition, it is also required to set use_maintenance_window to true.

Here is a working example of a schedule Item from my test table in DynamoDb:

{
  "name": {
    "S": "test-schedule"
  },
  "periods": {
    "SS": [
      "test-period"
    ]
  },
  "ssm_maintenance_window": {
    "S": "test-ssm-mw"
  },
  "timezone": {
    "S": "UTC"
  },
  "type": {
    "S": "schedule"
  },
  "use_maintenance_window": {
    "BOOL": true
  }
}

The test-period that used in periods looks like as follows:

{
  "begintime": {
    "S": "6:00"
  },
  "endtime": {
    "S": "17:00"
  },
  "name": {
    "S": "test-period"
  },
  "type": {
    "S": "period"
  },
  "weekdays": {
    "SS": [
      "mon-sun"
    ]
  }
}

The instance scheduler first tests if the instance has a maintenance window (here, test-ssm-mw) in which it must be running. If not then the scheduler checks condition for period (here, test-period).

I was wondering if you can confirm that you have used both ssm_maintenance_window and use_maintenance_window properties as outlined.

georgebearden · 2019-10-16T17:47:33Z

This issue should now be resolved. Please let us know if this is not the case.

georgematthew · 2019-10-16T19:06:34Z

@tapughose And the scheduler kept your instance running for the duration of the maintenance window? This issue is still occurring for me in v1.3. I'm seeing the same behavior that I describe above.

Here is my schedule:

{
  "description": "Keep instances off except for the maintenance window.",
  "enforced": true,
  "name": "AlwaysOff",
  "periods": [
    "AlwaysOff"
  ],
  "ssm_maintenance_window": "test-maintenance-window",
  "type": "schedule",
  "use_maintenance_window": true
}

and the period:

{
  "description": "Keep instances off.",
  "endtime": "00:00",
  "name": "AlwaysOff",
  "type": "period"
}

and I've pasted the scheduler logs below, where you can see that the scheduler successfully detects the maintenance window and starts the instance at 18:20. When the scheduler runs again at 18:25, it leaves the instance in the running state. When the scheduler runs at 18:30, which is the start of the 2hr maintenance window, it shuts the instance down while the maintenance window is still InProgress. Based on the logs, the scheduler has created a new running period for the next execution of the maintenance window (tomorrow).

2019-10-16 - 18:20:19.716 - INFO : Handler SchedulerRequestHandler scheduling request for service(s) ec2, account(s) [xxxxxxxxxxx], region(s) us-east-1 at 2019-10-16 18:20:19.716431
2019-10-16 - 18:20:19.934 - INFO : Running EC2 scheduler for account [xxxxxxxxxxx] in region(s) us-east-1
2019-10-16 - 18:20:20.694 - INFO : Fetching ec2 instances for account [xxxxxxxxxxx] in region us-east-1
2019-10-16 - 18:20:21.489 - INFO : Created schedule test-maintenance-window from SSM maintence window, start is 2019-10-16T14:20:00-04:00, end is 2019-10-16T16:30:00-04:00
2019-10-16 - 18:20:21.489 - INFO : SSM maintenance window disabled (mw-[xxxxxxxxxxx]) is disabled
2019-10-16 - 18:20:21.490 - DEBUG : Selected ec2 instance i-[xxxxxxxxxxx] in state (stopped)
2019-10-16 - 18:20:21.490 - INFO : Number of fetched ec2 instances is 1, number of instances in a schedulable state is 1
2019-10-16 - 18:20:21.751 - DEBUG : [ Instance EC2:i-[xxxxxxxxxxx] (Test) ]
2019-10-16 - 18:20:21.751 - DEBUG : Current state is stopped, instance type is t2.micro, schedule is "AlwaysOff"
2019-10-16 - 18:20:21.751 - INFO : Maintenance window "test-maintenance-window" used as running period found for instance i-[xxxxxxxxxxx]
2019-10-16 - 18:20:21.752 - DEBUG : Time used to determine desired for instance is Wed Oct 16 14:20:21 2019
2019-10-16 - 18:20:21.752 - DEBUG : Checking conditions for period "test-maintenance-window-period"
2019-10-16 - 18:20:21.752 - DEBUG : [running] Month "oct" in months (oct)
2019-10-16 - 18:20:21.752 - DEBUG : [running] Day of month 16 in month days (16)
2019-10-16 - 18:20:21.752 - DEBUG : [running] Time 14:20:21 is within 14:20:00-16:30:00, returned state is running
2019-10-16 - 18:20:21.752 - DEBUG : Active period in schedule "test-maintenance-window": "test-maintenance-window-period"
2019-10-16 - 18:20:21.752 - DEBUG : Desired state for instance from schedule "AlwaysOff" is running, last desired state was stopped, actual state is stopped
2019-10-16 - 18:20:21.752 - DEBUG : Using enforcement flag of schedule to set actual state of instance EC2:i-[xxxxxxxxxxx] (Test) from stopped to running
2019-10-16 - 18:20:21.752 - DEBUG : Listing instance EC2:i-[xxxxxxxxxxx] (Test) in region us-east-1 with instance type t2.micro to be started by scheduler
2019-10-16 - 18:20:21.752 - INFO : Starting instances EC2:i-[xxxxxxxxxxx] (Test) in region us-east-1
2019-10-16 - 18:20:22.534 - INFO : Scheduler result {'[xxxxxxxxxxx]': {'started': {'us-east-1': [{'i-[xxxxxxxxxxx]': {'schedule': 'AlwaysOff'}}]}, 'stopped': {}, 'resized': {}}}
2019-10-16 - 18:25:19.630 - INFO : Handler SchedulerRequestHandler scheduling request for service(s) ec2, account(s) [xxxxxxxxxxx], region(s) us-east-1 at 2019-10-16 18:25:19.630564
2019-10-16 - 18:25:19.849 - INFO : Running EC2 scheduler for account [xxxxxxxxxxx] in region(s) us-east-1
2019-10-16 - 18:25:20.488 - INFO : Fetching ec2 instances for account [xxxxxxxxxxx] in region us-east-1
2019-10-16 - 18:25:21.200 - INFO : Created schedule test-maintenance-window from SSM maintence window, start is 2019-10-16T14:20:00-04:00, end is 2019-10-16T16:30:00-04:00
2019-10-16 - 18:25:21.200 - INFO : SSM maintenance window disabled (mw-[xxxxxxxxxxx]) is disabled
2019-10-16 - 18:25:21.202 - DEBUG : Selected ec2 instance i-[xxxxxxxxxxx] in state (running)
2019-10-16 - 18:25:21.202 - INFO : Number of fetched ec2 instances is 1, number of instances in a schedulable state is 1
2019-10-16 - 18:25:21.469 - DEBUG : [ Instance EC2:i-[xxxxxxxxxxx] (Test) ]
2019-10-16 - 18:25:21.469 - DEBUG : Current state is running, instance type is t2.micro, schedule is "AlwaysOff"
2019-10-16 - 18:25:21.469 - INFO : Maintenance window "test-maintenance-window" used as running period found for instance i-[xxxxxxxxxxx]
2019-10-16 - 18:25:21.469 - DEBUG : Time used to determine desired for instance is Wed Oct 16 14:25:21 2019
2019-10-16 - 18:25:21.469 - DEBUG : Checking conditions for period "test-maintenance-window-period"
2019-10-16 - 18:25:21.469 - DEBUG : [running] Month "oct" in months (oct)
2019-10-16 - 18:25:21.469 - DEBUG : [running] Day of month 16 in month days (16)
2019-10-16 - 18:25:21.469 - DEBUG : [running] Time 14:25:21 is within 14:20:00-16:30:00, returned state is running
2019-10-16 - 18:25:21.469 - DEBUG : Active period in schedule "test-maintenance-window": "test-maintenance-window-period"
2019-10-16 - 18:25:21.469 - DEBUG : Desired state for instance from schedule "AlwaysOff" is running, last desired state was running, actual state is running
2019-10-16 - 18:25:21.469 - INFO : Scheduler result {'[xxxxxxxxxxx]': {'started': {}, 'stopped': {}, 'resized': {}}}
2019-10-16 - 18:30:20.476 - INFO : Handler SchedulerRequestHandler scheduling request for service(s) ec2, account(s) [xxxxxxxxxxx], region(s) us-east-1 at 2019-10-16 18:30:20.476061
2019-10-16 - 18:30:20.693 - INFO : Running EC2 scheduler for account [xxxxxxxxxxx] in region(s) us-east-1
2019-10-16 - 18:30:21.234 - INFO : Fetching ec2 instances for account [xxxxxxxxxxx] in region us-east-1
2019-10-16 - 18:30:21.888 - INFO : Created schedule test-maintenance-window from SSM maintence window, start is 2019-10-17T14:20:00-04:00, end is 2019-10-17T16:30:00-04:00
2019-10-16 - 18:30:21.888 - INFO : SSM maintenance window disabled (mw-[xxxxxxxxxxx]) is disabled
2019-10-16 - 18:30:21.894 - DEBUG : Selected ec2 instance i-[xxxxxxxxxxx] in state (running)
2019-10-16 - 18:30:21.894 - INFO : Number of fetched ec2 instances is 1, number of instances in a schedulable state is 1
2019-10-16 - 18:30:22.155 - DEBUG : [ Instance EC2:i-[xxxxxxxxxxx] (Test) ]
2019-10-16 - 18:30:22.155 - DEBUG : Current state is running, instance type is t2.micro, schedule is "AlwaysOff"
2019-10-16 - 18:30:22.155 - INFO : Maintenance window "test-maintenance-window" used as running period found for instance i-[xxxxxxxxxxx]
2019-10-16 - 18:30:22.155 - DEBUG : Time used to determine desired for instance is Wed Oct 16 14:30:22 2019
2019-10-16 - 18:30:22.155 - DEBUG : Checking conditions for period "test-maintenance-window-period"
2019-10-16 - 18:30:22.155 - DEBUG : [running] Month "oct" in months (oct)
2019-10-16 - 18:30:22.155 - DEBUG : [stopped] Day of month 16 not in month days (17)
2019-10-16 - 18:30:22.155 - DEBUG : No running periods at this time found in schedule "test-maintenance-window" for this time, desired state is stopped
2019-10-16 - 18:30:22.155 - DEBUG : Time used to determine desired for instance is Wed Oct 16 14:30:17 2019
2019-10-16 - 18:30:22.155 - DEBUG : Checking conditions for period "AlwaysOff"
2019-10-16 - 18:30:22.155 - DEBUG : [stopped] Time 14:30:17 is after stoptime 00:00:00, returned state is stopped
2019-10-16 - 18:30:22.155 - DEBUG : No running periods at this time found in schedule "AlwaysOff" for this time, desired state is stopped
2019-10-16 - 18:30:22.155 - DEBUG : Desired state for instance from schedule "AlwaysOff" is stopped, last desired state was running, actual state is running
2019-10-16 - 18:30:22.155 - DEBUG : Using enforcement flag of schedule to set actual state of instance EC2:i-[xxxxxxxxxxx] (Test) from running to stopped
2019-10-16 - 18:30:22.155 - DEBUG : Listing instance EC2:i-[xxxxxxxxxxx] (Test) in region us-east-1 to be stopped by scheduler
2019-10-16 - 18:30:22.155 - INFO : Stopping instances EC2:i-[xxxxxxxxxxx] (Test) in region us-east-1
2019-10-16 - 18:30:22.654 - INFO : Scheduler result {'[xxxxxxxxxxx]': {'started': {}, 'stopped': {'us-east-1': [{'i-[xxxxxxxxxxx]': {'schedule': 'AlwaysOff'}}]}, 'resized': {}}}

Is there a check that is done to see if the ssm_maintenance_window that is defined in the schedule is currently running? The scheduler seems to only be taking the NextExecutionTime of the maintenance window and not its current execution.

tapughose · 2019-11-01T00:52:14Z

@georgematthew, yes.. the scheduler kept my instance running for the duration of the maintenance window. I will make a schedule and period as of yours and let me see if I can find something.

georgematthew · 2019-11-18T20:24:08Z

@tapughose I tried blowing away the scheduler stack and redeploying with the newest version, no luck. I experienced the same behavior that I describe above.

@georgebearden I don't yet consider this issue resolved. The instance scheduler is not keeping instances running for the duration of the maintenance window when configured with the schedule/period I have posted above. I would consider this issue resolved if someone is able to point to an error in the configuration I have posted above or an update is released that resolves the issue with the provided configuration.

Please let me know if I can provide any additional debugging information. Thank you.

hoppalotta · 2020-01-02T22:29:44Z

I can confirm I am seeing the exact same behavior as @georgematthew.
Using 1.3 and a schedule that is effectively always off other than the ssm maintenance period.

hross-frae · 2020-01-03T11:03:52Z

I am also experiencing the same problem with the scheduler immediately turning off an instance after it has been start for a maintenance window.

chaitand28 · 2020-03-10T20:43:52Z

This issue has been fixed in the release 1.3.1. Please deploy the latest template to get the updated code.

mahammadism · 2021-01-05T07:00:44Z

Hi,
I am working Instance scheduler and ssm maintenance window for the first time. So, my requirement is to enable ssm maintenance window for instance scheduler and need to start instances 2 hours before ssm maintenance window task execution.
Please guide me on how we can enable ssm maintenance window in instance scheduler using cloudforamtion.

Thanks,
Ismail. S

georgebearden closed this as completed Oct 16, 2019

maykays mentioned this issue Oct 8, 2020

Throttling exception using ssm maintenance window #191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintenance window's "NextExecutionTime" is updated as soon as execution begins, causing instances to be shut down at the next scheduler interval #101

Maintenance window's "NextExecutionTime" is updated as soon as execution begins, causing instances to be shut down at the next scheduler interval #101

georgematthew commented Jun 26, 2019

georgebearden commented Jul 3, 2019

georgematthew commented Jul 16, 2019

tapughose commented Sep 26, 2019

georgebearden commented Oct 16, 2019

georgematthew commented Oct 16, 2019

tapughose commented Nov 1, 2019

georgematthew commented Nov 18, 2019

hoppalotta commented Jan 2, 2020

hross-frae commented Jan 3, 2020

chaitand28 commented Mar 10, 2020

mahammadism commented Jan 5, 2021 •

edited

Loading

Maintenance window's "NextExecutionTime" is updated as soon as execution begins, causing instances to be shut down at the next scheduler interval #101

Maintenance window's "NextExecutionTime" is updated as soon as execution begins, causing instances to be shut down at the next scheduler interval #101

Comments

georgematthew commented Jun 26, 2019

georgebearden commented Jul 3, 2019

georgematthew commented Jul 16, 2019

tapughose commented Sep 26, 2019

georgebearden commented Oct 16, 2019

georgematthew commented Oct 16, 2019

tapughose commented Nov 1, 2019

georgematthew commented Nov 18, 2019

hoppalotta commented Jan 2, 2020

hross-frae commented Jan 3, 2020

chaitand28 commented Mar 10, 2020

mahammadism commented Jan 5, 2021 • edited Loading

mahammadism commented Jan 5, 2021 •

edited

Loading