Skip to content

Latest commit

 

History

History
executable file
·
373 lines (268 loc) · 20.5 KB

modes-of-operation.md

File metadata and controls

executable file
·
373 lines (268 loc) · 20.5 KB

AWS DeepRacer application: Modes of operation

Overview

The AWS DeepRacer core application on the AWS DeepRacer device operates in three modes: autonomous mode, calibration mode, and manual mode. The following sections provide details about how the AWS DeepRacer core application works in autonomous and manual modes. You can follow the instructions in Getting started with AWS DeepRacer OpenSource to set up your car with the latest version of the AWS DeepRacer device software to use the autonomous, calibration, and manual modes of the AWS DeepRacer core application.

The autonomous and manual modes are accessible on the Control vehicle page of the AWS DeepRacer device console. In autonomous mode, you can load the reinforcement learning model trained on the AWS DeepRacer service and run inference on the physical car. In manual mode, you can manually control the car to steer and move it using a simple joystick on the device console. You can also change the speed using the device console to control the speed at which the car is driving in both the modes.

Both autonomous and manual mode leverage the servo node as their navigation component. You can read about how the various ROS nodes that are part of the AWS DeepRacer device software and the device console interact in autonomous and manual mode in the following sections.

Autonomous mode

While running inference on the reinforcement learning model loaded in autonomous mode, the AWS DeepRacer device takes in the state information required at the rate of the camera sensor and takes an action to either increase or decrease the speed and steering angle. Each perception-inference-action step involves a pipeline of a series of ROS messages published or subscribed at various nodes from the point the camera and LiDAR data is published; then a model takes the image or image+LiDAR data as an input; to the point where the servo and motors attached to the wheel change angle or speed.

Perception

As part of the perception step, the camera images are read from the camera and published in the camera_node. If you have a single camera connected, then the camera messages contain only one image per message, whereas if there are two cameras connected (as in the AWS DeepRacer Evo), the camera messages contain one image from each camera per message. In the AWS DeepRacer Evo, we also publish the LiDAR data read from the connected LiDAR. The camera and LiDAR data are combined together as a sensor message and published at the rate of the camera sensor in the sensor_fusion_node.

Inference (decision)

The reinforcement learning model used in the AWS DeepRacer device is trained in the AWS DeepRacer service and uploaded to the device. Before running the inference, the reinforcement learning model is optimized on the device using the Intel OpenVino Model Optimizer in the model_optimizer_node to create the intermediate representation artifacts of the network, which can be read, loaded, and inferred with the Intel OpenVino Inference Engine. As part of the inference step, the AWS DeepRacer application takes in the state information sent from the sensor_fusion_node and runs inference using the Inference Engine APIs in the inference_node to publish the reinforcement learning inference results.

Action (navigation)

The action step differs slightly based on the action space type on which the model was trained. The AWS DeepRacer service support two types of action space types for the models that are trained: discrete and continuous. For the discrete action space, a set of discrete values are returned from the neural network, interpreted as a probability distribution, and mapped to a set of actions. For the continuous action space, the policy only outputs two discrete values. These values are interpreted to be the mean and standard deviation of a continuous normal distribution. You can learn more about these in the The ins and outs of action spaces section of this AWS blog post.

As part of the action step, the results of the inference published by the inference_node are read, mapped, and scaled to steering and speed values in the deepracer_navigation_node. These scaled values are published as servo messages and converted to raw PWM duty cycle values that are centered around the mid, max, and min values set during calibration setup in the servo_node.

Differences between the discrete action space and the continuous action space

The differences between the discrete and continuous action spaces are primarily in the way the neural network is architected to output the values.

In the discrete action values, there is a probability distribution for a predefined set of actions, each containing a fixed {steering_angle, speed} combination. We choose the maximum probable {steering_angle, speed} combination values for scaling.

In the continuous action values, there is a numerical mean value clipped between [-1.0, 1.0], which is rescaled linearly to the [minimum_value, maximum_value] for speed and steering_angle.

At this point, the steering value is scaled based on its maximum value and the speed value is nonlinearly scaled to the range of [0.0, 1.0]. For more details about this nonlinear scaling, see the following Nonlinear speed-mapping equations section.

Manual mode

In manual mode, you can drive the car manually using a joystick on a rectangle trackpad in the device side console. Dragging a joystick up toward the top of screen increases the speed, down decreases the speed, right turns right and left turns left.

The actual values are calculated as a difference between the top-left coordinate of the rectangle trackpad and the coordinate of joystick itself. The x and y values of these coordinates are calculated from the top-left corner of the displayed screen in the browser. These raw joystick coordinates are shifted to have the origin at the center of joystick at rest position, and the (x, y) values are scaled to percentages. The raw values between [-1.0, 1.0] are passed to the backend web server, where we categorize and map the speed nonlinearly and categorize the angle.

  • Categorize the throttle and angle values: This allows us to convert the joystick trackpad into concentric rectangles of stepped values, allowing better feedback and control for the user to move the joystick.

  • Use the maximum speed percentage and throttle as an input to calculate the nonlinear mapping: This allows us to flatten the curve if the maximum speed percentage is lower, thereby reducing the impact of the raw joystick value on the final value. For more details about this nonlinear mapping, see the following Nonlinear speed-mapping equations section.

These rescaled values are mapped to the PWM values of the servo and motor.

Distinctions between the autonomous mode action flow and the manual mode action flow include the following:

  • Both steering angle and speed values are already in the range of [-1.0, 1.0] and there is no minimum and maximum speed associated with this. Autonomous mode has an additional mapping required to use the values passed in the model_metatdata JSON.

  • The maximum speed value is used to nonlinearly scale the speed, unlike in autonomous mode, where the maximum throttle value is used to get a fraction of the total value calculated.

Nonlinear speed-mapping equations

The AWS DeepRacer device does not have a mechanism to detect the actual speed of the car and depends on the raw PWM values to control the speed. These PWM values have nonlinear mapping to the RPM of the wheel, which makes the car go faster or slower. In order to connect the magnitude of speed to the actual RPM on the car, we use a nonlinear mapping of the input speed values as a proxy for the PWM values responsible for increasing speed. One of the important functions in the AWS DeepRacer device-side software is a method that maps the action space value for speed into this transformed car speed value.

It's implemented by mapping the [maximum speed, half of the maximum speed] from the action space to the values of [1.0, 0.8] using the quadratic equation:

y = ax**2 + bx

The equation corresponds to a parabola. In our use case, we have to find a parabola that contains both the values [maximum_speed, 1.0] and [maximum_speed/2, 0.8].

Why do we need to fit [maximum_speed, 1.0] and [maximum_speed/2, 0.8] on the ax**2 + bx curve?

The values 1.0 and 0.8 correspond to the percent of speed that needs to be nonlinearly mapped to the maximum speed and the half of maximum speed values that are passed as part of the model_metadata.json file (in autonomous mode). It implies that for a value of half of maximum speed in the model_metadata.json, we consider 80% of PWM value. The mapping values selected have been empirically tested to provide a close and consistent mapping under various vehicle battery levels, devices, and testing scenarios.

To begin understanding this in detail, consider the following curves:

These show the mapping of different [maximum_speed, maximum_speed/2] values in model_metadata to the [1.0, 0.8] value on the y axis. It is important to note that the these values also pass through [0, 0], indicating that there is no constant c value in our parabola expression, ax**2 + bx + c.

Building some intuition around the mapping curves

Whenever we consider the effects of coefficients a, b, and c on the parabolic graph, we can observe that:

  1. Changing the value of a changes the width of the opening of the parabola and that the sign of a determines whether the parabola opens upwards or downwards.
  2. Changing the value of b moves the axis of symmetry of the parabola from side to side; increasing b moves the axis in the opposite direction.
  3. Changing the value of c moves the vertex of the parabola up or down and c is always the value of the y-intercept.

We have already seen that the value of c in our expression is set to 0.

Finding the values of coefficients a and b

Consider the equation of the parabola:

y = ax**2 + bx

In our code, we notify the anchor values used for mapping as DEFAULT_SPEED_SCALES = [1.0, 0.8]. With this notation, we can find the value of coefficients a and b by solving the equation for two points: (maximum_speed, DEFAULT_SPEED_SCALES[0]) and (maximum_speed/2, DEFAULT_SPEED_SCALES[1])

Replace the values of x and y in the parabola equation:

DEFAULT_SPEED_SCALES[0] = a * maximum_speed**2 + b * maximum_speed

DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) / 4 + (b * maximum_speed) / 2

Solve for a and b:

4 * DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) + 2 * (b * maximum_speed)

b = (1 / maximum_speed) * ( 4 * DEFAULT_SPEED_SCALES[1] - DEFAULT_SPEED_SCALES[0])

b = (1 / maximum_speed) * 2.2

Replace b in equation 3:

DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) / 4 + ( 4 * DEFAULT_SPEED_SCALES[1] - DEFAULT_SPEED_SCALES[0] ) * maximum_speed/ (2 * maximum_speed)

4 * DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) + 2 * ( 4 * DEFAULT_SPEED_SCALES[1] - DEFAULT_SPEED_SCALES[0])

a = (4 * DEFAULT_SPEED_SCALES[1] - 8 * DEFAULT_SPEED_SCALES[1] + 2 * DEFAULT_SPEED_SCALES[0] ) / maximum_speed**2

a = (1 / maximum_speed**2) * ( 2 * DEFAULT_SPEED_SCALES[0] - 4 * DEFAULT_SPEED_SCALES[1] )

a = - (1 / maximum_speed**2) * 1.2

The coefficients a and b are inversely related to the maximum speed. We can confirm that increasing the maximum speed increases the width of the opening of parabola and moves the axis of the parabola side to side.

Additional rescaling in the continuous action space to map the network output to the original range

In the continuous action space, we train the neural network to output mean values for steering_angle and speed in the range of [-1.0, 1.0]. These values are to be rescaled back to the corresponding user-provided minimum and maximum values. This is done by linearly scaling the mean value obtained for steering_angle to the range of [minimum_steering_angle, maximum_steering_angle] and the mean value of speed to the range of [minimum_speed_value, maximum_speed_value].

This scaled speed value is mapped nonlinearly to a range of [0.0, 1.0] using the formula y = ax^2 + bx. The following examples of map the value [-1.0, 1.0] → [minimum_speed_value, maximum_speed_value] → [0.0, 1.0].

Autonomous mode: Impact of the maximum speed percent value set by the user on the device console on the speed value

The AWS DeepRacer console allows the user to set the maximum speed percent values between [0, 100]%, with a default value set to 50%. In autonomous mode, the value set from the front end is passed back to the navigation node via the control_node, where the nonlinearly scaled values are multiplied to this percentage.

For example, if the user has set the maximum speed value to be 40%, and the model selected has a maximum speed of 0.8 m/s, then the throttle value set in the servo message in the navigation node for a neural network output speed of 0.4 m/s is:

non_linear_scaled_speed = 0.8 (0.4m/s corresponds to 0.8 of non linear scaled speed for a maximum speed of 0.8m/s )

servo_msg.throttle = maximum_speed_threshold * non_linear_scaled_speed = 0.4 * 0.8 = 0.32

For the same maximum speed threshold of 40% and maximum speed of 0.8 m/s, if the neural network outputs a speed of 0.2 m/s, then the value of the non_linear_scaled_speed decreases according to the equation derived before:

coefficients = a, b = -1.875, 2.75 (for max speed 0.8 m/s) speed= 0.2 m/s

non_linear_scaled_speed = a * s**2 + b * s = -1.875 * 0.2**2 + 2.75 * 0.2 = 0.475 (0.2m/s corresponds to 0.475 for a maximum speed of 0.8 m/s)

servo_msg.throttle = maximum_speed_threshold * non_linear_scaled_speed* = 0.4 * 0.475 = 0.19

Manual mode: Math behind the nonlinear speed mapping equations

As seen in the continuous action space in autonomous mode, we now need to nonlinearly map the raw value obtained from the joystick movement to the PWM values of the servo and motor for the car to move.

As we do not have a maximum speed value defined to calculate the coefficients of the equation y = ax**2 + bx, we use the maximum speed percent from the device console to map to a range of [1.0, 5.0] speed scale values. This allows us to recalculate the curve for each maximum speed percent value and use that to map it to the throttle value in the servo message.

The idea behind this is that a lower percentage of maximum speed percent should map to a higher speed scale value while calculating the coefficients so that the curve is flatter and the impact of actual speed values is less for a lower max speed percent as shown in the following example:

Why do we need to map the maximum speed percent values [0.0, 1.0] to another speed scale value in the range [5.0, 1.0] to calculate coefficients a and b?

The maximum_speed_percentage values that we get as an user input are in the range of [0. 0, 1.0], and higher values mean the user wants more out of the joystick movement. In other words, higher values for the maximum_speed_percentage means that we need to have a steep curve while mapping the possible speed values to their transformed counterpart (the throttle value in the servo message). We need to inversely map the maximum speed percentage values to the speed scale value used to calculate the non_linear_scaled_speed.

We map the 100% maximum speed to 1.0 because the possible speed values from the joystick range from [0.0, 1.0] and 1.0 is the minimum value where the curve peaks before reversing in the preceding figure. This indicates that we are guaranteed to have a nonlinearly increasing mapping for all joystick values [0.0, 1.0].

We map the 0% to 5.0 to add a safe buffer to accommodate different vehicle battery charge levels. For example, for a lower battery charge, the user might use a maximum speed percentage of 70% to control the car, but when fully charged, the user can reduce the maximum_speed_percentage to 10% to have a similar effect.

Example 1

input:
throttle (from the joystick) = 0.567; 
max_speed_pct (from +/- adjust maximum speed buttons) = 0.5

calculated values:
categorized_throttle = 0.5
a =  -0.13333333333333336, b =  0.7333333333333334

mapped_speed = a * categorized_throttle**2 + b * categorized_throttle
             = -0.13333333333333336 * 0.5**2 + 0.7333333333333334 * 0.5 
             = -0.13333333333333336 * 0.25 +  0.7333333333333334 * 0.5
             = -0.033333333 + 0.366666667
             = 0.333333334

Example 2

input:
throttle (from the joystick) = 0.567; 
max_speed_pct (from +/- adjust maximum speed buttons) = 0.6

calculated values:
categorized_throttle = 0.5
a =  -0.17751479289940836, b =  0.8461538461538464

mapped_speed = -0.17751479289940836 * 0.25 + 0.8461538461538464 * 0.5 = 0.378698225

Example 3

input:
throttle (from the joystick) = 0.467; 
max_speed_pct (from +/- adjust maximum speed buttons) = 0.6

calculated values:
categorized_throttle = 0.3
a =  -0.17751479289940836, b =  0.8461538461538464

mapped_speed = -0.17751479289940836 * 0.09 + 0.8461538461538464 * 0.3 = 0.237869822

After we have the final categorizing and mapping completed for the speed and steering_angle, we have the output in the range of [-1.0, 1.0]. These values are passed to the servo node, similar to autonomous mode, to move the car.

Setting the servo and motor PWM duty values

The servo node expects the speed and steering_angle values sent as part of the servo message to be in the range of [-1.0, 1.0]. In autonomous mode, the speed is in the range of [0.0, 1.0], as we do not support reverse driving for our car.

The AWS DeepRacer device is an open-loop system and does not have feedback to recognize the speed of the car. We linearly map the value of speed and steering_angle obtained at the servo node to the duty cycle values, which regulate the RPM of the servo and motor. The exact value written as the PWM duty on to the motor and servo file further depends on the bounding calibration values set during the calibration flow.

Summary

This component is one of the core features of the AWS DeepRacer application. The Follow the Leader (FTL) sample project leverages most of the concepts used in the manual mode to build the ftl_navigation node. To learn more about the Follow the Leader(FTL) sample project, see AWS DeepRacer Follow the Leader (FTL) sample project.

Resources