The AWS DeepRacer core application on the AWS DeepRacer device operates in three
modes: autonomous mode, calibration mode, and manual mode. The following sections provide
details about how the AWS DeepRacer core application works in autonomous and manual
modes. You can follow the instructions in
Getting started with AWS DeepRacer OpenSource to set up your car with the latest
version of the AWS DeepRacer device software to use the autonomous
, calibration
, and
manual
modes of the AWS DeepRacer core application.
The autonomous
and manual
modes are accessible on the Control vehicle page of the AWS DeepRacer device console. In autonomous
mode, you can load the reinforcement learning model trained on the AWS DeepRacer service and run inference on the physical car. In manual
mode, you can manually control the car to steer and move it using a simple joystick on the device console. You can also change the speed using the device console to control the speed at which the car is driving in both the modes.
Both autonomous
and manual
mode leverage the servo node as their navigation component. You can read about how the various ROS nodes that are part of the AWS DeepRacer device software and the device console interact in autonomous
and manual
mode in the following sections.
While running inference on the reinforcement learning model loaded in
autonomous
mode, the AWS DeepRacer device takes in the state information required
at the rate of the camera sensor and takes an action to either increase or decrease the
speed and steering angle. Each perception-inference-action step involves a
pipeline of a series of ROS messages published or subscribed at various nodes from
the point the camera and LiDAR data is published; then a model takes the image or
image+LiDAR data as an input; to the point where the servo and motors attached to the
wheel change angle or speed.
As part of the perception step, the camera images are read from the camera and published in the camera_node. If you have a single camera connected, then the camera messages contain only one image per message, whereas if there are two cameras connected (as in the AWS DeepRacer Evo), the camera messages contain one image from each camera per message. In the AWS DeepRacer Evo, we also publish the LiDAR data read from the connected LiDAR. The camera and LiDAR data are combined together as a sensor message and published at the rate of the camera sensor in the sensor_fusion_node.
The reinforcement learning model used in the AWS DeepRacer device is trained in the AWS DeepRacer service and uploaded to the device. Before running the inference, the reinforcement learning model is optimized on the device using the Intel OpenVino Model Optimizer in the model_optimizer_node to create the intermediate representation artifacts of the network, which can be read, loaded, and inferred with the Intel OpenVino Inference Engine. As part of the inference step, the AWS DeepRacer application takes in the state information sent from the sensor_fusion_node and runs inference using the Inference Engine APIs in the inference_node to publish the reinforcement learning inference results.
The action step differs slightly based on the action space type on which the model was trained. The AWS DeepRacer service support two types of action space types for the models that are trained: discrete and continuous. For the discrete action space, a set of discrete values are returned from the neural network, interpreted as a probability distribution, and mapped to a set of actions. For the continuous action space, the policy only outputs two discrete values. These values are interpreted to be the mean and standard deviation of a continuous normal distribution. You can learn more about these in the The ins and outs of action spaces section of this AWS blog post.
As part of the action step, the results of the inference published by the
inference_node
are read, mapped, and scaled to steering and speed values in the
deepracer_navigation_node
. These scaled values are published as
servo messages and converted to raw PWM duty cycle values that are centered
around the mid, max, and min values set during calibration setup in the
servo_node
.
The differences between the discrete and continuous action spaces are primarily in the way the neural network is architected to output the values.
In the discrete action values, there is a probability distribution for a predefined
set of actions, each containing a fixed {steering_angle
, speed
} combination. We choose the maximum probable {steering_angle
, speed
} combination values
for scaling.
In the continuous action values, there is a numerical mean value clipped between
[-1.0, 1.0], which is rescaled linearly to the [minimum_value, maximum_value] for
speed
and steering_angle
.
At this point, the steering value is scaled based on its maximum value and the speed value is nonlinearly scaled to the range of [0.0, 1.0]. For more details about this nonlinear scaling, see the following Nonlinear speed-mapping equations section.
In manual
mode, you can drive the car manually using a joystick on a
rectangle trackpad in the device side console. Dragging a joystick up toward
the top of screen increases the speed, down decreases the
speed, right turns right and left turns left.
The actual values are calculated as a difference between the top-left coordinate
of the rectangle trackpad and the coordinate of joystick itself. The x
and y
values of these coordinates are calculated from the top-left corner of the
displayed screen in the browser. These raw joystick coordinates are shifted
to have the origin at the center of joystick at rest position, and the (x
, y
)
values are scaled to percentages. The raw values between [-1.0, 1.0] are
passed to the backend web server, where we categorize and map the speed nonlinearly and categorize the angle.
-
Categorize the throttle and angle values: This allows us to convert the joystick trackpad into concentric rectangles of stepped values, allowing better feedback and control for the user to move the joystick.
-
Use the maximum speed percentage and throttle as an input to calculate the nonlinear mapping: This allows us to flatten the curve if the maximum speed percentage is lower, thereby reducing the impact of the raw joystick value on the final value. For more details about this nonlinear mapping, see the following Nonlinear speed-mapping equations section.
These rescaled values are mapped to the PWM values of the servo and motor.
Distinctions between the autonomous
mode action flow and the manual
mode action flow include the following:
-
Both steering angle and speed values are already in the range of [-1.0, 1.0] and there is no minimum and maximum speed associated with this.
Autonomous
mode has an additional mapping required to use the values passed in themodel_metatdata
JSON. -
The maximum speed value is used to nonlinearly scale the speed, unlike in
autonomous
mode, where the maximum throttle value is used to get a fraction of the total value calculated.
The AWS DeepRacer device does not have a mechanism to detect the actual speed of the car and depends on the raw PWM values to control the speed. These PWM values have nonlinear mapping to the RPM of the wheel, which makes the car go faster or slower. In order to connect the magnitude of speed to the actual RPM on the car, we use a nonlinear mapping of the input speed values as a proxy for the PWM values responsible for increasing speed. One of the important functions in the AWS DeepRacer device-side software is a method that maps the action space value for speed into this transformed car speed value.
It's implemented by mapping the [maximum speed, half of the maximum speed] from the action space to the values of [1.0, 0.8] using the quadratic equation:
y = ax**2 + bx
The equation corresponds to a parabola. In our use case, we have to find a parabola that contains both the values [maximum_speed, 1.0] and [maximum_speed/2, 0.8].
Why do we need to fit [maximum_speed, 1.0] and [maximum_speed/2, 0.8] on the ax**2 + bx curve?
The values 1.0 and 0.8 correspond to the percent of speed that needs
to be nonlinearly mapped to the maximum speed and the half of maximum speed values
that are passed as part of the model_metadata.json
file (in autonomous
mode). It
implies that for a value of half of maximum speed in the model_metadata.json
, we
consider 80% of PWM value. The mapping values selected have been empirically
tested to provide a close and consistent mapping under various vehicle battery
levels, devices, and testing scenarios.
To begin understanding this in detail, consider the following curves:
These show the mapping of different [maximum_speed, maximum_speed/2] values in
model_metadata
to the [1.0, 0.8] value on the y
axis. It is important to note that the
these values also pass through [0, 0], indicating that there is no constant c
value in
our parabola expression, ax**2 + bx + c
.
Whenever we consider the effects of coefficients a
, b
, and c
on the parabolic
graph, we can observe that:
- Changing the value of
a
changes the width of the opening of the parabola and that the sign ofa
determines whether the parabola opens upwards or downwards. - Changing the value of
b
moves the axis of symmetry of the parabola from side to side; increasingb
moves the axis in the opposite direction. - Changing the value of
c
moves the vertex of the parabola up or down andc
is always the value of they
-intercept.
We have already seen that the value of c
in our expression is set to 0.
Consider the equation of the parabola:
y = ax**2 + bx
In our code, we notify the anchor values used for mapping as DEFAULT_SPEED_SCALES = [1.0, 0.8]
. With this notation, we can find the value of coefficients a
and
b
by solving the equation for two points: (maximum_speed, DEFAULT_SPEED_SCALES[0]
) and (maximum_speed/2, DEFAULT_SPEED_SCALES[1]
)
Replace the values of x
and y
in the parabola equation:
DEFAULT_SPEED_SCALES[0] = a * maximum_speed**2 + b * maximum_speed
DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) / 4 + (b * maximum_speed) / 2
Solve for a
and b
:
4 * DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) + 2 * (b * maximum_speed)
b = (1 / maximum_speed) * ( 4 * DEFAULT_SPEED_SCALES[1] - DEFAULT_SPEED_SCALES[0])
b = (1 / maximum_speed) * 2.2
Replace b
in equation 3:
DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) / 4 + ( 4 * DEFAULT_SPEED_SCALES[1] - DEFAULT_SPEED_SCALES[0] ) * maximum_speed/ (2 * maximum_speed)
4 * DEFAULT_SPEED_SCALES[1] = (a * maximum_speed**2 ) + 2 * ( 4 * DEFAULT_SPEED_SCALES[1] - DEFAULT_SPEED_SCALES[0])
a = (4 * DEFAULT_SPEED_SCALES[1] - 8 * DEFAULT_SPEED_SCALES[1] + 2 * DEFAULT_SPEED_SCALES[0] ) / maximum_speed**2
a = (1 / maximum_speed**2) * ( 2 * DEFAULT_SPEED_SCALES[0] - 4 * DEFAULT_SPEED_SCALES[1] )
a = - (1 / maximum_speed**2) * 1.2
The coefficients a
and b
are inversely related to the maximum speed. We can
confirm that increasing the maximum speed increases the width of the opening of
parabola and moves the axis of the parabola side to side.
In the continuous action space, we train the neural network to output mean values for steering_angle
and speed
in the range of [-1.0, 1.0]. These values are to be rescaled back to the corresponding user-provided minimum and maximum values. This is done by linearly scaling the mean value obtained for steering_angle
to the range of [minimum_steering_angle, maximum_steering_angle] and the mean value of speed
to the range of [minimum_speed_value, maximum_speed_value].
This scaled speed
value is mapped nonlinearly to a range of [0.0, 1.0] using the formula y = ax^2 + bx
. The following examples of map the value [-1.0, 1.0] → [minimum_speed_value, maximum_speed_value] → [0.0, 1.0].
Autonomous
mode: Impact of the maximum speed percent value set by the user on the device console on the speed value
The AWS DeepRacer console allows the user to set the maximum speed percent values between [0, 100]%, with a default value set to 50%. In autonomous
mode, the value set from the front end is passed back to the navigation node via the control_node
, where the nonlinearly scaled values are multiplied to this percentage.
For example, if the user has set the maximum speed value to be 40%, and the model selected has a maximum speed of 0.8 m/s, then the throttle value set in the servo message in the navigation node for a neural network output speed of 0.4 m/s is:
non_linear_scaled_speed = 0.8 (0.4m/s corresponds to 0.8 of non linear scaled speed for a maximum speed of 0.8m/s )
servo_msg.throttle = maximum_speed_threshold * non_linear_scaled_speed = 0.4 * 0.8 = 0.32
For the same maximum speed threshold of 40% and maximum speed of 0.8 m/s, if the neural network outputs a speed of 0.2 m/s, then the value of the non_linear_scaled_speed
decreases according to the equation derived before:
coefficients = a, b = -1.875, 2.75 (for max speed 0.8 m/s) speed= 0.2 m/s
non_linear_scaled_speed = a * s**2 + b * s = -1.875 * 0.2**2 + 2.75 * 0.2 = 0.475 (0.2m/s corresponds to 0.475 for a maximum speed of 0.8 m/s)
servo_msg.throttle = maximum_speed_threshold * non_linear_scaled_speed* = 0.4 * 0.475 = 0.19
As seen in the continuous action space in autonomous
mode, we now need to nonlinearly
map the raw value obtained from the joystick movement to the PWM values of the servo
and motor for the car to move.
As we do not have a maximum speed value defined to calculate the coefficients of the
equation y = ax**2 + bx
, we use the maximum speed percent from the device console to map to a
range of [1.0, 5.0] speed scale values. This allows us to recalculate the curve for
each maximum speed percent value and use that to map it to the throttle value in the servo message.
The idea behind this is that a lower percentage of maximum speed percent should map to a higher speed scale value while calculating the coefficients so that the curve is flatter and the impact of actual speed values is less for a lower max speed percent as shown in the following example:
Why do we need to map the maximum speed percent values [0.0, 1.0] to another speed scale
value in the range [5.0, 1.0] to calculate coefficients a
and b
?
The maximum_speed_percentage
values that we get as an user input are in the range of [0.
0, 1.0], and higher values mean the user wants more out of the joystick movement. In
other words, higher values for the maximum_speed_percentage
means that we need to
have a steep curve while mapping the possible speed values to their transformed
counterpart (the throttle value in the servo message). We need to inversely map the maximum speed percentage values to the
speed scale value used to calculate the non_linear_scaled_speed
.
We map the 100% maximum speed to 1.0 because the possible speed values from the joystick range from [0.0, 1.0] and 1.0 is the minimum value where the curve peaks before reversing in the preceding figure. This indicates that we are guaranteed to have a nonlinearly increasing mapping for all joystick values [0.0, 1.0].
We map the 0% to 5.0 to add a safe buffer to accommodate different
vehicle battery charge levels. For example, for a lower battery charge, the user might use a
maximum speed percentage of 70% to control the car, but when fully charged, the user can
reduce the maximum_speed_percentage
to 10% to have a similar effect.
input:
throttle (from the joystick) = 0.567;
max_speed_pct (from +/- adjust maximum speed buttons) = 0.5
calculated values:
categorized_throttle = 0.5
a = -0.13333333333333336, b = 0.7333333333333334
mapped_speed = a * categorized_throttle**2 + b * categorized_throttle
= -0.13333333333333336 * 0.5**2 + 0.7333333333333334 * 0.5
= -0.13333333333333336 * 0.25 + 0.7333333333333334 * 0.5
= -0.033333333 + 0.366666667
= 0.333333334
input:
throttle (from the joystick) = 0.567;
max_speed_pct (from +/- adjust maximum speed buttons) = 0.6
calculated values:
categorized_throttle = 0.5
a = -0.17751479289940836, b = 0.8461538461538464
mapped_speed = -0.17751479289940836 * 0.25 + 0.8461538461538464 * 0.5 = 0.378698225
input:
throttle (from the joystick) = 0.467;
max_speed_pct (from +/- adjust maximum speed buttons) = 0.6
calculated values:
categorized_throttle = 0.3
a = -0.17751479289940836, b = 0.8461538461538464
mapped_speed = -0.17751479289940836 * 0.09 + 0.8461538461538464 * 0.3 = 0.237869822
After we have the final categorizing and mapping completed for the speed
and steering_angle
, we have the output in the range of [-1.0, 1.0]. These values are passed to the servo node, similar to autonomous
mode, to move the car.
The servo node expects the speed
and steering_angle
values sent as part of the servo message to be in the range of [-1.0, 1.0]. In autonomous
mode, the speed is in the range of [0.0, 1.0], as we do not support reverse driving for our car.
The AWS DeepRacer device is an open-loop system and does not have feedback to recognize the speed of the car. We linearly map the value of speed
and steering_angle
obtained at the servo node to the duty cycle values, which regulate the RPM of the servo and motor. The exact value written as the PWM duty on to the motor and servo file further depends on the bounding calibration values set during the calibration flow.
This component is one of the core features of the AWS DeepRacer application. The Follow the Leader (FTL) sample project leverages most of the concepts used in the manual mode to build the ftl_navigation
node. To learn more about the Follow the Leader(FTL) sample project, see AWS DeepRacer Follow the Leader (FTL) sample project.