Skip to content

Simulation Components

The simulator is made of composable entities that allow changing the physics engine parameters, adding various environment handling functionalities to manipulate the environment at runtime, load the assets, select robots and controllers and design custom sensors.

Registries

The code contains registries for having a named mapping to various configuration files and classes. The registries allow ease of configuration through parameters and allow to mix-and-match various settings, robots, environments, sensors within the simulation. New configurations can be created on-the-fly and registered to be used programmatically by the active code, without having to stop to manually configure the simulation.

The code contains registries for the following components:

TO register your own custom components, please take a look at the Customization page.

Simulation Parameters

We use NVIDIA's Isaac Gym as the simulation engine. The simulator allows for selection of different physics backends, such as PhysX and Flex. The simulator provides a set of APIs to interact with the environment, such as setting the gravity, time step, and rendering options. We provide a set of default configurations for the physics engine (based on PhysX) that can be set as per the user's needs. The simulation params are set here:

Default simulation parameters
class BaseSimConfig:
# viewer camera:
class viewer:
    headless = False
    ref_env = 0
    camera_position = [-5, -5, 4]  # [m]
    lookat = [0, 0, 0]  # [m]
    camera_orientation_euler_deg = [0, 0, 0]  # [deg]
    camera_follow_type = "FOLLOW_TRANSFORM"
    width = 1280
    height = 720
    max_range = 100.0  # [m]
    min_range = 0.1
    horizontal_fov_deg = 90
    use_collision_geometry = False
    camera_follow_transform_local_offset = [-1.0, 0.0, 0.2]  # m
    camera_follow_position_global_offset = [-1.0, 0.0, 0.4]  # m

class sim:
    dt = 0.01
    substeps = 1
    gravity = [0.0, 0.0, -9.81]  # [m/s^2]
    up_axis = 1  # 0 is y, 1 is z
    use_gpu_pipeline = True

    class physx:
        num_threads = 10
        solver_type = 1  # 0: pgs, 1: tgs
        num_position_iterations = 4
        num_velocity_iterations = 2
        contact_offset = 0.002  # [m]
        rest_offset = 0.001  # [m]
        bounce_threshold_velocity = 0.1  # 0.5 [m/s]
        max_depenetration_velocity = 1.0
        max_gpu_contact_pairs = 2**24  # 2**24 -> needed for 8000 envs and more
        default_buffer_size_multiplier = 10
        contact_collection = 1  # 0: never, 1: last sub-step, 2: all sub-steps (default=2)

While the default setting of the physics engine is PhysX for this simulator, this can be changed to Flex with minor modifications. It has not been tested for this purpose.

Assets

Simulation assets in Aerial Gym are generally URDF files that can each have their own parameters for simulation. The asset configuration files for assets are stored in config/asset_config. In our implementation we have defined asset classes based on the type of the asset of our purposes and ground their properties together. Assets of each type can be represented through different URDF files inherently enabling randomization across environments without a further need to specify which asset is to be loaded. Just adding additional URDFs to the appropriate folder path suffices to add them in the pool for selection fr simulation.

The parameters for each asset are derived from the BaseAssetParams class that includes the number of assets of that type to be loaded per environment, specifies root asset folder, specifies the position, orientation ratios and physics properties of the asset such as damping coefficients, density etc. Additional parameters can be used to specify properties such as presence of force sensors on the asset, per-link or whole body segmentation labels, etc.

The BaseAssetParams file is as follows:

Example of configurable BaseAssetParams class
class BaseAssetParams:
    num_assets = 1  # number of assets to include per environment

    asset_folder = f"{AERIAL_GYM_DIRECTORY}/resources/models/environment_assets"
    file = None  # if file=None, random assets will be selected. If not None, this file will be used

    min_position_ratio = [0.5, 0.5, 0.5]  # min position as a ratio of the bounds
    max_position_ratio = [0.5, 0.5, 0.5]  # max position as a ratio of the bounds

    collision_mask = 1

    disable_gravity = False
    replace_cylinder_with_capsule = True  # replace collision cylinders with capsules, leads to faster/more stable simulation
    flip_visual_attachments = (
        True  # Some .obj meshes must be flipped from y-up to z-up
    )
    density = 0.000001
    angular_damping = 0.0001
    linear_damping = 0.0001
    max_angular_velocity = 100.0
    max_linear_velocity = 100.0
    armature = 0.001

    collapse_fixed_joints = True
    fix_base_link = True
    color = None
    keep_in_env = False

    body_semantic_label = 0
    link_semantic_label = 0
    per_link_semantic = False
    semantic_masked_links = {}
    place_force_sensor = False
    force_sensor_parent_link = "base_link"
    force_sensor_transform = [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        1.0,
    ]  # position, quat x, y, z, w
    use_collision_mesh_instead_of_visual = False

Environments

Environment specification determines what are the components in a simulation environment. A configuration file can be used to select a particular robot (with it's sensors), a specified controller for thr robot and selection of obstacles that are present in the environment alongside a strategy to spawn and randomize their positions w.r.t the environment bounds. The environment configuration files are stored in config/env_config folder. The environment manager calls each of the environment entities to allow specific user-coded behaviors at each timestep or to perform certain actions when the environment is reset. The environment manager is responsible for spawning the assets, robots and obstacles in the environment and managing their interactions.

Example of an Environment Configuration for an empty environment with a robot
class EmptyEnvCfg:
    class env:
        num_envs = 3  # number of environments
        num_env_actions = 0  # this is the number of actions handled by the environment
        # these are the actions that are sent to environment entities
        # and some of them may be used to control various entities in the environment
        # e.g. motion of obstacles, etc.
        env_spacing = 1.0  # not used with heightfields/trimeshes
        num_physics_steps_per_env_step_mean = 1  # number of steps between camera renders mean
        num_physics_steps_per_env_step_std = 0  # number of steps between camera renders std
        render_viewer_every_n_steps = 10  # render the viewer every n steps
        collision_force_threshold = 0.010  # collision force threshold
        manual_camera_trigger = False  # trigger camera captures manually
        reset_on_collision = (
            True  # reset environment when contact force on quadrotor is above a threshold
        )
        create_ground_plane = False  # create a ground plane
        sample_timestep_for_latency = True  # sample the timestep for the latency noise
        perturb_observations = True
        keep_same_env_for_num_episodes = 1

        use_warp = False
        e_s = env_spacing
        lower_bound_min = [-e_s, -e_s, -e_s]  # lower bound for the environment space
        lower_bound_max = [-e_s, -e_s, -e_s]  # lower bound for the environment space
        upper_bound_min = [e_s, e_s, e_s]  # upper bound for the environment space
        upper_bound_max = [e_s, e_s, e_s]  # upper bound for the environment space

    class env_config:
        include_asset_type = {}

        asset_type_to_dict_map = {}

In order to add assets to the environment, the include_asset_type dictionary can be used to specify the assets that are to be included in the environment. The asset_type_to_dict_map dictionary maps the asset type to the class defining the asset parameters.

Environment configuration file for an environment with obstacles

for the case of an environment with obstacles can be seen below:

from aerial_gym.config.env_config.env_object_config import EnvObjectConfig

import numpy as np


class EnvWithObstaclesCfg(EnvObjectConfig):
    class env:
        num_envs = 64
        num_env_actions = 4  # this is the number of actions handled by the environment
        # potentially some of these can be input from the RL agent for the robot and
        # some of them can be used to control various entities in the environment
        # e.g. motion of obstacles, etc.
        env_spacing = 5.0  # not used with heightfields/trimeshes

        num_physics_steps_per_env_step_mean = 10  # number of steps between camera renders mean
        num_physics_steps_per_env_step_std = 0  # number of steps between camera renders std

        render_viewer_every_n_steps = 1  # render the viewer every n steps
        reset_on_collision = (
            True  # reset environment when contact force on quadrotor is above a threshold
        )
        collision_force_threshold = 0.05  # collision force threshold [N]
        create_ground_plane = False  # create a ground plane
        sample_timestep_for_latency = True  # sample the timestep for the latency noise
        perturb_observations = True
        keep_same_env_for_num_episodes = 1

        use_warp = True
        lower_bound_min = [-2.0, -4.0, -3.0]  # lower bound for the environment space
        lower_bound_max = [-1.0, -2.5, -2.0]  # lower bound for the environment space
        upper_bound_min = [9.0, 2.5, 2.0]  # upper bound for the environment space
        upper_bound_max = [10.0, 4.0, 3.0]  # upper bound for the environment space

    class env_config:
        include_asset_type = {
            "panels": True,
            "thin": False,
            "trees": False,
            "objects": True,
            "left_wall": True,
            "right_wall": True,
            "back_wall": True,
            "front_wall": True,
            "top_wall": True,
            "bottom_wall": True,
        }

        # maps the above names to the classes defining the assets. They can be enabled and disabled above in include_asset_type
        asset_type_to_dict_map = {
            "panels": EnvObjectConfig.panel_asset_params,
            "thin": EnvObjectConfig.thin_asset_params,
            "trees": EnvObjectConfig.tree_asset_params,
            "objects": EnvObjectConfig.object_asset_params,
            "left_wall": EnvObjectConfig.left_wall,
            "right_wall": EnvObjectConfig.right_wall,
            "back_wall": EnvObjectConfig.back_wall,
            "front_wall": EnvObjectConfig.front_wall,
            "bottom_wall": EnvObjectConfig.bottom_wall,
            "top_wall": EnvObjectConfig.top_wall,
        }

Tasks

An environment specification determines what is populated in an independent simulation instance and how the collective simulation steps through time based on commanded actions. The task here is however slightly different. We intend to use this term of interpreting task-specific information from the environment. A task class instantiates the entire simulation with all its parallel robots and assets and therefore, has access to all the simulation information. We intend to use this class to determine how the environment is interpreted for RL tasks. For example, a given simulation instance with sim params, environment and asset specification, robot, sensors and controller specifications, can be utilized to train a policy to perform completely different tasks. Example of these could include setpoint navigation through clutter, observing a specific asset in the simulation, perching on that specific asset in simulation and so on and so forth. All these tasks can be performed with the same set of objects in the environment, but require a different interpretation of the environment data for training the RL algorithm appropriately. This can be done within the Task classes. A task can be specified in the files in the config/task_config folder as:

Example of a Task Configuration
class task_config:
    seed = 10
    sim_name = "base_sim"
    env_name = "empty_env"
    robot_name = "base_quadrotor"
    args = {}
    num_envs = 2
    device = "cuda:0"
    observation_space_dim = 13
    privileged_observation_space_dim = 0
    action_space_dim = 4
    episode_len_steps = 1000 # real physics time for simulation is this value multiplied by sim.dt
    return_state_before_reset = False
    reward_parameters = {
        "pos_error_gain1": [2.0, 2.0, 2.0],
        "pos_error_exp1": [1/3.5, 1/3.5, 1/3.5],
        "pos_error_gain2": [2.0, 2.0, 2.0],
        "pos_error_exp2": [2.0, 2.0, 2.0],
        "dist_reward_coefficient": 7.5,
        "max_dist": 15.0,
        "action_diff_penalty_gain": [1.0, 1.0, 1.0],
        "absolute_action_reward_gain": [2.0, 2.0, 2.0],
        "crash_penalty": -100,
    }

    # a + bx for action scaling
    consant_for_action = torch.tensor([0.0, 0.0, 0.0, 0.0], dtype=torch.float32, device=device)

    scale_for_action = torch.tensor([3.0, 3.0, 3.0, 1.50], dtype=torch.float32, device=device)


    def action_transformation_function(action):
        clamped_action = torch.clamp(action, -1.0, 1.0)
        return task_config.consant_for_action + task_config.scale_for_action * clamped_action     

A sample task for position setpoint navigation (without sensors or obstacles) is provided as an example:

PositionSetpointTask class definition example
class PositionSetpointTask(BaseTask):
    def __init__(self, task_config):
        super().__init__(task_config)
        self.device = self.task_config.device
        # set the each of the elements of reward parameter to a torch tensor
        # common boilerplate code here

        # Currently only the "observations" are sent to the actor and critic.
        # The "privileged_obs" are not handled so far in sample-factory

        self.task_obs = {
            "observations": torch.zeros(
                (self.sim_env.num_envs, self.task_config.observation_space_dim),
                device=self.device,
                requires_grad=False,
            ),
            "priviliged_obs": torch.zeros(
                (self.sim_env.num_envs, self.task_config.privileged_observation_space_dim),
                device=self.device,
                requires_grad=False,
            ),
            "collisions": torch.zeros(
                (self.sim_env.num_envs, 1), device=self.device, requires_grad=False
            ),
            "rewards": torch.zeros(
                (self.sim_env.num_envs, 1), device=self.device, requires_grad=False
            ),
        }

    # common boilerplate code here

    def step(self, actions):
        # this uses the action, gets observations
        # calculates rewards, returns tuples
        # In this case, the episodes that are terminated need to be
        # first reset, and the first observation of the new episode
        # needs to be returned.

        transformed_action = self.action_transformation_function(actions)
        self.sim_env.step(actions=transformed_action)

        # This step must be done since the reset is done after the reward is calculated.
        # This enables the robot to send back an updated state, and an updated observation to the RL agent after the reset.
        # This is important for the RL agent to get the correct state after the reset.
        self.rewards[:], self.terminations[:] = self.compute_rewards_and_crashes(self.obs_dict)

        if self.task_config.return_state_before_reset == True:
            return_tuple = self.get_return_tuple()


        self.truncations[:] = torch.where(self.sim_env.sim_steps > self.task_config.episode_len_steps, 1, 0)
        self.sim_env.post_reward_calculation_step()

        self.infos = {}  # self.obs_dict["infos"]

        if self.task_config.return_state_before_reset == False:
            return_tuple = self.get_return_tuple()
        return return_tuple

    ...

    def process_obs_for_task(self):
        self.task_obs["observations"][:, 0:3] = (
            self.target_position - self.obs_dict["robot_position"] # position in environment/world frame
        )
        self.task_obs["observations"][:, 3:7] = self.obs_dict["robot_orientation"] # orientation in environment/world frame
        self.task_obs["observations"][:, 7:10] = self.obs_dict["robot_body_linvel"] # linear velocity in the body/imu frame
        self.task_obs["observations"][:, 10:13] = self.obs_dict["robot_body_angvel"] # angular velocity in the body/imu frame
        self.task_obs["rewards"] = self.rewards # reward for the time step after it is calculated
        self.task_obs["terminations"] = self.terminations # terminations/crashes 
        self.task_obs["truncations"] = self.truncations # truncations or premature resets (for purposes of diversifying data and episodes)


@torch.jit.script
def exp_func(x, gain, exp):
    return gain * torch.exp(-exp * x * x)


@torch.jit.script
def compute_reward(
    pos_error, crashes, action, prev_action, curriculum_level_multiplier, parameter_dict
):
    # type: (Tensor, Tensor, Tensor, Tensor, float, Dict[str, Tensor]) -> Tuple[Tensor, Tensor]
    dist = torch.norm(pos_error, dim=1)

    pos_reward = 2.0 / ( 1.0 + dist * dist)

    dist_reward = (20 - dist) / 20.0

    total_reward = (
        pos_reward + dist_reward # + up_reward + action_diff_reward + absolute_action_reward
    )
    total_reward[:] = curriculum_level_multiplier * total_reward
    crashes[:] = torch.where(dist > 8.0, torch.ones_like(crashes), crashes)

    total_reward[:] = torch.where(
        crashes > 0.0, -2 * torch.ones_like(total_reward), total_reward
    )
    return total_reward, crashes

The Task class is ultimately designed to be used with RL frameworks, therefore conforms to the Gymnasium API specification. In this above class, the step(...) function first translates the commands from the RL agent to a control command for the specific robot platform by transforming the action input. Subsequently this is commanded to the robot and the environment is stepped. Finally, a reward is computed for the new state and truncations and terminations are determined and the final tuple is returned for use by the RL framework. Similarly, for trajectory tracking, only the reward function and observation needs to be changed to train the RL algorithm without making any changes to the asset, robots or the environment.

To add your own custom tasks please refer to the section on customizing the simulator.

Difference between Environment and Task

A lot of different simulator implementations interchange the terminologies. In our case, we view the environment as the components that define the robot, it's physics surroundings, i.e., the assets near the robot, the parameters of the physics engine that detemine how the various entities in the simulation world interact with one another, and how sensors perceive the data via the sensor parameters.

The task on the other hand is an interpretation of the simulation world and the information provided by / collected from it to reach a particular goal that is desired by the user. The same environment can be used to train multiple tasks and the tasks can be changed without changing an environment definition.

For example, an empty environment with a quadrotor can be used to train a position setpoint task, or a trajectory tracking task. An environment with a set of obstacles can be used to train a policy that can either navigate through the obstacles or perch on a specific asset in the environment. The task is the interpretation of the environment data for the RL algorithm to learn the desired behavior.

To relate it with a familiar environment from the OpenAI Gym suite of tasks, an "environment" in our case could refer to a CartPole world with its associated dynamics, however a "task" in our case would allow the same cartpole to be controlled to balance the pole upright, or to keep swinging the pole at a given angular rate or to have the endpoint of the pole at a given location in the environment. All of which require different formulation of rewards and observations for the RL algorithm to learn the desired behavior.

Robots

Robots can be specified and configured independently of the sensors and the environment using a robot registry. More about robots can be found on the page for Robots and Controllers

Controllers

Controllers can be specified and selected independently of the robot platform. Note however, that all combinations of controllers will not produce optimal results with all platforms. The controllers can be registered and selected from the controller registry. More about controllers can be found on the page for Robots and Controllers.

Sensors

Similar to the above, the sensors can be specified and selected independently. However since the sensors are mounted on the robot platform, we have made a choice to select the sensors for the robot in the robot config file, and not directly as a registry (it is possible to do so yourself with very minor changes to the code). More about the capabilities of the sensors can be found on the page for Sensors and Rendering.