voxelgym2D.envs.env_one_step module

Voxel environment corresponding to Onestep action space

class voxelgym2D.envs.env_one_step.VoxelGymOneStep(render_mode=None, mapfile='600x600.npy', view_size=21, image_size=42, max_collisions=0, max_steps=60, show_path=True, multi_output=False, partial_reward=True, inference_mode=False, discrete_actions=True, log_level='ERROR')[source]

Bases: BaseEnv

Voxel environment corresponding to Onestep action space

__init__(render_mode=None, mapfile='600x600.npy', view_size=21, image_size=42, max_collisions=0, max_steps=60, show_path=True, multi_output=False, partial_reward=True, inference_mode=False, discrete_actions=True, log_level='ERROR')[source]
Parameters:
  • render_mode (Optional[str], optional) – render mode, by default None

  • mapfile (str) – name of the map file in the maps folder

  • view_size (int) – size of the view window for observation

  • image_size (int) – size of the image to be returned as observation

  • max_collisions (int) – maximum number of collisions allowed before episode ends

  • max_steps (int) – maximum number of steps allowed before episode ends

  • show_path (bool) – whether to show the last travesed action path in the observation

  • multi_output (bool) – whether to add additional outputs in the observation

  • partial_reward (bool) – whether to give rewards for each step

  • inference_mode (bool) – whether to run in inference mode

  • discrete_actions (bool) – whether to use discrete actions

  • log_level (str, optional) – log level, by default “ERROR”. One of “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”

action_space: spaces.Space[ActType]
_compute_reward(completion_reward=False)[source]

Computes the reward for the current step

Parameters:

completion_reward (bool) – if True, returns the terminal reward for reaching the target, else returns the reward for the current step

Returns:

reward – reward for the current step

Return type:

float

static action_to_bins(action)[source]

Converts the action to bins of size 1/4 and returns the bin number in the range [0, 7] for actions

Parameters:

action (np.ndarray) – action to be converted to bin number

Returns:

bin – bin number in the range [0, 7]

Return type:

int

_create_base_obs()

Creates the base observation for the episode which can be reused throughout the episode

Return type:

None

_find_target()

Finds a target location for the agent to move to

Returns:

target_location – target location

Return type:

np.ndarray

Raises:

RuntimeError – If a target location cannot be found

_get_info()

Returns the info dictionary for the current step of the episode

Returns:

info – info dictionary

Return type:

Dict

_get_new_index_from_counts(counts_mat, alpha_p=1.0)

Returns a new index sampled from the counts matrix

Parameters:
  • counts_mat (np.ndarray) – counts matrix from which is used to sample the new index

  • alpha_p (float) – parameter to control the sampling probability

Returns:

sampled_index – sampled index from the counts matrix in the form (y, x)

Return type:

Tuple[int, int]

_get_obs()

Returns the observation for the current step of the episode

Returns:

obs – observation for the current step of the episode

Return type:

Union[np.ndarray, OrderedDict]

_is_protocol = False
_make_astar_matrix()

Creates the astar matrix for the current world map and sets the astar grid

Return type:

None

_np_random: np.random.Generator | None = None
_run_astar(target)

Runs the A* algorithm on the current world map and returns the path, path cost and number of nodes visited

Parameters:

target (np.ndarray) – target location

Return type:

Tuple[List[Tuple[int, int]], float, int]

Returns:

  • path (List[Tuple[int, int]]) – path from agent to target

  • path_cost (float) – cost of the path

  • runs (int) – number of nodes visited

_slice_grid_map()

Slices the grid map into a 2D numpy array of size (2*view_size, 2*view_size) Generate a mapping from the sliced grid map to the original grid map

Return type:

Tuple[Callable, Optional[ndarray]]

Returns:

  • mapping (Callable(int, int)) – mapping from the sliced grid map to the original grid map

  • potential_start_location (Union[np.ndarray, None]) – potential start location for the agent

_soft_reset()

Moves the agent to the center of the map and resets the target

Return type:

None

_start_end_counts()

Create arrays to keep track of the start and end cell counts

Return type:

Tuple[ndarray, ndarray]

Returns:

  • start_counts (np.ndarray) – shape like self.grid_map with the count of start cells

  • end_counts (np.ndarray) – shape like self.grid_map with the count of end cells

_take_action(action)[source]

Takes the action and updates the agent location

Parameters:

action (np.ndarray) – action to be taken

Return type:

Tuple[List, bool]

Returns:

  • new_agent_location (List) – new agent location along with the path taken

  • collision (bool) – True if the agent collides with an obstacle, else False

close()

Closes all open matplotlib figures

Return type:

None

static find_obstacle_neighbor_count(grid_map)

Finds the number of neighboring obstacles for each cell in the grid map

Parameters:

grid_map (np.ndarray) – grid map with obstacles marked as 1s and free cells marked as 0s

Returns:

neighbors – number of neighboring obstacles for each cell in the grid map

Return type:

np.ndarray

get_logger()

Returns the logger

Returns:

logger – logger object

Return type:

Logger

get_wrapper_attr(name)

Gets the attribute name from the environment.

Return type:

Any

Parameters:

name (str) –

metadata: Dict[str, Any] = {'render_fps': 1, 'render_modes': ['None']}
property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

static ordinal(num)

Returns the ordinal of a number

Parameters:

num (int) – the number to get the ordinal of

Returns:

the ordinal of the number

Return type:

str

render()

Renders the environment

Return type:

None

render_mode: str | None = None
reset(*, seed=None, options=None)

Resets the environment to the initial state and returns the initial observation and info

Parameters:
  • seed (Union[int, None]) – seed to use for the environment

  • options (Union[Dict, None]) – options to use for the environment

Return type:

Tuple[Union[ndarray, OrderedDict], Dict]

Returns:

  • obs (np.ndarray or OrderedDict) – observation from manystep environment

  • info (Dict) – info dictionary of the last step in the stack

reward_range = (-inf, inf)
spec: EnvSpec | None = None
step(action)

Takes a step in the environment and returns the observation, reward, terminated, truncated and info

Parameters:

action (np.ndarray) – the action to take

Return type:

Tuple[Union[ndarray, OrderedDict], float, bool, bool, Dict]

Returns:

  • observation (np.ndarray or OrderedDict) – observation

  • reward (float) – reward

  • terminated (bool) – whether the episode terminated

  • truncated (bool) – whether the episode was truncated

  • info (Dict) – info dictionary

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

Env

_new_world_center: np.ndarray
_next_new_world_center: np.ndarray
_agent_location: np.ndarray
_target_location: np.ndarray
_path: List
ini_astarPath: List
astarPath: List
random_gen: np.random.Generator
observation_space: spaces.Space[ObsType]