voxelgym2D.envs.env_one_step module

Voxel environment corresponding to Onestep action space

class voxelgym2D.envs.env_one_step.VoxelGymOneStep(render_mode=None, mapfile='600x600.npy', view_size=21, image_size=42, max_collisions=0, max_steps=60, show_path=True, multi_output=False, partial_reward=True, inference_mode=False, discrete_actions=True, log_level='ERROR')[source]

Bases: BaseEnv

Voxel environment corresponding to Onestep action space

__init__(render_mode=None, mapfile='600x600.npy', view_size=21, image_size=42, max_collisions=0, max_steps=60, show_path=True, multi_output=False, partial_reward=True, inference_mode=False, discrete_actions=True, log_level='ERROR')[source]

Parameters:

render_mode (Optional[str], optional) – render mode, by default None
mapfile (str) – name of the map file in the maps folder
view_size (int) – size of the view window for observation
image_size (int) – size of the image to be returned as observation
max_collisions (int) – maximum number of collisions allowed before episode ends
max_steps (int) – maximum number of steps allowed before episode ends
show_path (bool) – whether to show the last travesed action path in the observation
multi_output (bool) – whether to add additional outputs in the observation
partial_reward (bool) – whether to give rewards for each step
inference_mode (bool) – whether to run in inference mode
discrete_actions (bool) – whether to use discrete actions
log_level (str, optional) – log level, by default “ERROR”. One of “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”

action_space: spaces.Space[ActType]

_compute_reward(completion_reward=False)[source]

Computes the reward for the current step

Parameters:: completion_reward (bool) – if True, returns the terminal reward for reaching the target, else returns the reward for the current step
Returns:: reward – reward for the current step
Return type:: float

static action_to_bins(action)[source]

Converts the action to bins of size 1/4 and returns the bin number in the range [0, 7] for actions

Parameters:: action (np.ndarray) – action to be converted to bin number
Returns:: bin – bin number in the range [0, 7]
Return type:: int

_create_base_obs()

Creates the base observation for the episode which can be reused throughout the episode

Return type:: None

_find_target()

Finds a target location for the agent to move to

Returns:: target_location – target location
Return type:: np.ndarray
Raises:: RuntimeError – If a target location cannot be found

_get_info()

Returns the info dictionary for the current step of the episode

Returns:: info – info dictionary
Return type:: Dict

_get_new_index_from_counts(counts_mat, alpha_p=1.0)

Returns a new index sampled from the counts matrix

Parameters:

counts_mat (np.ndarray) – counts matrix from which is used to sample the new index
alpha_p (float) – parameter to control the sampling probability

Returns:

sampled_index – sampled index from the counts matrix in the form (y, x)

Return type:

Tuple[int, int]

_get_obs()

Returns the observation for the current step of the episode

Returns:: obs – observation for the current step of the episode
Return type:: Union[np.ndarray, OrderedDict]

_is_protocol = False

_make_astar_matrix()

Creates the astar matrix for the current world map and sets the astar grid

Return type:: None

_np_random: np.random.Generator | None = None

_run_astar(target)

Runs the A* algorithm on the current world map and returns the path, path cost and number of nodes visited

Parameters:

target (np.ndarray) – target location

Return type:

Tuple[List[Tuple[int, int]], float, int]

Returns:

path (List[Tuple[int, int]]) – path from agent to target
path_cost (float) – cost of the path
runs (int) – number of nodes visited

_slice_grid_map()

Slices the grid map into a 2D numpy array of size (2*view_size, 2*view_size) Generate a mapping from the sliced grid map to the original grid map

Return type:

Tuple[Callable, Optional[ndarray]]

Returns:

mapping (Callable(int, int)) – mapping from the sliced grid map to the original grid map
potential_start_location (Union[np.ndarray, None]) – potential start location for the agent

_soft_reset()

Moves the agent to the center of the map and resets the target

Return type:: None

_start_end_counts()

Create arrays to keep track of the start and end cell counts

Return type:

Tuple[ndarray, ndarray]

Returns:

start_counts (np.ndarray) – shape like self.grid_map with the count of start cells
end_counts (np.ndarray) – shape like self.grid_map with the count of end cells

_take_action(action)[source]

Takes the action and updates the agent location

Parameters:

action (np.ndarray) – action to be taken

Return type:

Tuple[List, bool]

Returns:

new_agent_location (List) – new agent location along with the path taken
collision (bool) – True if the agent collides with an obstacle, else False

close()

Closes all open matplotlib figures

Return type:: None

static find_obstacle_neighbor_count(grid_map)

Finds the number of neighboring obstacles for each cell in the grid map

Parameters:: grid_map (np.ndarray) – grid map with obstacles marked as 1s and free cells marked as 0s
Returns:: neighbors – number of neighboring obstacles for each cell in the grid map
Return type:: np.ndarray

get_logger()

Returns the logger

Returns:: logger – logger object
Return type:: Logger

get_wrapper_attr(name)

Gets the attribute name from the environment.

Return type:: Any
Parameters:: name (str) –

metadata: Dict[str, Any] = {'render_fps': 1, 'render_modes': ['None']}

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

static ordinal(num)

Returns the ordinal of a number

Parameters:: num (int) – the number to get the ordinal of
Returns:: the ordinal of the number
Return type:: str

render()

Renders the environment

Return type:: None

render_mode: str | None = None

reset(*, seed=None, options=None)

Resets the environment to the initial state and returns the initial observation and info

Parameters:

seed (Union[int, None]) – seed to use for the environment
options (Union[Dict, None]) – options to use for the environment

Return type:

Tuple[Union[ndarray, OrderedDict], Dict]

Returns:

obs (np.ndarray or OrderedDict) – observation from manystep environment
info (Dict) – info dictionary of the last step in the stack

reward_range = (-inf, inf)

spec: EnvSpec | None = None

step(action)

Takes a step in the environment and returns the observation, reward, terminated, truncated and info

Parameters:

action (np.ndarray) – the action to take

Return type:

Tuple[Union[ndarray, OrderedDict], float, bool, bool, Dict]

Returns:

observation (np.ndarray or OrderedDict) – observation
reward (float) – reward
terminated (bool) – whether the episode terminated
truncated (bool) – whether the episode was truncated
info (Dict) – info dictionary

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: The base non-wrapped gymnasium.Env instance
Return type:: Env

_new_world_center: np.ndarray

_next_new_world_center: np.ndarray

_agent_location: np.ndarray

_target_location: np.ndarray

_path: List

ini_astarPath: List

astarPath: List

random_gen: np.random.Generator

observation_space: spaces.Space[ObsType]