Environment builder
You can easily create your own customized HierarchyCraft environment with the all the benefits (graphical user interface, tasks, reward shaping, solving behavior, requirements graph).
Each HierarchyCraft environment is defined by a list of transformations and an initial state.
Thus, you just need to understand how to create a list of
hcraft.transformation
and how to build a world with an initial state from those.
The initial state defines the starting state of the environment, including the agent's position, inventory, and zones inventories. By combining transformations and an initial state, users can simply create complex hierarchical environments with a high degree of flexibility and control.
See hcraft.state
for more details on the HierarchyCraft environements state.
You can also check more complex examples in hcraft.examples
.
Example: Simple customed environment
Let's make a simple environment, where the goal is to open a the treasure chest and take it's gold.
Create items
First, we need to represent the items we want to be able to manipulate.
For now, we only have two items we can simply build using the Item class from hcraft.world
:
from hcraft import Item
CHEST = Item("treasure_chest")
GOLD = Item("gold")
Link items with transformations
We want to remove the chest from the zone where our player is, and add it to his inventory.
We can then link those two items with a Tranformation from hcraft.transformation
:
from hcraft.transformation import Transformation, Use, Yield, PLAYER, CURRENT_ZONE
TAKE_GOLD_FROM_CHEST = Transformation(
inventory_changes=[
Use(CURRENT_ZONE, CHEST, consume=1),
Yield(PLAYER, GOLD),
]
)
Of course, TAKE_GOLD_FROM_CHEST
will not be valid unless there is a CHEST
in the zone.
Let's create a zone where we want our CHEST
to be.
Create a zone
Like items, zones are created with a Zone object from hcraft.world
:
from hcraft import Zone
TREASURE_ROOM = Zone("treasure_room")
To place our CHEST
in the TREASURE_ROOM
, we need to build a World
from hcraft.world
that will define our environment.
Build a World from transformations
Items and zones in transformations will automaticaly be indexed by the World
to be stored in the environment state. (See hcraft.state
for more details)
We can simply build a world from a list of transformations:
from hcraft.world import world_from_transformations
WORLD = world_from_transformations(
transformations=[TAKE_GOLD_FROM_CHEST],
start_zone=TREASURE_ROOM,
start_zones_items={TREASURE_ROOM: [CHEST]},
)
Note that the world stores the initial state of the environment.
So we can add our CHEST
in the TREASURE_ROOM
here !
Complete your first HierarchyCraft environment
To build a complete hcraft environment,
we simply need to pass our WORLD
to HcraftEnv from hcraft.env
:
from hcraft import HcraftEnv
env = HcraftEnv(WORLD)
We can already render it in the GUI:
from hcraft import render_env_with_human
render_env_with_human(env)
Add a goal
For now, our environment is a sandbox that never ends and has no goal.
We can simply add a Purpose from hcraft.purpose
like so:
from hcraft.purpose import GetItemTask
get_gold_task = GetItemTask(GOLD)
env = HcraftEnv(WORLD, purpose=get_gold_task)
render_env_with_human(env)
Turn up the challenge
Now that we have the basics done, let's have a bit more fun with our environment! Let's lock the chest with keys, and add two room, a start room and a keys room.
First let's build the KEY
item and the KEY_ROOM
.
KEY = Item("key")
KEY_ROOM = Zone("key_room")
Now let's make the KEY_ROOM
a source of maximum 2 KEY
with a transformation:
SEARCH_KEY = Transformation(
inventory_changes=[
Yield(PLAYER, KEY, max=1),
],
zone=KEY_ROOM,
)
Note that max=1
because max is the maximum before the transformation.
Then add the 'new state' for the CHEST
, for this we simply build a new item LOCKED_CHEST
,
and we add a transformation that will unlock the LOCKED_CHEST
into a CHEST
consuming two KEYS
.
LOCKED_CHEST = Item("locked_chest")
UNLOCK_CHEST = Transformation(
inventory_changes=[
Use(PLAYER, KEY, 2),
Use(CURRENT_ZONE, LOCKED_CHEST, consume=1),
Yield(CURRENT_ZONE, CHEST),
],
)
Now we need to be able to move between zones, for this we use (again) transformations:
Let's make the START_ROOM
the link between the two other rooms.
START_ROOM = Zone("start_room")
MOVE_TO_KEY_ROOM = Transformation(
destination=KEY_ROOM,
zone=START_ROOM,
)
MOVE_TO_TREASURE_ROOM = Transformation(
destination=TREASURE_ROOM,
zone=START_ROOM,
)
MOVE_TO_START_ROOM = Transformation(
destination=START_ROOM,
)
We are ready for our V2 ! Again, we build the world from all our transformations and the env from the world.
But now the chest inside the TREASURE_ROOM
is the LOCKED_CHEST
and our player start in START_ROOM
.
Also, let's add a time limit to spice things up.
from hcraft.world import world_from_transformations
WORLD_2 = world_from_transformations(
transformations=[
TAKE_GOLD_FROM_CHEST,
SEARCH_KEY,
UNLOCK_CHEST,
MOVE_TO_KEY_ROOM,
MOVE_TO_TREASURE_ROOM,
MOVE_TO_START_ROOM,
],
start_zone=START_ROOM,
start_zones_items={TREASURE_ROOM: [LOCKED_CHEST]},
)
env = HcraftEnv(WORLD_2, purpose=get_gold_task, max_step=10)
render_env_with_human(env)
Add graphics
For now, our environment is a bit ... ugly. Text is cool, but images are better !
For that, we need to give our world a ressource path where images are located.
To simplify our case, we can use the already built folder under the treasure example:
from pathlib import Path
import hcraft
WORLD_2.resources_path = Path(hcraft.__file__).parent.joinpath(
"examples", "treasure", "resources"
)
render_env_with_human(env)
And we now have cool images for items !
Under the hood, this can simply be replicated by getting some assets. (Like those previous 2D assets from Pixel_Poem on itch.io )
We then simply put them into a folder like so, with matching names for items and zones:
cwd
├───myscript.py
├───resources
│ ├───items
│ │ ├───gold.png
│ │ ├───key.png
│ │ ├───locked_chest.png
│ │ └───treasure_chest.png
│ ├───zones
│ └───font.ttf
And setting that path as the world's ressources_path:
WORLD_2.resources_path = Path("resources")
render_env_with_human(env)
Try to do the same with zones and change the font aswell!
Package into a class
If you wish to have someone else use your enviroment, you should pack it up into a class and inherit HcraftEnv directly like so:
from pathlib import Path
from typing import List
from hcraft.elements import Item, Zone
from hcraft.env import HcraftEnv
from hcraft.purpose import GetItemTask
from hcraft.transformation import Transformation, Use, Yield, PLAYER, CURRENT_ZONE
from hcraft.world import world_from_transformations
class TreasureEnv(HcraftEnv):
"""A simple environment used in for the env building tutorial."""
TREASURE_ROOM = Zone("treasure_room")
"""Room containing the treasure."""
KEY_ROOM = Zone("key_room")
"""Where all the keys are stored."""
START_ROOM = Zone("start_room")
"""Where the player starts."""
CHEST = Item("treasure_chest")
"""Treasure chest containing gold."""
LOCKED_CHEST = Item("locked_chest")
"""Treasure chest containing gold ... but it's locked."""
GOLD = Item("gold")
"""Gold! well the pixel version at least."""
KEY = Item("key")
"""A key ... it can probably unlock things."""
def __init__(self, **kwargs) -> None:
transformations = self._build_transformations()
world = world_from_transformations(
transformations=transformations,
start_zone=self.START_ROOM,
start_zones_items={self.TREASURE_ROOM: [self.LOCKED_CHEST]},
)
world.resources_path = Path(__file__).parent / "resources"
super().__init__(
world, purpose=GetItemTask(self.GOLD), name="TreasureHcraft", **kwargs
)
def _build_transformations(self) -> List[Transformation]:
TAKE_GOLD_FROM_CHEST = Transformation(
"take-gold-from-chest",
inventory_changes=[
Use(CURRENT_ZONE, self.CHEST, consume=1),
Yield(PLAYER, self.GOLD),
],
)
SEARCH_KEY = Transformation(
"search-key",
inventory_changes=[
Yield(PLAYER, self.KEY, max=1),
],
zone=self.KEY_ROOM,
)
UNLOCK_CHEST = Transformation(
"unlock-chest",
inventory_changes=[
Use(PLAYER, self.KEY, 2),
Use(CURRENT_ZONE, self.LOCKED_CHEST, consume=1),
Yield(CURRENT_ZONE, self.CHEST),
],
)
MOVE_TO_KEY_ROOM = Transformation(
"move-to-key_room",
destination=self.KEY_ROOM,
zone=self.START_ROOM,
)
MOVE_TO_TREASURE_ROOM = Transformation(
"move-to-treasure_room",
destination=self.TREASURE_ROOM,
zone=self.START_ROOM,
)
MOVE_TO_START_ROOM = Transformation(
"move-to-start_room",
destination=self.START_ROOM,
)
return [
TAKE_GOLD_FROM_CHEST,
SEARCH_KEY,
UNLOCK_CHEST,
MOVE_TO_KEY_ROOM,
MOVE_TO_TREASURE_ROOM,
MOVE_TO_START_ROOM,
]
That's it for this small customized env if you want more, be sure to check Transformation
form hcraft.transformation
, there is plenty we didn't cover here.
1"""# Environment builder 2 3You can easily create your own customized HierarchyCraft environment with the all the benefits 4(graphical user interface, tasks, reward shaping, solving behavior, requirements graph). 5 6Each HierarchyCraft environment is defined by a list of transformations and an initial state. 7 8Thus, you just need to understand how to create a list of 9[`hcraft.transformation`](https://irll.github.io/HierarchyCraft/hcraft/transformation.html) 10and how to build a world with an initial state from those. 11 12The initial state defines the starting state of the environment, 13including the agent's position, inventory, and zones inventories. 14By combining transformations and an initial state, users can simply create complex hierarchical environments 15with a high degree of flexibility and control. 16 17See [`hcraft.state`](https://irll.github.io/HierarchyCraft/hcraft/state.html) 18for more details on the HierarchyCraft environements state. 19 20You can also check more complex examples in `hcraft.examples`. 21 22# Example: Simple customed environment 23 24Let's make a simple environment, where the goal is to open a the treasure chest and take it's gold. 25 26## Create items 27 28First, we need to represent the items we want to be able to manipulate. 29 30For now, we only have two items we can simply build using the Item class from `hcraft.world`: 31 32```python 33from hcraft import Item 34 35CHEST = Item("treasure_chest") 36GOLD = Item("gold") 37``` 38 39## Link items with transformations 40 41We want to remove the chest from the zone where our player is, and add it to his inventory. 42 43We can then link those two items with a Tranformation from `hcraft.transformation`: 44 45```python 46from hcraft.transformation import Transformation, Use, Yield, PLAYER, CURRENT_ZONE 47 48TAKE_GOLD_FROM_CHEST = Transformation( 49 inventory_changes=[ 50 Use(CURRENT_ZONE, CHEST, consume=1), 51 Yield(PLAYER, GOLD), 52 ] 53) 54``` 55 56Of course, `TAKE_GOLD_FROM_CHEST` will not be valid unless there is a `CHEST` in the zone. 57 58Let's create a zone where we want our `CHEST` to be. 59 60## Create a zone 61 62Like items, zones are created with a Zone object from `hcraft.world`: 63 64```python 65from hcraft import Zone 66 67TREASURE_ROOM = Zone("treasure_room") 68``` 69 70To place our `CHEST` in the `TREASURE_ROOM`, we need to build a World 71from `hcraft.world` that will define our environment. 72 73## Build a World from transformations 74 75Items and zones in transformations will automaticaly be indexed by the World 76to be stored in the environment state. (See `hcraft.state` for more details) 77We can simply build a world from a list of transformations: 78 79```python 80from hcraft.world import world_from_transformations 81 82WORLD = world_from_transformations( 83 transformations=[TAKE_GOLD_FROM_CHEST], 84 start_zone=TREASURE_ROOM, 85 start_zones_items={TREASURE_ROOM: [CHEST]}, 86) 87``` 88 89Note that the world stores the initial state of the environment. 90So we can add our `CHEST` in the `TREASURE_ROOM` here ! 91 92## Complete your first HierarchyCraft environment 93 94To build a complete hcraft environment, 95we simply need to pass our `WORLD` to HcraftEnv from `hcraft.env`: 96 97```python 98from hcraft import HcraftEnv 99 100env = HcraftEnv(WORLD) 101``` 102 103We can already render it in the GUI: 104 105```python 106from hcraft import render_env_with_human 107 108render_env_with_human(env) 109``` 110 111 112## Add a goal 113 114For now, our environment is a sandbox that never ends and has no goal. 115We can simply add a Purpose from `hcraft.purpose` like so: 116 117```python 118from hcraft.purpose import GetItemTask 119 120get_gold_task = GetItemTask(GOLD) 121env = HcraftEnv(WORLD, purpose=get_gold_task) 122render_env_with_human(env) 123``` 124 125## Turn up the challenge 126 127Now that we have the basics done, let's have a bit more fun with our environment! 128Let's lock the chest with keys, and add two room, a start room and a keys room. 129 130First let's build the `KEY` item and the `KEY_ROOM`. 131 132```python 133KEY = Item("key") 134KEY_ROOM = Zone("key_room") 135``` 136 137Now let's make the `KEY_ROOM` a source of maximum 2 `KEY` with a transformation: 138 139```python 140SEARCH_KEY = Transformation( 141 inventory_changes=[ 142 Yield(PLAYER, KEY, max=1), 143 ], 144 zone=KEY_ROOM, 145) 146``` 147Note that `max=1` because max is the maximum *before* the transformation. 148 149Then add the 'new state' for the `CHEST`, for this we simply build a new item `LOCKED_CHEST`, 150and we add a transformation that will unlock the `LOCKED_CHEST` into a `CHEST` consuming two `KEYS`. 151 152```python 153LOCKED_CHEST = Item("locked_chest") 154UNLOCK_CHEST = Transformation( 155 inventory_changes=[ 156 Use(PLAYER, KEY, 2), 157 Use(CURRENT_ZONE, LOCKED_CHEST, consume=1), 158 Yield(CURRENT_ZONE, CHEST), 159 ], 160) 161``` 162 163Now we need to be able to move between zones, for this we use (again) transformations: 164 165Let's make the `START_ROOM` the link between the two other rooms. 166 167```python 168START_ROOM = Zone("start_room") 169MOVE_TO_KEY_ROOM = Transformation( 170 destination=KEY_ROOM, 171 zone=START_ROOM, 172) 173MOVE_TO_TREASURE_ROOM = Transformation( 174 destination=TREASURE_ROOM, 175 zone=START_ROOM, 176) 177MOVE_TO_START_ROOM = Transformation( 178 destination=START_ROOM, 179) 180``` 181 182We are ready for our V2 ! 183Again, we build the world from all our transformations and the env from the world. 184 185But now the chest inside the `TREASURE_ROOM` is the `LOCKED_CHEST` 186and our player start in `START_ROOM`. 187 188Also, let's add a time limit to spice things up. 189 190```python 191from hcraft.world import world_from_transformations 192 193WORLD_2 = world_from_transformations( 194 transformations=[ 195 TAKE_GOLD_FROM_CHEST, 196 SEARCH_KEY, 197 UNLOCK_CHEST, 198 MOVE_TO_KEY_ROOM, 199 MOVE_TO_TREASURE_ROOM, 200 MOVE_TO_START_ROOM, 201 ], 202 start_zone=START_ROOM, 203 start_zones_items={TREASURE_ROOM: [LOCKED_CHEST]}, 204) 205env = HcraftEnv(WORLD_2, purpose=get_gold_task, max_step=10) 206render_env_with_human(env) 207``` 208 209## Add graphics 210 211For now, our environment is a bit ... ugly. 212Text is cool, but images are better ! 213 214For that, we need to give our world a ressource path where images are located. 215 216To simplify our case, we can use the already built folder under the treasure example: 217 218```python 219from pathlib import Path 220import hcraft 221 222WORLD_2.resources_path = Path(hcraft.__file__).parent.joinpath( 223 "examples", "treasure", "resources" 224) 225render_env_with_human(env) 226``` 227And we now have cool images for items ! 228 229Under the hood, this can simply be replicated by getting some assets. 230(Like those previous [2D assets from Pixel_Poem on itch.io](https://pixel-poem.itch.io/dungeon-assetpuck) 231) 232 233We then simply put them into a folder like so, with matching names for items and zones: 234```bash 235cwd 236├───myscript.py 237├───resources 238│ ├───items 239│ │ ├───gold.png 240│ │ ├───key.png 241│ │ ├───locked_chest.png 242│ │ └───treasure_chest.png 243│ ├───zones 244│ └───font.ttf 245``` 246 247And setting that path as the world's ressources_path: 248 249```python 250WORLD_2.resources_path = Path("resources") 251render_env_with_human(env) 252``` 253 254Try to do the same with zones and change the font aswell! 255 256 257 258## Package into a class 259 260If you wish to have someone else use your enviroment, 261you should pack it up into a class and inherit HcraftEnv directly like so: 262 263```python 264.. include:: examples/treasure/env.py 265``` 266 267That's it for this small customized env if you want more, be sure to check Transformation 268 form `hcraft.transformation`, there is plenty we didn't cover here. 269 270 271""" 272 273import collections 274from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union 275 276import numpy as np 277 278from hcraft.metrics import SuccessCounter 279from hcraft.purpose import Purpose 280from hcraft.render.render import HcraftWindow 281from hcraft.render.utils import surface_to_rgb_array 282from hcraft.solving_behaviors import ( 283 Behavior, 284 build_all_solving_behaviors, 285 task_to_behavior_name, 286) 287from hcraft.planning import HcraftPlanningProblem 288from hcraft.state import HcraftState 289 290if TYPE_CHECKING: 291 from hcraft.task import Task 292 from hcraft.world import World 293 294# Gym is an optional dependency. 295try: 296 import gymnasium as gym 297 298 DiscreteSpace = gym.spaces.Discrete 299 BoxSpace = gym.spaces.Box 300 TupleSpace = gym.spaces.Tuple 301 MultiBinarySpace = gym.spaces.MultiBinary 302 Env = gym.Env 303except ImportError: 304 DiscreteSpace = collections.namedtuple("DiscreteSpace", "n") 305 BoxSpace = collections.namedtuple("BoxSpace", "low, high, shape, dtype") 306 TupleSpace = collections.namedtuple("TupleSpace", "spaces") 307 MultiBinarySpace = collections.namedtuple("MultiBinary", "n") 308 Env = object 309 310 311class HcraftEnv(Env): 312 """Environment to simulate inventory management.""" 313 314 def __init__( 315 self, 316 world: "World", 317 purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None, 318 invalid_reward: float = -1.0, 319 render_window: Optional[HcraftWindow] = None, 320 name: str = "HierarchyCraft", 321 max_step: Optional[int] = None, 322 ) -> None: 323 """ 324 Args: 325 world: World defining the environment. 326 purpose: Purpose of the player, defining rewards and termination. 327 Defaults to None, hence a sandbox environment. 328 invalid_reward: Reward given to the agent for invalid actions. 329 Defaults to -1.0. 330 render_window: Window using to render the environment with pygame. 331 name: Name of the environement. Defaults to 'HierarchyCraft'. 332 max_step: (Optional[int], optional): Maximum number of steps before episode truncation. 333 If None, never truncates the episode. Defaults to None. 334 """ 335 self.world = world 336 self.invalid_reward = invalid_reward 337 self.max_step = max_step 338 self.name = name 339 self._all_behaviors = None 340 341 self.render_window = render_window 342 self.render_mode = "rgb_array" 343 344 self.state = HcraftState(self.world) 345 self.current_step = 0 346 self.current_score = 0 347 self.cumulated_score = 0 348 self.episodes = 0 349 self.task_successes: Optional[SuccessCounter] = None 350 self.terminal_successes: Optional[SuccessCounter] = None 351 352 if purpose is None: 353 purpose = Purpose(None) 354 if not isinstance(purpose, Purpose): 355 purpose = Purpose(tasks=purpose) 356 self.purpose = purpose 357 self.metadata = {} 358 359 @property 360 def truncated(self) -> bool: 361 """Whether the time limit has been exceeded.""" 362 if self.max_step is None: 363 return False 364 return self.current_step >= self.max_step 365 366 @property 367 def observation_space(self) -> Union[BoxSpace, TupleSpace]: 368 """Observation space for the Agent.""" 369 obs_space = BoxSpace( 370 low=np.array( 371 [0 for _ in range(self.world.n_items)] 372 + [0 for _ in range(self.world.n_zones)] 373 + [0 for _ in range(self.world.n_zones_items)] 374 ), 375 high=np.array( 376 [np.inf for _ in range(self.world.n_items)] 377 + [1 for _ in range(self.world.n_zones)] 378 + [np.inf for _ in range(self.world.n_zones_items)] 379 ), 380 ) 381 382 return obs_space 383 384 @property 385 def action_space(self) -> DiscreteSpace: 386 """Action space for the Agent. 387 388 Actions are expected to often be invalid. 389 """ 390 return DiscreteSpace(len(self.world.transformations)) 391 392 def action_masks(self) -> np.ndarray: 393 """Return boolean mask of valid actions.""" 394 return np.array([t.is_valid(self.state) for t in self.world.transformations]) 395 396 def step( 397 self, action: Union[int, str, np.ndarray] 398 ) -> Tuple[np.ndarray, float, bool, bool, dict]: 399 """Perform one step in the environment given the index of a wanted transformation. 400 401 If the selected transformation can be performed, the state is updated and 402 a reward is given depending of the environment tasks. 403 Else the state is left unchanged and the `invalid_reward` is given to the player. 404 405 """ 406 407 if isinstance(action, np.ndarray): 408 if not action.size == 1: 409 raise TypeError( 410 "Actions should be integers corresponding the a transformation index" 411 f", got array with multiple elements:\n{action}." 412 ) 413 action = action.flatten()[0] 414 try: 415 action = int(action) 416 except (TypeError, ValueError) as e: 417 raise TypeError( 418 "Actions should be integers corresponding the a transformation index." 419 ) from e 420 421 self.current_step += 1 422 423 self.task_successes.step_reset() 424 self.terminal_successes.step_reset() 425 426 success = self.state.apply(action) 427 if success: 428 reward = self.purpose.reward(self.state) 429 else: 430 reward = self.invalid_reward 431 432 terminated = self.purpose.is_terminal(self.state) 433 434 self.task_successes.update(self.episodes) 435 self.terminal_successes.update(self.episodes) 436 437 self.current_score += reward 438 self.cumulated_score += reward 439 return ( 440 self.state.observation, 441 reward, 442 terminated, 443 self.truncated, 444 self.infos(), 445 ) 446 447 def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]: 448 """Render the observation of the agent in a format depending on `render_mode`.""" 449 if mode is not None: 450 self.render_mode = mode 451 452 if self.render_mode in ("human", "rgb_array"): # for human interaction 453 return self._render_rgb_array() 454 if self.render_mode == "console": # for console print 455 raise NotImplementedError 456 raise NotImplementedError 457 458 def reset( 459 self, 460 *, 461 seed: Optional[int] = None, 462 options: Optional[dict] = None, 463 ) -> Tuple[np.ndarray,]: 464 """Resets the state of the environement. 465 466 Returns: 467 (np.ndarray): The first observation. 468 """ 469 470 if not self.purpose.built: 471 self.purpose.build(self) 472 self.task_successes = SuccessCounter(self.purpose.tasks) 473 self.terminal_successes = SuccessCounter(self.purpose.terminal_groups) 474 475 self.current_step = 0 476 self.current_score = 0 477 self.episodes += 1 478 479 self.task_successes.new_episode(self.episodes) 480 self.terminal_successes.new_episode(self.episodes) 481 482 self.state.reset() 483 self.purpose.reset() 484 return self.state.observation, self.infos() 485 486 def close(self): 487 """Closes the environment.""" 488 if self.render_window is not None: 489 self.render_window.close() 490 491 @property 492 def all_behaviors(self) -> Dict[str, "Behavior"]: 493 """All solving behaviors using hebg.""" 494 if self._all_behaviors is None: 495 self._all_behaviors = build_all_solving_behaviors(self) 496 return self._all_behaviors 497 498 def solving_behavior(self, task: "Task") -> "Behavior": 499 """Get the solving behavior for a given task. 500 501 Args: 502 task: Task to solve. 503 504 Returns: 505 Behavior: Behavior solving the task. 506 507 Example: 508 ```python 509 solving_behavior = env.solving_behavior(task) 510 511 done = False 512 observation, _info = env.reset() 513 while not done: 514 action = solving_behavior(observation) 515 observation, _reward, terminated, truncated, _info = env.step(action) 516 done = terminated or truncated 517 518 assert terminated # Env is successfuly terminated 519 assert task.is_terminated # Task is successfuly terminated 520 ``` 521 """ 522 return self.all_behaviors[task_to_behavior_name(task)] 523 524 def planning_problem(self, **kwargs) -> HcraftPlanningProblem: 525 """Build this hcraft environment planning problem. 526 527 Returns: 528 Problem: Unified planning problem cooresponding to that environment. 529 530 Example: 531 Write as PDDL files: 532 ```python 533 from unified_planning.io import PDDLWriter 534 problem = env.planning_problem() 535 writer = PDDLWriter(problem.upf_problem) 536 writer.write_domain("domain.pddl") 537 writer.write_problem("problem.pddl") 538 ``` 539 540 Using a plan to solve a HierarchyCraft gym environment: 541 ```python 542 hcraft_problem = env.planning_problem() 543 544 done = False 545 546 _observation, _info = env.reset() 547 while not done: 548 # Observations are not used when blindly following a plan 549 # But the state in required in order to replan if there is no plan left 550 action = hcraft_problem.action_from_plan(env.state) 551 _observation, _reward, terminated, truncated, _info = env.step(action) 552 done = terminated or truncated 553 assert env.purpose.is_terminated # Purpose is achieved 554 ``` 555 """ 556 return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs) 557 558 def infos(self) -> dict: 559 infos = { 560 "action_is_legal": self.action_masks(), 561 "score": self.current_score, 562 "score_average": self.cumulated_score / self.episodes, 563 } 564 infos.update(self._tasks_infos()) 565 return infos 566 567 def _tasks_infos(self): 568 infos = {} 569 infos.update(self.task_successes.done_infos) 570 infos.update(self.task_successes.rates_infos) 571 infos.update(self.terminal_successes.done_infos) 572 infos.update(self.terminal_successes.rates_infos) 573 return infos 574 575 def _render_rgb_array(self) -> np.ndarray: 576 """Render an image of the game. 577 578 Create the rendering window if not existing yet. 579 """ 580 if self.render_window is None: 581 self.render_window = HcraftWindow() 582 if not self.render_window.built: 583 self.render_window.build(self) 584 fps = self.metadata.get("video.frames_per_second") 585 self.render_window.update_rendering(fps=fps) 586 return surface_to_rgb_array(self.render_window.screen)
API Documentation
312class HcraftEnv(Env): 313 """Environment to simulate inventory management.""" 314 315 def __init__( 316 self, 317 world: "World", 318 purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None, 319 invalid_reward: float = -1.0, 320 render_window: Optional[HcraftWindow] = None, 321 name: str = "HierarchyCraft", 322 max_step: Optional[int] = None, 323 ) -> None: 324 """ 325 Args: 326 world: World defining the environment. 327 purpose: Purpose of the player, defining rewards and termination. 328 Defaults to None, hence a sandbox environment. 329 invalid_reward: Reward given to the agent for invalid actions. 330 Defaults to -1.0. 331 render_window: Window using to render the environment with pygame. 332 name: Name of the environement. Defaults to 'HierarchyCraft'. 333 max_step: (Optional[int], optional): Maximum number of steps before episode truncation. 334 If None, never truncates the episode. Defaults to None. 335 """ 336 self.world = world 337 self.invalid_reward = invalid_reward 338 self.max_step = max_step 339 self.name = name 340 self._all_behaviors = None 341 342 self.render_window = render_window 343 self.render_mode = "rgb_array" 344 345 self.state = HcraftState(self.world) 346 self.current_step = 0 347 self.current_score = 0 348 self.cumulated_score = 0 349 self.episodes = 0 350 self.task_successes: Optional[SuccessCounter] = None 351 self.terminal_successes: Optional[SuccessCounter] = None 352 353 if purpose is None: 354 purpose = Purpose(None) 355 if not isinstance(purpose, Purpose): 356 purpose = Purpose(tasks=purpose) 357 self.purpose = purpose 358 self.metadata = {} 359 360 @property 361 def truncated(self) -> bool: 362 """Whether the time limit has been exceeded.""" 363 if self.max_step is None: 364 return False 365 return self.current_step >= self.max_step 366 367 @property 368 def observation_space(self) -> Union[BoxSpace, TupleSpace]: 369 """Observation space for the Agent.""" 370 obs_space = BoxSpace( 371 low=np.array( 372 [0 for _ in range(self.world.n_items)] 373 + [0 for _ in range(self.world.n_zones)] 374 + [0 for _ in range(self.world.n_zones_items)] 375 ), 376 high=np.array( 377 [np.inf for _ in range(self.world.n_items)] 378 + [1 for _ in range(self.world.n_zones)] 379 + [np.inf for _ in range(self.world.n_zones_items)] 380 ), 381 ) 382 383 return obs_space 384 385 @property 386 def action_space(self) -> DiscreteSpace: 387 """Action space for the Agent. 388 389 Actions are expected to often be invalid. 390 """ 391 return DiscreteSpace(len(self.world.transformations)) 392 393 def action_masks(self) -> np.ndarray: 394 """Return boolean mask of valid actions.""" 395 return np.array([t.is_valid(self.state) for t in self.world.transformations]) 396 397 def step( 398 self, action: Union[int, str, np.ndarray] 399 ) -> Tuple[np.ndarray, float, bool, bool, dict]: 400 """Perform one step in the environment given the index of a wanted transformation. 401 402 If the selected transformation can be performed, the state is updated and 403 a reward is given depending of the environment tasks. 404 Else the state is left unchanged and the `invalid_reward` is given to the player. 405 406 """ 407 408 if isinstance(action, np.ndarray): 409 if not action.size == 1: 410 raise TypeError( 411 "Actions should be integers corresponding the a transformation index" 412 f", got array with multiple elements:\n{action}." 413 ) 414 action = action.flatten()[0] 415 try: 416 action = int(action) 417 except (TypeError, ValueError) as e: 418 raise TypeError( 419 "Actions should be integers corresponding the a transformation index." 420 ) from e 421 422 self.current_step += 1 423 424 self.task_successes.step_reset() 425 self.terminal_successes.step_reset() 426 427 success = self.state.apply(action) 428 if success: 429 reward = self.purpose.reward(self.state) 430 else: 431 reward = self.invalid_reward 432 433 terminated = self.purpose.is_terminal(self.state) 434 435 self.task_successes.update(self.episodes) 436 self.terminal_successes.update(self.episodes) 437 438 self.current_score += reward 439 self.cumulated_score += reward 440 return ( 441 self.state.observation, 442 reward, 443 terminated, 444 self.truncated, 445 self.infos(), 446 ) 447 448 def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]: 449 """Render the observation of the agent in a format depending on `render_mode`.""" 450 if mode is not None: 451 self.render_mode = mode 452 453 if self.render_mode in ("human", "rgb_array"): # for human interaction 454 return self._render_rgb_array() 455 if self.render_mode == "console": # for console print 456 raise NotImplementedError 457 raise NotImplementedError 458 459 def reset( 460 self, 461 *, 462 seed: Optional[int] = None, 463 options: Optional[dict] = None, 464 ) -> Tuple[np.ndarray,]: 465 """Resets the state of the environement. 466 467 Returns: 468 (np.ndarray): The first observation. 469 """ 470 471 if not self.purpose.built: 472 self.purpose.build(self) 473 self.task_successes = SuccessCounter(self.purpose.tasks) 474 self.terminal_successes = SuccessCounter(self.purpose.terminal_groups) 475 476 self.current_step = 0 477 self.current_score = 0 478 self.episodes += 1 479 480 self.task_successes.new_episode(self.episodes) 481 self.terminal_successes.new_episode(self.episodes) 482 483 self.state.reset() 484 self.purpose.reset() 485 return self.state.observation, self.infos() 486 487 def close(self): 488 """Closes the environment.""" 489 if self.render_window is not None: 490 self.render_window.close() 491 492 @property 493 def all_behaviors(self) -> Dict[str, "Behavior"]: 494 """All solving behaviors using hebg.""" 495 if self._all_behaviors is None: 496 self._all_behaviors = build_all_solving_behaviors(self) 497 return self._all_behaviors 498 499 def solving_behavior(self, task: "Task") -> "Behavior": 500 """Get the solving behavior for a given task. 501 502 Args: 503 task: Task to solve. 504 505 Returns: 506 Behavior: Behavior solving the task. 507 508 Example: 509 ```python 510 solving_behavior = env.solving_behavior(task) 511 512 done = False 513 observation, _info = env.reset() 514 while not done: 515 action = solving_behavior(observation) 516 observation, _reward, terminated, truncated, _info = env.step(action) 517 done = terminated or truncated 518 519 assert terminated # Env is successfuly terminated 520 assert task.is_terminated # Task is successfuly terminated 521 ``` 522 """ 523 return self.all_behaviors[task_to_behavior_name(task)] 524 525 def planning_problem(self, **kwargs) -> HcraftPlanningProblem: 526 """Build this hcraft environment planning problem. 527 528 Returns: 529 Problem: Unified planning problem cooresponding to that environment. 530 531 Example: 532 Write as PDDL files: 533 ```python 534 from unified_planning.io import PDDLWriter 535 problem = env.planning_problem() 536 writer = PDDLWriter(problem.upf_problem) 537 writer.write_domain("domain.pddl") 538 writer.write_problem("problem.pddl") 539 ``` 540 541 Using a plan to solve a HierarchyCraft gym environment: 542 ```python 543 hcraft_problem = env.planning_problem() 544 545 done = False 546 547 _observation, _info = env.reset() 548 while not done: 549 # Observations are not used when blindly following a plan 550 # But the state in required in order to replan if there is no plan left 551 action = hcraft_problem.action_from_plan(env.state) 552 _observation, _reward, terminated, truncated, _info = env.step(action) 553 done = terminated or truncated 554 assert env.purpose.is_terminated # Purpose is achieved 555 ``` 556 """ 557 return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs) 558 559 def infos(self) -> dict: 560 infos = { 561 "action_is_legal": self.action_masks(), 562 "score": self.current_score, 563 "score_average": self.cumulated_score / self.episodes, 564 } 565 infos.update(self._tasks_infos()) 566 return infos 567 568 def _tasks_infos(self): 569 infos = {} 570 infos.update(self.task_successes.done_infos) 571 infos.update(self.task_successes.rates_infos) 572 infos.update(self.terminal_successes.done_infos) 573 infos.update(self.terminal_successes.rates_infos) 574 return infos 575 576 def _render_rgb_array(self) -> np.ndarray: 577 """Render an image of the game. 578 579 Create the rendering window if not existing yet. 580 """ 581 if self.render_window is None: 582 self.render_window = HcraftWindow() 583 if not self.render_window.built: 584 self.render_window.build(self) 585 fps = self.metadata.get("video.frames_per_second") 586 self.render_window.update_rendering(fps=fps) 587 return surface_to_rgb_array(self.render_window.screen)
Environment to simulate inventory management.
315 def __init__( 316 self, 317 world: "World", 318 purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None, 319 invalid_reward: float = -1.0, 320 render_window: Optional[HcraftWindow] = None, 321 name: str = "HierarchyCraft", 322 max_step: Optional[int] = None, 323 ) -> None: 324 """ 325 Args: 326 world: World defining the environment. 327 purpose: Purpose of the player, defining rewards and termination. 328 Defaults to None, hence a sandbox environment. 329 invalid_reward: Reward given to the agent for invalid actions. 330 Defaults to -1.0. 331 render_window: Window using to render the environment with pygame. 332 name: Name of the environement. Defaults to 'HierarchyCraft'. 333 max_step: (Optional[int], optional): Maximum number of steps before episode truncation. 334 If None, never truncates the episode. Defaults to None. 335 """ 336 self.world = world 337 self.invalid_reward = invalid_reward 338 self.max_step = max_step 339 self.name = name 340 self._all_behaviors = None 341 342 self.render_window = render_window 343 self.render_mode = "rgb_array" 344 345 self.state = HcraftState(self.world) 346 self.current_step = 0 347 self.current_score = 0 348 self.cumulated_score = 0 349 self.episodes = 0 350 self.task_successes: Optional[SuccessCounter] = None 351 self.terminal_successes: Optional[SuccessCounter] = None 352 353 if purpose is None: 354 purpose = Purpose(None) 355 if not isinstance(purpose, Purpose): 356 purpose = Purpose(tasks=purpose) 357 self.purpose = purpose 358 self.metadata = {}
Arguments:
- world: World defining the environment.
- purpose: Purpose of the player, defining rewards and termination. Defaults to None, hence a sandbox environment.
- invalid_reward: Reward given to the agent for invalid actions. Defaults to -1.0.
- render_window: Window using to render the environment with pygame.
- name: Name of the environement. Defaults to 'HierarchyCraft'.
- max_step: (Optional[int], optional): Maximum number of steps before episode truncation. If None, never truncates the episode. Defaults to None.
360 @property 361 def truncated(self) -> bool: 362 """Whether the time limit has been exceeded.""" 363 if self.max_step is None: 364 return False 365 return self.current_step >= self.max_step
Whether the time limit has been exceeded.
367 @property 368 def observation_space(self) -> Union[BoxSpace, TupleSpace]: 369 """Observation space for the Agent.""" 370 obs_space = BoxSpace( 371 low=np.array( 372 [0 for _ in range(self.world.n_items)] 373 + [0 for _ in range(self.world.n_zones)] 374 + [0 for _ in range(self.world.n_zones_items)] 375 ), 376 high=np.array( 377 [np.inf for _ in range(self.world.n_items)] 378 + [1 for _ in range(self.world.n_zones)] 379 + [np.inf for _ in range(self.world.n_zones_items)] 380 ), 381 ) 382 383 return obs_space
Observation space for the Agent.
385 @property 386 def action_space(self) -> DiscreteSpace: 387 """Action space for the Agent. 388 389 Actions are expected to often be invalid. 390 """ 391 return DiscreteSpace(len(self.world.transformations))
Action space for the Agent.
Actions are expected to often be invalid.
393 def action_masks(self) -> np.ndarray: 394 """Return boolean mask of valid actions.""" 395 return np.array([t.is_valid(self.state) for t in self.world.transformations])
Return boolean mask of valid actions.
397 def step( 398 self, action: Union[int, str, np.ndarray] 399 ) -> Tuple[np.ndarray, float, bool, bool, dict]: 400 """Perform one step in the environment given the index of a wanted transformation. 401 402 If the selected transformation can be performed, the state is updated and 403 a reward is given depending of the environment tasks. 404 Else the state is left unchanged and the `invalid_reward` is given to the player. 405 406 """ 407 408 if isinstance(action, np.ndarray): 409 if not action.size == 1: 410 raise TypeError( 411 "Actions should be integers corresponding the a transformation index" 412 f", got array with multiple elements:\n{action}." 413 ) 414 action = action.flatten()[0] 415 try: 416 action = int(action) 417 except (TypeError, ValueError) as e: 418 raise TypeError( 419 "Actions should be integers corresponding the a transformation index." 420 ) from e 421 422 self.current_step += 1 423 424 self.task_successes.step_reset() 425 self.terminal_successes.step_reset() 426 427 success = self.state.apply(action) 428 if success: 429 reward = self.purpose.reward(self.state) 430 else: 431 reward = self.invalid_reward 432 433 terminated = self.purpose.is_terminal(self.state) 434 435 self.task_successes.update(self.episodes) 436 self.terminal_successes.update(self.episodes) 437 438 self.current_score += reward 439 self.cumulated_score += reward 440 return ( 441 self.state.observation, 442 reward, 443 terminated, 444 self.truncated, 445 self.infos(), 446 )
Perform one step in the environment given the index of a wanted transformation.
If the selected transformation can be performed, the state is updated and
a reward is given depending of the environment tasks.
Else the state is left unchanged and the invalid_reward
is given to the player.
448 def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]: 449 """Render the observation of the agent in a format depending on `render_mode`.""" 450 if mode is not None: 451 self.render_mode = mode 452 453 if self.render_mode in ("human", "rgb_array"): # for human interaction 454 return self._render_rgb_array() 455 if self.render_mode == "console": # for console print 456 raise NotImplementedError 457 raise NotImplementedError
Render the observation of the agent in a format depending on render_mode
.
459 def reset( 460 self, 461 *, 462 seed: Optional[int] = None, 463 options: Optional[dict] = None, 464 ) -> Tuple[np.ndarray,]: 465 """Resets the state of the environement. 466 467 Returns: 468 (np.ndarray): The first observation. 469 """ 470 471 if not self.purpose.built: 472 self.purpose.build(self) 473 self.task_successes = SuccessCounter(self.purpose.tasks) 474 self.terminal_successes = SuccessCounter(self.purpose.terminal_groups) 475 476 self.current_step = 0 477 self.current_score = 0 478 self.episodes += 1 479 480 self.task_successes.new_episode(self.episodes) 481 self.terminal_successes.new_episode(self.episodes) 482 483 self.state.reset() 484 self.purpose.reset() 485 return self.state.observation, self.infos()
Resets the state of the environement.
Returns:
(np.ndarray): The first observation.
487 def close(self): 488 """Closes the environment.""" 489 if self.render_window is not None: 490 self.render_window.close()
Closes the environment.
492 @property 493 def all_behaviors(self) -> Dict[str, "Behavior"]: 494 """All solving behaviors using hebg.""" 495 if self._all_behaviors is None: 496 self._all_behaviors = build_all_solving_behaviors(self) 497 return self._all_behaviors
All solving behaviors using hebg.
499 def solving_behavior(self, task: "Task") -> "Behavior": 500 """Get the solving behavior for a given task. 501 502 Args: 503 task: Task to solve. 504 505 Returns: 506 Behavior: Behavior solving the task. 507 508 Example: 509 ```python 510 solving_behavior = env.solving_behavior(task) 511 512 done = False 513 observation, _info = env.reset() 514 while not done: 515 action = solving_behavior(observation) 516 observation, _reward, terminated, truncated, _info = env.step(action) 517 done = terminated or truncated 518 519 assert terminated # Env is successfuly terminated 520 assert task.is_terminated # Task is successfuly terminated 521 ``` 522 """ 523 return self.all_behaviors[task_to_behavior_name(task)]
Get the solving behavior for a given task.
Arguments:
- task: Task to solve.
Returns:
Behavior: Behavior solving the task.
Example:
solving_behavior = env.solving_behavior(task) done = False observation, _info = env.reset() while not done: action = solving_behavior(observation) observation, _reward, terminated, truncated, _info = env.step(action) done = terminated or truncated assert terminated # Env is successfuly terminated assert task.is_terminated # Task is successfuly terminated
525 def planning_problem(self, **kwargs) -> HcraftPlanningProblem: 526 """Build this hcraft environment planning problem. 527 528 Returns: 529 Problem: Unified planning problem cooresponding to that environment. 530 531 Example: 532 Write as PDDL files: 533 ```python 534 from unified_planning.io import PDDLWriter 535 problem = env.planning_problem() 536 writer = PDDLWriter(problem.upf_problem) 537 writer.write_domain("domain.pddl") 538 writer.write_problem("problem.pddl") 539 ``` 540 541 Using a plan to solve a HierarchyCraft gym environment: 542 ```python 543 hcraft_problem = env.planning_problem() 544 545 done = False 546 547 _observation, _info = env.reset() 548 while not done: 549 # Observations are not used when blindly following a plan 550 # But the state in required in order to replan if there is no plan left 551 action = hcraft_problem.action_from_plan(env.state) 552 _observation, _reward, terminated, truncated, _info = env.step(action) 553 done = terminated or truncated 554 assert env.purpose.is_terminated # Purpose is achieved 555 ``` 556 """ 557 return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)
Build this hcraft environment planning problem.
Returns:
Problem: Unified planning problem cooresponding to that environment.
Example:
Write as PDDL files:
from unified_planning.io import PDDLWriter problem = env.planning_problem() writer = PDDLWriter(problem.upf_problem) writer.write_domain("domain.pddl") writer.write_problem("problem.pddl")
Using a plan to solve a HierarchyCraft gym environment:
hcraft_problem = env.planning_problem() done = False _observation, _info = env.reset() while not done: # Observations are not used when blindly following a plan # But the state in required in order to replan if there is no plan left action = hcraft_problem.action_from_plan(env.state) _observation, _reward, terminated, truncated, _info = env.step(action) done = terminated or truncated assert env.purpose.is_terminated # Purpose is achieved
Inherited Members
- gymnasium.core.Env
- spec
- unwrapped
- np_random_seed
- np_random
- has_wrapper_attr
- get_wrapper_attr
- set_wrapper_attr