Environment builder

You can easily create your own customized HierarchyCraft environment with the all the benefits (graphical user interface, tasks, reward shaping, solving behavior, requirements graph).

Each HierarchyCraft environment is defined by a list of transformations and an initial state.

Thus, you just need to understand how to create a list of hcraft.transformation and how to build a world with an initial state from those.

The initial state defines the starting state of the environment, including the agent's position, inventory, and zones inventories. By combining transformations and an initial state, users can simply create complex hierarchical environments with a high degree of flexibility and control.

See hcraft.state for more details on the HierarchyCraft environements state.

You can also check more complex examples in hcraft.examples.

Example: Simple customed environment

Let's make a simple environment, where the goal is to open a the treasure chest and take it's gold.

Create items

First, we need to represent the items we want to be able to manipulate.

For now, we only have two items we can simply build using the Item class from hcraft.world:

from hcraft import Item

CHEST = Item("treasure_chest")
GOLD = Item("gold")

Link items with transformations

We want to remove the chest from the zone where our player is, and add it to his inventory.

We can then link those two items with a Tranformation from hcraft.transformation:

from hcraft.transformation import Transformation, Use, Yield, PLAYER, CURRENT_ZONE

TAKE_GOLD_FROM_CHEST = Transformation(
    inventory_changes=[
        Use(CURRENT_ZONE, CHEST, consume=1),
        Yield(PLAYER, GOLD),
    ]
)

Of course, TAKE_GOLD_FROM_CHEST will not be valid unless there is a CHEST in the zone.

Let's create a zone where we want our CHEST to be.

Create a zone

Like items, zones are created with a Zone object from hcraft.world:

from hcraft import Zone

TREASURE_ROOM = Zone("treasure_room")

To place our CHEST in the TREASURE_ROOM, we need to build a World from hcraft.world that will define our environment.

Build a World from transformations

Items and zones in transformations will automaticaly be indexed by the World to be stored in the environment state. (See hcraft.state for more details) We can simply build a world from a list of transformations:

from hcraft.world import world_from_transformations

WORLD = world_from_transformations(
    transformations=[TAKE_GOLD_FROM_CHEST],
    start_zone=TREASURE_ROOM,
    start_zones_items={TREASURE_ROOM: [CHEST]},
)

Note that the world stores the initial state of the environment. So we can add our CHEST in the TREASURE_ROOM here !

Complete your first HierarchyCraft environment

To build a complete hcraft environment, we simply need to pass our WORLD to HcraftEnv from hcraft.env:

from hcraft import HcraftEnv

env = HcraftEnv(WORLD)

We can already render it in the GUI:

from hcraft import render_env_with_human

render_env_with_human(env)

Add a goal

For now, our environment is a sandbox that never ends and has no goal. We can simply add a Purpose from hcraft.purpose like so:

from hcraft.purpose import GetItemTask

get_gold_task = GetItemTask(GOLD)
env = HcraftEnv(WORLD, purpose=get_gold_task)
render_env_with_human(env)

Turn up the challenge

Now that we have the basics done, let's have a bit more fun with our environment! Let's lock the chest with keys, and add two room, a start room and a keys room.

First let's build the KEY item and the KEY_ROOM.

KEY = Item("key")
KEY_ROOM = Zone("key_room")

Now let's make the KEY_ROOM a source of maximum 2 KEY with a transformation:

SEARCH_KEY = Transformation(
    inventory_changes=[
        Yield(PLAYER, KEY, max=1),
    ],
    zone=KEY_ROOM,
)

Note that max=1 because max is the maximum before the transformation.

Then add the 'new state' for the CHEST, for this we simply build a new item LOCKED_CHEST, and we add a transformation that will unlock the LOCKED_CHEST into a CHEST consuming two KEYS.

LOCKED_CHEST = Item("locked_chest")
UNLOCK_CHEST = Transformation(
    inventory_changes=[
        Use(PLAYER, KEY, 2),
        Use(CURRENT_ZONE, LOCKED_CHEST, consume=1),
        Yield(CURRENT_ZONE, CHEST),
    ],
)

Now we need to be able to move between zones, for this we use (again) transformations:

Let's make the START_ROOM the link between the two other rooms.

START_ROOM = Zone("start_room")
MOVE_TO_KEY_ROOM = Transformation(
    destination=KEY_ROOM,
    zone=START_ROOM,
)
MOVE_TO_TREASURE_ROOM = Transformation(
    destination=TREASURE_ROOM,
    zone=START_ROOM,
)
MOVE_TO_START_ROOM = Transformation(
    destination=START_ROOM,
)

We are ready for our V2 ! Again, we build the world from all our transformations and the env from the world.

But now the chest inside the TREASURE_ROOM is the LOCKED_CHEST and our player start in START_ROOM.

Also, let's add a time limit to spice things up.

from hcraft.world import world_from_transformations

WORLD_2 = world_from_transformations(
    transformations=[
        TAKE_GOLD_FROM_CHEST,
        SEARCH_KEY,
        UNLOCK_CHEST,
        MOVE_TO_KEY_ROOM,
        MOVE_TO_TREASURE_ROOM,
        MOVE_TO_START_ROOM,
    ],
    start_zone=START_ROOM,
    start_zones_items={TREASURE_ROOM: [LOCKED_CHEST]},
)
env = HcraftEnv(WORLD_2, purpose=get_gold_task, max_step=10)
render_env_with_human(env)

Add graphics

For now, our environment is a bit ... ugly. Text is cool, but images are better !

For that, we need to give our world a ressource path where images are located.

To simplify our case, we can use the already built folder under the treasure example:

from pathlib import Path
import hcraft

WORLD_2.resources_path = Path(hcraft.__file__).parent.joinpath(
    "examples", "treasure", "resources"
)
render_env_with_human(env)

And we now have cool images for items !

Under the hood, this can simply be replicated by getting some assets. (Like those previous 2D assets from Pixel_Poem on itch.io )

We then simply put them into a folder like so, with matching names for items and zones:

cwd
├───myscript.py
├───resources
│   ├───items
│   │   ├───gold.png
│   │   ├───key.png
│   │   ├───locked_chest.png
│   │   └───treasure_chest.png
│   ├───zones
│   └───font.ttf

And setting that path as the world's ressources_path:

WORLD_2.resources_path = Path("resources")
render_env_with_human(env)

Try to do the same with zones and change the font aswell!

Package into a class

If you wish to have someone else use your enviroment, you should pack it up into a class and inherit HcraftEnv directly like so:

from pathlib import Path
from typing import List

from hcraft.elements import Item, Zone
from hcraft.env import HcraftEnv
from hcraft.purpose import GetItemTask
from hcraft.transformation import Transformation, Use, Yield, PLAYER, CURRENT_ZONE
from hcraft.world import world_from_transformations


class TreasureEnv(HcraftEnv):
    """A simple environment used in for the env building tutorial."""

    TREASURE_ROOM = Zone("treasure_room")
    """Room containing the treasure."""
    KEY_ROOM = Zone("key_room")
    """Where all the keys are stored."""
    START_ROOM = Zone("start_room")
    """Where the player starts."""

    CHEST = Item("treasure_chest")
    """Treasure chest containing gold."""
    LOCKED_CHEST = Item("locked_chest")
    """Treasure chest containing gold ... but it's locked."""
    GOLD = Item("gold")
    """Gold! well the pixel version at least."""
    KEY = Item("key")
    """A key ... it can probably unlock things."""

    def __init__(self, **kwargs) -> None:
        transformations = self._build_transformations()
        world = world_from_transformations(
            transformations=transformations,
            start_zone=self.START_ROOM,
            start_zones_items={self.TREASURE_ROOM: [self.LOCKED_CHEST]},
        )
        world.resources_path = Path(__file__).parent / "resources"
        super().__init__(
            world, purpose=GetItemTask(self.GOLD), name="TreasureHcraft", **kwargs
        )

    def _build_transformations(self) -> List[Transformation]:
        TAKE_GOLD_FROM_CHEST = Transformation(
            "take-gold-from-chest",
            inventory_changes=[
                Use(CURRENT_ZONE, self.CHEST, consume=1),
                Yield(PLAYER, self.GOLD),
            ],
        )

        SEARCH_KEY = Transformation(
            "search-key",
            inventory_changes=[
                Yield(PLAYER, self.KEY, max=1),
            ],
            zone=self.KEY_ROOM,
        )

        UNLOCK_CHEST = Transformation(
            "unlock-chest",
            inventory_changes=[
                Use(PLAYER, self.KEY, 2),
                Use(CURRENT_ZONE, self.LOCKED_CHEST, consume=1),
                Yield(CURRENT_ZONE, self.CHEST),
            ],
        )

        MOVE_TO_KEY_ROOM = Transformation(
            "move-to-key_room",
            destination=self.KEY_ROOM,
            zone=self.START_ROOM,
        )
        MOVE_TO_TREASURE_ROOM = Transformation(
            "move-to-treasure_room",
            destination=self.TREASURE_ROOM,
            zone=self.START_ROOM,
        )
        MOVE_TO_START_ROOM = Transformation(
            "move-to-start_room",
            destination=self.START_ROOM,
        )

        return [
            TAKE_GOLD_FROM_CHEST,
            SEARCH_KEY,
            UNLOCK_CHEST,
            MOVE_TO_KEY_ROOM,
            MOVE_TO_TREASURE_ROOM,
            MOVE_TO_START_ROOM,
        ]

That's it for this small customized env if you want more, be sure to check Transformation form hcraft.transformation, there is plenty we didn't cover here.

View Source

  1"""# Environment builder
  2
  3You can easily create your own customized HierarchyCraft environment with the all the benefits
  4(graphical user interface, tasks, reward shaping, solving behavior, requirements graph).
  5
  6Each HierarchyCraft environment is defined by a list of transformations and an initial state.
  7
  8Thus, you just need to understand how to create a list of
  9[`hcraft.transformation`](https://irll.github.io/HierarchyCraft/hcraft/transformation.html)
 10and how to build a world with an initial state from those.
 11
 12The initial state defines the starting state of the environment,
 13including the agent's position, inventory, and zones inventories.
 14By combining transformations and an initial state, users can simply create complex hierarchical environments
 15with a high degree of flexibility and control.
 16
 17See [`hcraft.state`](https://irll.github.io/HierarchyCraft/hcraft/state.html)
 18for more details on the HierarchyCraft environements state.
 19
 20You can also check more complex examples in `hcraft.examples`.
 21
 22# Example: Simple customed environment
 23
 24Let's make a simple environment, where the goal is to open a the treasure chest and take it's gold.
 25
 26## Create items
 27
 28First, we need to represent the items we want to be able to manipulate.
 29
 30For now, we only have two items we can simply build using the Item class from `hcraft.world`:
 31
 32```python
 33from hcraft import Item
 34
 35CHEST = Item("treasure_chest")
 36GOLD = Item("gold")
 37```
 38
 39## Link items with transformations
 40
 41We want to remove the chest from the zone where our player is, and add it to his inventory.
 42
 43We can then link those two items with a Tranformation from `hcraft.transformation`:
 44
 45```python
 46from hcraft.transformation import Transformation, Use, Yield, PLAYER, CURRENT_ZONE
 47
 48TAKE_GOLD_FROM_CHEST = Transformation(
 49    inventory_changes=[
 50        Use(CURRENT_ZONE, CHEST, consume=1),
 51        Yield(PLAYER, GOLD),
 52    ]
 53)
 54```
 55
 56Of course, `TAKE_GOLD_FROM_CHEST` will not be valid unless there is a `CHEST` in the zone.
 57
 58Let's create a zone where we want our `CHEST` to be.
 59
 60## Create a zone
 61
 62Like items, zones are created with a Zone object from `hcraft.world`:
 63
 64```python
 65from hcraft import Zone
 66
 67TREASURE_ROOM = Zone("treasure_room")
 68```
 69
 70To place our `CHEST` in the `TREASURE_ROOM`, we need to build a World
 71from `hcraft.world` that will define our environment.
 72
 73## Build a World from transformations
 74
 75Items and zones in transformations will automaticaly be indexed by the World
 76to be stored in the environment state. (See `hcraft.state` for more details)
 77We can simply build a world from a list of transformations:
 78
 79```python
 80from hcraft.world import world_from_transformations
 81
 82WORLD = world_from_transformations(
 83    transformations=[TAKE_GOLD_FROM_CHEST],
 84    start_zone=TREASURE_ROOM,
 85    start_zones_items={TREASURE_ROOM: [CHEST]},
 86)
 87```
 88
 89Note that the world stores the initial state of the environment.
 90So we can add our `CHEST` in the `TREASURE_ROOM` here !
 91
 92## Complete your first HierarchyCraft environment
 93
 94To build a complete hcraft environment,
 95we simply need to pass our `WORLD` to HcraftEnv from `hcraft.env`:
 96
 97```python
 98from hcraft import HcraftEnv
 99
100env = HcraftEnv(WORLD)
101```
102
103We can already render it in the GUI:
104
105```python
106from hcraft import render_env_with_human
107
108render_env_with_human(env)
109```
110![](../../docs/images/TreasureEnvV1.png)
111
112## Add a goal
113
114For now, our environment is a sandbox that never ends and has no goal.
115We can simply add a Purpose from `hcraft.purpose` like so:
116
117```python
118from hcraft.purpose import GetItemTask
119
120get_gold_task = GetItemTask(GOLD)
121env = HcraftEnv(WORLD, purpose=get_gold_task)
122render_env_with_human(env)
123```
124
125## Turn up the challenge
126
127Now that we have the basics done, let's have a bit more fun with our environment!
128Let's lock the chest with keys, and add two room, a start room and a keys room.
129
130First let's build the `KEY` item and the `KEY_ROOM`.
131
132```python
133KEY = Item("key")
134KEY_ROOM = Zone("key_room")
135```
136
137Now let's make the `KEY_ROOM` a source of maximum 2 `KEY` with a transformation:
138
139```python
140SEARCH_KEY = Transformation(
141    inventory_changes=[
142        Yield(PLAYER, KEY, max=1),
143    ],
144    zone=KEY_ROOM,
145)
146```
147Note that `max=1` because max is the maximum *before* the transformation.
148
149Then add the 'new state' for the `CHEST`, for this we simply build a new item `LOCKED_CHEST`,
150and we add a transformation that will unlock the `LOCKED_CHEST` into a `CHEST` consuming two `KEYS`.
151
152```python
153LOCKED_CHEST = Item("locked_chest")
154UNLOCK_CHEST = Transformation(
155    inventory_changes=[
156        Use(PLAYER, KEY, 2),
157        Use(CURRENT_ZONE, LOCKED_CHEST, consume=1),
158        Yield(CURRENT_ZONE, CHEST),
159    ],
160)
161```
162
163Now we need to be able to move between zones, for this we use (again) transformations:
164
165Let's make the `START_ROOM` the link between the two other rooms.
166
167```python
168START_ROOM = Zone("start_room")
169MOVE_TO_KEY_ROOM = Transformation(
170    destination=KEY_ROOM,
171    zone=START_ROOM,
172)
173MOVE_TO_TREASURE_ROOM = Transformation(
174    destination=TREASURE_ROOM,
175    zone=START_ROOM,
176)
177MOVE_TO_START_ROOM = Transformation(
178    destination=START_ROOM,
179)
180```
181
182We are ready for our V2 !
183Again, we build the world from all our transformations and the env from the world.
184
185But now the chest inside the `TREASURE_ROOM` is the `LOCKED_CHEST`
186and our player start in `START_ROOM`.
187
188Also, let's add a time limit to spice things up.
189
190```python
191from hcraft.world import world_from_transformations
192
193WORLD_2 = world_from_transformations(
194    transformations=[
195        TAKE_GOLD_FROM_CHEST,
196        SEARCH_KEY,
197        UNLOCK_CHEST,
198        MOVE_TO_KEY_ROOM,
199        MOVE_TO_TREASURE_ROOM,
200        MOVE_TO_START_ROOM,
201    ],
202    start_zone=START_ROOM,
203    start_zones_items={TREASURE_ROOM: [LOCKED_CHEST]},
204)
205env = HcraftEnv(WORLD_2, purpose=get_gold_task, max_step=10)
206render_env_with_human(env)
207```
208
209## Add graphics
210
211For now, our environment is a bit ... ugly.
212Text is cool, but images are better !
213
214For that, we need to give our world a ressource path where images are located.
215
216To simplify our case, we can use the already built folder under the treasure example:
217
218```python
219from pathlib import Path
220import hcraft
221
222WORLD_2.resources_path = Path(hcraft.__file__).parent.joinpath(
223    "examples", "treasure", "resources"
224)
225render_env_with_human(env)
226```
227And we now have cool images for items !
228
229Under the hood, this can simply be replicated by getting some assets.
230(Like those previous [2D assets from Pixel_Poem on itch.io](https://pixel-poem.itch.io/dungeon-assetpuck)
231)
232
233We then simply put them into a folder like so, with matching names for items and zones:
234```bash
235cwd
236├───myscript.py
237├───resources
238│   ├───items
239│   │   ├───gold.png
240│   │   ├───key.png
241│   │   ├───locked_chest.png
242│   │   └───treasure_chest.png
243│   ├───zones
244│   └───font.ttf
245```
246
247And setting that path as the world's ressources_path:
248
249```python
250WORLD_2.resources_path = Path("resources")
251render_env_with_human(env)
252```
253
254Try to do the same with zones and change the font aswell!
255
256![](../../docs/images/TreasureEnvV2.png)
257
258## Package into a class
259
260If you wish to have someone else use your enviroment,
261you should pack it up into a class and inherit HcraftEnv directly like so:
262
263```python
264.. include:: examples/treasure/env.py
265```
266
267That's it for this small customized env if you want more, be sure to check Transformation
268 form `hcraft.transformation`, there is plenty we didn't cover here.
269
270
271"""
272
273import collections
274from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union
275
276import numpy as np
277
278from hcraft.metrics import SuccessCounter
279from hcraft.purpose import Purpose
280from hcraft.render.render import HcraftWindow
281from hcraft.render.utils import surface_to_rgb_array
282from hcraft.solving_behaviors import (
283    Behavior,
284    build_all_solving_behaviors,
285    task_to_behavior_name,
286)
287from hcraft.planning import HcraftPlanningProblem
288from hcraft.state import HcraftState
289
290if TYPE_CHECKING:
291    from hcraft.task import Task
292    from hcraft.world import World
293
294# Gym is an optional dependency.
295try:
296    import gymnasium as gym
297
298    DiscreteSpace = gym.spaces.Discrete
299    BoxSpace = gym.spaces.Box
300    TupleSpace = gym.spaces.Tuple
301    MultiBinarySpace = gym.spaces.MultiBinary
302    Env = gym.Env
303except ImportError:
304    DiscreteSpace = collections.namedtuple("DiscreteSpace", "n")
305    BoxSpace = collections.namedtuple("BoxSpace", "low, high, shape, dtype")
306    TupleSpace = collections.namedtuple("TupleSpace", "spaces")
307    MultiBinarySpace = collections.namedtuple("MultiBinary", "n")
308    Env = object
309
310
311class HcraftEnv(Env):
312    """Environment to simulate inventory management."""
313
314    def __init__(
315        self,
316        world: "World",
317        purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None,
318        invalid_reward: float = -1.0,
319        render_window: Optional[HcraftWindow] = None,
320        name: str = "HierarchyCraft",
321        max_step: Optional[int] = None,
322    ) -> None:
323        """
324        Args:
325            world: World defining the environment.
326            purpose: Purpose of the player, defining rewards and termination.
327                Defaults to None, hence a sandbox environment.
328            invalid_reward: Reward given to the agent for invalid actions.
329                Defaults to -1.0.
330            render_window: Window using to render the environment with pygame.
331            name: Name of the environement. Defaults to 'HierarchyCraft'.
332            max_step: (Optional[int], optional): Maximum number of steps before episode truncation.
333                If None, never truncates the episode. Defaults to None.
334        """
335        self.world = world
336        self.invalid_reward = invalid_reward
337        self.max_step = max_step
338        self.name = name
339        self._all_behaviors = None
340
341        self.render_window = render_window
342        self.render_mode = "rgb_array"
343
344        self.state = HcraftState(self.world)
345        self.current_step = 0
346        self.current_score = 0
347        self.cumulated_score = 0
348        self.episodes = 0
349        self.task_successes: Optional[SuccessCounter] = None
350        self.terminal_successes: Optional[SuccessCounter] = None
351
352        if purpose is None:
353            purpose = Purpose(None)
354        if not isinstance(purpose, Purpose):
355            purpose = Purpose(tasks=purpose)
356        self.purpose = purpose
357        self.metadata = {}
358
359    @property
360    def truncated(self) -> bool:
361        """Whether the time limit has been exceeded."""
362        if self.max_step is None:
363            return False
364        return self.current_step >= self.max_step
365
366    @property
367    def observation_space(self) -> Union[BoxSpace, TupleSpace]:
368        """Observation space for the Agent."""
369        obs_space = BoxSpace(
370            low=np.array(
371                [0 for _ in range(self.world.n_items)]
372                + [0 for _ in range(self.world.n_zones)]
373                + [0 for _ in range(self.world.n_zones_items)]
374            ),
375            high=np.array(
376                [np.inf for _ in range(self.world.n_items)]
377                + [1 for _ in range(self.world.n_zones)]
378                + [np.inf for _ in range(self.world.n_zones_items)]
379            ),
380        )
381
382        return obs_space
383
384    @property
385    def action_space(self) -> DiscreteSpace:
386        """Action space for the Agent.
387
388        Actions are expected to often be invalid.
389        """
390        return DiscreteSpace(len(self.world.transformations))
391
392    def action_masks(self) -> np.ndarray:
393        """Return boolean mask of valid actions."""
394        return np.array([t.is_valid(self.state) for t in self.world.transformations])
395
396    def step(
397        self, action: Union[int, str, np.ndarray]
398    ) -> Tuple[np.ndarray, float, bool, bool, dict]:
399        """Perform one step in the environment given the index of a wanted transformation.
400
401        If the selected transformation can be performed, the state is updated and
402        a reward is given depending of the environment tasks.
403        Else the state is left unchanged and the `invalid_reward` is given to the player.
404
405        """
406
407        if isinstance(action, np.ndarray):
408            if not action.size == 1:
409                raise TypeError(
410                    "Actions should be integers corresponding the a transformation index"
411                    f", got array with multiple elements:\n{action}."
412                )
413            action = action.flatten()[0]
414        try:
415            action = int(action)
416        except (TypeError, ValueError) as e:
417            raise TypeError(
418                "Actions should be integers corresponding the a transformation index."
419            ) from e
420
421        self.current_step += 1
422
423        self.task_successes.step_reset()
424        self.terminal_successes.step_reset()
425
426        success = self.state.apply(action)
427        if success:
428            reward = self.purpose.reward(self.state)
429        else:
430            reward = self.invalid_reward
431
432        terminated = self.purpose.is_terminal(self.state)
433
434        self.task_successes.update(self.episodes)
435        self.terminal_successes.update(self.episodes)
436
437        self.current_score += reward
438        self.cumulated_score += reward
439        return (
440            self.state.observation,
441            reward,
442            terminated,
443            self.truncated,
444            self.infos(),
445        )
446
447    def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]:
448        """Render the observation of the agent in a format depending on `render_mode`."""
449        if mode is not None:
450            self.render_mode = mode
451
452        if self.render_mode in ("human", "rgb_array"):  # for human interaction
453            return self._render_rgb_array()
454        if self.render_mode == "console":  # for console print
455            raise NotImplementedError
456        raise NotImplementedError
457
458    def reset(
459        self,
460        *,
461        seed: Optional[int] = None,
462        options: Optional[dict] = None,
463    ) -> Tuple[np.ndarray,]:
464        """Resets the state of the environement.
465
466        Returns:
467            (np.ndarray): The first observation.
468        """
469
470        if not self.purpose.built:
471            self.purpose.build(self)
472            self.task_successes = SuccessCounter(self.purpose.tasks)
473            self.terminal_successes = SuccessCounter(self.purpose.terminal_groups)
474
475        self.current_step = 0
476        self.current_score = 0
477        self.episodes += 1
478
479        self.task_successes.new_episode(self.episodes)
480        self.terminal_successes.new_episode(self.episodes)
481
482        self.state.reset()
483        self.purpose.reset()
484        return self.state.observation, self.infos()
485
486    def close(self):
487        """Closes the environment."""
488        if self.render_window is not None:
489            self.render_window.close()
490
491    @property
492    def all_behaviors(self) -> Dict[str, "Behavior"]:
493        """All solving behaviors using hebg."""
494        if self._all_behaviors is None:
495            self._all_behaviors = build_all_solving_behaviors(self)
496        return self._all_behaviors
497
498    def solving_behavior(self, task: "Task") -> "Behavior":
499        """Get the solving behavior for a given task.
500
501        Args:
502            task: Task to solve.
503
504        Returns:
505            Behavior: Behavior solving the task.
506
507        Example:
508            ```python
509            solving_behavior = env.solving_behavior(task)
510
511            done = False
512            observation, _info = env.reset()
513            while not done:
514                action = solving_behavior(observation)
515                observation, _reward, terminated, truncated, _info = env.step(action)
516                done = terminated or truncated
517
518            assert terminated  # Env is successfuly terminated
519            assert task.is_terminated # Task is successfuly terminated
520            ```
521        """
522        return self.all_behaviors[task_to_behavior_name(task)]
523
524    def planning_problem(self, **kwargs) -> HcraftPlanningProblem:
525        """Build this hcraft environment planning problem.
526
527        Returns:
528            Problem: Unified planning problem cooresponding to that environment.
529
530        Example:
531            Write as PDDL files:
532            ```python
533            from unified_planning.io import PDDLWriter
534            problem = env.planning_problem()
535            writer = PDDLWriter(problem.upf_problem)
536            writer.write_domain("domain.pddl")
537            writer.write_problem("problem.pddl")
538            ```
539
540            Using a plan to solve a HierarchyCraft gym environment:
541            ```python
542            hcraft_problem = env.planning_problem()
543
544            done = False
545
546            _observation, _info = env.reset()
547            while not done:
548                # Observations are not used when blindly following a plan
549                # But the state in required in order to replan if there is no plan left
550                action = hcraft_problem.action_from_plan(env.state)
551                _observation, _reward, terminated, truncated, _info = env.step(action)
552                done = terminated or truncated
553            assert env.purpose.is_terminated # Purpose is achieved
554            ```
555        """
556        return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)
557
558    def infos(self) -> dict:
559        infos = {
560            "action_is_legal": self.action_masks(),
561            "score": self.current_score,
562            "score_average": self.cumulated_score / self.episodes,
563        }
564        infos.update(self._tasks_infos())
565        return infos
566
567    def _tasks_infos(self):
568        infos = {}
569        infos.update(self.task_successes.done_infos)
570        infos.update(self.task_successes.rates_infos)
571        infos.update(self.terminal_successes.done_infos)
572        infos.update(self.terminal_successes.rates_infos)
573        return infos
574
575    def _render_rgb_array(self) -> np.ndarray:
576        """Render an image of the game.
577
578        Create the rendering window if not existing yet.
579        """
580        if self.render_window is None:
581            self.render_window = HcraftWindow()
582        if not self.render_window.built:
583            self.render_window.build(self)
584        fps = self.metadata.get("video.frames_per_second")
585        self.render_window.update_rendering(fps=fps)
586        return surface_to_rgb_array(self.render_window.screen)

API Documentation

class HcraftEnv(typing.Generic[~ObsType, ~ActType]): View Source

312class HcraftEnv(Env):
313    """Environment to simulate inventory management."""
314
315    def __init__(
316        self,
317        world: "World",
318        purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None,
319        invalid_reward: float = -1.0,
320        render_window: Optional[HcraftWindow] = None,
321        name: str = "HierarchyCraft",
322        max_step: Optional[int] = None,
323    ) -> None:
324        """
325        Args:
326            world: World defining the environment.
327            purpose: Purpose of the player, defining rewards and termination.
328                Defaults to None, hence a sandbox environment.
329            invalid_reward: Reward given to the agent for invalid actions.
330                Defaults to -1.0.
331            render_window: Window using to render the environment with pygame.
332            name: Name of the environement. Defaults to 'HierarchyCraft'.
333            max_step: (Optional[int], optional): Maximum number of steps before episode truncation.
334                If None, never truncates the episode. Defaults to None.
335        """
336        self.world = world
337        self.invalid_reward = invalid_reward
338        self.max_step = max_step
339        self.name = name
340        self._all_behaviors = None
341
342        self.render_window = render_window
343        self.render_mode = "rgb_array"
344
345        self.state = HcraftState(self.world)
346        self.current_step = 0
347        self.current_score = 0
348        self.cumulated_score = 0
349        self.episodes = 0
350        self.task_successes: Optional[SuccessCounter] = None
351        self.terminal_successes: Optional[SuccessCounter] = None
352
353        if purpose is None:
354            purpose = Purpose(None)
355        if not isinstance(purpose, Purpose):
356            purpose = Purpose(tasks=purpose)
357        self.purpose = purpose
358        self.metadata = {}
359
360    @property
361    def truncated(self) -> bool:
362        """Whether the time limit has been exceeded."""
363        if self.max_step is None:
364            return False
365        return self.current_step >= self.max_step
366
367    @property
368    def observation_space(self) -> Union[BoxSpace, TupleSpace]:
369        """Observation space for the Agent."""
370        obs_space = BoxSpace(
371            low=np.array(
372                [0 for _ in range(self.world.n_items)]
373                + [0 for _ in range(self.world.n_zones)]
374                + [0 for _ in range(self.world.n_zones_items)]
375            ),
376            high=np.array(
377                [np.inf for _ in range(self.world.n_items)]
378                + [1 for _ in range(self.world.n_zones)]
379                + [np.inf for _ in range(self.world.n_zones_items)]
380            ),
381        )
382
383        return obs_space
384
385    @property
386    def action_space(self) -> DiscreteSpace:
387        """Action space for the Agent.
388
389        Actions are expected to often be invalid.
390        """
391        return DiscreteSpace(len(self.world.transformations))
392
393    def action_masks(self) -> np.ndarray:
394        """Return boolean mask of valid actions."""
395        return np.array([t.is_valid(self.state) for t in self.world.transformations])
396
397    def step(
398        self, action: Union[int, str, np.ndarray]
399    ) -> Tuple[np.ndarray, float, bool, bool, dict]:
400        """Perform one step in the environment given the index of a wanted transformation.
401
402        If the selected transformation can be performed, the state is updated and
403        a reward is given depending of the environment tasks.
404        Else the state is left unchanged and the `invalid_reward` is given to the player.
405
406        """
407
408        if isinstance(action, np.ndarray):
409            if not action.size == 1:
410                raise TypeError(
411                    "Actions should be integers corresponding the a transformation index"
412                    f", got array with multiple elements:\n{action}."
413                )
414            action = action.flatten()[0]
415        try:
416            action = int(action)
417        except (TypeError, ValueError) as e:
418            raise TypeError(
419                "Actions should be integers corresponding the a transformation index."
420            ) from e
421
422        self.current_step += 1
423
424        self.task_successes.step_reset()
425        self.terminal_successes.step_reset()
426
427        success = self.state.apply(action)
428        if success:
429            reward = self.purpose.reward(self.state)
430        else:
431            reward = self.invalid_reward
432
433        terminated = self.purpose.is_terminal(self.state)
434
435        self.task_successes.update(self.episodes)
436        self.terminal_successes.update(self.episodes)
437
438        self.current_score += reward
439        self.cumulated_score += reward
440        return (
441            self.state.observation,
442            reward,
443            terminated,
444            self.truncated,
445            self.infos(),
446        )
447
448    def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]:
449        """Render the observation of the agent in a format depending on `render_mode`."""
450        if mode is not None:
451            self.render_mode = mode
452
453        if self.render_mode in ("human", "rgb_array"):  # for human interaction
454            return self._render_rgb_array()
455        if self.render_mode == "console":  # for console print
456            raise NotImplementedError
457        raise NotImplementedError
458
459    def reset(
460        self,
461        *,
462        seed: Optional[int] = None,
463        options: Optional[dict] = None,
464    ) -> Tuple[np.ndarray,]:
465        """Resets the state of the environement.
466
467        Returns:
468            (np.ndarray): The first observation.
469        """
470
471        if not self.purpose.built:
472            self.purpose.build(self)
473            self.task_successes = SuccessCounter(self.purpose.tasks)
474            self.terminal_successes = SuccessCounter(self.purpose.terminal_groups)
475
476        self.current_step = 0
477        self.current_score = 0
478        self.episodes += 1
479
480        self.task_successes.new_episode(self.episodes)
481        self.terminal_successes.new_episode(self.episodes)
482
483        self.state.reset()
484        self.purpose.reset()
485        return self.state.observation, self.infos()
486
487    def close(self):
488        """Closes the environment."""
489        if self.render_window is not None:
490            self.render_window.close()
491
492    @property
493    def all_behaviors(self) -> Dict[str, "Behavior"]:
494        """All solving behaviors using hebg."""
495        if self._all_behaviors is None:
496            self._all_behaviors = build_all_solving_behaviors(self)
497        return self._all_behaviors
498
499    def solving_behavior(self, task: "Task") -> "Behavior":
500        """Get the solving behavior for a given task.
501
502        Args:
503            task: Task to solve.
504
505        Returns:
506            Behavior: Behavior solving the task.
507
508        Example:
509            ```python
510            solving_behavior = env.solving_behavior(task)
511
512            done = False
513            observation, _info = env.reset()
514            while not done:
515                action = solving_behavior(observation)
516                observation, _reward, terminated, truncated, _info = env.step(action)
517                done = terminated or truncated
518
519            assert terminated  # Env is successfuly terminated
520            assert task.is_terminated # Task is successfuly terminated
521            ```
522        """
523        return self.all_behaviors[task_to_behavior_name(task)]
524
525    def planning_problem(self, **kwargs) -> HcraftPlanningProblem:
526        """Build this hcraft environment planning problem.
527
528        Returns:
529            Problem: Unified planning problem cooresponding to that environment.
530
531        Example:
532            Write as PDDL files:
533            ```python
534            from unified_planning.io import PDDLWriter
535            problem = env.planning_problem()
536            writer = PDDLWriter(problem.upf_problem)
537            writer.write_domain("domain.pddl")
538            writer.write_problem("problem.pddl")
539            ```
540
541            Using a plan to solve a HierarchyCraft gym environment:
542            ```python
543            hcraft_problem = env.planning_problem()
544
545            done = False
546
547            _observation, _info = env.reset()
548            while not done:
549                # Observations are not used when blindly following a plan
550                # But the state in required in order to replan if there is no plan left
551                action = hcraft_problem.action_from_plan(env.state)
552                _observation, _reward, terminated, truncated, _info = env.step(action)
553                done = terminated or truncated
554            assert env.purpose.is_terminated # Purpose is achieved
555            ```
556        """
557        return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)
558
559    def infos(self) -> dict:
560        infos = {
561            "action_is_legal": self.action_masks(),
562            "score": self.current_score,
563            "score_average": self.cumulated_score / self.episodes,
564        }
565        infos.update(self._tasks_infos())
566        return infos
567
568    def _tasks_infos(self):
569        infos = {}
570        infos.update(self.task_successes.done_infos)
571        infos.update(self.task_successes.rates_infos)
572        infos.update(self.terminal_successes.done_infos)
573        infos.update(self.terminal_successes.rates_infos)
574        return infos
575
576    def _render_rgb_array(self) -> np.ndarray:
577        """Render an image of the game.
578
579        Create the rendering window if not existing yet.
580        """
581        if self.render_window is None:
582            self.render_window = HcraftWindow()
583        if not self.render_window.built:
584            self.render_window.build(self)
585        fps = self.metadata.get("video.frames_per_second")
586        self.render_window.update_rendering(fps=fps)
587        return surface_to_rgb_array(self.render_window.screen)

Environment to simulate inventory management.

HcraftEnv( world: hcraft.world.World, purpose: Union[hcraft.Purpose, List[hcraft.task.Task], hcraft.task.Task, NoneType] = None, invalid_reward: float = -1.0, render_window: Optional[hcraft.render.render.HcraftWindow] = None, name: str = 'HierarchyCraft', max_step: Optional[int] = None) View Source

315    def __init__(
316        self,
317        world: "World",
318        purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None,
319        invalid_reward: float = -1.0,
320        render_window: Optional[HcraftWindow] = None,
321        name: str = "HierarchyCraft",
322        max_step: Optional[int] = None,
323    ) -> None:
324        """
325        Args:
326            world: World defining the environment.
327            purpose: Purpose of the player, defining rewards and termination.
328                Defaults to None, hence a sandbox environment.
329            invalid_reward: Reward given to the agent for invalid actions.
330                Defaults to -1.0.
331            render_window: Window using to render the environment with pygame.
332            name: Name of the environement. Defaults to 'HierarchyCraft'.
333            max_step: (Optional[int], optional): Maximum number of steps before episode truncation.
334                If None, never truncates the episode. Defaults to None.
335        """
336        self.world = world
337        self.invalid_reward = invalid_reward
338        self.max_step = max_step
339        self.name = name
340        self._all_behaviors = None
341
342        self.render_window = render_window
343        self.render_mode = "rgb_array"
344
345        self.state = HcraftState(self.world)
346        self.current_step = 0
347        self.current_score = 0
348        self.cumulated_score = 0
349        self.episodes = 0
350        self.task_successes: Optional[SuccessCounter] = None
351        self.terminal_successes: Optional[SuccessCounter] = None
352
353        if purpose is None:
354            purpose = Purpose(None)
355        if not isinstance(purpose, Purpose):
356            purpose = Purpose(tasks=purpose)
357        self.purpose = purpose
358        self.metadata = {}

Arguments:

world: World defining the environment.
purpose: Purpose of the player, defining rewards and termination. Defaults to None, hence a sandbox environment.
invalid_reward: Reward given to the agent for invalid actions. Defaults to -1.0.
render_window: Window using to render the environment with pygame.
name: Name of the environement. Defaults to 'HierarchyCraft'.
max_step: (Optional[int], optional): Maximum number of steps before episode truncation. If None, never truncates the episode. Defaults to None.

world

invalid_reward

max_step

name

render_window

render_mode = None

state

current_step

current_score

cumulated_score

episodes

task_successes: Optional[hcraft.metrics.SuccessCounter]

terminal_successes: Optional[hcraft.metrics.SuccessCounter]

purpose

metadata = {'render_modes': []}

truncated: bool View Source

360    @property
361    def truncated(self) -> bool:
362        """Whether the time limit has been exceeded."""
363        if self.max_step is None:
364            return False
365        return self.current_step >= self.max_step

Whether the time limit has been exceeded.

observation_space: Union[gymnasium.spaces.box.Box, gymnasium.spaces.tuple.Tuple] View Source

367    @property
368    def observation_space(self) -> Union[BoxSpace, TupleSpace]:
369        """Observation space for the Agent."""
370        obs_space = BoxSpace(
371            low=np.array(
372                [0 for _ in range(self.world.n_items)]
373                + [0 for _ in range(self.world.n_zones)]
374                + [0 for _ in range(self.world.n_zones_items)]
375            ),
376            high=np.array(
377                [np.inf for _ in range(self.world.n_items)]
378                + [1 for _ in range(self.world.n_zones)]
379                + [np.inf for _ in range(self.world.n_zones_items)]
380            ),
381        )
382
383        return obs_space

Observation space for the Agent.

action_space: gymnasium.spaces.discrete.Discrete View Source

385    @property
386    def action_space(self) -> DiscreteSpace:
387        """Action space for the Agent.
388
389        Actions are expected to often be invalid.
390        """
391        return DiscreteSpace(len(self.world.transformations))

Action space for the Agent.

Actions are expected to often be invalid.

def action_masks(self) -> numpy.ndarray: View Source

393    def action_masks(self) -> np.ndarray:
394        """Return boolean mask of valid actions."""
395        return np.array([t.is_valid(self.state) for t in self.world.transformations])

Return boolean mask of valid actions.

def step( self, action: Union[int, str, numpy.ndarray]) -> Tuple[numpy.ndarray, float, bool, bool, dict]: View Source

397    def step(
398        self, action: Union[int, str, np.ndarray]
399    ) -> Tuple[np.ndarray, float, bool, bool, dict]:
400        """Perform one step in the environment given the index of a wanted transformation.
401
402        If the selected transformation can be performed, the state is updated and
403        a reward is given depending of the environment tasks.
404        Else the state is left unchanged and the `invalid_reward` is given to the player.
405
406        """
407
408        if isinstance(action, np.ndarray):
409            if not action.size == 1:
410                raise TypeError(
411                    "Actions should be integers corresponding the a transformation index"
412                    f", got array with multiple elements:\n{action}."
413                )
414            action = action.flatten()[0]
415        try:
416            action = int(action)
417        except (TypeError, ValueError) as e:
418            raise TypeError(
419                "Actions should be integers corresponding the a transformation index."
420            ) from e
421
422        self.current_step += 1
423
424        self.task_successes.step_reset()
425        self.terminal_successes.step_reset()
426
427        success = self.state.apply(action)
428        if success:
429            reward = self.purpose.reward(self.state)
430        else:
431            reward = self.invalid_reward
432
433        terminated = self.purpose.is_terminal(self.state)
434
435        self.task_successes.update(self.episodes)
436        self.terminal_successes.update(self.episodes)
437
438        self.current_score += reward
439        self.cumulated_score += reward
440        return (
441            self.state.observation,
442            reward,
443            terminated,
444            self.truncated,
445            self.infos(),
446        )

Perform one step in the environment given the index of a wanted transformation.

If the selected transformation can be performed, the state is updated and a reward is given depending of the environment tasks. Else the state is left unchanged and the invalid_reward is given to the player.

def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, numpy.ndarray]: View Source

448    def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]:
449        """Render the observation of the agent in a format depending on `render_mode`."""
450        if mode is not None:
451            self.render_mode = mode
452
453        if self.render_mode in ("human", "rgb_array"):  # for human interaction
454            return self._render_rgb_array()
455        if self.render_mode == "console":  # for console print
456            raise NotImplementedError
457        raise NotImplementedError

Render the observation of the agent in a format depending on render_mode.

def reset( self, *, seed: Optional[int] = None, options: Optional[dict] = None) -> Tuple[numpy.ndarray]: View Source

459    def reset(
460        self,
461        *,
462        seed: Optional[int] = None,
463        options: Optional[dict] = None,
464    ) -> Tuple[np.ndarray,]:
465        """Resets the state of the environement.
466
467        Returns:
468            (np.ndarray): The first observation.
469        """
470
471        if not self.purpose.built:
472            self.purpose.build(self)
473            self.task_successes = SuccessCounter(self.purpose.tasks)
474            self.terminal_successes = SuccessCounter(self.purpose.terminal_groups)
475
476        self.current_step = 0
477        self.current_score = 0
478        self.episodes += 1
479
480        self.task_successes.new_episode(self.episodes)
481        self.terminal_successes.new_episode(self.episodes)
482
483        self.state.reset()
484        self.purpose.reset()
485        return self.state.observation, self.infos()

Resets the state of the environement.

Returns:

(np.ndarray): The first observation.

def close(self): View Source

487    def close(self):
488        """Closes the environment."""
489        if self.render_window is not None:
490            self.render_window.close()

Closes the environment.

all_behaviors: Dict[str, hebg.behavior.Behavior] View Source

492    @property
493    def all_behaviors(self) -> Dict[str, "Behavior"]:
494        """All solving behaviors using hebg."""
495        if self._all_behaviors is None:
496            self._all_behaviors = build_all_solving_behaviors(self)
497        return self._all_behaviors

All solving behaviors using hebg.

def solving_behavior(self, task: hcraft.task.Task) -> hebg.behavior.Behavior: View Source

499    def solving_behavior(self, task: "Task") -> "Behavior":
500        """Get the solving behavior for a given task.
501
502        Args:
503            task: Task to solve.
504
505        Returns:
506            Behavior: Behavior solving the task.
507
508        Example:
509            ```python
510            solving_behavior = env.solving_behavior(task)
511
512            done = False
513            observation, _info = env.reset()
514            while not done:
515                action = solving_behavior(observation)
516                observation, _reward, terminated, truncated, _info = env.step(action)
517                done = terminated or truncated
518
519            assert terminated  # Env is successfuly terminated
520            assert task.is_terminated # Task is successfuly terminated
521            ```
522        """
523        return self.all_behaviors[task_to_behavior_name(task)]

Get the solving behavior for a given task.

Arguments:

task: Task to solve.

Returns:

Behavior: Behavior solving the task.

Example:

solving_behavior = env.solving_behavior(task)

done = False
observation, _info = env.reset()
while not done:
    action = solving_behavior(observation)
    observation, _reward, terminated, truncated, _info = env.step(action)
    done = terminated or truncated

assert terminated  # Env is successfuly terminated
assert task.is_terminated # Task is successfuly terminated

def planning_problem(self, **kwargs) -> hcraft.planning.HcraftPlanningProblem: View Source

525    def planning_problem(self, **kwargs) -> HcraftPlanningProblem:
526        """Build this hcraft environment planning problem.
527
528        Returns:
529            Problem: Unified planning problem cooresponding to that environment.
530
531        Example:
532            Write as PDDL files:
533            ```python
534            from unified_planning.io import PDDLWriter
535            problem = env.planning_problem()
536            writer = PDDLWriter(problem.upf_problem)
537            writer.write_domain("domain.pddl")
538            writer.write_problem("problem.pddl")
539            ```
540
541            Using a plan to solve a HierarchyCraft gym environment:
542            ```python
543            hcraft_problem = env.planning_problem()
544
545            done = False
546
547            _observation, _info = env.reset()
548            while not done:
549                # Observations are not used when blindly following a plan
550                # But the state in required in order to replan if there is no plan left
551                action = hcraft_problem.action_from_plan(env.state)
552                _observation, _reward, terminated, truncated, _info = env.step(action)
553                done = terminated or truncated
554            assert env.purpose.is_terminated # Purpose is achieved
555            ```
556        """
557        return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)

Build this hcraft environment planning problem.

Returns:

Problem: Unified planning problem cooresponding to that environment.

Example:

Write as PDDL files:

from unified_planning.io import PDDLWriter
problem = env.planning_problem()
writer = PDDLWriter(problem.upf_problem)
writer.write_domain("domain.pddl")
writer.write_problem("problem.pddl")

Using a plan to solve a HierarchyCraft gym environment:

hcraft_problem = env.planning_problem()

done = False

_observation, _info = env.reset()
while not done:
    # Observations are not used when blindly following a plan
    # But the state in required in order to replan if there is no plan left
    action = hcraft_problem.action_from_plan(env.state)
    _observation, _reward, terminated, truncated, _info = env.step(action)
    done = terminated or truncated
assert env.purpose.is_terminated # Purpose is achieved

def infos(self) -> dict: View Source

559    def infos(self) -> dict:
560        infos = {
561            "action_is_legal": self.action_masks(),
562            "score": self.current_score,
563            "score_average": self.cumulated_score / self.episodes,
564        }
565        infos.update(self._tasks_infos())
566        return infos

Inherited Members

gymnasium.core.Env: spec; unwrapped; np_random_seed; np_random; has_wrapper_attr; get_wrapper_attr; set_wrapper_attr