HierarchyCraft - Environements builder for hierarchical reasoning research

HierarchyCraft

HierarchyCraft (hcraft for short) is a Python library designed to create arbitrary hierarchical environments that are compatible with both the OpenAI Gym Reinforcement Learning Framework and AIPlan4EU Unified Planning Framework. This library enables users to easily create complex hierarchical structures that can be used to test and develop various reinforcement learning or planning algorithms.

In environments built with HierarchyCraft the agent (player) has an inventory and can navigate into abstract zones that themselves have inventories.

The action space of HierarchyCraft environments consists of sub-tasks, referred to as Transformations, as opposed to detailed movements and controls. But each Transformations has specific requirements to be valid (eg. have enought of an item, be in the right place), and these requirements may necessitate the execution of other Transformations first, inherently creating a hierarchical structure in HierarchyCraft environments.

This concept is visually represented by the Requirements graph depicting the hierarchical relationships within each HierarchyCraft environment. The Requirements graph is directly constructed from the list of Transformations composing the environement.

More details about requirements graph can be found in the documentation at hcraft.requirements and example of requirements graph for some HierarchyCraft environements can be found in hcraft.examples.

No feature extraction for fast research even with low compute

HierarchyCraft returns vectorized state information, which plainly and directly describes the player's inventory, current positions, and the inventory of the current zone. Compared to benchmarks that return grids, pixel arrays, text or sound, we directly return a low-dimensional latent representation that doesn't need to be learned. Therefore saving compute time and allowing researchers to focus only the the hierarchical reasoning part.

See hcraft.state for more details.

Create your own tailored HierarchyCraft environments

You can use HierarchyCraft to create various custom hierarchical environments from a list of customized Transformations.

See hcraft.env for a complete tutorial on creating custom environments.

Installation

Using pip

Without optional dependencies:

pip install hcraft

All hcraft environments can use a common graphical user interface that can be used with gui requirements:

pip install hcraft[gui]

Gym environment can be obtained with gym requirements:

pip install hcraft[gym]

Planning problems can be obtained throught the upf interface with planning requirements:

pip install hcraft[planning]

Some complex graph can be represented in html interactive visualisation:

pip install hcraft[htmlvis]

Quickstart

Play yourself!

A player knowing Minecraft will find MineHcraft easy.

Install the graphical user interface optional dependencies:

pip install hcraft[gui]

Using the command line interface

You can directly try to play yourself with the GUI available for any HierarchyCraft environments, for example:

hcraft minecraft

For more examples:

hcraft --help

Using the programmatic interface:

from hcraft import get_human_action
from hcraft.examples import MineHcraftEnv

env = MineHcraftEnv()
# or env: MineHcraftEnv = gym.make("MineHcraft-NoReward-v1")
n_episodes = 2
for _ in range(n_episodes):
    env.reset()
    done = False
    total_reward = 0
    while not done:
        env.render()
        action = get_human_action(env)
        print(f"Human pressed: {env.world.transformations[action]}")

        _observation, reward, done, _info = env.step(action)
        total_reward += reward

    print(f"SCORE: {total_reward}")

As a Gym RL environment

Using the programmatic interface, any HierarchyCraft environment can easily be interfaced with classic reinforcement learning agents.

import numpy as np
from hcraft.examples import MineHcraftEnv

def random_legal_agent(observation, action_is_legal):
    action = np.random.choice(np.nonzero(action_is_legal)[0])
    return int(action)

env = MineHcraftEnv(max_step=10)
done = False
observation, _info = env.reset()
while not done:
    action_is_legal = env.action_masks()
    action = random_legal_agent(observation, action_is_legal)
    _observation, _reward, terminated, truncated, _info = env.step(action)

# Other examples of HierarchyCraft environments
from hcraft.examples import  TowerHcraftEnv, RecursiveHcraftEnv, RandomHcraftEnv

tower_env = TowerHcraftEnv(height=3, width=2)
# or tower_env = gym.make("TowerHcraft-v1", height=3, width=2)
recursive_env = RecursiveHcraftEnv(n_items=6)
# or recursive_env = gym.make("RecursiveHcraft-v1", n_items=6)
random_env = RandomHcraftEnv(n_items_per_n_inputs={0:2, 1:5, 2:10}, seed=42)
# or random_env = gym.make("RandomHcraft-v1", n_items_per_n_inputs={0:2, 1:5, 2:10}, seed=42)

See hcraft.env for a more complete description.

As a UPF problem for planning

HierarchyCraft environments can be converted to planning problem in one line thanks to the Unified Planning Framework (UPF):

# Example env
env = TowerHcraftEnv(height=3, width=2)

# Make it into a unified planning problem
planning_problem = env.planning_problem()
print(planning_problem.upf_problem)

Then they can be solved with any compatible planner for UPF:

# Solve the planning problem and show the plan
planning_problem.solve()
print(planning_problem.plan)

The planning_problem can also give actions to do in the environment, triggering replaning if necessary:

done = False
_observation, _info = env.reset()
while not done:
    # Automatically replan at the end of each plan until env termination

    # Observations are not used when blindly following a current plan
    # But the state in required in order to replan if there is no plan left
    action = planning_problem.action_from_plan(env.state)
    if action is None:
        # Plan is existing but empty, thus nothing to do, thus terminates
        done = True
        continue
    _observation, _reward, terminated, truncated, _info = env.step(action)
    done = terminated or truncated

if terminated:
    print("Success ! The plan worked in the actual environment !")
else:
    print("Failed ... Something went wrong with the plan or the episode was truncated.")

See hcraft.planning for a more complete description.

More about HierarchyCraft

Online documentation

Learn more in the DOCUMENTATION

Contributing

You want to contribute to HierarchyCraft ? See our contributions guidelines and join us !

Custom purposes for agents in HierarchyCraft environments

HierarchyCraft allows users to specify custom purposes (one or multiple tasks) for agents in their environments. This feature provides a high degree of flexibility and allows users to design environments that are tailored to specific applications or scenarios. This feature enables to study mutli-task or lifelong learning settings.

See hcraft.purpose for more details.

Solving behavior for all tasks of most HierarchyCraft environments

HierarchyCraft also includes solving behaviors that can be used to generate actions from observations that will complete most tasks in any HierarchyCraft environment, including user-designed. Solving behaviors are handcrafted, and may not work in some edge cases when some items are rquired in specific zones. This feature makes it easy for users to obtain a strong baseline in their custom environments.

See hcraft.solving_behaviors for more details.

Visualizing the underlying hierarchy of the environment (requirements graph)

HierarchyCraft gives the ability to visualize the hierarchy of the environment as a requirements graph. This graph provides a potentialy complex but complete representation of what is required to obtain each item or to go in each zone, allowing users to easily understand the structure of the environment and identify key items of the environment.

For example, here is the graph of the 'MiniCraftUnlock' environment where the goal is to open a door using a key: Unlock requirements graph

And here is much more complex graph of the 'MineHcraft' environment shown previously: Minehcraft requirements graph

See hcraft.requirements for more details.

View Source

 1"""
 2.. include:: ../../README.md
 3
 4## Custom purposes for agents in HierarchyCraft environments
 5
 6HierarchyCraft allows users to specify custom purposes (one or multiple tasks) for agents in their environments.
 7This feature provides a high degree of flexibility and allows users to design environments that
 8are tailored to specific applications or scenarios.
 9This feature enables to study mutli-task or lifelong learning settings.
10
11See [`hcraft.purpose`](https://irll.github.io/HierarchyCraft/hcraft/purpose.html) for more details.
12
13## Solving behavior for all tasks of most HierarchyCraft environments
14
15HierarchyCraft also includes solving behaviors that can be used to generate actions
16from observations that will complete most tasks in any HierarchyCraft environment, including user-designed.
17Solving behaviors are handcrafted, and may not work in some edge cases when some items are rquired in specific zones.
18This feature makes it easy for users to obtain a strong baseline in their custom environments.
19
20See [`hcraft.solving_behaviors`](https://irll.github.io/HierarchyCraft/hcraft/solving_behaviors.html) for more details.
21
22## Visualizing the underlying hierarchy of the environment (requirements graph)
23
24HierarchyCraft gives the ability to visualize the hierarchy of the environment as a requirements graph.
25This graph provides a potentialy complex but complete representation of what is required
26to obtain each item or to go in each zone, allowing users to easily understand the structure
27of the environment and identify key items of the environment.
28
29For example, here is the graph of the 'MiniCraftUnlock' environment where the goal is to open a door using a key:
30![Unlock requirements graph](../../docs/images/requirements_graphs/MiniHCraftUnlock.png)
31
32
33And here is much more complex graph of the 'MineHcraft' environment shown previously:
34![Minehcraft requirements graph](../../docs/images/requirements_graphs/MineHcraft.png)
35
36See [`hcraft.requirements`](https://irll.github.io/HierarchyCraft/hcraft/requirements.html) for more details.
37
38"""
39
40import hcraft.state as state
41import hcraft.solving_behaviors as solving_behaviors
42import hcraft.purpose as purpose
43import hcraft.transformation as transformation
44import hcraft.requirements as requirements
45import hcraft.env as env
46import hcraft.examples as examples
47import hcraft.world as world
48import hcraft.planning as planning
49
50from hcraft.elements import Item, Stack, Zone
51from hcraft.transformation import Transformation
52from hcraft.env import HcraftEnv, HcraftState
53from hcraft.purpose import Purpose
54from hcraft.render.human import get_human_action, render_env_with_human
55from hcraft.task import GetItemTask, GoToZoneTask, PlaceItemTask
56
57
58__all__ = [
59    "HcraftState",
60    "Transformation",
61    "Item",
62    "Stack",
63    "Zone",
64    "HcraftEnv",
65    "get_human_action",
66    "render_env_with_human",
67    "Purpose",
68    "GetItemTask",
69    "GoToZoneTask",
70    "PlaceItemTask",
71    "state",
72    "transformation",
73    "purpose",
74    "solving_behaviors",
75    "requirements",
76    "world",
77    "env",
78    "planning",
79    "examples",
80]

API Documentation

@dataclass(frozen=True)

class Item: View Source

5@dataclass(frozen=True)
6class Item:
7    """Represent an item for any hcraft environement."""
8
9    name: str

Represent an item for any hcraft environement.

Item(name: str)

name: str

@dataclass(frozen=True)

class Stack: View Source

12@dataclass(frozen=True)
13class Stack:
14    """Represent a stack of an item for any hcraft environement"""
15
16    item: Item
17    quantity: int = 1
18
19    def __str__(self) -> str:
20        quantity_str = f"[{self.quantity}]" if self.quantity > 1 else ""
21        return f"{quantity_str}{self.item.name}"

Represent a stack of an item for any hcraft environement

Stack(item: Item, quantity: int = 1)

item: Item

quantity: int = 1

@dataclass(frozen=True)

class Zone: View Source

24@dataclass(frozen=True)
25class Zone:
26    """Represent a zone for any hcraft environement."""
27
28    name: str

Represent a zone for any hcraft environement.

Zone(name: str)

name: str

class HcraftEnv(typing.Generic[~ObsType, ~ActType]): View Source

312class HcraftEnv(Env):
313    """Environment to simulate inventory management."""
314
315    def __init__(
316        self,
317        world: "World",
318        purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None,
319        invalid_reward: float = -1.0,
320        render_window: Optional[HcraftWindow] = None,
321        name: str = "HierarchyCraft",
322        max_step: Optional[int] = None,
323    ) -> None:
324        """
325        Args:
326            world: World defining the environment.
327            purpose: Purpose of the player, defining rewards and termination.
328                Defaults to None, hence a sandbox environment.
329            invalid_reward: Reward given to the agent for invalid actions.
330                Defaults to -1.0.
331            render_window: Window using to render the environment with pygame.
332            name: Name of the environement. Defaults to 'HierarchyCraft'.
333            max_step: (Optional[int], optional): Maximum number of steps before episode truncation.
334                If None, never truncates the episode. Defaults to None.
335        """
336        self.world = world
337        self.invalid_reward = invalid_reward
338        self.max_step = max_step
339        self.name = name
340        self._all_behaviors = None
341
342        self.render_window = render_window
343        self.render_mode = "rgb_array"
344
345        self.state = HcraftState(self.world)
346        self.current_step = 0
347        self.current_score = 0
348        self.cumulated_score = 0
349        self.episodes = 0
350        self.task_successes: Optional[SuccessCounter] = None
351        self.terminal_successes: Optional[SuccessCounter] = None
352
353        if purpose is None:
354            purpose = Purpose(None)
355        if not isinstance(purpose, Purpose):
356            purpose = Purpose(tasks=purpose)
357        self.purpose = purpose
358        self.metadata = {}
359
360    @property
361    def truncated(self) -> bool:
362        """Whether the time limit has been exceeded."""
363        if self.max_step is None:
364            return False
365        return self.current_step >= self.max_step
366
367    @property
368    def observation_space(self) -> Union[BoxSpace, TupleSpace]:
369        """Observation space for the Agent."""
370        obs_space = BoxSpace(
371            low=np.array(
372                [0 for _ in range(self.world.n_items)]
373                + [0 for _ in range(self.world.n_zones)]
374                + [0 for _ in range(self.world.n_zones_items)]
375            ),
376            high=np.array(
377                [np.inf for _ in range(self.world.n_items)]
378                + [1 for _ in range(self.world.n_zones)]
379                + [np.inf for _ in range(self.world.n_zones_items)]
380            ),
381        )
382
383        return obs_space
384
385    @property
386    def action_space(self) -> DiscreteSpace:
387        """Action space for the Agent.
388
389        Actions are expected to often be invalid.
390        """
391        return DiscreteSpace(len(self.world.transformations))
392
393    def action_masks(self) -> np.ndarray:
394        """Return boolean mask of valid actions."""
395        return np.array([t.is_valid(self.state) for t in self.world.transformations])
396
397    def step(
398        self, action: Union[int, str, np.ndarray]
399    ) -> Tuple[np.ndarray, float, bool, bool, dict]:
400        """Perform one step in the environment given the index of a wanted transformation.
401
402        If the selected transformation can be performed, the state is updated and
403        a reward is given depending of the environment tasks.
404        Else the state is left unchanged and the `invalid_reward` is given to the player.
405
406        """
407
408        if isinstance(action, np.ndarray):
409            if not action.size == 1:
410                raise TypeError(
411                    "Actions should be integers corresponding the a transformation index"
412                    f", got array with multiple elements:\n{action}."
413                )
414            action = action.flatten()[0]
415        try:
416            action = int(action)
417        except (TypeError, ValueError) as e:
418            raise TypeError(
419                "Actions should be integers corresponding the a transformation index."
420            ) from e
421
422        self.current_step += 1
423
424        self.task_successes.step_reset()
425        self.terminal_successes.step_reset()
426
427        success = self.state.apply(action)
428        if success:
429            reward = self.purpose.reward(self.state)
430        else:
431            reward = self.invalid_reward
432
433        terminated = self.purpose.is_terminal(self.state)
434
435        self.task_successes.update(self.episodes)
436        self.terminal_successes.update(self.episodes)
437
438        self.current_score += reward
439        self.cumulated_score += reward
440        return (
441            self.state.observation,
442            reward,
443            terminated,
444            self.truncated,
445            self.infos(),
446        )
447
448    def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]:
449        """Render the observation of the agent in a format depending on `render_mode`."""
450        if mode is not None:
451            self.render_mode = mode
452
453        if self.render_mode in ("human", "rgb_array"):  # for human interaction
454            return self._render_rgb_array()
455        if self.render_mode == "console":  # for console print
456            raise NotImplementedError
457        raise NotImplementedError
458
459    def reset(
460        self,
461        *,
462        seed: Optional[int] = None,
463        options: Optional[dict] = None,
464    ) -> Tuple[np.ndarray,]:
465        """Resets the state of the environement.
466
467        Returns:
468            (np.ndarray): The first observation.
469        """
470
471        if not self.purpose.built:
472            self.purpose.build(self)
473            self.task_successes = SuccessCounter(self.purpose.tasks)
474            self.terminal_successes = SuccessCounter(self.purpose.terminal_groups)
475
476        self.current_step = 0
477        self.current_score = 0
478        self.episodes += 1
479
480        self.task_successes.new_episode(self.episodes)
481        self.terminal_successes.new_episode(self.episodes)
482
483        self.state.reset()
484        self.purpose.reset()
485        return self.state.observation, self.infos()
486
487    def close(self):
488        """Closes the environment."""
489        if self.render_window is not None:
490            self.render_window.close()
491
492    @property
493    def all_behaviors(self) -> Dict[str, "Behavior"]:
494        """All solving behaviors using hebg."""
495        if self._all_behaviors is None:
496            self._all_behaviors = build_all_solving_behaviors(self)
497        return self._all_behaviors
498
499    def solving_behavior(self, task: "Task") -> "Behavior":
500        """Get the solving behavior for a given task.
501
502        Args:
503            task: Task to solve.
504
505        Returns:
506            Behavior: Behavior solving the task.
507
508        Example:
509            ```python
510            solving_behavior = env.solving_behavior(task)
511
512            done = False
513            observation, _info = env.reset()
514            while not done:
515                action = solving_behavior(observation)
516                observation, _reward, terminated, truncated, _info = env.step(action)
517                done = terminated or truncated
518
519            assert terminated  # Env is successfuly terminated
520            assert task.is_terminated # Task is successfuly terminated
521            ```
522        """
523        return self.all_behaviors[task_to_behavior_name(task)]
524
525    def planning_problem(self, **kwargs) -> HcraftPlanningProblem:
526        """Build this hcraft environment planning problem.
527
528        Returns:
529            Problem: Unified planning problem cooresponding to that environment.
530
531        Example:
532            Write as PDDL files:
533            ```python
534            from unified_planning.io import PDDLWriter
535            problem = env.planning_problem()
536            writer = PDDLWriter(problem.upf_problem)
537            writer.write_domain("domain.pddl")
538            writer.write_problem("problem.pddl")
539            ```
540
541            Using a plan to solve a HierarchyCraft gym environment:
542            ```python
543            hcraft_problem = env.planning_problem()
544
545            done = False
546
547            _observation, _info = env.reset()
548            while not done:
549                # Observations are not used when blindly following a plan
550                # But the state in required in order to replan if there is no plan left
551                action = hcraft_problem.action_from_plan(env.state)
552                _observation, _reward, terminated, truncated, _info = env.step(action)
553                done = terminated or truncated
554            assert env.purpose.is_terminated # Purpose is achieved
555            ```
556        """
557        return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)
558
559    def infos(self) -> dict:
560        infos = {
561            "action_is_legal": self.action_masks(),
562            "score": self.current_score,
563            "score_average": self.cumulated_score / self.episodes,
564        }
565        infos.update(self._tasks_infos())
566        return infos
567
568    def _tasks_infos(self):
569        infos = {}
570        infos.update(self.task_successes.done_infos)
571        infos.update(self.task_successes.rates_infos)
572        infos.update(self.terminal_successes.done_infos)
573        infos.update(self.terminal_successes.rates_infos)
574        return infos
575
576    def _render_rgb_array(self) -> np.ndarray:
577        """Render an image of the game.
578
579        Create the rendering window if not existing yet.
580        """
581        if self.render_window is None:
582            self.render_window = HcraftWindow()
583        if not self.render_window.built:
584            self.render_window.build(self)
585        fps = self.metadata.get("video.frames_per_second")
586        self.render_window.update_rendering(fps=fps)
587        return surface_to_rgb_array(self.render_window.screen)

Environment to simulate inventory management.

HcraftEnv( world: hcraft.world.World, purpose: Union[Purpose, List[hcraft.task.Task], hcraft.task.Task, NoneType] = None, invalid_reward: float = -1.0, render_window: Optional[hcraft.render.render.HcraftWindow] = None, name: str = 'HierarchyCraft', max_step: Optional[int] = None) View Source

315    def __init__(
316        self,
317        world: "World",
318        purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None,
319        invalid_reward: float = -1.0,
320        render_window: Optional[HcraftWindow] = None,
321        name: str = "HierarchyCraft",
322        max_step: Optional[int] = None,
323    ) -> None:
324        """
325        Args:
326            world: World defining the environment.
327            purpose: Purpose of the player, defining rewards and termination.
328                Defaults to None, hence a sandbox environment.
329            invalid_reward: Reward given to the agent for invalid actions.
330                Defaults to -1.0.
331            render_window: Window using to render the environment with pygame.
332            name: Name of the environement. Defaults to 'HierarchyCraft'.
333            max_step: (Optional[int], optional): Maximum number of steps before episode truncation.
334                If None, never truncates the episode. Defaults to None.
335        """
336        self.world = world
337        self.invalid_reward = invalid_reward
338        self.max_step = max_step
339        self.name = name
340        self._all_behaviors = None
341
342        self.render_window = render_window
343        self.render_mode = "rgb_array"
344
345        self.state = HcraftState(self.world)
346        self.current_step = 0
347        self.current_score = 0
348        self.cumulated_score = 0
349        self.episodes = 0
350        self.task_successes: Optional[SuccessCounter] = None
351        self.terminal_successes: Optional[SuccessCounter] = None
352
353        if purpose is None:
354            purpose = Purpose(None)
355        if not isinstance(purpose, Purpose):
356            purpose = Purpose(tasks=purpose)
357        self.purpose = purpose
358        self.metadata = {}

Arguments:

world: World defining the environment.
purpose: Purpose of the player, defining rewards and termination. Defaults to None, hence a sandbox environment.
invalid_reward: Reward given to the agent for invalid actions. Defaults to -1.0.
render_window: Window using to render the environment with pygame.
name: Name of the environement. Defaults to 'HierarchyCraft'.
max_step: (Optional[int], optional): Maximum number of steps before episode truncation. If None, never truncates the episode. Defaults to None.

world

invalid_reward

max_step

name

render_window

render_mode = None

state

current_step

current_score

cumulated_score

episodes

task_successes: Optional[hcraft.metrics.SuccessCounter]

terminal_successes: Optional[hcraft.metrics.SuccessCounter]

purpose

metadata = {'render_modes': []}

truncated: bool View Source

360    @property
361    def truncated(self) -> bool:
362        """Whether the time limit has been exceeded."""
363        if self.max_step is None:
364            return False
365        return self.current_step >= self.max_step

Whether the time limit has been exceeded.

observation_space: Union[gymnasium.spaces.box.Box, gymnasium.spaces.tuple.Tuple] View Source

367    @property
368    def observation_space(self) -> Union[BoxSpace, TupleSpace]:
369        """Observation space for the Agent."""
370        obs_space = BoxSpace(
371            low=np.array(
372                [0 for _ in range(self.world.n_items)]
373                + [0 for _ in range(self.world.n_zones)]
374                + [0 for _ in range(self.world.n_zones_items)]
375            ),
376            high=np.array(
377                [np.inf for _ in range(self.world.n_items)]
378                + [1 for _ in range(self.world.n_zones)]
379                + [np.inf for _ in range(self.world.n_zones_items)]
380            ),
381        )
382
383        return obs_space

Observation space for the Agent.

action_space: gymnasium.spaces.discrete.Discrete View Source

385    @property
386    def action_space(self) -> DiscreteSpace:
387        """Action space for the Agent.
388
389        Actions are expected to often be invalid.
390        """
391        return DiscreteSpace(len(self.world.transformations))

Action space for the Agent.

Actions are expected to often be invalid.

def action_masks(self) -> numpy.ndarray: View Source

393    def action_masks(self) -> np.ndarray:
394        """Return boolean mask of valid actions."""
395        return np.array([t.is_valid(self.state) for t in self.world.transformations])

Return boolean mask of valid actions.

def step( self, action: Union[int, str, numpy.ndarray]) -> Tuple[numpy.ndarray, float, bool, bool, dict]: View Source

397    def step(
398        self, action: Union[int, str, np.ndarray]
399    ) -> Tuple[np.ndarray, float, bool, bool, dict]:
400        """Perform one step in the environment given the index of a wanted transformation.
401
402        If the selected transformation can be performed, the state is updated and
403        a reward is given depending of the environment tasks.
404        Else the state is left unchanged and the `invalid_reward` is given to the player.
405
406        """
407
408        if isinstance(action, np.ndarray):
409            if not action.size == 1:
410                raise TypeError(
411                    "Actions should be integers corresponding the a transformation index"
412                    f", got array with multiple elements:\n{action}."
413                )
414            action = action.flatten()[0]
415        try:
416            action = int(action)
417        except (TypeError, ValueError) as e:
418            raise TypeError(
419                "Actions should be integers corresponding the a transformation index."
420            ) from e
421
422        self.current_step += 1
423
424        self.task_successes.step_reset()
425        self.terminal_successes.step_reset()
426
427        success = self.state.apply(action)
428        if success:
429            reward = self.purpose.reward(self.state)
430        else:
431            reward = self.invalid_reward
432
433        terminated = self.purpose.is_terminal(self.state)
434
435        self.task_successes.update(self.episodes)
436        self.terminal_successes.update(self.episodes)
437
438        self.current_score += reward
439        self.cumulated_score += reward
440        return (
441            self.state.observation,
442            reward,
443            terminated,
444            self.truncated,
445            self.infos(),
446        )

Perform one step in the environment given the index of a wanted transformation.

If the selected transformation can be performed, the state is updated and a reward is given depending of the environment tasks. Else the state is left unchanged and the invalid_reward is given to the player.

def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, numpy.ndarray]: View Source

448    def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]:
449        """Render the observation of the agent in a format depending on `render_mode`."""
450        if mode is not None:
451            self.render_mode = mode
452
453        if self.render_mode in ("human", "rgb_array"):  # for human interaction
454            return self._render_rgb_array()
455        if self.render_mode == "console":  # for console print
456            raise NotImplementedError
457        raise NotImplementedError

Render the observation of the agent in a format depending on render_mode.

def reset( self, *, seed: Optional[int] = None, options: Optional[dict] = None) -> Tuple[numpy.ndarray]: View Source

459    def reset(
460        self,
461        *,
462        seed: Optional[int] = None,
463        options: Optional[dict] = None,
464    ) -> Tuple[np.ndarray,]:
465        """Resets the state of the environement.
466
467        Returns:
468            (np.ndarray): The first observation.
469        """
470
471        if not self.purpose.built:
472            self.purpose.build(self)
473            self.task_successes = SuccessCounter(self.purpose.tasks)
474            self.terminal_successes = SuccessCounter(self.purpose.terminal_groups)
475
476        self.current_step = 0
477        self.current_score = 0
478        self.episodes += 1
479
480        self.task_successes.new_episode(self.episodes)
481        self.terminal_successes.new_episode(self.episodes)
482
483        self.state.reset()
484        self.purpose.reset()
485        return self.state.observation, self.infos()

Resets the state of the environement.

Returns:

(np.ndarray): The first observation.

def close(self): View Source

487    def close(self):
488        """Closes the environment."""
489        if self.render_window is not None:
490            self.render_window.close()

Closes the environment.

all_behaviors: Dict[str, hebg.behavior.Behavior] View Source

492    @property
493    def all_behaviors(self) -> Dict[str, "Behavior"]:
494        """All solving behaviors using hebg."""
495        if self._all_behaviors is None:
496            self._all_behaviors = build_all_solving_behaviors(self)
497        return self._all_behaviors

All solving behaviors using hebg.

def solving_behavior(self, task: hcraft.task.Task) -> hebg.behavior.Behavior: View Source

499    def solving_behavior(self, task: "Task") -> "Behavior":
500        """Get the solving behavior for a given task.
501
502        Args:
503            task: Task to solve.
504
505        Returns:
506            Behavior: Behavior solving the task.
507
508        Example:
509            ```python
510            solving_behavior = env.solving_behavior(task)
511
512            done = False
513            observation, _info = env.reset()
514            while not done:
515                action = solving_behavior(observation)
516                observation, _reward, terminated, truncated, _info = env.step(action)
517                done = terminated or truncated
518
519            assert terminated  # Env is successfuly terminated
520            assert task.is_terminated # Task is successfuly terminated
521            ```
522        """
523        return self.all_behaviors[task_to_behavior_name(task)]

Get the solving behavior for a given task.

Arguments:

task: Task to solve.

Returns:

Behavior: Behavior solving the task.

Example:

solving_behavior = env.solving_behavior(task)

done = False
observation, _info = env.reset()
while not done:
    action = solving_behavior(observation)
    observation, _reward, terminated, truncated, _info = env.step(action)
    done = terminated or truncated

assert terminated  # Env is successfuly terminated
assert task.is_terminated # Task is successfuly terminated

def planning_problem(self, **kwargs) -> hcraft.planning.HcraftPlanningProblem: View Source

525    def planning_problem(self, **kwargs) -> HcraftPlanningProblem:
526        """Build this hcraft environment planning problem.
527
528        Returns:
529            Problem: Unified planning problem cooresponding to that environment.
530
531        Example:
532            Write as PDDL files:
533            ```python
534            from unified_planning.io import PDDLWriter
535            problem = env.planning_problem()
536            writer = PDDLWriter(problem.upf_problem)
537            writer.write_domain("domain.pddl")
538            writer.write_problem("problem.pddl")
539            ```
540
541            Using a plan to solve a HierarchyCraft gym environment:
542            ```python
543            hcraft_problem = env.planning_problem()
544
545            done = False
546
547            _observation, _info = env.reset()
548            while not done:
549                # Observations are not used when blindly following a plan
550                # But the state in required in order to replan if there is no plan left
551                action = hcraft_problem.action_from_plan(env.state)
552                _observation, _reward, terminated, truncated, _info = env.step(action)
553                done = terminated or truncated
554            assert env.purpose.is_terminated # Purpose is achieved
555            ```
556        """
557        return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)

Build this hcraft environment planning problem.

Returns:

Problem: Unified planning problem cooresponding to that environment.

Example:

Write as PDDL files:

from unified_planning.io import PDDLWriter
problem = env.planning_problem()
writer = PDDLWriter(problem.upf_problem)
writer.write_domain("domain.pddl")
writer.write_problem("problem.pddl")

Using a plan to solve a HierarchyCraft gym environment:

hcraft_problem = env.planning_problem()

done = False

_observation, _info = env.reset()
while not done:
    # Observations are not used when blindly following a plan
    # But the state in required in order to replan if there is no plan left
    action = hcraft_problem.action_from_plan(env.state)
    _observation, _reward, terminated, truncated, _info = env.step(action)
    done = terminated or truncated
assert env.purpose.is_terminated # Purpose is achieved

def infos(self) -> dict: View Source

559    def infos(self) -> dict:
560        infos = {
561            "action_is_legal": self.action_masks(),
562            "score": self.current_score,
563            "score_average": self.cumulated_score / self.episodes,
564        }
565        infos.update(self._tasks_infos())
566        return infos

Inherited Members

gymnasium.core.Env: spec; unwrapped; np_random_seed; np_random; has_wrapper_attr; get_wrapper_attr; set_wrapper_attr

def get_human_action( env: HcraftEnv, additional_events: List[pygame.event.Event] = None, can_be_none: bool = False, fps: Optional[float] = None): View Source

10def get_human_action(
11    env: "HcraftEnv",
12    additional_events: List["Event"] = None,
13    can_be_none: bool = False,
14    fps: Optional[float] = None,
15):
16    """Update the environment rendering and gather potential action given by the UI.
17
18    Args:
19        env: The running HierarchyCraft environment.
20        additional_events (Optional): Additional simulated pygame events.
21        can_be_none: If False, this function will loop on rendering until an action is found.
22            If True, will return None if no action was found after one rendering update.
23
24    Returns:
25        The action found using the UI.
26
27    """
28    action_chosen = False
29    while not action_chosen:
30        action = env.render_window.update_rendering(additional_events, fps)
31        action_chosen = action is not None or can_be_none
32    return action

Update the environment rendering and gather potential action given by the UI.

Arguments:

env: The running HierarchyCraft environment.
additional_events (Optional): Additional simulated pygame events.
can_be_none: If False, this function will loop on rendering until an action is found. If True, will return None if no action was found after one rendering update.

Returns:

The action found using the UI.

def render_env_with_human(env: HcraftEnv, n_episodes: int = 1): View Source

35def render_env_with_human(env: "HcraftEnv", n_episodes: int = 1):
36    """Render the given environment with human iteractions.
37
38    Args:
39        env (HcraftEnv): The HierarchyCraft environment to run.
40        n_episodes (int, optional): Number of episodes to run. Defaults to 1.
41    """
42    print("Purpose: ", env.purpose)
43
44    for _ in range(n_episodes):
45        env.reset()
46        done = False
47        total_reward = 0
48        while not done:
49            env.render()
50            action = get_human_action(env)
51            print(f"Human did: {env.world.transformations[action]}")
52
53            _observation, reward, terminated, truncated, _info = env.step(action)
54            done = terminated or truncated
55            total_reward += reward
56
57        print("SCORE: ", total_reward)

Render the given environment with human iteractions.

Arguments:

env (HcraftEnv): The HierarchyCraft environment to run.
n_episodes (int, optional): Number of episodes to run. Defaults to 1.

class GetItemTask(hcraft.task.AchievementTask): View Source

 83class GetItemTask(AchievementTask):
 84    """Task of getting a given quantity of an item."""
 85
 86    def __init__(self, item_stack: Union[Item, Stack], reward: float = 1.0):
 87        self.item_stack = _stack_item(item_stack)
 88        super().__init__(name=self.get_name(self.item_stack), reward=reward)
 89
 90    def build(self, world: "World") -> None:
 91        super().build(world)
 92        item_slot = world.items.index(self.item_stack.item)
 93        self._terminate_player_items[item_slot] = self.item_stack.quantity
 94
 95    def _is_terminal(self, state: "HcraftState") -> bool:
 96        return np.all(state.player_inventory >= self._terminate_player_items)
 97
 98    @staticmethod
 99    def get_name(stack: Stack):
100        """Name of the task for a given Stack"""
101        quantity_str = _quantity_str(stack.quantity)
102        return f"Get{quantity_str}{stack.item.name}"

Task of getting a given quantity of an item.

GetItemTask( item_stack: Union[Item, Stack], reward: float = 1.0) View Source

86    def __init__(self, item_stack: Union[Item, Stack], reward: float = 1.0):
87        self.item_stack = _stack_item(item_stack)
88        super().__init__(name=self.get_name(self.item_stack), reward=reward)

item_stack

def build(self, world: hcraft.world.World) -> None: View Source

90    def build(self, world: "World") -> None:
91        super().build(world)
92        item_slot = world.items.index(self.item_stack.item)
93        self._terminate_player_items[item_slot] = self.item_stack.quantity

Build the task operation arrays based on the given world.

@staticmethod

def get_name(stack: Stack): View Source

 98    @staticmethod
 99    def get_name(stack: Stack):
100        """Name of the task for a given Stack"""
101        quantity_str = _quantity_str(stack.quantity)
102        return f"Get{quantity_str}{stack.item.name}"

Name of the task for a given Stack

Inherited Members

hcraft.task.AchievementTask: reward
hcraft.task.Task: name; terminated; is_terminal; reset

class GoToZoneTask(hcraft.task.AchievementTask): View Source

105class GoToZoneTask(AchievementTask):
106    """Task to go to a given zone."""
107
108    def __init__(self, zone: Zone, reward: float = 1.0) -> None:
109        super().__init__(name=self.get_name(zone), reward=reward)
110        self.zone = zone
111
112    def build(self, world: "World"):
113        super().build(world)
114        zone_slot = world.zones.index(self.zone)
115        self._terminate_position[zone_slot] = 1
116
117    def _is_terminal(self, state: "HcraftState") -> bool:
118        return np.all(state.position == self._terminate_position)
119
120    @staticmethod
121    def get_name(zone: Zone):
122        """Name of the task for a given Stack"""
123        return f"Go to {zone.name}"

Task to go to a given zone.

GoToZoneTask(zone: Zone, reward: float = 1.0) View Source

108    def __init__(self, zone: Zone, reward: float = 1.0) -> None:
109        super().__init__(name=self.get_name(zone), reward=reward)
110        self.zone = zone

zone

def build(self, world: hcraft.world.World): View Source

112    def build(self, world: "World"):
113        super().build(world)
114        zone_slot = world.zones.index(self.zone)
115        self._terminate_position[zone_slot] = 1

Build the task operation arrays based on the given world.

@staticmethod

def get_name(zone: Zone): View Source

120    @staticmethod
121    def get_name(zone: Zone):
122        """Name of the task for a given Stack"""
123        return f"Go to {zone.name}"

Name of the task for a given Stack

Inherited Members

hcraft.task.AchievementTask: reward
hcraft.task.Task: name; terminated; is_terminal; reset

class PlaceItemTask(hcraft.task.AchievementTask): View Source

126class PlaceItemTask(AchievementTask):
127    """Task to place a quantity of item in a given zone.
128
129    If no zone is given, consider placing the item anywhere.
130
131    """
132
133    def __init__(
134        self,
135        item_stack: Union[Item, Stack],
136        zone: Optional[Union[Zone, List[Zone]]] = None,
137        reward: float = 1.0,
138    ):
139        item_stack = _stack_item(item_stack)
140        self.item_stack = item_stack
141        self.zone = zone
142        super().__init__(name=self.get_name(item_stack, zone), reward=reward)
143
144    def build(self, world: "World"):
145        super().build(world)
146        if self.zone is None:
147            zones_slots = np.arange(self._terminate_zones_items.shape[0])
148        else:
149            zones_slots = np.array([world.slot_from_zone(self.zone)])
150        zone_item_slot = world.zones_items.index(self.item_stack.item)
151        self._terminate_zones_items[zones_slots, zone_item_slot] = (
152            self.item_stack.quantity
153        )
154
155    def _is_terminal(self, state: "HcraftState") -> bool:
156        if self.zone is None:
157            return np.any(
158                np.all(state.zones_inventories >= self._terminate_zones_items, axis=1)
159            )
160        return np.all(state.zones_inventories >= self._terminate_zones_items)
161
162    @staticmethod
163    def get_name(stack: Stack, zone: Optional[Zone]):
164        """Name of the task for a given Stack and list of Zone"""
165        quantity_str = _quantity_str(stack.quantity)
166        zones_str = _zones_str(zone)
167        return f"Place{quantity_str}{stack.item.name}{zones_str}"

Task to place a quantity of item in a given zone.

If no zone is given, consider placing the item anywhere.

PlaceItemTask( item_stack: Union[Item, Stack], zone: Union[Zone, List[Zone], NoneType] = None, reward: float = 1.0) View Source

133    def __init__(
134        self,
135        item_stack: Union[Item, Stack],
136        zone: Optional[Union[Zone, List[Zone]]] = None,
137        reward: float = 1.0,
138    ):
139        item_stack = _stack_item(item_stack)
140        self.item_stack = item_stack
141        self.zone = zone
142        super().__init__(name=self.get_name(item_stack, zone), reward=reward)

item_stack

zone

def build(self, world: hcraft.world.World): View Source

144    def build(self, world: "World"):
145        super().build(world)
146        if self.zone is None:
147            zones_slots = np.arange(self._terminate_zones_items.shape[0])
148        else:
149            zones_slots = np.array([world.slot_from_zone(self.zone)])
150        zone_item_slot = world.zones_items.index(self.item_stack.item)
151        self._terminate_zones_items[zones_slots, zone_item_slot] = (
152            self.item_stack.quantity
153        )

Build the task operation arrays based on the given world.

@staticmethod

def get_name(stack: Stack, zone: Optional[Zone]): View Source

162    @staticmethod
163    def get_name(stack: Stack, zone: Optional[Zone]):
164        """Name of the task for a given Stack and list of Zone"""
165        quantity_str = _quantity_str(stack.quantity)
166        zones_str = _zones_str(zone)
167        return f"Place{quantity_str}{stack.item.name}{zones_str}"

Name of the task for a given Stack and list of Zone

Inherited Members

hcraft.task.AchievementTask: reward
hcraft.task.Task: name; terminated; is_terminal; reset