Purpose in HierarchyCraft

Every hcraft environments are sandbox environments and do not have a precise purpose by default. But of course, purpose can be added in any HierarchyCraft environment by setting up one or multiple tasks.

Tasks can be one of:

Single task purpose

When a single task is passed to a HierarchyCraft environment, it will automaticaly build a purpose. Then the environment will terminates if the task is completed.

Let's take an example on the MineHcraft environment. (This would work on other HierarchyCraft environment)

from hcraft.examples MineHcraftv
from hcraft.purpose import GetItemTask
from hcraft.examples.minecraft.items import DIAMOND

get_diamond = GetItemTask(DIAMOND, reward=10)
env = MineHcraftEnv(purpose=get_diamond)

Reward shaping

Achievement tasks only rewards the player when completed. But this long term feedback is known to be challenging. To ease learning such tasks, HierarchyCraft Purpose can generate substasks to give intermediate feedback, this process is also known as reward shaping. See RewardShaping for more details.

For example, let's add the "required" reward shaping to the get_diamond task:

from hcraft.examples import MineHcraftEnv
from hcraft.purpose import Purpose, GetItemTask
from hcraft.examples.minecraft.items import DIAMOND

get_diamond = GetItemTask(DIAMOND, reward=10)
purpose = Purpose(shaping_value=2)
purpose.add_task(get_diamond, reward_shaping="required")

env = MineHcraftEnv(purpose=purpose)

Then getting the IRON_INGOT item for the first time will give a reward of 2.0 to the player, because IRON_INGOT is used to craft the IRON_PICKAXE that is itself used to get a DIAMOND.

Multi-tasks and terminal groups

In a sandbox environment, why limit ourselves to only one task ? In HierarchyCraft, a purpose can be composed on multiple tasks. But when does the purpose terminates ? When any task is done ? When all tasks are done ?

To solve this, we need to introduce terminal groups. Terminal groups are represented with strings.

The purpose will terminate if ANY of the terminal groups have ALL its tasks done.

When adding a task to a purpose, one can choose one or multiple terminal groups like so:

from hcraft.examples import MineHcraftEnv
from hcraft.purpose import Purpose, GetItemTask, GoToZone
from hcraft.examples.minecraft.items import DIAMOND, GOLD_INGOT, EGG
from hcraft.examples.minecraft.zones import END

get_diamond = GetItemTask(DIAMOND, reward=10)
get_gold = GetItemTask(GOLD_INGOT, reward=5)
get_egg = GetItemTask(EGG, reward=100)
go_to_end = GoToZone(END, reward=20)

purpose = Purpose()
purpose.add_task(get_diamond, reward_shaping="required", terminal_groups="get rich!")
purpose.add_task(get_gold, terminal_groups=["golden end", "get rich!"])
purpose.add_task(go_to_end, reward_shaping="inputs", terminal_groups="golden end")
purpose.add_task(get_egg, terminal_groups=None)

env = MineHcraftEnv(purpose=purpose)

Here the environment will terminate if the player gets both diamond and gold_ingot items ("get rich!" group) or if the player gets a gold_ingot and reaches the end zone ("golden end" group). The task get_egg is optional and cannot terminate the purpose anyhow, but it will still reward the player if completed.

Just like this last task, reward shaping subtasks are always optional.

  1"""# Purpose in HierarchyCraft
  2
  3**Every** hcraft environments are sandbox environments
  4and do not have a precise purpose by default.
  5But of course, purpose can be added in **any** HierarchyCraft environment
  6by setting up one or multiple tasks.
  7
  8Tasks can be one of:
  9* Get the given item: `hcraft.task.GetItemTask`
 10* Go to the given zone: `hcraft.task.GoToZoneTask`
 11* Place the given item in the given zone (or any zone if none given): `hcraft.task.PlaceItemTask`
 12
 13
 14## Single task purpose
 15
 16When a single task is passed to a HierarchyCraft environment, it will automaticaly build a purpose.
 17Then the environment will terminates if the task is completed.
 18
 19Let's take an example on the MineHcraft environment.
 20(This would work on other HierarchyCraft environment)
 21```python
 22from hcraft.examples MineHcraftv
 23from hcraft.purpose import GetItemTask
 24from hcraft.examples.minecraft.items import DIAMOND
 25
 26get_diamond = GetItemTask(DIAMOND, reward=10)
 27env = MineHcraftEnv(purpose=get_diamond)
 28```
 29
 30## Reward shaping
 31
 32Achievement tasks only rewards the player when completed. But this long term feedback is known
 33to be challenging. To ease learning such tasks, HierarchyCraft Purpose can generate substasks to give
 34intermediate feedback, this process is also known as reward shaping.
 35See `hcraft.purpose.RewardShaping` for more details.
 36
 37For example, let's add the "required" reward shaping to the get_diamond task:
 38
 39```python
 40from hcraft.examples import MineHcraftEnv
 41from hcraft.purpose import Purpose, GetItemTask
 42from hcraft.examples.minecraft.items import DIAMOND
 43
 44get_diamond = GetItemTask(DIAMOND, reward=10)
 45purpose = Purpose(shaping_value=2)
 46purpose.add_task(get_diamond, reward_shaping="required")
 47
 48env = MineHcraftEnv(purpose=purpose)
 49```
 50
 51Then getting the IRON_INGOT item for the first time will give a reward of 2.0 to the player, because
 52IRON_INGOT is used to craft the IRON_PICKAXE that is itself used to get a DIAMOND.
 53
 54## Multi-tasks and terminal groups
 55
 56In a sandbox environment, why limit ourselves to only one task ?
 57In HierarchyCraft, a purpose can be composed on multiple tasks.
 58But when does the purpose terminates ? When any task is done ? When all tasks are done ?
 59
 60To solve this, we need to introduce terminal groups.
 61Terminal groups are represented with strings.
 62
 63The purpose will terminate if ANY of the terminal groups have ALL its tasks done.
 64
 65When adding a task to a purpose, one can choose one or multiple terminal groups like so:
 66
 67```python
 68from hcraft.examples import MineHcraftEnv
 69from hcraft.purpose import Purpose, GetItemTask, GoToZone
 70from hcraft.examples.minecraft.items import DIAMOND, GOLD_INGOT, EGG
 71from hcraft.examples.minecraft.zones import END
 72
 73get_diamond = GetItemTask(DIAMOND, reward=10)
 74get_gold = GetItemTask(GOLD_INGOT, reward=5)
 75get_egg = GetItemTask(EGG, reward=100)
 76go_to_end = GoToZone(END, reward=20)
 77
 78purpose = Purpose()
 79purpose.add_task(get_diamond, reward_shaping="required", terminal_groups="get rich!")
 80purpose.add_task(get_gold, terminal_groups=["golden end", "get rich!"])
 81purpose.add_task(go_to_end, reward_shaping="inputs", terminal_groups="golden end")
 82purpose.add_task(get_egg, terminal_groups=None)
 83
 84env = MineHcraftEnv(purpose=purpose)
 85```
 86
 87Here the environment will terminate if the player gets both diamond
 88and gold_ingot items ("get rich!" group) or if the player gets a gold_ingot
 89and reaches the end zone ("golden end" group).
 90The task get_egg is optional and cannot terminate the purpose anyhow,
 91but it will still reward the player if completed.
 92
 93Just like this last task, reward shaping subtasks are always optional.
 94
 95"""
 96
 97from dataclasses import dataclass, field
 98from enum import Enum
 99from typing import TYPE_CHECKING, Dict, List, Optional, Set, Union
100
101import networkx as nx
102import numpy as np
103
104from hcraft.requirements import RequirementNode, req_node_name
105from hcraft.task import GetItemTask, GoToZoneTask, PlaceItemTask, Task
106from hcraft.elements import Item, Zone
107
108
109if TYPE_CHECKING:
110    from hcraft.env import HcraftEnv, HcraftState
111    from hcraft.world import World
112
113
114class RewardShaping(Enum):
115    """Enumeration of all reward shapings possible."""
116
117    NONE = "none"
118    """No reward shaping"""
119    ALL_ACHIVEMENTS = "all"
120    """All items and zones will be associated with an achievement subtask."""
121    REQUIREMENTS_ACHIVEMENTS = "required"
122    """All (recursively) required items and zones for the given task
123    will be associated with an achievement subtask."""
124    INPUTS_ACHIVEMENT = "inputs"
125    """Items and zones consumed by any transformation solving the task
126    will be associated with an achievement subtask."""
127
128
129@dataclass
130class TerminalGroup:
131    """Terminal groups are groups of tasks that can terminate the purpose.
132
133    The purpose will termitate if ANY of the terminal groups have ALL its tasks done.
134    """
135
136    name: str
137    tasks: List[Task] = field(default_factory=list)
138
139    @property
140    def terminated(self) -> bool:
141        """True if all tasks of the terminal group are terminated."""
142        return all(task.terminated for task in self.tasks)
143
144    def __eq__(self, other) -> bool:
145        if isinstance(other, str):
146            return self.name == other
147        if isinstance(other, TerminalGroup):
148            return self.name == other.name
149        return False
150
151    def __hash__(self) -> int:
152        return self.name.__hash__()
153
154
155class Purpose:
156    """A purpose for a HierarchyCraft player based on a list of tasks."""
157
158    def __init__(
159        self,
160        tasks: Optional[Union[Task, List[Task]]] = None,
161        timestep_reward: float = 0.0,
162        default_reward_shaping: RewardShaping = RewardShaping.NONE,
163        shaping_value: float = 1.0,
164    ) -> None:
165        """
166        Args:
167            tasks: Tasks to add to the Purpose.
168                Defaults to None.
169            timestep_reward: Reward for each timestep.
170                Defaults to 0.0.
171            default_reward_shaping: Default reward shaping for tasks.
172                Defaults to RewardShaping.NONE.
173            shaping_value: Reward value used in reward shaping if any.
174                Defaults to 1.0.
175        """
176        self.tasks: List[Task] = []
177        self.timestep_reward = timestep_reward
178        self.shaping_value = shaping_value
179        self.default_reward_shaping = default_reward_shaping
180        self.built = False
181
182        self.reward_shaping: Dict[Task, RewardShaping] = {}
183        self.terminal_groups: List[TerminalGroup] = []
184
185        if isinstance(tasks, Task):
186            tasks = [tasks]
187        elif tasks is None:
188            tasks = []
189        for task in tasks:
190            self.add_task(task, reward_shaping=default_reward_shaping)
191
192        self._best_terminal_group = None
193
194    def add_task(
195        self,
196        task: Task,
197        reward_shaping: Optional[RewardShaping] = None,
198        terminal_groups: Optional[Union[str, List[str]]] = "default",
199    ):
200        """Add a new task to the purpose.
201
202        Args:
203            task: Task to be added to the purpose.
204            reward_shaping: Reward shaping for this task.
205                Defaults to purpose's default reward shaping.
206            terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates.
207                If terminal groups is "" or None, task will be optional and will
208                not allow to terminate the purpose at all.
209                By default, tasks are added in the "default" group and hence
210                ALL tasks have to be done to terminate the purpose.
211        """
212        if reward_shaping is None:
213            reward_shaping = self.default_reward_shaping
214        reward_shaping = RewardShaping(reward_shaping)
215        if terminal_groups:
216            if isinstance(terminal_groups, str):
217                terminal_groups = [terminal_groups]
218            for terminal_group in terminal_groups:
219                existing_group = self._terminal_group_from_name(terminal_group)
220                if not existing_group:
221                    existing_group = TerminalGroup(terminal_group)
222                    self.terminal_groups.append(existing_group)
223                existing_group.tasks.append(task)
224
225        self.reward_shaping[task] = reward_shaping
226        self.tasks.append(task)
227
228    def build(self, env: "HcraftEnv"):
229        """
230        Builds the purpose of the player relative to the given environment.
231
232        Args:
233            env: The HierarchyCraft environment to build upon.
234        """
235        if self.built:
236            return
237
238        if not self.tasks:
239            return
240        # Add reward shaping subtasks
241        for task in self.tasks:
242            subtasks = self._add_reward_shaping_subtasks(
243                task, env, self.reward_shaping[task]
244            )
245            for subtask in subtasks:
246                self.add_task(subtask, RewardShaping.NONE, terminal_groups=None)
247
248        # Build all tasks
249        for task in self.tasks:
250            task.build(env.world)
251
252        self.built = True
253
254    def reward(self, state: "HcraftState") -> float:
255        """
256        Returns the purpose reward for the given state based on tasks.
257        """
258        reward = self.timestep_reward
259        if not self.tasks:
260            return reward
261        for task in self.tasks:
262            reward += task.reward(state)
263        return reward
264
265    def is_terminal(self, state: "HcraftState") -> bool:
266        """
267        Returns True if the given state is terminal for the whole purpose.
268        """
269        if not self.tasks:
270            return False
271        for task in self.tasks:
272            task.is_terminal(state)
273        for terminal_group in self.terminal_groups:
274            if terminal_group.terminated:
275                return True
276        return False
277
278    def reset(self) -> None:
279        """Reset the purpose."""
280        for task in self.tasks:
281            task.reset()
282
283    @property
284    def optional_tasks(self) -> List[Task]:
285        """List of tasks in no terminal group hence being optinal."""
286        terminal_tasks = []
287        for group in self.terminal_groups:
288            terminal_tasks += group.tasks
289        return [task for task in self.tasks if task not in terminal_tasks]
290
291    @property
292    def terminated(self) -> bool:
293        """True if any of the terminal groups are terminated."""
294        return any(
295            all(task.terminated for task in terminal_group.tasks)
296            for terminal_group in self.terminal_groups
297        )
298
299    @property
300    def best_terminal_group(self) -> TerminalGroup:
301        """Best rewarding terminal group."""
302        if self._best_terminal_group is not None:
303            return self._best_terminal_group
304
305        best_terminal_group, best_terminal_value = None, -np.inf
306        for terminal_group in self.terminal_groups:
307            terminal_value = sum(task._reward for task in terminal_group.tasks)
308            if terminal_value > best_terminal_value:
309                best_terminal_value = terminal_value
310                best_terminal_group = terminal_group
311
312        self._best_terminal_group = best_terminal_group
313        return best_terminal_group
314
315    def _terminal_group_from_name(self, name: str) -> Optional[TerminalGroup]:
316        if name not in self.terminal_groups:
317            return None
318        group_id = self.terminal_groups.index(name)
319        return self.terminal_groups[group_id]
320
321    def _add_reward_shaping_subtasks(
322        self, task: Task, env: "HcraftEnv", reward_shaping: RewardShaping
323    ) -> List[Task]:
324        if reward_shaping == RewardShaping.NONE:
325            return []
326        if reward_shaping == RewardShaping.ALL_ACHIVEMENTS:
327            return _all_subtasks(env.world, self.shaping_value)
328        if reward_shaping == RewardShaping.INPUTS_ACHIVEMENT:
329            return _inputs_subtasks(task, env.world, self.shaping_value)
330        if reward_shaping == RewardShaping.REQUIREMENTS_ACHIVEMENTS:
331            return _required_subtasks(task, env, self.shaping_value)
332        raise NotImplementedError
333
334    def __str__(self) -> str:
335        terminal_groups_str = []
336        for terminal_group in self.terminal_groups:
337            tasks_str_joined = self._tasks_str(terminal_group.tasks)
338            group_str = f"{terminal_group.name}:[{tasks_str_joined}]"
339            terminal_groups_str.append(group_str)
340        optional_tasks_str = self._tasks_str(self.optional_tasks)
341        if optional_tasks_str:
342            group_str = f"optional:[{optional_tasks_str}]"
343            terminal_groups_str.append(group_str)
344        joined_groups_str = ", ".join(terminal_groups_str)
345        return f"Purpose({joined_groups_str})"
346
347    def _tasks_str(self, tasks: List[Task]) -> str:
348        tasks_str = []
349        for task in tasks:
350            shaping = self.reward_shaping[task]
351            shaping_str = f"#{shaping.value}" if shaping != RewardShaping.NONE else ""
352            tasks_str.append(f"{task}{shaping_str}")
353        return ",".join(tasks_str)
354
355
356def platinium_purpose(
357    items: List[Item],
358    zones: List[Zone],
359    zones_items: List[Item],
360    success_reward: float = 10.0,
361    timestep_reward: float = -0.1,
362):
363    purpose = Purpose(timestep_reward=timestep_reward)
364    for item in items:
365        purpose.add_task(GetItemTask(item, reward=success_reward))
366    for zone in zones:
367        purpose.add_task(GoToZoneTask(zone, reward=success_reward))
368    for item in zones_items:
369        purpose.add_task(PlaceItemTask(item, reward=success_reward))
370    return purpose
371
372
373def _all_subtasks(world: "World", shaping_reward: float) -> List[Task]:
374    return _build_reward_shaping_subtasks(
375        world.items, world.zones, world.zones_items, shaping_reward
376    )
377
378
379def _required_subtasks(
380    task: Task, env: "HcraftEnv", shaping_reward: float
381) -> List[Task]:
382    relevant_items = set()
383    relevant_zones = set()
384    relevant_zone_items = set()
385
386    if isinstance(task, GetItemTask):
387        goal_item = task.item_stack.item
388        goal_requirement_nodes = [req_node_name(goal_item, RequirementNode.ITEM)]
389    elif isinstance(task, PlaceItemTask):
390        goal_item = task.item_stack.item
391        goal_requirement_nodes = [req_node_name(goal_item, RequirementNode.ZONE_ITEM)]
392        goal_zones = task.zone
393        if goal_zones is not None:
394            relevant_zones.add(goal_zones)
395            goal_requirement_nodes.append(
396                req_node_name(goal_zones, RequirementNode.ZONE)
397            )
398    elif isinstance(task, GoToZoneTask):
399        goal_requirement_nodes = [req_node_name(task.zone, RequirementNode.ZONE)]
400    else:
401        raise NotImplementedError(
402            f"Unsupported reward shaping {RewardShaping.REQUIREMENTS_ACHIVEMENTS}"
403            f"for given task type: {type(task)} of {task}"
404        )
405
406    requirements_acydigraph = env.world.requirements.acydigraph
407    for requirement_node in goal_requirement_nodes:
408        for ancestor in nx.ancestors(requirements_acydigraph, requirement_node):
409            if ancestor == "START#":
410                continue
411            ancestor_node = requirements_acydigraph.nodes[ancestor]
412            item_or_zone: Union["Item", "Zone"] = ancestor_node["obj"]
413            ancestor_type = RequirementNode(ancestor_node["type"])
414            if ancestor_type is RequirementNode.ITEM:
415                relevant_items.add(item_or_zone)
416            if ancestor_type is RequirementNode.ZONE:
417                relevant_zones.add(item_or_zone)
418            if ancestor_type is RequirementNode.ZONE_ITEM:
419                relevant_zone_items.add(item_or_zone)
420    return _build_reward_shaping_subtasks(
421        relevant_items,
422        relevant_zones,
423        relevant_zone_items,
424        shaping_reward,
425    )
426
427
428def _inputs_subtasks(task: Task, world: "World", shaping_reward: float) -> List[Task]:
429    relevant_items = set()
430    relevant_zones = set()
431    relevant_zone_items = set()
432
433    goal_zone = None
434    goal_item = None
435    goal_zone_item = None
436    if isinstance(task, GetItemTask):
437        goal_item = task.item_stack.item
438    elif isinstance(task, GoToZoneTask):
439        goal_zone = task.zone
440    elif isinstance(task, PlaceItemTask):
441        goal_zone_item = task.item_stack.item
442        if task.zone:
443            goal_zone = task.zone
444            relevant_zones.add(task.zone)
445    else:
446        raise NotImplementedError(
447            f"Unsupported reward shaping {RewardShaping.INPUTS_ACHIVEMENT}"
448            f"for given task type: {type(task)} of {task}"
449        )
450    transfo_giving_item = [
451        transfo
452        for transfo in world.transformations
453        if goal_item in transfo.production("player")
454        and goal_item not in transfo.min_required("player")
455    ]
456    transfo_placing_zone_item = [
457        transfo
458        for transfo in world.transformations
459        if goal_zone_item in transfo.produced_zones_items
460        and goal_zone_item not in transfo.min_required_zones_items
461    ]
462    transfo_going_to_goal_zone = [
463        transfo
464        for transfo in world.transformations
465        if transfo.destination is not None and transfo.destination == goal_zone
466    ]
467    relevant_transformations = (
468        transfo_giving_item + transfo_placing_zone_item + transfo_going_to_goal_zone
469    )
470
471    for transfo in relevant_transformations:
472        relevant_items |= transfo.consumption("player")
473        relevant_zone_items |= transfo.consumption("current_zone")
474        relevant_zone_items |= transfo.consumption("destination")
475        relevant_zone_items |= transfo.consumption("zones")
476        if transfo.zone:
477            relevant_zones.add(transfo.zone)
478
479    return _build_reward_shaping_subtasks(
480        relevant_items,
481        relevant_zones,
482        relevant_zone_items,
483        shaping_reward,
484    )
485
486
487def _build_reward_shaping_subtasks(
488    items: Optional[Union[List[Item], Set[Item]]] = None,
489    zones: Optional[Union[List[Zone], Set[Zone]]] = None,
490    zone_items: Optional[Union[List[Item], Set[Item]]] = None,
491    shaping_reward: float = 1.0,
492) -> List[Task]:
493    subtasks = []
494    if items:
495        subtasks += [GetItemTask(item, reward=shaping_reward) for item in items]
496    if zones:
497        subtasks += [GoToZoneTask(zone, reward=shaping_reward) for zone in zones]
498    if zone_items:
499        subtasks += [PlaceItemTask(item, reward=shaping_reward) for item in zone_items]
500    return subtasks

API Documentation

class RewardShaping(enum.Enum):
115class RewardShaping(Enum):
116    """Enumeration of all reward shapings possible."""
117
118    NONE = "none"
119    """No reward shaping"""
120    ALL_ACHIVEMENTS = "all"
121    """All items and zones will be associated with an achievement subtask."""
122    REQUIREMENTS_ACHIVEMENTS = "required"
123    """All (recursively) required items and zones for the given task
124    will be associated with an achievement subtask."""
125    INPUTS_ACHIVEMENT = "inputs"
126    """Items and zones consumed by any transformation solving the task
127    will be associated with an achievement subtask."""

Enumeration of all reward shapings possible.

NONE = <RewardShaping.NONE: 'none'>

No reward shaping

ALL_ACHIVEMENTS = <RewardShaping.ALL_ACHIVEMENTS: 'all'>

All items and zones will be associated with an achievement subtask.

REQUIREMENTS_ACHIVEMENTS = <RewardShaping.REQUIREMENTS_ACHIVEMENTS: 'required'>

All (recursively) required items and zones for the given task will be associated with an achievement subtask.

INPUTS_ACHIVEMENT = <RewardShaping.INPUTS_ACHIVEMENT: 'inputs'>

Items and zones consumed by any transformation solving the task will be associated with an achievement subtask.

Inherited Members
enum.Enum
name
value
@dataclass
class TerminalGroup:
130@dataclass
131class TerminalGroup:
132    """Terminal groups are groups of tasks that can terminate the purpose.
133
134    The purpose will termitate if ANY of the terminal groups have ALL its tasks done.
135    """
136
137    name: str
138    tasks: List[Task] = field(default_factory=list)
139
140    @property
141    def terminated(self) -> bool:
142        """True if all tasks of the terminal group are terminated."""
143        return all(task.terminated for task in self.tasks)
144
145    def __eq__(self, other) -> bool:
146        if isinstance(other, str):
147            return self.name == other
148        if isinstance(other, TerminalGroup):
149            return self.name == other.name
150        return False
151
152    def __hash__(self) -> int:
153        return self.name.__hash__()

Terminal groups are groups of tasks that can terminate the purpose.

The purpose will termitate if ANY of the terminal groups have ALL its tasks done.

TerminalGroup(name: str, tasks: List[hcraft.task.Task] = <factory>)
name: str
tasks: List[hcraft.task.Task]
terminated: bool
140    @property
141    def terminated(self) -> bool:
142        """True if all tasks of the terminal group are terminated."""
143        return all(task.terminated for task in self.tasks)

True if all tasks of the terminal group are terminated.

class Purpose:
156class Purpose:
157    """A purpose for a HierarchyCraft player based on a list of tasks."""
158
159    def __init__(
160        self,
161        tasks: Optional[Union[Task, List[Task]]] = None,
162        timestep_reward: float = 0.0,
163        default_reward_shaping: RewardShaping = RewardShaping.NONE,
164        shaping_value: float = 1.0,
165    ) -> None:
166        """
167        Args:
168            tasks: Tasks to add to the Purpose.
169                Defaults to None.
170            timestep_reward: Reward for each timestep.
171                Defaults to 0.0.
172            default_reward_shaping: Default reward shaping for tasks.
173                Defaults to RewardShaping.NONE.
174            shaping_value: Reward value used in reward shaping if any.
175                Defaults to 1.0.
176        """
177        self.tasks: List[Task] = []
178        self.timestep_reward = timestep_reward
179        self.shaping_value = shaping_value
180        self.default_reward_shaping = default_reward_shaping
181        self.built = False
182
183        self.reward_shaping: Dict[Task, RewardShaping] = {}
184        self.terminal_groups: List[TerminalGroup] = []
185
186        if isinstance(tasks, Task):
187            tasks = [tasks]
188        elif tasks is None:
189            tasks = []
190        for task in tasks:
191            self.add_task(task, reward_shaping=default_reward_shaping)
192
193        self._best_terminal_group = None
194
195    def add_task(
196        self,
197        task: Task,
198        reward_shaping: Optional[RewardShaping] = None,
199        terminal_groups: Optional[Union[str, List[str]]] = "default",
200    ):
201        """Add a new task to the purpose.
202
203        Args:
204            task: Task to be added to the purpose.
205            reward_shaping: Reward shaping for this task.
206                Defaults to purpose's default reward shaping.
207            terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates.
208                If terminal groups is "" or None, task will be optional and will
209                not allow to terminate the purpose at all.
210                By default, tasks are added in the "default" group and hence
211                ALL tasks have to be done to terminate the purpose.
212        """
213        if reward_shaping is None:
214            reward_shaping = self.default_reward_shaping
215        reward_shaping = RewardShaping(reward_shaping)
216        if terminal_groups:
217            if isinstance(terminal_groups, str):
218                terminal_groups = [terminal_groups]
219            for terminal_group in terminal_groups:
220                existing_group = self._terminal_group_from_name(terminal_group)
221                if not existing_group:
222                    existing_group = TerminalGroup(terminal_group)
223                    self.terminal_groups.append(existing_group)
224                existing_group.tasks.append(task)
225
226        self.reward_shaping[task] = reward_shaping
227        self.tasks.append(task)
228
229    def build(self, env: "HcraftEnv"):
230        """
231        Builds the purpose of the player relative to the given environment.
232
233        Args:
234            env: The HierarchyCraft environment to build upon.
235        """
236        if self.built:
237            return
238
239        if not self.tasks:
240            return
241        # Add reward shaping subtasks
242        for task in self.tasks:
243            subtasks = self._add_reward_shaping_subtasks(
244                task, env, self.reward_shaping[task]
245            )
246            for subtask in subtasks:
247                self.add_task(subtask, RewardShaping.NONE, terminal_groups=None)
248
249        # Build all tasks
250        for task in self.tasks:
251            task.build(env.world)
252
253        self.built = True
254
255    def reward(self, state: "HcraftState") -> float:
256        """
257        Returns the purpose reward for the given state based on tasks.
258        """
259        reward = self.timestep_reward
260        if not self.tasks:
261            return reward
262        for task in self.tasks:
263            reward += task.reward(state)
264        return reward
265
266    def is_terminal(self, state: "HcraftState") -> bool:
267        """
268        Returns True if the given state is terminal for the whole purpose.
269        """
270        if not self.tasks:
271            return False
272        for task in self.tasks:
273            task.is_terminal(state)
274        for terminal_group in self.terminal_groups:
275            if terminal_group.terminated:
276                return True
277        return False
278
279    def reset(self) -> None:
280        """Reset the purpose."""
281        for task in self.tasks:
282            task.reset()
283
284    @property
285    def optional_tasks(self) -> List[Task]:
286        """List of tasks in no terminal group hence being optinal."""
287        terminal_tasks = []
288        for group in self.terminal_groups:
289            terminal_tasks += group.tasks
290        return [task for task in self.tasks if task not in terminal_tasks]
291
292    @property
293    def terminated(self) -> bool:
294        """True if any of the terminal groups are terminated."""
295        return any(
296            all(task.terminated for task in terminal_group.tasks)
297            for terminal_group in self.terminal_groups
298        )
299
300    @property
301    def best_terminal_group(self) -> TerminalGroup:
302        """Best rewarding terminal group."""
303        if self._best_terminal_group is not None:
304            return self._best_terminal_group
305
306        best_terminal_group, best_terminal_value = None, -np.inf
307        for terminal_group in self.terminal_groups:
308            terminal_value = sum(task._reward for task in terminal_group.tasks)
309            if terminal_value > best_terminal_value:
310                best_terminal_value = terminal_value
311                best_terminal_group = terminal_group
312
313        self._best_terminal_group = best_terminal_group
314        return best_terminal_group
315
316    def _terminal_group_from_name(self, name: str) -> Optional[TerminalGroup]:
317        if name not in self.terminal_groups:
318            return None
319        group_id = self.terminal_groups.index(name)
320        return self.terminal_groups[group_id]
321
322    def _add_reward_shaping_subtasks(
323        self, task: Task, env: "HcraftEnv", reward_shaping: RewardShaping
324    ) -> List[Task]:
325        if reward_shaping == RewardShaping.NONE:
326            return []
327        if reward_shaping == RewardShaping.ALL_ACHIVEMENTS:
328            return _all_subtasks(env.world, self.shaping_value)
329        if reward_shaping == RewardShaping.INPUTS_ACHIVEMENT:
330            return _inputs_subtasks(task, env.world, self.shaping_value)
331        if reward_shaping == RewardShaping.REQUIREMENTS_ACHIVEMENTS:
332            return _required_subtasks(task, env, self.shaping_value)
333        raise NotImplementedError
334
335    def __str__(self) -> str:
336        terminal_groups_str = []
337        for terminal_group in self.terminal_groups:
338            tasks_str_joined = self._tasks_str(terminal_group.tasks)
339            group_str = f"{terminal_group.name}:[{tasks_str_joined}]"
340            terminal_groups_str.append(group_str)
341        optional_tasks_str = self._tasks_str(self.optional_tasks)
342        if optional_tasks_str:
343            group_str = f"optional:[{optional_tasks_str}]"
344            terminal_groups_str.append(group_str)
345        joined_groups_str = ", ".join(terminal_groups_str)
346        return f"Purpose({joined_groups_str})"
347
348    def _tasks_str(self, tasks: List[Task]) -> str:
349        tasks_str = []
350        for task in tasks:
351            shaping = self.reward_shaping[task]
352            shaping_str = f"#{shaping.value}" if shaping != RewardShaping.NONE else ""
353            tasks_str.append(f"{task}{shaping_str}")
354        return ",".join(tasks_str)

A purpose for a HierarchyCraft player based on a list of tasks.

Purpose( tasks: Union[hcraft.task.Task, List[hcraft.task.Task], NoneType] = None, timestep_reward: float = 0.0, default_reward_shaping: RewardShaping = <RewardShaping.NONE: 'none'>, shaping_value: float = 1.0)
159    def __init__(
160        self,
161        tasks: Optional[Union[Task, List[Task]]] = None,
162        timestep_reward: float = 0.0,
163        default_reward_shaping: RewardShaping = RewardShaping.NONE,
164        shaping_value: float = 1.0,
165    ) -> None:
166        """
167        Args:
168            tasks: Tasks to add to the Purpose.
169                Defaults to None.
170            timestep_reward: Reward for each timestep.
171                Defaults to 0.0.
172            default_reward_shaping: Default reward shaping for tasks.
173                Defaults to RewardShaping.NONE.
174            shaping_value: Reward value used in reward shaping if any.
175                Defaults to 1.0.
176        """
177        self.tasks: List[Task] = []
178        self.timestep_reward = timestep_reward
179        self.shaping_value = shaping_value
180        self.default_reward_shaping = default_reward_shaping
181        self.built = False
182
183        self.reward_shaping: Dict[Task, RewardShaping] = {}
184        self.terminal_groups: List[TerminalGroup] = []
185
186        if isinstance(tasks, Task):
187            tasks = [tasks]
188        elif tasks is None:
189            tasks = []
190        for task in tasks:
191            self.add_task(task, reward_shaping=default_reward_shaping)
192
193        self._best_terminal_group = None
Arguments:
  • tasks: Tasks to add to the Purpose. Defaults to None.
  • timestep_reward: Reward for each timestep. Defaults to 0.0.
  • default_reward_shaping: Default reward shaping for tasks. Defaults to RewardShaping.NONE.
  • shaping_value: Reward value used in reward shaping if any. Defaults to 1.0.
tasks: List[hcraft.task.Task]
timestep_reward
shaping_value
default_reward_shaping
built
reward_shaping: Dict[hcraft.task.Task, RewardShaping]
terminal_groups: List[TerminalGroup]
def add_task( self, task: hcraft.task.Task, reward_shaping: Optional[RewardShaping] = None, terminal_groups: Union[str, List[str], NoneType] = 'default'):
195    def add_task(
196        self,
197        task: Task,
198        reward_shaping: Optional[RewardShaping] = None,
199        terminal_groups: Optional[Union[str, List[str]]] = "default",
200    ):
201        """Add a new task to the purpose.
202
203        Args:
204            task: Task to be added to the purpose.
205            reward_shaping: Reward shaping for this task.
206                Defaults to purpose's default reward shaping.
207            terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates.
208                If terminal groups is "" or None, task will be optional and will
209                not allow to terminate the purpose at all.
210                By default, tasks are added in the "default" group and hence
211                ALL tasks have to be done to terminate the purpose.
212        """
213        if reward_shaping is None:
214            reward_shaping = self.default_reward_shaping
215        reward_shaping = RewardShaping(reward_shaping)
216        if terminal_groups:
217            if isinstance(terminal_groups, str):
218                terminal_groups = [terminal_groups]
219            for terminal_group in terminal_groups:
220                existing_group = self._terminal_group_from_name(terminal_group)
221                if not existing_group:
222                    existing_group = TerminalGroup(terminal_group)
223                    self.terminal_groups.append(existing_group)
224                existing_group.tasks.append(task)
225
226        self.reward_shaping[task] = reward_shaping
227        self.tasks.append(task)

Add a new task to the purpose.

Arguments:
  • task: Task to be added to the purpose.
  • reward_shaping: Reward shaping for this task. Defaults to purpose's default reward shaping.
  • terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. If terminal groups is "" or None, task will be optional and will not allow to terminate the purpose at all. By default, tasks are added in the "default" group and hence ALL tasks have to be done to terminate the purpose.
def build(self, env: hcraft.HcraftEnv):
229    def build(self, env: "HcraftEnv"):
230        """
231        Builds the purpose of the player relative to the given environment.
232
233        Args:
234            env: The HierarchyCraft environment to build upon.
235        """
236        if self.built:
237            return
238
239        if not self.tasks:
240            return
241        # Add reward shaping subtasks
242        for task in self.tasks:
243            subtasks = self._add_reward_shaping_subtasks(
244                task, env, self.reward_shaping[task]
245            )
246            for subtask in subtasks:
247                self.add_task(subtask, RewardShaping.NONE, terminal_groups=None)
248
249        # Build all tasks
250        for task in self.tasks:
251            task.build(env.world)
252
253        self.built = True

Builds the purpose of the player relative to the given environment.

Arguments:
  • env: The HierarchyCraft environment to build upon.
def reward(self, state: hcraft.HcraftState) -> float:
255    def reward(self, state: "HcraftState") -> float:
256        """
257        Returns the purpose reward for the given state based on tasks.
258        """
259        reward = self.timestep_reward
260        if not self.tasks:
261            return reward
262        for task in self.tasks:
263            reward += task.reward(state)
264        return reward

Returns the purpose reward for the given state based on tasks.

def is_terminal(self, state: hcraft.HcraftState) -> bool:
266    def is_terminal(self, state: "HcraftState") -> bool:
267        """
268        Returns True if the given state is terminal for the whole purpose.
269        """
270        if not self.tasks:
271            return False
272        for task in self.tasks:
273            task.is_terminal(state)
274        for terminal_group in self.terminal_groups:
275            if terminal_group.terminated:
276                return True
277        return False

Returns True if the given state is terminal for the whole purpose.

def reset(self) -> None:
279    def reset(self) -> None:
280        """Reset the purpose."""
281        for task in self.tasks:
282            task.reset()

Reset the purpose.

optional_tasks: List[hcraft.task.Task]
284    @property
285    def optional_tasks(self) -> List[Task]:
286        """List of tasks in no terminal group hence being optinal."""
287        terminal_tasks = []
288        for group in self.terminal_groups:
289            terminal_tasks += group.tasks
290        return [task for task in self.tasks if task not in terminal_tasks]

List of tasks in no terminal group hence being optinal.

terminated: bool
292    @property
293    def terminated(self) -> bool:
294        """True if any of the terminal groups are terminated."""
295        return any(
296            all(task.terminated for task in terminal_group.tasks)
297            for terminal_group in self.terminal_groups
298        )

True if any of the terminal groups are terminated.

best_terminal_group: TerminalGroup
300    @property
301    def best_terminal_group(self) -> TerminalGroup:
302        """Best rewarding terminal group."""
303        if self._best_terminal_group is not None:
304            return self._best_terminal_group
305
306        best_terminal_group, best_terminal_value = None, -np.inf
307        for terminal_group in self.terminal_groups:
308            terminal_value = sum(task._reward for task in terminal_group.tasks)
309            if terminal_value > best_terminal_value:
310                best_terminal_value = terminal_value
311                best_terminal_group = terminal_group
312
313        self._best_terminal_group = best_terminal_group
314        return best_terminal_group

Best rewarding terminal group.

def platinium_purpose( items: List[hcraft.Item], zones: List[hcraft.Zone], zones_items: List[hcraft.Item], success_reward: float = 10.0, timestep_reward: float = -0.1):
357def platinium_purpose(
358    items: List[Item],
359    zones: List[Zone],
360    zones_items: List[Item],
361    success_reward: float = 10.0,
362    timestep_reward: float = -0.1,
363):
364    purpose = Purpose(timestep_reward=timestep_reward)
365    for item in items:
366        purpose.add_task(GetItemTask(item, reward=success_reward))
367    for zone in zones:
368        purpose.add_task(GoToZoneTask(zone, reward=success_reward))
369    for item in zones_items:
370        purpose.add_task(PlaceItemTask(item, reward=success_reward))
371    return purpose