Purpose in HierarchyCraft
Every hcraft environments are sandbox environments and do not have a precise purpose by default. But of course, purpose can be added in any HierarchyCraft environment by setting up one or multiple tasks.
Tasks can be one of:
- Get the given item:
hcraft.GetItemTask - Go to the given zone:
hcraft.GoToZoneTask - Place the given item in the given zone (or any zone if none given):
hcraft.PlaceItemTask
Single task purpose
When a single task is passed to a HierarchyCraft environment, it will automaticaly build a purpose. Then the environment will terminates if the task is completed.
Let's take an example on the MineHcraft environment. (This would work on other HierarchyCraft environment)
from hcraft.examples MineHcraftv
from hcraft.purpose import GetItemTask
from hcraft.examples.minecraft.items import DIAMOND
get_diamond = GetItemTask(DIAMOND, reward=10)
env = MineHcraftEnv(purpose=get_diamond)
Reward shaping
Achievement tasks only rewards the player when completed. But this long term feedback is known
to be challenging. To ease learning such tasks, HierarchyCraft Purpose can generate substasks to give
intermediate feedback, this process is also known as reward shaping.
See RewardShaping for more details.
For example, let's add the "required" reward shaping to the get_diamond task:
from hcraft.examples import MineHcraftEnv
from hcraft.purpose import Purpose, GetItemTask
from hcraft.examples.minecraft.items import DIAMOND
get_diamond = GetItemTask(DIAMOND, reward=10)
purpose = Purpose(shaping_value=2)
purpose.add_task(get_diamond, reward_shaping="required")
env = MineHcraftEnv(purpose=purpose)
Then getting the IRON_INGOT item for the first time will give a reward of 2.0 to the player, because IRON_INGOT is used to craft the IRON_PICKAXE that is itself used to get a DIAMOND.
Multi-tasks and terminal groups
In a sandbox environment, why limit ourselves to only one task ? In HierarchyCraft, a purpose can be composed on multiple tasks. But when does the purpose terminates ? When any task is done ? When all tasks are done ?
To solve this, we need to introduce terminal groups. Terminal groups are represented with strings.
The purpose will terminate if ANY of the terminal groups have ALL its tasks done.
When adding a task to a purpose, one can choose one or multiple terminal groups like so:
from hcraft.examples import MineHcraftEnv
from hcraft.purpose import Purpose, GetItemTask, GoToZone
from hcraft.examples.minecraft.items import DIAMOND, GOLD_INGOT, EGG
from hcraft.examples.minecraft.zones import END
get_diamond = GetItemTask(DIAMOND, reward=10)
get_gold = GetItemTask(GOLD_INGOT, reward=5)
get_egg = GetItemTask(EGG, reward=100)
go_to_end = GoToZone(END, reward=20)
purpose = Purpose()
purpose.add_task(get_diamond, reward_shaping="required", terminal_groups="get rich!")
purpose.add_task(get_gold, terminal_groups=["golden end", "get rich!"])
purpose.add_task(go_to_end, reward_shaping="inputs", terminal_groups="golden end")
purpose.add_task(get_egg, terminal_groups=None)
env = MineHcraftEnv(purpose=purpose)
Here the environment will terminate if the player gets both diamond and gold_ingot items ("get rich!" group) or if the player gets a gold_ingot and reaches the end zone ("golden end" group). The task get_egg is optional and cannot terminate the purpose anyhow, but it will still reward the player if completed.
Just like this last task, reward shaping subtasks are always optional.
1"""# Purpose in HierarchyCraft 2 3**Every** hcraft environments are sandbox environments 4and do not have a precise purpose by default. 5But of course, purpose can be added in **any** HierarchyCraft environment 6by setting up one or multiple tasks. 7 8Tasks can be one of: 9* Get the given item: `hcraft.task.GetItemTask` 10* Go to the given zone: `hcraft.task.GoToZoneTask` 11* Place the given item in the given zone (or any zone if none given): `hcraft.task.PlaceItemTask` 12 13 14## Single task purpose 15 16When a single task is passed to a HierarchyCraft environment, it will automaticaly build a purpose. 17Then the environment will terminates if the task is completed. 18 19Let's take an example on the MineHcraft environment. 20(This would work on other HierarchyCraft environment) 21```python 22from hcraft.examples MineHcraftv 23from hcraft.purpose import GetItemTask 24from hcraft.examples.minecraft.items import DIAMOND 25 26get_diamond = GetItemTask(DIAMOND, reward=10) 27env = MineHcraftEnv(purpose=get_diamond) 28``` 29 30## Reward shaping 31 32Achievement tasks only rewards the player when completed. But this long term feedback is known 33to be challenging. To ease learning such tasks, HierarchyCraft Purpose can generate substasks to give 34intermediate feedback, this process is also known as reward shaping. 35See `hcraft.purpose.RewardShaping` for more details. 36 37For example, let's add the "required" reward shaping to the get_diamond task: 38 39```python 40from hcraft.examples import MineHcraftEnv 41from hcraft.purpose import Purpose, GetItemTask 42from hcraft.examples.minecraft.items import DIAMOND 43 44get_diamond = GetItemTask(DIAMOND, reward=10) 45purpose = Purpose(shaping_value=2) 46purpose.add_task(get_diamond, reward_shaping="required") 47 48env = MineHcraftEnv(purpose=purpose) 49``` 50 51Then getting the IRON_INGOT item for the first time will give a reward of 2.0 to the player, because 52IRON_INGOT is used to craft the IRON_PICKAXE that is itself used to get a DIAMOND. 53 54## Multi-tasks and terminal groups 55 56In a sandbox environment, why limit ourselves to only one task ? 57In HierarchyCraft, a purpose can be composed on multiple tasks. 58But when does the purpose terminates ? When any task is done ? When all tasks are done ? 59 60To solve this, we need to introduce terminal groups. 61Terminal groups are represented with strings. 62 63The purpose will terminate if ANY of the terminal groups have ALL its tasks done. 64 65When adding a task to a purpose, one can choose one or multiple terminal groups like so: 66 67```python 68from hcraft.examples import MineHcraftEnv 69from hcraft.purpose import Purpose, GetItemTask, GoToZone 70from hcraft.examples.minecraft.items import DIAMOND, GOLD_INGOT, EGG 71from hcraft.examples.minecraft.zones import END 72 73get_diamond = GetItemTask(DIAMOND, reward=10) 74get_gold = GetItemTask(GOLD_INGOT, reward=5) 75get_egg = GetItemTask(EGG, reward=100) 76go_to_end = GoToZone(END, reward=20) 77 78purpose = Purpose() 79purpose.add_task(get_diamond, reward_shaping="required", terminal_groups="get rich!") 80purpose.add_task(get_gold, terminal_groups=["golden end", "get rich!"]) 81purpose.add_task(go_to_end, reward_shaping="inputs", terminal_groups="golden end") 82purpose.add_task(get_egg, terminal_groups=None) 83 84env = MineHcraftEnv(purpose=purpose) 85``` 86 87Here the environment will terminate if the player gets both diamond 88and gold_ingot items ("get rich!" group) or if the player gets a gold_ingot 89and reaches the end zone ("golden end" group). 90The task get_egg is optional and cannot terminate the purpose anyhow, 91but it will still reward the player if completed. 92 93Just like this last task, reward shaping subtasks are always optional. 94 95""" 96 97from dataclasses import dataclass, field 98from enum import Enum 99from typing import TYPE_CHECKING, Dict, List, Optional, Set, Union 100 101import networkx as nx 102import numpy as np 103 104from hcraft.requirements import RequirementNode, req_node_name 105from hcraft.task import GetItemTask, GoToZoneTask, PlaceItemTask, Task 106from hcraft.elements import Item, Zone 107 108 109if TYPE_CHECKING: 110 from hcraft.env import HcraftEnv, HcraftState 111 from hcraft.world import World 112 113 114class RewardShaping(Enum): 115 """Enumeration of all reward shapings possible.""" 116 117 NONE = "none" 118 """No reward shaping""" 119 ALL_ACHIVEMENTS = "all" 120 """All items and zones will be associated with an achievement subtask.""" 121 REQUIREMENTS_ACHIVEMENTS = "required" 122 """All (recursively) required items and zones for the given task 123 will be associated with an achievement subtask.""" 124 INPUTS_ACHIVEMENT = "inputs" 125 """Items and zones consumed by any transformation solving the task 126 will be associated with an achievement subtask.""" 127 128 129@dataclass 130class TerminalGroup: 131 """Terminal groups are groups of tasks that can terminate the purpose. 132 133 The purpose will termitate if ANY of the terminal groups have ALL its tasks done. 134 """ 135 136 name: str 137 tasks: List[Task] = field(default_factory=list) 138 139 @property 140 def terminated(self) -> bool: 141 """True if all tasks of the terminal group are terminated.""" 142 return all(task.terminated for task in self.tasks) 143 144 def __eq__(self, other) -> bool: 145 if isinstance(other, str): 146 return self.name == other 147 if isinstance(other, TerminalGroup): 148 return self.name == other.name 149 return False 150 151 def __hash__(self) -> int: 152 return self.name.__hash__() 153 154 155class Purpose: 156 """A purpose for a HierarchyCraft player based on a list of tasks.""" 157 158 def __init__( 159 self, 160 tasks: Optional[Union[Task, List[Task]]] = None, 161 timestep_reward: float = 0.0, 162 default_reward_shaping: RewardShaping = RewardShaping.NONE, 163 shaping_value: float = 1.0, 164 ) -> None: 165 """ 166 Args: 167 tasks: Tasks to add to the Purpose. 168 Defaults to None. 169 timestep_reward: Reward for each timestep. 170 Defaults to 0.0. 171 default_reward_shaping: Default reward shaping for tasks. 172 Defaults to RewardShaping.NONE. 173 shaping_value: Reward value used in reward shaping if any. 174 Defaults to 1.0. 175 """ 176 self.tasks: List[Task] = [] 177 self.timestep_reward = timestep_reward 178 self.shaping_value = shaping_value 179 self.default_reward_shaping = default_reward_shaping 180 self.built = False 181 182 self.reward_shaping: Dict[Task, RewardShaping] = {} 183 self.terminal_groups: List[TerminalGroup] = [] 184 185 if isinstance(tasks, Task): 186 tasks = [tasks] 187 elif tasks is None: 188 tasks = [] 189 for task in tasks: 190 self.add_task(task, reward_shaping=default_reward_shaping) 191 192 self._best_terminal_group = None 193 194 def add_task( 195 self, 196 task: Task, 197 reward_shaping: Optional[RewardShaping] = None, 198 terminal_groups: Optional[Union[str, List[str]]] = "default", 199 ): 200 """Add a new task to the purpose. 201 202 Args: 203 task: Task to be added to the purpose. 204 reward_shaping: Reward shaping for this task. 205 Defaults to purpose's default reward shaping. 206 terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. 207 If terminal groups is "" or None, task will be optional and will 208 not allow to terminate the purpose at all. 209 By default, tasks are added in the "default" group and hence 210 ALL tasks have to be done to terminate the purpose. 211 """ 212 if reward_shaping is None: 213 reward_shaping = self.default_reward_shaping 214 reward_shaping = RewardShaping(reward_shaping) 215 if terminal_groups: 216 if isinstance(terminal_groups, str): 217 terminal_groups = [terminal_groups] 218 for terminal_group in terminal_groups: 219 existing_group = self._terminal_group_from_name(terminal_group) 220 if not existing_group: 221 existing_group = TerminalGroup(terminal_group) 222 self.terminal_groups.append(existing_group) 223 existing_group.tasks.append(task) 224 225 self.reward_shaping[task] = reward_shaping 226 self.tasks.append(task) 227 228 def build(self, env: "HcraftEnv"): 229 """ 230 Builds the purpose of the player relative to the given environment. 231 232 Args: 233 env: The HierarchyCraft environment to build upon. 234 """ 235 if self.built: 236 return 237 238 if not self.tasks: 239 return 240 # Add reward shaping subtasks 241 for task in self.tasks: 242 subtasks = self._add_reward_shaping_subtasks( 243 task, env, self.reward_shaping[task] 244 ) 245 for subtask in subtasks: 246 self.add_task(subtask, RewardShaping.NONE, terminal_groups=None) 247 248 # Build all tasks 249 for task in self.tasks: 250 task.build(env.world) 251 252 self.built = True 253 254 def reward(self, state: "HcraftState") -> float: 255 """ 256 Returns the purpose reward for the given state based on tasks. 257 """ 258 reward = self.timestep_reward 259 if not self.tasks: 260 return reward 261 for task in self.tasks: 262 reward += task.reward(state) 263 return reward 264 265 def is_terminal(self, state: "HcraftState") -> bool: 266 """ 267 Returns True if the given state is terminal for the whole purpose. 268 """ 269 if not self.tasks: 270 return False 271 for task in self.tasks: 272 task.is_terminal(state) 273 for terminal_group in self.terminal_groups: 274 if terminal_group.terminated: 275 return True 276 return False 277 278 def reset(self) -> None: 279 """Reset the purpose.""" 280 for task in self.tasks: 281 task.reset() 282 283 @property 284 def optional_tasks(self) -> List[Task]: 285 """List of tasks in no terminal group hence being optinal.""" 286 terminal_tasks = [] 287 for group in self.terminal_groups: 288 terminal_tasks += group.tasks 289 return [task for task in self.tasks if task not in terminal_tasks] 290 291 @property 292 def terminated(self) -> bool: 293 """True if any of the terminal groups are terminated.""" 294 return any( 295 all(task.terminated for task in terminal_group.tasks) 296 for terminal_group in self.terminal_groups 297 ) 298 299 @property 300 def best_terminal_group(self) -> TerminalGroup: 301 """Best rewarding terminal group.""" 302 if self._best_terminal_group is not None: 303 return self._best_terminal_group 304 305 best_terminal_group, best_terminal_value = None, -np.inf 306 for terminal_group in self.terminal_groups: 307 terminal_value = sum(task._reward for task in terminal_group.tasks) 308 if terminal_value > best_terminal_value: 309 best_terminal_value = terminal_value 310 best_terminal_group = terminal_group 311 312 self._best_terminal_group = best_terminal_group 313 return best_terminal_group 314 315 def _terminal_group_from_name(self, name: str) -> Optional[TerminalGroup]: 316 if name not in self.terminal_groups: 317 return None 318 group_id = self.terminal_groups.index(name) 319 return self.terminal_groups[group_id] 320 321 def _add_reward_shaping_subtasks( 322 self, task: Task, env: "HcraftEnv", reward_shaping: RewardShaping 323 ) -> List[Task]: 324 if reward_shaping == RewardShaping.NONE: 325 return [] 326 if reward_shaping == RewardShaping.ALL_ACHIVEMENTS: 327 return _all_subtasks(env.world, self.shaping_value) 328 if reward_shaping == RewardShaping.INPUTS_ACHIVEMENT: 329 return _inputs_subtasks(task, env.world, self.shaping_value) 330 if reward_shaping == RewardShaping.REQUIREMENTS_ACHIVEMENTS: 331 return _required_subtasks(task, env, self.shaping_value) 332 raise NotImplementedError 333 334 def __str__(self) -> str: 335 terminal_groups_str = [] 336 for terminal_group in self.terminal_groups: 337 tasks_str_joined = self._tasks_str(terminal_group.tasks) 338 group_str = f"{terminal_group.name}:[{tasks_str_joined}]" 339 terminal_groups_str.append(group_str) 340 optional_tasks_str = self._tasks_str(self.optional_tasks) 341 if optional_tasks_str: 342 group_str = f"optional:[{optional_tasks_str}]" 343 terminal_groups_str.append(group_str) 344 joined_groups_str = ", ".join(terminal_groups_str) 345 return f"Purpose({joined_groups_str})" 346 347 def _tasks_str(self, tasks: List[Task]) -> str: 348 tasks_str = [] 349 for task in tasks: 350 shaping = self.reward_shaping[task] 351 shaping_str = f"#{shaping.value}" if shaping != RewardShaping.NONE else "" 352 tasks_str.append(f"{task}{shaping_str}") 353 return ",".join(tasks_str) 354 355 356def platinium_purpose( 357 items: List[Item], 358 zones: List[Zone], 359 zones_items: List[Item], 360 success_reward: float = 10.0, 361 timestep_reward: float = -0.1, 362): 363 purpose = Purpose(timestep_reward=timestep_reward) 364 for item in items: 365 purpose.add_task(GetItemTask(item, reward=success_reward)) 366 for zone in zones: 367 purpose.add_task(GoToZoneTask(zone, reward=success_reward)) 368 for item in zones_items: 369 purpose.add_task(PlaceItemTask(item, reward=success_reward)) 370 return purpose 371 372 373def _all_subtasks(world: "World", shaping_reward: float) -> List[Task]: 374 return _build_reward_shaping_subtasks( 375 world.items, world.zones, world.zones_items, shaping_reward 376 ) 377 378 379def _required_subtasks( 380 task: Task, env: "HcraftEnv", shaping_reward: float 381) -> List[Task]: 382 relevant_items = set() 383 relevant_zones = set() 384 relevant_zone_items = set() 385 386 if isinstance(task, GetItemTask): 387 goal_item = task.item_stack.item 388 goal_requirement_nodes = [req_node_name(goal_item, RequirementNode.ITEM)] 389 elif isinstance(task, PlaceItemTask): 390 goal_item = task.item_stack.item 391 goal_requirement_nodes = [req_node_name(goal_item, RequirementNode.ZONE_ITEM)] 392 goal_zones = task.zone 393 if goal_zones is not None: 394 relevant_zones.add(goal_zones) 395 goal_requirement_nodes.append( 396 req_node_name(goal_zones, RequirementNode.ZONE) 397 ) 398 elif isinstance(task, GoToZoneTask): 399 goal_requirement_nodes = [req_node_name(task.zone, RequirementNode.ZONE)] 400 else: 401 raise NotImplementedError( 402 f"Unsupported reward shaping {RewardShaping.REQUIREMENTS_ACHIVEMENTS}" 403 f"for given task type: {type(task)} of {task}" 404 ) 405 406 requirements_acydigraph = env.world.requirements.acydigraph 407 for requirement_node in goal_requirement_nodes: 408 for ancestor in nx.ancestors(requirements_acydigraph, requirement_node): 409 if ancestor == "START#": 410 continue 411 ancestor_node = requirements_acydigraph.nodes[ancestor] 412 item_or_zone: Union["Item", "Zone"] = ancestor_node["obj"] 413 ancestor_type = RequirementNode(ancestor_node["type"]) 414 if ancestor_type is RequirementNode.ITEM: 415 relevant_items.add(item_or_zone) 416 if ancestor_type is RequirementNode.ZONE: 417 relevant_zones.add(item_or_zone) 418 if ancestor_type is RequirementNode.ZONE_ITEM: 419 relevant_zone_items.add(item_or_zone) 420 return _build_reward_shaping_subtasks( 421 relevant_items, 422 relevant_zones, 423 relevant_zone_items, 424 shaping_reward, 425 ) 426 427 428def _inputs_subtasks(task: Task, world: "World", shaping_reward: float) -> List[Task]: 429 relevant_items = set() 430 relevant_zones = set() 431 relevant_zone_items = set() 432 433 goal_zone = None 434 goal_item = None 435 goal_zone_item = None 436 if isinstance(task, GetItemTask): 437 goal_item = task.item_stack.item 438 elif isinstance(task, GoToZoneTask): 439 goal_zone = task.zone 440 elif isinstance(task, PlaceItemTask): 441 goal_zone_item = task.item_stack.item 442 if task.zone: 443 goal_zone = task.zone 444 relevant_zones.add(task.zone) 445 else: 446 raise NotImplementedError( 447 f"Unsupported reward shaping {RewardShaping.INPUTS_ACHIVEMENT}" 448 f"for given task type: {type(task)} of {task}" 449 ) 450 transfo_giving_item = [ 451 transfo 452 for transfo in world.transformations 453 if goal_item in transfo.production("player") 454 and goal_item not in transfo.min_required("player") 455 ] 456 transfo_placing_zone_item = [ 457 transfo 458 for transfo in world.transformations 459 if goal_zone_item in transfo.produced_zones_items 460 and goal_zone_item not in transfo.min_required_zones_items 461 ] 462 transfo_going_to_goal_zone = [ 463 transfo 464 for transfo in world.transformations 465 if transfo.destination is not None and transfo.destination == goal_zone 466 ] 467 relevant_transformations = ( 468 transfo_giving_item + transfo_placing_zone_item + transfo_going_to_goal_zone 469 ) 470 471 for transfo in relevant_transformations: 472 relevant_items |= transfo.consumption("player") 473 relevant_zone_items |= transfo.consumption("current_zone") 474 relevant_zone_items |= transfo.consumption("destination") 475 relevant_zone_items |= transfo.consumption("zones") 476 if transfo.zone: 477 relevant_zones.add(transfo.zone) 478 479 return _build_reward_shaping_subtasks( 480 relevant_items, 481 relevant_zones, 482 relevant_zone_items, 483 shaping_reward, 484 ) 485 486 487def _build_reward_shaping_subtasks( 488 items: Optional[Union[List[Item], Set[Item]]] = None, 489 zones: Optional[Union[List[Zone], Set[Zone]]] = None, 490 zone_items: Optional[Union[List[Item], Set[Item]]] = None, 491 shaping_reward: float = 1.0, 492) -> List[Task]: 493 subtasks = [] 494 if items: 495 subtasks += [GetItemTask(item, reward=shaping_reward) for item in items] 496 if zones: 497 subtasks += [GoToZoneTask(zone, reward=shaping_reward) for zone in zones] 498 if zone_items: 499 subtasks += [PlaceItemTask(item, reward=shaping_reward) for item in zone_items] 500 return subtasks
API Documentation
115class RewardShaping(Enum): 116 """Enumeration of all reward shapings possible.""" 117 118 NONE = "none" 119 """No reward shaping""" 120 ALL_ACHIVEMENTS = "all" 121 """All items and zones will be associated with an achievement subtask.""" 122 REQUIREMENTS_ACHIVEMENTS = "required" 123 """All (recursively) required items and zones for the given task 124 will be associated with an achievement subtask.""" 125 INPUTS_ACHIVEMENT = "inputs" 126 """Items and zones consumed by any transformation solving the task 127 will be associated with an achievement subtask."""
Enumeration of all reward shapings possible.
All items and zones will be associated with an achievement subtask.
All (recursively) required items and zones for the given task will be associated with an achievement subtask.
Items and zones consumed by any transformation solving the task will be associated with an achievement subtask.
Inherited Members
- enum.Enum
- name
- value
130@dataclass 131class TerminalGroup: 132 """Terminal groups are groups of tasks that can terminate the purpose. 133 134 The purpose will termitate if ANY of the terminal groups have ALL its tasks done. 135 """ 136 137 name: str 138 tasks: List[Task] = field(default_factory=list) 139 140 @property 141 def terminated(self) -> bool: 142 """True if all tasks of the terminal group are terminated.""" 143 return all(task.terminated for task in self.tasks) 144 145 def __eq__(self, other) -> bool: 146 if isinstance(other, str): 147 return self.name == other 148 if isinstance(other, TerminalGroup): 149 return self.name == other.name 150 return False 151 152 def __hash__(self) -> int: 153 return self.name.__hash__()
Terminal groups are groups of tasks that can terminate the purpose.
The purpose will termitate if ANY of the terminal groups have ALL its tasks done.
156class Purpose: 157 """A purpose for a HierarchyCraft player based on a list of tasks.""" 158 159 def __init__( 160 self, 161 tasks: Optional[Union[Task, List[Task]]] = None, 162 timestep_reward: float = 0.0, 163 default_reward_shaping: RewardShaping = RewardShaping.NONE, 164 shaping_value: float = 1.0, 165 ) -> None: 166 """ 167 Args: 168 tasks: Tasks to add to the Purpose. 169 Defaults to None. 170 timestep_reward: Reward for each timestep. 171 Defaults to 0.0. 172 default_reward_shaping: Default reward shaping for tasks. 173 Defaults to RewardShaping.NONE. 174 shaping_value: Reward value used in reward shaping if any. 175 Defaults to 1.0. 176 """ 177 self.tasks: List[Task] = [] 178 self.timestep_reward = timestep_reward 179 self.shaping_value = shaping_value 180 self.default_reward_shaping = default_reward_shaping 181 self.built = False 182 183 self.reward_shaping: Dict[Task, RewardShaping] = {} 184 self.terminal_groups: List[TerminalGroup] = [] 185 186 if isinstance(tasks, Task): 187 tasks = [tasks] 188 elif tasks is None: 189 tasks = [] 190 for task in tasks: 191 self.add_task(task, reward_shaping=default_reward_shaping) 192 193 self._best_terminal_group = None 194 195 def add_task( 196 self, 197 task: Task, 198 reward_shaping: Optional[RewardShaping] = None, 199 terminal_groups: Optional[Union[str, List[str]]] = "default", 200 ): 201 """Add a new task to the purpose. 202 203 Args: 204 task: Task to be added to the purpose. 205 reward_shaping: Reward shaping for this task. 206 Defaults to purpose's default reward shaping. 207 terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. 208 If terminal groups is "" or None, task will be optional and will 209 not allow to terminate the purpose at all. 210 By default, tasks are added in the "default" group and hence 211 ALL tasks have to be done to terminate the purpose. 212 """ 213 if reward_shaping is None: 214 reward_shaping = self.default_reward_shaping 215 reward_shaping = RewardShaping(reward_shaping) 216 if terminal_groups: 217 if isinstance(terminal_groups, str): 218 terminal_groups = [terminal_groups] 219 for terminal_group in terminal_groups: 220 existing_group = self._terminal_group_from_name(terminal_group) 221 if not existing_group: 222 existing_group = TerminalGroup(terminal_group) 223 self.terminal_groups.append(existing_group) 224 existing_group.tasks.append(task) 225 226 self.reward_shaping[task] = reward_shaping 227 self.tasks.append(task) 228 229 def build(self, env: "HcraftEnv"): 230 """ 231 Builds the purpose of the player relative to the given environment. 232 233 Args: 234 env: The HierarchyCraft environment to build upon. 235 """ 236 if self.built: 237 return 238 239 if not self.tasks: 240 return 241 # Add reward shaping subtasks 242 for task in self.tasks: 243 subtasks = self._add_reward_shaping_subtasks( 244 task, env, self.reward_shaping[task] 245 ) 246 for subtask in subtasks: 247 self.add_task(subtask, RewardShaping.NONE, terminal_groups=None) 248 249 # Build all tasks 250 for task in self.tasks: 251 task.build(env.world) 252 253 self.built = True 254 255 def reward(self, state: "HcraftState") -> float: 256 """ 257 Returns the purpose reward for the given state based on tasks. 258 """ 259 reward = self.timestep_reward 260 if not self.tasks: 261 return reward 262 for task in self.tasks: 263 reward += task.reward(state) 264 return reward 265 266 def is_terminal(self, state: "HcraftState") -> bool: 267 """ 268 Returns True if the given state is terminal for the whole purpose. 269 """ 270 if not self.tasks: 271 return False 272 for task in self.tasks: 273 task.is_terminal(state) 274 for terminal_group in self.terminal_groups: 275 if terminal_group.terminated: 276 return True 277 return False 278 279 def reset(self) -> None: 280 """Reset the purpose.""" 281 for task in self.tasks: 282 task.reset() 283 284 @property 285 def optional_tasks(self) -> List[Task]: 286 """List of tasks in no terminal group hence being optinal.""" 287 terminal_tasks = [] 288 for group in self.terminal_groups: 289 terminal_tasks += group.tasks 290 return [task for task in self.tasks if task not in terminal_tasks] 291 292 @property 293 def terminated(self) -> bool: 294 """True if any of the terminal groups are terminated.""" 295 return any( 296 all(task.terminated for task in terminal_group.tasks) 297 for terminal_group in self.terminal_groups 298 ) 299 300 @property 301 def best_terminal_group(self) -> TerminalGroup: 302 """Best rewarding terminal group.""" 303 if self._best_terminal_group is not None: 304 return self._best_terminal_group 305 306 best_terminal_group, best_terminal_value = None, -np.inf 307 for terminal_group in self.terminal_groups: 308 terminal_value = sum(task._reward for task in terminal_group.tasks) 309 if terminal_value > best_terminal_value: 310 best_terminal_value = terminal_value 311 best_terminal_group = terminal_group 312 313 self._best_terminal_group = best_terminal_group 314 return best_terminal_group 315 316 def _terminal_group_from_name(self, name: str) -> Optional[TerminalGroup]: 317 if name not in self.terminal_groups: 318 return None 319 group_id = self.terminal_groups.index(name) 320 return self.terminal_groups[group_id] 321 322 def _add_reward_shaping_subtasks( 323 self, task: Task, env: "HcraftEnv", reward_shaping: RewardShaping 324 ) -> List[Task]: 325 if reward_shaping == RewardShaping.NONE: 326 return [] 327 if reward_shaping == RewardShaping.ALL_ACHIVEMENTS: 328 return _all_subtasks(env.world, self.shaping_value) 329 if reward_shaping == RewardShaping.INPUTS_ACHIVEMENT: 330 return _inputs_subtasks(task, env.world, self.shaping_value) 331 if reward_shaping == RewardShaping.REQUIREMENTS_ACHIVEMENTS: 332 return _required_subtasks(task, env, self.shaping_value) 333 raise NotImplementedError 334 335 def __str__(self) -> str: 336 terminal_groups_str = [] 337 for terminal_group in self.terminal_groups: 338 tasks_str_joined = self._tasks_str(terminal_group.tasks) 339 group_str = f"{terminal_group.name}:[{tasks_str_joined}]" 340 terminal_groups_str.append(group_str) 341 optional_tasks_str = self._tasks_str(self.optional_tasks) 342 if optional_tasks_str: 343 group_str = f"optional:[{optional_tasks_str}]" 344 terminal_groups_str.append(group_str) 345 joined_groups_str = ", ".join(terminal_groups_str) 346 return f"Purpose({joined_groups_str})" 347 348 def _tasks_str(self, tasks: List[Task]) -> str: 349 tasks_str = [] 350 for task in tasks: 351 shaping = self.reward_shaping[task] 352 shaping_str = f"#{shaping.value}" if shaping != RewardShaping.NONE else "" 353 tasks_str.append(f"{task}{shaping_str}") 354 return ",".join(tasks_str)
A purpose for a HierarchyCraft player based on a list of tasks.
159 def __init__( 160 self, 161 tasks: Optional[Union[Task, List[Task]]] = None, 162 timestep_reward: float = 0.0, 163 default_reward_shaping: RewardShaping = RewardShaping.NONE, 164 shaping_value: float = 1.0, 165 ) -> None: 166 """ 167 Args: 168 tasks: Tasks to add to the Purpose. 169 Defaults to None. 170 timestep_reward: Reward for each timestep. 171 Defaults to 0.0. 172 default_reward_shaping: Default reward shaping for tasks. 173 Defaults to RewardShaping.NONE. 174 shaping_value: Reward value used in reward shaping if any. 175 Defaults to 1.0. 176 """ 177 self.tasks: List[Task] = [] 178 self.timestep_reward = timestep_reward 179 self.shaping_value = shaping_value 180 self.default_reward_shaping = default_reward_shaping 181 self.built = False 182 183 self.reward_shaping: Dict[Task, RewardShaping] = {} 184 self.terminal_groups: List[TerminalGroup] = [] 185 186 if isinstance(tasks, Task): 187 tasks = [tasks] 188 elif tasks is None: 189 tasks = [] 190 for task in tasks: 191 self.add_task(task, reward_shaping=default_reward_shaping) 192 193 self._best_terminal_group = None
Arguments:
- tasks: Tasks to add to the Purpose. Defaults to None.
- timestep_reward: Reward for each timestep. Defaults to 0.0.
- default_reward_shaping: Default reward shaping for tasks. Defaults to RewardShaping.NONE.
- shaping_value: Reward value used in reward shaping if any. Defaults to 1.0.
195 def add_task( 196 self, 197 task: Task, 198 reward_shaping: Optional[RewardShaping] = None, 199 terminal_groups: Optional[Union[str, List[str]]] = "default", 200 ): 201 """Add a new task to the purpose. 202 203 Args: 204 task: Task to be added to the purpose. 205 reward_shaping: Reward shaping for this task. 206 Defaults to purpose's default reward shaping. 207 terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. 208 If terminal groups is "" or None, task will be optional and will 209 not allow to terminate the purpose at all. 210 By default, tasks are added in the "default" group and hence 211 ALL tasks have to be done to terminate the purpose. 212 """ 213 if reward_shaping is None: 214 reward_shaping = self.default_reward_shaping 215 reward_shaping = RewardShaping(reward_shaping) 216 if terminal_groups: 217 if isinstance(terminal_groups, str): 218 terminal_groups = [terminal_groups] 219 for terminal_group in terminal_groups: 220 existing_group = self._terminal_group_from_name(terminal_group) 221 if not existing_group: 222 existing_group = TerminalGroup(terminal_group) 223 self.terminal_groups.append(existing_group) 224 existing_group.tasks.append(task) 225 226 self.reward_shaping[task] = reward_shaping 227 self.tasks.append(task)
Add a new task to the purpose.
Arguments:
- task: Task to be added to the purpose.
- reward_shaping: Reward shaping for this task. Defaults to purpose's default reward shaping.
- terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. If terminal groups is "" or None, task will be optional and will not allow to terminate the purpose at all. By default, tasks are added in the "default" group and hence ALL tasks have to be done to terminate the purpose.
229 def build(self, env: "HcraftEnv"): 230 """ 231 Builds the purpose of the player relative to the given environment. 232 233 Args: 234 env: The HierarchyCraft environment to build upon. 235 """ 236 if self.built: 237 return 238 239 if not self.tasks: 240 return 241 # Add reward shaping subtasks 242 for task in self.tasks: 243 subtasks = self._add_reward_shaping_subtasks( 244 task, env, self.reward_shaping[task] 245 ) 246 for subtask in subtasks: 247 self.add_task(subtask, RewardShaping.NONE, terminal_groups=None) 248 249 # Build all tasks 250 for task in self.tasks: 251 task.build(env.world) 252 253 self.built = True
Builds the purpose of the player relative to the given environment.
Arguments:
- env: The HierarchyCraft environment to build upon.
255 def reward(self, state: "HcraftState") -> float: 256 """ 257 Returns the purpose reward for the given state based on tasks. 258 """ 259 reward = self.timestep_reward 260 if not self.tasks: 261 return reward 262 for task in self.tasks: 263 reward += task.reward(state) 264 return reward
Returns the purpose reward for the given state based on tasks.
266 def is_terminal(self, state: "HcraftState") -> bool: 267 """ 268 Returns True if the given state is terminal for the whole purpose. 269 """ 270 if not self.tasks: 271 return False 272 for task in self.tasks: 273 task.is_terminal(state) 274 for terminal_group in self.terminal_groups: 275 if terminal_group.terminated: 276 return True 277 return False
Returns True if the given state is terminal for the whole purpose.
279 def reset(self) -> None: 280 """Reset the purpose.""" 281 for task in self.tasks: 282 task.reset()
Reset the purpose.
284 @property 285 def optional_tasks(self) -> List[Task]: 286 """List of tasks in no terminal group hence being optinal.""" 287 terminal_tasks = [] 288 for group in self.terminal_groups: 289 terminal_tasks += group.tasks 290 return [task for task in self.tasks if task not in terminal_tasks]
List of tasks in no terminal group hence being optinal.
292 @property 293 def terminated(self) -> bool: 294 """True if any of the terminal groups are terminated.""" 295 return any( 296 all(task.terminated for task in terminal_group.tasks) 297 for terminal_group in self.terminal_groups 298 )
True if any of the terminal groups are terminated.
300 @property 301 def best_terminal_group(self) -> TerminalGroup: 302 """Best rewarding terminal group.""" 303 if self._best_terminal_group is not None: 304 return self._best_terminal_group 305 306 best_terminal_group, best_terminal_value = None, -np.inf 307 for terminal_group in self.terminal_groups: 308 terminal_value = sum(task._reward for task in terminal_group.tasks) 309 if terminal_value > best_terminal_value: 310 best_terminal_value = terminal_value 311 best_terminal_group = terminal_group 312 313 self._best_terminal_group = best_terminal_group 314 return best_terminal_group
Best rewarding terminal group.
357def platinium_purpose( 358 items: List[Item], 359 zones: List[Zone], 360 zones_items: List[Item], 361 success_reward: float = 10.0, 362 timestep_reward: float = -0.1, 363): 364 purpose = Purpose(timestep_reward=timestep_reward) 365 for item in items: 366 purpose.add_task(GetItemTask(item, reward=success_reward)) 367 for zone in zones: 368 purpose.add_task(GoToZoneTask(zone, reward=success_reward)) 369 for item in zones_items: 370 purpose.add_task(PlaceItemTask(item, reward=success_reward)) 371 return purpose