HierarchyCraft - Environements builder for hierarchical reasoning research
HierarchyCraft
HierarchyCraft (hcraft for short) is a Python library designed to create arbitrary hierarchical environments that are compatible with both the OpenAI Gym Reinforcement Learning Framework and AIPlan4EU Unified Planning Framework. This library enables users to easily create complex hierarchical structures that can be used to test and develop various reinforcement learning or planning algorithms.
In environments built with HierarchyCraft the agent (player) has an inventory and can navigate into abstract zones that themselves have inventories.
The action space of HierarchyCraft environments consists of sub-tasks, referred to as Transformations, as opposed to detailed movements and controls. But each Transformations has specific requirements to be valid (eg. have enought of an item, be in the right place), and these requirements may necessitate the execution of other Transformations first, inherently creating a hierarchical structure in HierarchyCraft environments.
This concept is visually represented by the Requirements graph depicting the hierarchical relationships within each HierarchyCraft environment. The Requirements graph is directly constructed from the list of Transformations composing the environement.
More details about requirements graph can be found in the documentation at hcraft.requirements
and example of requirements graph for some HierarchyCraft environements can be found in hcraft.examples
.
No feature extraction for fast research even with low compute
HierarchyCraft returns vectorized state information, which plainly and directly describes the player's inventory, current positions, and the inventory of the current zone. Compared to benchmarks that return grids, pixel arrays, text or sound, we directly return a low-dimensional latent representation that doesn't need to be learned. Therefore saving compute time and allowing researchers to focus only the the hierarchical reasoning part.
See hcraft.state
for more details.
Create your own tailored HierarchyCraft environments
You can use HierarchyCraft to create various custom hierarchical environments from a list of customized Transformations.
See hcraft.env
for a complete tutorial on creating custom environments.
Installation
Using pip
Without optional dependencies:
pip install hcraft
All hcraft environments can use a common graphical user interface that can be used with gui requirements:
pip install hcraft[gui]
Gym environment can be obtained with gym requirements:
pip install hcraft[gym]
Planning problems can be obtained throught the upf interface with planning requirements:
pip install hcraft[planning]
Some complex graph can be represented in html interactive visualisation:
pip install hcraft[htmlvis]
Quickstart
Play yourself!
Install the graphical user interface optional dependencies:
pip install hcraft[gui]
Using the command line interface
You can directly try to play yourself with the GUI available for any HierarchyCraft environments, for example:
hcraft minecraft
For more examples:
hcraft --help
Using the programmatic interface:
from hcraft import get_human_action
from hcraft.examples import MineHcraftEnv
env = MineHcraftEnv()
# or env: MineHcraftEnv = gym.make("MineHcraft-NoReward-v1")
n_episodes = 2
for _ in range(n_episodes):
env.reset()
done = False
total_reward = 0
while not done:
env.render()
action = get_human_action(env)
print(f"Human pressed: {env.world.transformations[action]}")
_observation, reward, done, _info = env.step(action)
total_reward += reward
print(f"SCORE: {total_reward}")
As a Gym RL environment
Using the programmatic interface, any HierarchyCraft environment can easily be interfaced with classic reinforcement learning agents.
import numpy as np
from hcraft.examples import MineHcraftEnv
def random_legal_agent(observation, action_is_legal):
action = np.random.choice(np.nonzero(action_is_legal)[0])
return int(action)
env = MineHcraftEnv(max_step=10)
done = False
observation, _info = env.reset()
while not done:
action_is_legal = env.action_masks()
action = random_legal_agent(observation, action_is_legal)
_observation, _reward, terminated, truncated, _info = env.step(action)
# Other examples of HierarchyCraft environments
from hcraft.examples import TowerHcraftEnv, RecursiveHcraftEnv, RandomHcraftEnv
tower_env = TowerHcraftEnv(height=3, width=2)
# or tower_env = gym.make("TowerHcraft-v1", height=3, width=2)
recursive_env = RecursiveHcraftEnv(n_items=6)
# or recursive_env = gym.make("RecursiveHcraft-v1", n_items=6)
random_env = RandomHcraftEnv(n_items_per_n_inputs={0:2, 1:5, 2:10}, seed=42)
# or random_env = gym.make("RandomHcraft-v1", n_items_per_n_inputs={0:2, 1:5, 2:10}, seed=42)
See hcraft.env
for a more complete description.
As a UPF problem for planning
HierarchyCraft environments can be converted to planning problem in one line thanks to the Unified Planning Framework (UPF):
# Example env
env = TowerHcraftEnv(height=3, width=2)
# Make it into a unified planning problem
planning_problem = env.planning_problem()
print(planning_problem.upf_problem)
Then they can be solved with any compatible planner for UPF:
# Solve the planning problem and show the plan
planning_problem.solve()
print(planning_problem.plan)
The planning_problem can also give actions to do in the environment, triggering replaning if necessary:
done = False
_observation, _info = env.reset()
while not done:
# Automatically replan at the end of each plan until env termination
# Observations are not used when blindly following a current plan
# But the state in required in order to replan if there is no plan left
action = planning_problem.action_from_plan(env.state)
if action is None:
# Plan is existing but empty, thus nothing to do, thus terminates
done = True
continue
_observation, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
if terminated:
print("Success ! The plan worked in the actual environment !")
else:
print("Failed ... Something went wrong with the plan or the episode was truncated.")
See hcraft.planning
for a more complete description.
More about HierarchyCraft
Online documentation
Learn more in the DOCUMENTATION
Contributing
You want to contribute to HierarchyCraft ? See our contributions guidelines and join us !
Custom purposes for agents in HierarchyCraft environments
HierarchyCraft allows users to specify custom purposes (one or multiple tasks) for agents in their environments. This feature provides a high degree of flexibility and allows users to design environments that are tailored to specific applications or scenarios. This feature enables to study mutli-task or lifelong learning settings.
See hcraft.purpose
for more details.
Solving behavior for all tasks of most HierarchyCraft environments
HierarchyCraft also includes solving behaviors that can be used to generate actions from observations that will complete most tasks in any HierarchyCraft environment, including user-designed. Solving behaviors are handcrafted, and may not work in some edge cases when some items are rquired in specific zones. This feature makes it easy for users to obtain a strong baseline in their custom environments.
See hcraft.solving_behaviors
for more details.
Visualizing the underlying hierarchy of the environment (requirements graph)
HierarchyCraft gives the ability to visualize the hierarchy of the environment as a requirements graph. This graph provides a potentialy complex but complete representation of what is required to obtain each item or to go in each zone, allowing users to easily understand the structure of the environment and identify key items of the environment.
For example, here is the graph of the 'MiniCraftUnlock' environment where the goal is to open a door using a key:
And here is much more complex graph of the 'MineHcraft' environment shown previously:
1""" 2.. include:: ../../README.md 3 4## Custom purposes for agents in HierarchyCraft environments 5 6HierarchyCraft allows users to specify custom purposes (one or multiple tasks) for agents in their environments. 7This feature provides a high degree of flexibility and allows users to design environments that 8are tailored to specific applications or scenarios. 9This feature enables to study mutli-task or lifelong learning settings. 10 11See [`hcraft.purpose`](https://irll.github.io/HierarchyCraft/hcraft/purpose.html) for more details. 12 13## Solving behavior for all tasks of most HierarchyCraft environments 14 15HierarchyCraft also includes solving behaviors that can be used to generate actions 16from observations that will complete most tasks in any HierarchyCraft environment, including user-designed. 17Solving behaviors are handcrafted, and may not work in some edge cases when some items are rquired in specific zones. 18This feature makes it easy for users to obtain a strong baseline in their custom environments. 19 20See [`hcraft.solving_behaviors`](https://irll.github.io/HierarchyCraft/hcraft/solving_behaviors.html) for more details. 21 22## Visualizing the underlying hierarchy of the environment (requirements graph) 23 24HierarchyCraft gives the ability to visualize the hierarchy of the environment as a requirements graph. 25This graph provides a potentialy complex but complete representation of what is required 26to obtain each item or to go in each zone, allowing users to easily understand the structure 27of the environment and identify key items of the environment. 28 29For example, here is the graph of the 'MiniCraftUnlock' environment where the goal is to open a door using a key: 30 31 32 33And here is much more complex graph of the 'MineHcraft' environment shown previously: 34 35 36See [`hcraft.requirements`](https://irll.github.io/HierarchyCraft/hcraft/requirements.html) for more details. 37 38""" 39 40import hcraft.state as state 41import hcraft.solving_behaviors as solving_behaviors 42import hcraft.purpose as purpose 43import hcraft.transformation as transformation 44import hcraft.requirements as requirements 45import hcraft.env as env 46import hcraft.examples as examples 47import hcraft.world as world 48import hcraft.planning as planning 49 50from hcraft.elements import Item, Stack, Zone 51from hcraft.transformation import Transformation 52from hcraft.env import HcraftEnv, HcraftState 53from hcraft.purpose import Purpose 54from hcraft.render.human import get_human_action, render_env_with_human 55from hcraft.task import GetItemTask, GoToZoneTask, PlaceItemTask 56 57 58__all__ = [ 59 "HcraftState", 60 "Transformation", 61 "Item", 62 "Stack", 63 "Zone", 64 "HcraftEnv", 65 "get_human_action", 66 "render_env_with_human", 67 "Purpose", 68 "GetItemTask", 69 "GoToZoneTask", 70 "PlaceItemTask", 71 "state", 72 "transformation", 73 "purpose", 74 "solving_behaviors", 75 "requirements", 76 "world", 77 "env", 78 "planning", 79 "examples", 80]
API Documentation
13class HcraftState: 14 """State manager of HierarchyCraft environments. 15 16 The state of every HierarchyCraft environment is composed of three parts: 17 * The player's inventory: `state.player_inventory` 18 * The one-hot encoded player's position: `state.position` 19 * All zones inventories: `state.zones_inventories` 20 21 The mapping of items, zones, and zones items to their respective indexes is done through 22 the given World. (See `hcraft.world`) 23 24  25 26 """ 27 28 def __init__(self, world: "World") -> None: 29 """ 30 Args: 31 world: World to build the state for. 32 """ 33 self.player_inventory = np.array([], dtype=np.int32) 34 self.position = np.array([], dtype=np.int32) 35 self.zones_inventories = np.array([], dtype=np.int32) 36 37 self.discovered_items = np.array([], dtype=np.ubyte) 38 self.discovered_zones = np.array([], dtype=np.ubyte) 39 self.discovered_zones_items = np.array([], dtype=np.ubyte) 40 self.discovered_transformations = np.array([], dtype=np.ubyte) 41 42 self.world = world 43 self.reset() 44 45 @property 46 def current_zone_inventory(self) -> np.ndarray: 47 """Inventory of the zone where the player is.""" 48 if self.position.shape[0] == 0: 49 return np.array([]) # No Zone 50 return self.zones_inventories[self._current_zone_slot, :][0] 51 52 @property 53 def observation(self) -> np.ndarray: 54 """The player's observation is a subset of the state. 55 56 Only the inventory of the current zone is shown. 57 58  59 60 """ 61 return np.concatenate( 62 ( 63 self.player_inventory, 64 self.position, 65 self.current_zone_inventory, 66 ) 67 ) 68 69 def amount_of(self, item: "Item", owner: Optional["Zone"] = "player") -> int: 70 """Current amount of the given item owned by owner. 71 72 Args: 73 item: Item to get the amount of. 74 owner: Owner of the inventory to check. Defaults to player. 75 76 Returns: 77 int: Amount of the item in the owner's inventory. 78 """ 79 80 if owner in self.world.zones: 81 zone_index = self.world.zones.index(owner) 82 zone_item_index = self.world.zones_items.index(item) 83 return int(self.zones_inventories[zone_index, zone_item_index]) 84 85 item_index = self.world.items.index(item) 86 return int(self.player_inventory[item_index]) 87 88 def has_discovered(self, zone: "Zone") -> bool: 89 """Whether the given zone was discovered. 90 91 Args: 92 zone (Zone): Zone to check. 93 94 Returns: 95 bool: True if the zone was discovered. 96 """ 97 zone_index = self.world.zones.index(zone) 98 return bool(self.discovered_zones[zone_index]) 99 100 @property 101 def current_zone(self) -> Optional["Zone"]: 102 """Current position of the player.""" 103 if self.world.n_zones == 0: 104 return None 105 return self.world.zones[self._current_zone_slot[0]] 106 107 @property 108 def _current_zone_slot(self) -> int: 109 return self.position.nonzero()[0] 110 111 @property 112 def player_inventory_dict(self) -> Dict["Item", int]: 113 """Current inventory of the player.""" 114 return self._inv_as_dict(self.player_inventory, self.world.items) 115 116 @property 117 def zones_inventories_dict(self) -> Dict["Zone", Dict["Item", int]]: 118 """Current inventories of the current zone and each zone containing item.""" 119 zones_invs = {} 120 for zone_slot, zone_inv in enumerate(self.zones_inventories): 121 zone = self.world.zones[zone_slot] 122 zone_inv = self._inv_as_dict(zone_inv, self.world.zones_items) 123 if zone_slot == self._current_zone_slot or zone_inv: 124 zones_invs[zone] = zone_inv 125 return zones_invs 126 127 def apply(self, action: int) -> bool: 128 """Apply the given action to update the state. 129 130 Args: 131 action (int): Index of the transformation to apply. 132 133 Returns: 134 bool: True if the transformation was applied succesfuly. False otherwise. 135 """ 136 choosen_transformation = self.world.transformations[action] 137 if not choosen_transformation.is_valid(self): 138 return False 139 choosen_transformation.apply( 140 self.player_inventory, 141 self.position, 142 self.zones_inventories, 143 ) 144 self._update_discoveries(action) 145 return True 146 147 def reset(self) -> None: 148 """Reset the state to it's initial value.""" 149 self.player_inventory = np.zeros(self.world.n_items, dtype=np.int32) 150 for stack in self.world.start_items: 151 item_slot = self.world.items.index(stack.item) 152 self.player_inventory[item_slot] = stack.quantity 153 154 self.position = np.zeros(self.world.n_zones, dtype=np.int32) 155 start_slot = 0 # Start in first Zone by default 156 if self.world.start_zone is not None: 157 start_slot = self.world.slot_from_zone(self.world.start_zone) 158 if self.position.shape[0] > 0: 159 self.position[start_slot] = 1 160 161 self.zones_inventories = np.zeros( 162 (self.world.n_zones, self.world.n_zones_items), dtype=np.int32 163 ) 164 for zone, zone_stacks in self.world.start_zones_items.items(): 165 zone_slot = self.world.slot_from_zone(zone) 166 for stack in zone_stacks: 167 item_slot = self.world.zones_items.index(stack.item) 168 self.zones_inventories[zone_slot, item_slot] = stack.quantity 169 170 self.discovered_items = np.zeros(self.world.n_items, dtype=np.ubyte) 171 self.discovered_zones_items = np.zeros(self.world.n_zones_items, dtype=np.ubyte) 172 self.discovered_zones = np.zeros(self.world.n_zones, dtype=np.ubyte) 173 self.discovered_transformations = np.zeros( 174 len(self.world.transformations), dtype=np.ubyte 175 ) 176 self._update_discoveries() 177 178 def _update_discoveries(self, action: Optional[int] = None) -> None: 179 self.discovered_items = np.bitwise_or( 180 self.discovered_items, self.player_inventory > 0 181 ) 182 self.discovered_zones_items = np.bitwise_or( 183 self.discovered_zones_items, self.current_zone_inventory > 0 184 ) 185 self.discovered_zones = np.bitwise_or(self.discovered_zones, self.position > 0) 186 if action is not None: 187 self.discovered_transformations[action] = 1 188 189 @staticmethod 190 def _inv_as_dict(inventory_array: np.ndarray, obj_registry: list): 191 return { 192 obj_registry[index]: value 193 for index, value in enumerate(inventory_array) 194 if value > 0 195 } 196 197 def as_dict(self) -> dict: 198 state_dict = { 199 "pos": self.current_zone, 200 InventoryOwner.PLAYER.value: self.player_inventory_dict, 201 } 202 state_dict.update(self.zones_inventories_dict) 203 return state_dict
State manager of HierarchyCraft environments.
The state of every HierarchyCraft environment is composed of three parts:
- The player's inventory:
state.player_inventory
- The one-hot encoded player's position:
state.position
- All zones inventories:
state.zones_inventories
The mapping of items, zones, and zones items to their respective indexes is done through
the given World. (See hcraft.world
)
28 def __init__(self, world: "World") -> None: 29 """ 30 Args: 31 world: World to build the state for. 32 """ 33 self.player_inventory = np.array([], dtype=np.int32) 34 self.position = np.array([], dtype=np.int32) 35 self.zones_inventories = np.array([], dtype=np.int32) 36 37 self.discovered_items = np.array([], dtype=np.ubyte) 38 self.discovered_zones = np.array([], dtype=np.ubyte) 39 self.discovered_zones_items = np.array([], dtype=np.ubyte) 40 self.discovered_transformations = np.array([], dtype=np.ubyte) 41 42 self.world = world 43 self.reset()
Arguments:
- world: World to build the state for.
45 @property 46 def current_zone_inventory(self) -> np.ndarray: 47 """Inventory of the zone where the player is.""" 48 if self.position.shape[0] == 0: 49 return np.array([]) # No Zone 50 return self.zones_inventories[self._current_zone_slot, :][0]
Inventory of the zone where the player is.
52 @property 53 def observation(self) -> np.ndarray: 54 """The player's observation is a subset of the state. 55 56 Only the inventory of the current zone is shown. 57 58  59 60 """ 61 return np.concatenate( 62 ( 63 self.player_inventory, 64 self.position, 65 self.current_zone_inventory, 66 ) 67 )
The player's observation is a subset of the state.
Only the inventory of the current zone is shown.
69 def amount_of(self, item: "Item", owner: Optional["Zone"] = "player") -> int: 70 """Current amount of the given item owned by owner. 71 72 Args: 73 item: Item to get the amount of. 74 owner: Owner of the inventory to check. Defaults to player. 75 76 Returns: 77 int: Amount of the item in the owner's inventory. 78 """ 79 80 if owner in self.world.zones: 81 zone_index = self.world.zones.index(owner) 82 zone_item_index = self.world.zones_items.index(item) 83 return int(self.zones_inventories[zone_index, zone_item_index]) 84 85 item_index = self.world.items.index(item) 86 return int(self.player_inventory[item_index])
Current amount of the given item owned by owner.
Arguments:
- item: Item to get the amount of.
- owner: Owner of the inventory to check. Defaults to player.
Returns:
int: Amount of the item in the owner's inventory.
88 def has_discovered(self, zone: "Zone") -> bool: 89 """Whether the given zone was discovered. 90 91 Args: 92 zone (Zone): Zone to check. 93 94 Returns: 95 bool: True if the zone was discovered. 96 """ 97 zone_index = self.world.zones.index(zone) 98 return bool(self.discovered_zones[zone_index])
Whether the given zone was discovered.
Arguments:
- zone (Zone): Zone to check.
Returns:
bool: True if the zone was discovered.
100 @property 101 def current_zone(self) -> Optional["Zone"]: 102 """Current position of the player.""" 103 if self.world.n_zones == 0: 104 return None 105 return self.world.zones[self._current_zone_slot[0]]
Current position of the player.
111 @property 112 def player_inventory_dict(self) -> Dict["Item", int]: 113 """Current inventory of the player.""" 114 return self._inv_as_dict(self.player_inventory, self.world.items)
Current inventory of the player.
116 @property 117 def zones_inventories_dict(self) -> Dict["Zone", Dict["Item", int]]: 118 """Current inventories of the current zone and each zone containing item.""" 119 zones_invs = {} 120 for zone_slot, zone_inv in enumerate(self.zones_inventories): 121 zone = self.world.zones[zone_slot] 122 zone_inv = self._inv_as_dict(zone_inv, self.world.zones_items) 123 if zone_slot == self._current_zone_slot or zone_inv: 124 zones_invs[zone] = zone_inv 125 return zones_invs
Current inventories of the current zone and each zone containing item.
127 def apply(self, action: int) -> bool: 128 """Apply the given action to update the state. 129 130 Args: 131 action (int): Index of the transformation to apply. 132 133 Returns: 134 bool: True if the transformation was applied succesfuly. False otherwise. 135 """ 136 choosen_transformation = self.world.transformations[action] 137 if not choosen_transformation.is_valid(self): 138 return False 139 choosen_transformation.apply( 140 self.player_inventory, 141 self.position, 142 self.zones_inventories, 143 ) 144 self._update_discoveries(action) 145 return True
Apply the given action to update the state.
Arguments:
- action (int): Index of the transformation to apply.
Returns:
bool: True if the transformation was applied succesfuly. False otherwise.
147 def reset(self) -> None: 148 """Reset the state to it's initial value.""" 149 self.player_inventory = np.zeros(self.world.n_items, dtype=np.int32) 150 for stack in self.world.start_items: 151 item_slot = self.world.items.index(stack.item) 152 self.player_inventory[item_slot] = stack.quantity 153 154 self.position = np.zeros(self.world.n_zones, dtype=np.int32) 155 start_slot = 0 # Start in first Zone by default 156 if self.world.start_zone is not None: 157 start_slot = self.world.slot_from_zone(self.world.start_zone) 158 if self.position.shape[0] > 0: 159 self.position[start_slot] = 1 160 161 self.zones_inventories = np.zeros( 162 (self.world.n_zones, self.world.n_zones_items), dtype=np.int32 163 ) 164 for zone, zone_stacks in self.world.start_zones_items.items(): 165 zone_slot = self.world.slot_from_zone(zone) 166 for stack in zone_stacks: 167 item_slot = self.world.zones_items.index(stack.item) 168 self.zones_inventories[zone_slot, item_slot] = stack.quantity 169 170 self.discovered_items = np.zeros(self.world.n_items, dtype=np.ubyte) 171 self.discovered_zones_items = np.zeros(self.world.n_zones_items, dtype=np.ubyte) 172 self.discovered_zones = np.zeros(self.world.n_zones, dtype=np.ubyte) 173 self.discovered_transformations = np.zeros( 174 len(self.world.transformations), dtype=np.ubyte 175 ) 176 self._update_discoveries()
Reset the state to it's initial value.
242class Transformation: 243 """The building blocks of every HierarchyCraft environment. 244 245 A list of transformations is what defines each HierarchyCraft environement. 246 Transformation becomes the available actions and all available transitions of the environment. 247 248 Each transformation defines changes of: 249 250 * the player inventory 251 * the player position to a given destination 252 * the current zone inventory 253 * the destination zone inventory (if a destination is specified). 254 * all specific zones inventories 255 256 Each inventory change is a list of removed (-) and added (+) Stack. 257 258 If specified, they may be restricted to only a subset of valid zones, 259 all zones are valid by default. 260 261 A Transformation can only be applied if valid in the given state. 262 A transformation is only valid if the player in a valid zone 263 and all relevant inventories have enough items to be removed *before* adding new items. 264 265 The picture bellow illustrates the impact of 266 an example transformation on a given `hcraft.HcraftState`: 267 <img 268 src="https://raw.githubusercontent.com/IRLL/HierarchyCraft/master/docs/images/hcraft_transformation.png" 269 width="90%"/> 270 271 In this example, when applied, the transformation will: 272 273 * <span style="color:red">(-)</span> 274 Remove 1 item "0", then <span style="color:red">(+)</span> 275 Add 4 item "3" in the <span style="color:red">player inventory</span>. 276 * Update the <span style="color:gray">player position</span> 277 from the <span style="color:green">current zone</span> "1". 278 to the <span style="color:orange">destination zone</span> "3". 279 * <span style="color:green">(-)</span> 280 Remove 2 zone item "0" and 1 zone item "1", then <span style="color:green">(+)</span> 281 Add 1 item "1" in the <span style="color:green">current zone</span> inventory. 282 * <span style="color:orange">(-)</span> 283 Remove 1 zone item "2", then <span style="color:orange">(+)</span> 284 Add 1 item "0" in the <span style="color:orange">destination zone</span> inventory. 285 * <span style="color:blue">(-)</span> 286 Remove 1 zone item "0" in the zone "1" inventory 287 and 2 zone item "2" in the zone "2" inventory, 288 then <span style="color:blue">(+)</span> 289 Add 1 zone item "1" in the zone "0" inventory 290 and 1 zone item "2" in the zone "1" inventory. 291 292 """ 293 294 def __init__( 295 self, 296 name: Optional[str] = None, 297 destination: Optional[Zone] = None, 298 inventory_changes: Optional[List[InventoryChange]] = None, 299 zone: Optional[Zone] = None, 300 ) -> None: 301 """The building blocks of every HierarchyCraft environment. 302 303 Args: 304 name: Name given to the Transformation. If None use repr instead. 305 Defaults to None. 306 destination: Destination zone. 307 Defaults to None. 308 inventory_changes: List of inventory changes done by this transformation. 309 Defaults to None. 310 zone: Zone to which Transformation is restricted. Unrestricted if None. 311 Defaults to None. 312 """ 313 self.destination = destination 314 self._destination = None 315 316 self.zone = zone 317 self._zone = None 318 319 self._changes_list = inventory_changes 320 self.inventory_changes = _format_inventory_changes(inventory_changes) 321 self._inventory_operations: Optional[ 322 Dict[InventoryOwner, InventoryOperations] 323 ] = None 324 325 self.name = name if name is not None else self.__repr__() 326 327 def apply( 328 self, 329 player_inventory: np.ndarray, 330 position: np.ndarray, 331 zones_inventories: np.ndarray, 332 ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: 333 """Apply the transformation in place on the given state.""" 334 335 for owner, operations in self._inventory_operations.items(): 336 operation_arr = operations[InventoryOperation.APPLY] 337 if operation_arr is not None: 338 _update_inventory( 339 owner, 340 player_inventory, 341 position, 342 zones_inventories, 343 self._destination, 344 operation_arr, 345 ) 346 if self._destination is not None: 347 position[...] = self._destination 348 349 def is_valid(self, state: "HcraftState") -> bool: 350 """Is the transformation valid in the given state?""" 351 if not self._is_valid_position(state.position): 352 return False 353 if not self._is_valid_player_inventory(state.player_inventory): 354 return False 355 if not self._is_valid_zones_inventory(state.zones_inventories, state.position): 356 return False 357 return True 358 359 def build(self, world: "World") -> None: 360 """Build the transformation array operations on the given world.""" 361 self._build_destination_op(world) 362 self._build_inventory_ops(world) 363 self._build_zones_op(world) 364 365 def get_changes( 366 self, owner: InventoryOwner, operation: InventoryOperation, default: Any = None 367 ) -> Optional[Union[List[Stack], Dict[Zone, List[Stack]]]]: 368 """Get individual changes for a given owner and a given operation. 369 370 Args: 371 owner: Owner of the inventory changes to get. 372 operation: Operation on the inventory to get. 373 374 Returns: 375 Changes of the inventory of the given owner with the given operation. 376 """ 377 owner = InventoryOwner(owner) 378 operation = InventoryOperation(operation) 379 operations = self.inventory_changes.get(owner, {}) 380 return operations.get(operation, default) 381 382 def production(self, owner: InventoryOwner) -> Set["Item"]: 383 """Set of produced items for the given owner by this transformation.""" 384 return self._relevant_items_changed(owner, InventoryOperation.ADD) 385 386 def consumption(self, owner: InventoryOwner) -> Set["Item"]: 387 """Set of consumed items for the given owner by this transformation.""" 388 return self._relevant_items_changed(owner, InventoryOperation.REMOVE) 389 390 def min_required(self, owner: InventoryOwner) -> Set["Item"]: 391 """Set of items for which a minimum is required by this transformation 392 for the given owner.""" 393 return self._relevant_items_changed(owner, InventoryOperation.MIN) 394 395 def max_required(self, owner: InventoryOwner) -> Set["Item"]: 396 """Set of items for which a maximum is required by this transformation 397 for the given owner.""" 398 return self._relevant_items_changed(owner, InventoryOperation.MAX) 399 400 @property 401 def produced_zones_items(self) -> Set["Item"]: 402 """Set of produced zones items by this transformation.""" 403 return ( 404 self.production(CURRENT_ZONE) 405 | self.production(DESTINATION) 406 | self.production(InventoryOwner.ZONES) 407 ) 408 409 @property 410 def consumed_zones_items(self) -> Set["Item"]: 411 """Set of consumed zones items by this transformation.""" 412 return ( 413 self.consumption(CURRENT_ZONE) 414 | self.consumption(DESTINATION) 415 | self.consumption(InventoryOwner.ZONES) 416 ) 417 418 @property 419 def min_required_zones_items(self) -> Set["Item"]: 420 """Set of zone items for which a minimum is required by this transformation.""" 421 return ( 422 self.min_required(CURRENT_ZONE) 423 | self.min_required(DESTINATION) 424 | self.min_required(InventoryOwner.ZONES) 425 ) 426 427 @property 428 def max_required_zones_items(self) -> Set["Item"]: 429 """Set of zone items for which a maximum is required by this transformation.""" 430 return ( 431 self.max_required(CURRENT_ZONE) 432 | self.max_required(DESTINATION) 433 | self.max_required(InventoryOwner.ZONES) 434 ) 435 436 def _relevant_items_changed( 437 self, owner: InventoryOwner, operation: InventoryOperation 438 ): 439 added_stacks = self.get_changes(owner, operation) 440 items = set() 441 442 if added_stacks: 443 if owner is not InventoryOwner.ZONES: 444 return _items_from_stack_list(added_stacks) 445 446 for _zone, stacks in added_stacks.items(): 447 items |= _items_from_stack_list(stacks) 448 449 return items 450 451 def _is_valid_position(self, position: np.ndarray): 452 if self._zone is not None and not np.any(np.multiply(self._zone, position)): 453 return False 454 if self._destination is not None and np.all(self._destination == position): 455 return False 456 return True 457 458 def _is_valid_inventory( 459 self, 460 inventory: np.ndarray, 461 added: Optional[np.ndarray], 462 removed: Optional[np.ndarray], 463 max_items: Optional[np.ndarray], 464 min_items: Optional[np.ndarray], 465 ): 466 added = 0 if added is None else added 467 removed = 0 if removed is None else removed 468 if max_items is not None and np.any(inventory > max_items): 469 return False 470 if min_items is not None and np.any(inventory < min_items): 471 return False 472 return True 473 474 def _is_valid_player_inventory(self, player_inventory: np.ndarray): 475 items_changes = self._inventory_operations.get(InventoryOwner.PLAYER, {}) 476 added = items_changes.get(InventoryOperation.ADD, 0) 477 removed = items_changes.get(InventoryOperation.REMOVE) 478 max_items = items_changes.get(InventoryOperation.MAX) 479 min_items = items_changes.get(InventoryOperation.MIN) 480 return self._is_valid_inventory( 481 player_inventory, added, removed, max_items, min_items 482 ) 483 484 def _is_valid_zones_inventory( 485 self, zones_inventories: np.ndarray, position: np.ndarray 486 ): 487 if zones_inventories.size == 0: 488 return True 489 490 # Specific zones operations 491 zones_changes = self._inventory_operations.get(InventoryOwner.ZONES, {}) 492 zeros = np.zeros_like(zones_inventories) 493 added = zones_changes.get(InventoryOperation.ADD, zeros.copy()) 494 removed = zones_changes.get(InventoryOperation.REMOVE, zeros.copy()) 495 infs = np.inf * np.ones_like(zones_inventories) 496 max_items = zones_changes.get(InventoryOperation.MAX, infs.copy()) 497 min_items = zones_changes.get(InventoryOperation.MIN, zeros.copy()) 498 499 # Current zone 500 current_changes = self._inventory_operations.get(InventoryOwner.CURRENT, {}) 501 current_slot = position.nonzero()[0] 502 added[current_slot] += current_changes.get(InventoryOperation.ADD, 0) 503 removed[current_slot] += current_changes.get(InventoryOperation.REMOVE, 0) 504 max_items[current_slot] = np.minimum( 505 max_items[current_slot], 506 current_changes.get(InventoryOperation.MAX, np.inf), 507 ) 508 min_items[current_slot] = np.maximum( 509 min_items[current_slot], 510 current_changes.get(InventoryOperation.MIN, -np.inf), 511 ) 512 513 # Destination 514 if self._destination is not None: 515 dest_changes = self._inventory_operations.get( 516 InventoryOwner.DESTINATION, {} 517 ) 518 dest_slot = self._destination.nonzero()[0] 519 added[dest_slot] += dest_changes.get(InventoryOperation.ADD, 0) 520 removed[dest_slot] += dest_changes.get(InventoryOperation.REMOVE, 0) 521 max_items[dest_slot] = np.minimum( 522 max_items[dest_slot], 523 dest_changes.get(InventoryOperation.MAX, np.inf), 524 ) 525 min_items[dest_slot] = np.maximum( 526 min_items[dest_slot], 527 dest_changes.get(InventoryOperation.MIN, -np.inf), 528 ) 529 530 return self._is_valid_inventory( 531 zones_inventories, added, removed, max_items, min_items 532 ) 533 534 def _build_destination_op(self, world: "World") -> None: 535 if self.destination is None: 536 return 537 self._destination = np.zeros(world.n_zones, dtype=np.int32) 538 self._destination[world.slot_from_zone(self.destination)] = 1 539 540 def _build_zones_op(self, world: "World") -> None: 541 if self.zone is None: 542 return 543 self._zone = np.zeros(world.n_zones, dtype=np.int32) 544 self._zone[world.slot_from_zone(self.zone)] = 1 545 546 def _build_inventory_ops(self, world: "World"): 547 self._inventory_operations = {} 548 for owner, operations in self.inventory_changes.items(): 549 self._build_inventory_operation(owner, operations, world) 550 self._build_apply_operations() 551 552 def _build_inventory_operation( 553 self, owner: InventoryOwner, operations: InventoryChanges, world: "World" 554 ): 555 owner = InventoryOwner(owner) 556 if owner is InventoryOwner.PLAYER: 557 world_items_list = world.items 558 else: 559 world_items_list = world.zones_items 560 561 for operation, stacks in operations.items(): 562 operation = InventoryOperation(operation) 563 default_value = 0 564 if operation is InventoryOperation.MAX: 565 default_value = np.inf 566 if owner is InventoryOwner.ZONES: 567 operation_arr = self._build_zones_items_op( 568 stacks, world.zones, world.zones_items, default_value 569 ) 570 else: 571 operation_arr = self._build_operation_array( 572 stacks, world_items_list, default_value 573 ) 574 if owner not in self._inventory_operations: 575 self._inventory_operations[owner] = {} 576 self._inventory_operations[owner][operation] = operation_arr 577 578 def _build_apply_operations(self): 579 for owner, operations in self._inventory_operations.items(): 580 apply_op = InventoryOperation.APPLY 581 apply_arr = _build_apply_operation_array(operations) 582 self._inventory_operations[owner][apply_op] = apply_arr 583 584 def _build_operation_array( 585 self, 586 stacks: List[Stack], 587 world_items_list: List["Item"], 588 default_value: int = 0, 589 ) -> np.ndarray: 590 operation = default_value * np.ones(len(world_items_list), dtype=np.int32) 591 for stack in stacks: 592 item_slot = world_items_list.index(stack.item) 593 operation[item_slot] = stack.quantity 594 return operation 595 596 def _build_zones_items_op( 597 self, 598 stacks_per_zone: Dict[Zone, List["Stack"]], 599 zones: List[Zone], 600 zones_items: List["Item"], 601 default_value: float = 0.0, 602 ) -> np.ndarray: 603 operation = default_value * np.ones( 604 (len(zones), len(zones_items)), dtype=np.int32 605 ) 606 for zone, stacks in stacks_per_zone.items(): 607 zone_slot = zones.index(zone) 608 for stack in stacks: 609 item_slot = zones_items.index(stack.item) 610 operation[zone_slot, item_slot] = stack.quantity 611 return operation 612 613 def __str__(self) -> str: 614 return self.name 615 616 def __repr__(self) -> str: 617 return f"{self._preconditions_repr()}⟹{self._effects_repr()}" 618 619 def _preconditions_repr(self) -> str: 620 preconditions_text = "" 621 622 owners_brackets = { 623 PLAYER: ".", 624 CURRENT_ZONE: "Zone(.)", 625 DESTINATION: "Dest(.)", 626 } 627 628 for owner in InventoryOwner: 629 if owner is InventoryOwner.ZONES: 630 continue 631 owner_texts = [] 632 owner_texts += _stacks_precontions_str( 633 self.get_changes(owner, InventoryOperation.MIN), 634 symbol="≥", 635 ) 636 owner_texts += _stacks_precontions_str( 637 self.get_changes(owner, InventoryOperation.MAX), 638 symbol="≤", 639 ) 640 stacks_text = ",".join(owner_texts) 641 if not owner_texts: 642 continue 643 if preconditions_text: 644 preconditions_text += " " 645 preconditions_text += owners_brackets[owner].replace(".", stacks_text) 646 647 zones_specific_ops: Dict[Zone, Dict[InventoryOperation, List[Stack]]] = {} 648 for op, zones_stacks in self.inventory_changes.get( 649 InventoryOwner.ZONES, {} 650 ).items(): 651 for zone, stacks in zones_stacks.items(): 652 if zone not in zones_specific_ops: 653 zones_specific_ops[zone] = {} 654 if op not in zones_specific_ops[zone]: 655 zones_specific_ops[zone][op] = [] 656 zones_specific_ops[zone][op] += stacks 657 658 for zone, operations in zones_specific_ops.items(): 659 owner_texts = [] 660 owner_texts += _stacks_precontions_str( 661 operations.get(InventoryOperation.MIN, []), 662 symbol="≥", 663 ) 664 owner_texts += _stacks_precontions_str( 665 operations.get(InventoryOperation.MAX, []), 666 symbol="≤", 667 ) 668 stacks_text = ",".join(owner_texts) 669 if not owner_texts: 670 continue 671 if preconditions_text: 672 preconditions_text += " " 673 preconditions_text += f"{zone.name}({stacks_text})" 674 675 if self.zone is not None: 676 if preconditions_text: 677 preconditions_text += " " 678 preconditions_text += f"| at {self.zone.name}" 679 680 if preconditions_text: 681 preconditions_text += " " 682 683 return preconditions_text 684 685 def _effects_repr(self) -> str: 686 effects_text = "" 687 owners_brackets = { 688 PLAYER: ".", 689 CURRENT_ZONE: "Zone(.)", 690 DESTINATION: "Dest(.)", 691 } 692 693 for owner in InventoryOwner: 694 if owner is InventoryOwner.ZONES: 695 continue 696 owner_texts = [] 697 owner_texts += _stacks_effects_str( 698 self.get_changes(owner, InventoryOperation.REMOVE), 699 stack_prefix="-", 700 ) 701 owner_texts += _stacks_effects_str( 702 self.get_changes(owner, InventoryOperation.ADD), 703 stack_prefix="+", 704 ) 705 stacks_text = ",".join(owner_texts) 706 if not owner_texts: 707 continue 708 effects_text += " " 709 effects_text += owners_brackets[owner].replace(".", stacks_text) 710 711 zones_specific_ops: Dict[Zone, Dict[InventoryOperation, List[Stack]]] = {} 712 for op, zones_stacks in self.inventory_changes.get( 713 InventoryOwner.ZONES, {} 714 ).items(): 715 for zone, stacks in zones_stacks.items(): 716 if zone not in zones_specific_ops: 717 zones_specific_ops[zone] = {} 718 if op not in zones_specific_ops[zone]: 719 zones_specific_ops[zone][op] = [] 720 zones_specific_ops[zone][op] += stacks 721 722 for zone, operations in zones_specific_ops.items(): 723 owner_texts = [] 724 owner_texts += _stacks_effects_str( 725 operations.get(InventoryOperation.REMOVE, []), 726 stack_prefix="-", 727 ) 728 owner_texts += _stacks_effects_str( 729 operations.get(InventoryOperation.ADD, []), 730 stack_prefix="+", 731 ) 732 stacks_text = ",".join(owner_texts) 733 if not owner_texts: 734 continue 735 effects_text += " " 736 effects_text += f"{zone.name}({stacks_text})" 737 738 if self.destination is not None: 739 effects_text += " " 740 effects_text += f"| at {self.destination.name}" 741 742 return effects_text
The building blocks of every HierarchyCraft environment.
A list of transformations is what defines each HierarchyCraft environement. Transformation becomes the available actions and all available transitions of the environment.
Each transformation defines changes of:
- the player inventory
- the player position to a given destination
- the current zone inventory
- the destination zone inventory (if a destination is specified).
- all specific zones inventories
Each inventory change is a list of removed (-) and added (+) Stack.
If specified, they may be restricted to only a subset of valid zones, all zones are valid by default.
A Transformation can only be applied if valid in the given state. A transformation is only valid if the player in a valid zone and all relevant inventories have enough items to be removed before adding new items.
The picture bellow illustrates the impact of
an example transformation on a given HcraftState
:
In this example, when applied, the transformation will:
- (-) Remove 1 item "0", then (+) Add 4 item "3" in the player inventory.
- Update the player position from the current zone "1". to the destination zone "3".
- (-) Remove 2 zone item "0" and 1 zone item "1", then (+) Add 1 item "1" in the current zone inventory.
- (-) Remove 1 zone item "2", then (+) Add 1 item "0" in the destination zone inventory.
- (-) Remove 1 zone item "0" in the zone "1" inventory and 2 zone item "2" in the zone "2" inventory, then (+) Add 1 zone item "1" in the zone "0" inventory and 1 zone item "2" in the zone "1" inventory.
294 def __init__( 295 self, 296 name: Optional[str] = None, 297 destination: Optional[Zone] = None, 298 inventory_changes: Optional[List[InventoryChange]] = None, 299 zone: Optional[Zone] = None, 300 ) -> None: 301 """The building blocks of every HierarchyCraft environment. 302 303 Args: 304 name: Name given to the Transformation. If None use repr instead. 305 Defaults to None. 306 destination: Destination zone. 307 Defaults to None. 308 inventory_changes: List of inventory changes done by this transformation. 309 Defaults to None. 310 zone: Zone to which Transformation is restricted. Unrestricted if None. 311 Defaults to None. 312 """ 313 self.destination = destination 314 self._destination = None 315 316 self.zone = zone 317 self._zone = None 318 319 self._changes_list = inventory_changes 320 self.inventory_changes = _format_inventory_changes(inventory_changes) 321 self._inventory_operations: Optional[ 322 Dict[InventoryOwner, InventoryOperations] 323 ] = None 324 325 self.name = name if name is not None else self.__repr__()
The building blocks of every HierarchyCraft environment.
Arguments:
- name: Name given to the Transformation. If None use repr instead. Defaults to None.
- destination: Destination zone. Defaults to None.
- inventory_changes: List of inventory changes done by this transformation. Defaults to None.
- zone: Zone to which Transformation is restricted. Unrestricted if None. Defaults to None.
327 def apply( 328 self, 329 player_inventory: np.ndarray, 330 position: np.ndarray, 331 zones_inventories: np.ndarray, 332 ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: 333 """Apply the transformation in place on the given state.""" 334 335 for owner, operations in self._inventory_operations.items(): 336 operation_arr = operations[InventoryOperation.APPLY] 337 if operation_arr is not None: 338 _update_inventory( 339 owner, 340 player_inventory, 341 position, 342 zones_inventories, 343 self._destination, 344 operation_arr, 345 ) 346 if self._destination is not None: 347 position[...] = self._destination
Apply the transformation in place on the given state.
349 def is_valid(self, state: "HcraftState") -> bool: 350 """Is the transformation valid in the given state?""" 351 if not self._is_valid_position(state.position): 352 return False 353 if not self._is_valid_player_inventory(state.player_inventory): 354 return False 355 if not self._is_valid_zones_inventory(state.zones_inventories, state.position): 356 return False 357 return True
Is the transformation valid in the given state?
359 def build(self, world: "World") -> None: 360 """Build the transformation array operations on the given world.""" 361 self._build_destination_op(world) 362 self._build_inventory_ops(world) 363 self._build_zones_op(world)
Build the transformation array operations on the given world.
365 def get_changes( 366 self, owner: InventoryOwner, operation: InventoryOperation, default: Any = None 367 ) -> Optional[Union[List[Stack], Dict[Zone, List[Stack]]]]: 368 """Get individual changes for a given owner and a given operation. 369 370 Args: 371 owner: Owner of the inventory changes to get. 372 operation: Operation on the inventory to get. 373 374 Returns: 375 Changes of the inventory of the given owner with the given operation. 376 """ 377 owner = InventoryOwner(owner) 378 operation = InventoryOperation(operation) 379 operations = self.inventory_changes.get(owner, {}) 380 return operations.get(operation, default)
Get individual changes for a given owner and a given operation.
Arguments:
- owner: Owner of the inventory changes to get.
- operation: Operation on the inventory to get.
Returns:
Changes of the inventory of the given owner with the given operation.
382 def production(self, owner: InventoryOwner) -> Set["Item"]: 383 """Set of produced items for the given owner by this transformation.""" 384 return self._relevant_items_changed(owner, InventoryOperation.ADD)
Set of produced items for the given owner by this transformation.
386 def consumption(self, owner: InventoryOwner) -> Set["Item"]: 387 """Set of consumed items for the given owner by this transformation.""" 388 return self._relevant_items_changed(owner, InventoryOperation.REMOVE)
Set of consumed items for the given owner by this transformation.
390 def min_required(self, owner: InventoryOwner) -> Set["Item"]: 391 """Set of items for which a minimum is required by this transformation 392 for the given owner.""" 393 return self._relevant_items_changed(owner, InventoryOperation.MIN)
Set of items for which a minimum is required by this transformation for the given owner.
395 def max_required(self, owner: InventoryOwner) -> Set["Item"]: 396 """Set of items for which a maximum is required by this transformation 397 for the given owner.""" 398 return self._relevant_items_changed(owner, InventoryOperation.MAX)
Set of items for which a maximum is required by this transformation for the given owner.
400 @property 401 def produced_zones_items(self) -> Set["Item"]: 402 """Set of produced zones items by this transformation.""" 403 return ( 404 self.production(CURRENT_ZONE) 405 | self.production(DESTINATION) 406 | self.production(InventoryOwner.ZONES) 407 )
Set of produced zones items by this transformation.
409 @property 410 def consumed_zones_items(self) -> Set["Item"]: 411 """Set of consumed zones items by this transformation.""" 412 return ( 413 self.consumption(CURRENT_ZONE) 414 | self.consumption(DESTINATION) 415 | self.consumption(InventoryOwner.ZONES) 416 )
Set of consumed zones items by this transformation.
418 @property 419 def min_required_zones_items(self) -> Set["Item"]: 420 """Set of zone items for which a minimum is required by this transformation.""" 421 return ( 422 self.min_required(CURRENT_ZONE) 423 | self.min_required(DESTINATION) 424 | self.min_required(InventoryOwner.ZONES) 425 )
Set of zone items for which a minimum is required by this transformation.
427 @property 428 def max_required_zones_items(self) -> Set["Item"]: 429 """Set of zone items for which a maximum is required by this transformation.""" 430 return ( 431 self.max_required(CURRENT_ZONE) 432 | self.max_required(DESTINATION) 433 | self.max_required(InventoryOwner.ZONES) 434 )
Set of zone items for which a maximum is required by this transformation.
5@dataclass(frozen=True) 6class Item: 7 """Represent an item for any hcraft environement.""" 8 9 name: str
Represent an item for any hcraft environement.
12@dataclass(frozen=True) 13class Stack: 14 """Represent a stack of an item for any hcraft environement""" 15 16 item: Item 17 quantity: int = 1 18 19 def __str__(self) -> str: 20 quantity_str = f"[{self.quantity}]" if self.quantity > 1 else "" 21 return f"{quantity_str}{self.item.name}"
Represent a stack of an item for any hcraft environement
24@dataclass(frozen=True) 25class Zone: 26 """Represent a zone for any hcraft environement.""" 27 28 name: str
Represent a zone for any hcraft environement.
312class HcraftEnv(Env): 313 """Environment to simulate inventory management.""" 314 315 def __init__( 316 self, 317 world: "World", 318 purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None, 319 invalid_reward: float = -1.0, 320 render_window: Optional[HcraftWindow] = None, 321 name: str = "HierarchyCraft", 322 max_step: Optional[int] = None, 323 ) -> None: 324 """ 325 Args: 326 world: World defining the environment. 327 purpose: Purpose of the player, defining rewards and termination. 328 Defaults to None, hence a sandbox environment. 329 invalid_reward: Reward given to the agent for invalid actions. 330 Defaults to -1.0. 331 render_window: Window using to render the environment with pygame. 332 name: Name of the environement. Defaults to 'HierarchyCraft'. 333 max_step: (Optional[int], optional): Maximum number of steps before episode truncation. 334 If None, never truncates the episode. Defaults to None. 335 """ 336 self.world = world 337 self.invalid_reward = invalid_reward 338 self.max_step = max_step 339 self.name = name 340 self._all_behaviors = None 341 342 self.render_window = render_window 343 self.render_mode = "rgb_array" 344 345 self.state = HcraftState(self.world) 346 self.current_step = 0 347 self.current_score = 0 348 self.cumulated_score = 0 349 self.episodes = 0 350 self.task_successes: Optional[SuccessCounter] = None 351 self.terminal_successes: Optional[SuccessCounter] = None 352 353 if purpose is None: 354 purpose = Purpose(None) 355 if not isinstance(purpose, Purpose): 356 purpose = Purpose(tasks=purpose) 357 self.purpose = purpose 358 self.metadata = {} 359 360 @property 361 def truncated(self) -> bool: 362 """Whether the time limit has been exceeded.""" 363 if self.max_step is None: 364 return False 365 return self.current_step >= self.max_step 366 367 @property 368 def observation_space(self) -> Union[BoxSpace, TupleSpace]: 369 """Observation space for the Agent.""" 370 obs_space = BoxSpace( 371 low=np.array( 372 [0 for _ in range(self.world.n_items)] 373 + [0 for _ in range(self.world.n_zones)] 374 + [0 for _ in range(self.world.n_zones_items)] 375 ), 376 high=np.array( 377 [np.inf for _ in range(self.world.n_items)] 378 + [1 for _ in range(self.world.n_zones)] 379 + [np.inf for _ in range(self.world.n_zones_items)] 380 ), 381 ) 382 383 return obs_space 384 385 @property 386 def action_space(self) -> DiscreteSpace: 387 """Action space for the Agent. 388 389 Actions are expected to often be invalid. 390 """ 391 return DiscreteSpace(len(self.world.transformations)) 392 393 def action_masks(self) -> np.ndarray: 394 """Return boolean mask of valid actions.""" 395 return np.array([t.is_valid(self.state) for t in self.world.transformations]) 396 397 def step( 398 self, action: Union[int, str, np.ndarray] 399 ) -> Tuple[np.ndarray, float, bool, bool, dict]: 400 """Perform one step in the environment given the index of a wanted transformation. 401 402 If the selected transformation can be performed, the state is updated and 403 a reward is given depending of the environment tasks. 404 Else the state is left unchanged and the `invalid_reward` is given to the player. 405 406 """ 407 408 if isinstance(action, np.ndarray): 409 if not action.size == 1: 410 raise TypeError( 411 "Actions should be integers corresponding the a transformation index" 412 f", got array with multiple elements:\n{action}." 413 ) 414 action = action.flatten()[0] 415 try: 416 action = int(action) 417 except (TypeError, ValueError) as e: 418 raise TypeError( 419 "Actions should be integers corresponding the a transformation index." 420 ) from e 421 422 self.current_step += 1 423 424 self.task_successes.step_reset() 425 self.terminal_successes.step_reset() 426 427 success = self.state.apply(action) 428 if success: 429 reward = self.purpose.reward(self.state) 430 else: 431 reward = self.invalid_reward 432 433 terminated = self.purpose.is_terminal(self.state) 434 435 self.task_successes.update(self.episodes) 436 self.terminal_successes.update(self.episodes) 437 438 self.current_score += reward 439 self.cumulated_score += reward 440 return ( 441 self.state.observation, 442 reward, 443 terminated, 444 self.truncated, 445 self.infos(), 446 ) 447 448 def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]: 449 """Render the observation of the agent in a format depending on `render_mode`.""" 450 if mode is not None: 451 self.render_mode = mode 452 453 if self.render_mode in ("human", "rgb_array"): # for human interaction 454 return self._render_rgb_array() 455 if self.render_mode == "console": # for console print 456 raise NotImplementedError 457 raise NotImplementedError 458 459 def reset( 460 self, 461 *, 462 seed: Optional[int] = None, 463 options: Optional[dict] = None, 464 ) -> Tuple[np.ndarray,]: 465 """Resets the state of the environement. 466 467 Returns: 468 (np.ndarray): The first observation. 469 """ 470 471 if not self.purpose.built: 472 self.purpose.build(self) 473 self.task_successes = SuccessCounter(self.purpose.tasks) 474 self.terminal_successes = SuccessCounter(self.purpose.terminal_groups) 475 476 self.current_step = 0 477 self.current_score = 0 478 self.episodes += 1 479 480 self.task_successes.new_episode(self.episodes) 481 self.terminal_successes.new_episode(self.episodes) 482 483 self.state.reset() 484 self.purpose.reset() 485 return self.state.observation, self.infos() 486 487 def close(self): 488 """Closes the environment.""" 489 if self.render_window is not None: 490 self.render_window.close() 491 492 @property 493 def all_behaviors(self) -> Dict[str, "Behavior"]: 494 """All solving behaviors using hebg.""" 495 if self._all_behaviors is None: 496 self._all_behaviors = build_all_solving_behaviors(self) 497 return self._all_behaviors 498 499 def solving_behavior(self, task: "Task") -> "Behavior": 500 """Get the solving behavior for a given task. 501 502 Args: 503 task: Task to solve. 504 505 Returns: 506 Behavior: Behavior solving the task. 507 508 Example: 509 ```python 510 solving_behavior = env.solving_behavior(task) 511 512 done = False 513 observation, _info = env.reset() 514 while not done: 515 action = solving_behavior(observation) 516 observation, _reward, terminated, truncated, _info = env.step(action) 517 done = terminated or truncated 518 519 assert terminated # Env is successfuly terminated 520 assert task.is_terminated # Task is successfuly terminated 521 ``` 522 """ 523 return self.all_behaviors[task_to_behavior_name(task)] 524 525 def planning_problem(self, **kwargs) -> HcraftPlanningProblem: 526 """Build this hcraft environment planning problem. 527 528 Returns: 529 Problem: Unified planning problem cooresponding to that environment. 530 531 Example: 532 Write as PDDL files: 533 ```python 534 from unified_planning.io import PDDLWriter 535 problem = env.planning_problem() 536 writer = PDDLWriter(problem.upf_problem) 537 writer.write_domain("domain.pddl") 538 writer.write_problem("problem.pddl") 539 ``` 540 541 Using a plan to solve a HierarchyCraft gym environment: 542 ```python 543 hcraft_problem = env.planning_problem() 544 545 done = False 546 547 _observation, _info = env.reset() 548 while not done: 549 # Observations are not used when blindly following a plan 550 # But the state in required in order to replan if there is no plan left 551 action = hcraft_problem.action_from_plan(env.state) 552 _observation, _reward, terminated, truncated, _info = env.step(action) 553 done = terminated or truncated 554 assert env.purpose.is_terminated # Purpose is achieved 555 ``` 556 """ 557 return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs) 558 559 def infos(self) -> dict: 560 infos = { 561 "action_is_legal": self.action_masks(), 562 "score": self.current_score, 563 "score_average": self.cumulated_score / self.episodes, 564 } 565 infos.update(self._tasks_infos()) 566 return infos 567 568 def _tasks_infos(self): 569 infos = {} 570 infos.update(self.task_successes.done_infos) 571 infos.update(self.task_successes.rates_infos) 572 infos.update(self.terminal_successes.done_infos) 573 infos.update(self.terminal_successes.rates_infos) 574 return infos 575 576 def _render_rgb_array(self) -> np.ndarray: 577 """Render an image of the game. 578 579 Create the rendering window if not existing yet. 580 """ 581 if self.render_window is None: 582 self.render_window = HcraftWindow() 583 if not self.render_window.built: 584 self.render_window.build(self) 585 fps = self.metadata.get("video.frames_per_second") 586 self.render_window.update_rendering(fps=fps) 587 return surface_to_rgb_array(self.render_window.screen)
Environment to simulate inventory management.
315 def __init__( 316 self, 317 world: "World", 318 purpose: Optional[Union[Purpose, List["Task"], "Task"]] = None, 319 invalid_reward: float = -1.0, 320 render_window: Optional[HcraftWindow] = None, 321 name: str = "HierarchyCraft", 322 max_step: Optional[int] = None, 323 ) -> None: 324 """ 325 Args: 326 world: World defining the environment. 327 purpose: Purpose of the player, defining rewards and termination. 328 Defaults to None, hence a sandbox environment. 329 invalid_reward: Reward given to the agent for invalid actions. 330 Defaults to -1.0. 331 render_window: Window using to render the environment with pygame. 332 name: Name of the environement. Defaults to 'HierarchyCraft'. 333 max_step: (Optional[int], optional): Maximum number of steps before episode truncation. 334 If None, never truncates the episode. Defaults to None. 335 """ 336 self.world = world 337 self.invalid_reward = invalid_reward 338 self.max_step = max_step 339 self.name = name 340 self._all_behaviors = None 341 342 self.render_window = render_window 343 self.render_mode = "rgb_array" 344 345 self.state = HcraftState(self.world) 346 self.current_step = 0 347 self.current_score = 0 348 self.cumulated_score = 0 349 self.episodes = 0 350 self.task_successes: Optional[SuccessCounter] = None 351 self.terminal_successes: Optional[SuccessCounter] = None 352 353 if purpose is None: 354 purpose = Purpose(None) 355 if not isinstance(purpose, Purpose): 356 purpose = Purpose(tasks=purpose) 357 self.purpose = purpose 358 self.metadata = {}
Arguments:
- world: World defining the environment.
- purpose: Purpose of the player, defining rewards and termination. Defaults to None, hence a sandbox environment.
- invalid_reward: Reward given to the agent for invalid actions. Defaults to -1.0.
- render_window: Window using to render the environment with pygame.
- name: Name of the environement. Defaults to 'HierarchyCraft'.
- max_step: (Optional[int], optional): Maximum number of steps before episode truncation. If None, never truncates the episode. Defaults to None.
360 @property 361 def truncated(self) -> bool: 362 """Whether the time limit has been exceeded.""" 363 if self.max_step is None: 364 return False 365 return self.current_step >= self.max_step
Whether the time limit has been exceeded.
367 @property 368 def observation_space(self) -> Union[BoxSpace, TupleSpace]: 369 """Observation space for the Agent.""" 370 obs_space = BoxSpace( 371 low=np.array( 372 [0 for _ in range(self.world.n_items)] 373 + [0 for _ in range(self.world.n_zones)] 374 + [0 for _ in range(self.world.n_zones_items)] 375 ), 376 high=np.array( 377 [np.inf for _ in range(self.world.n_items)] 378 + [1 for _ in range(self.world.n_zones)] 379 + [np.inf for _ in range(self.world.n_zones_items)] 380 ), 381 ) 382 383 return obs_space
Observation space for the Agent.
385 @property 386 def action_space(self) -> DiscreteSpace: 387 """Action space for the Agent. 388 389 Actions are expected to often be invalid. 390 """ 391 return DiscreteSpace(len(self.world.transformations))
Action space for the Agent.
Actions are expected to often be invalid.
393 def action_masks(self) -> np.ndarray: 394 """Return boolean mask of valid actions.""" 395 return np.array([t.is_valid(self.state) for t in self.world.transformations])
Return boolean mask of valid actions.
397 def step( 398 self, action: Union[int, str, np.ndarray] 399 ) -> Tuple[np.ndarray, float, bool, bool, dict]: 400 """Perform one step in the environment given the index of a wanted transformation. 401 402 If the selected transformation can be performed, the state is updated and 403 a reward is given depending of the environment tasks. 404 Else the state is left unchanged and the `invalid_reward` is given to the player. 405 406 """ 407 408 if isinstance(action, np.ndarray): 409 if not action.size == 1: 410 raise TypeError( 411 "Actions should be integers corresponding the a transformation index" 412 f", got array with multiple elements:\n{action}." 413 ) 414 action = action.flatten()[0] 415 try: 416 action = int(action) 417 except (TypeError, ValueError) as e: 418 raise TypeError( 419 "Actions should be integers corresponding the a transformation index." 420 ) from e 421 422 self.current_step += 1 423 424 self.task_successes.step_reset() 425 self.terminal_successes.step_reset() 426 427 success = self.state.apply(action) 428 if success: 429 reward = self.purpose.reward(self.state) 430 else: 431 reward = self.invalid_reward 432 433 terminated = self.purpose.is_terminal(self.state) 434 435 self.task_successes.update(self.episodes) 436 self.terminal_successes.update(self.episodes) 437 438 self.current_score += reward 439 self.cumulated_score += reward 440 return ( 441 self.state.observation, 442 reward, 443 terminated, 444 self.truncated, 445 self.infos(), 446 )
Perform one step in the environment given the index of a wanted transformation.
If the selected transformation can be performed, the state is updated and
a reward is given depending of the environment tasks.
Else the state is left unchanged and the invalid_reward
is given to the player.
448 def render(self, mode: Optional[str] = None, **_kwargs) -> Union[str, np.ndarray]: 449 """Render the observation of the agent in a format depending on `render_mode`.""" 450 if mode is not None: 451 self.render_mode = mode 452 453 if self.render_mode in ("human", "rgb_array"): # for human interaction 454 return self._render_rgb_array() 455 if self.render_mode == "console": # for console print 456 raise NotImplementedError 457 raise NotImplementedError
Render the observation of the agent in a format depending on render_mode
.
459 def reset( 460 self, 461 *, 462 seed: Optional[int] = None, 463 options: Optional[dict] = None, 464 ) -> Tuple[np.ndarray,]: 465 """Resets the state of the environement. 466 467 Returns: 468 (np.ndarray): The first observation. 469 """ 470 471 if not self.purpose.built: 472 self.purpose.build(self) 473 self.task_successes = SuccessCounter(self.purpose.tasks) 474 self.terminal_successes = SuccessCounter(self.purpose.terminal_groups) 475 476 self.current_step = 0 477 self.current_score = 0 478 self.episodes += 1 479 480 self.task_successes.new_episode(self.episodes) 481 self.terminal_successes.new_episode(self.episodes) 482 483 self.state.reset() 484 self.purpose.reset() 485 return self.state.observation, self.infos()
Resets the state of the environement.
Returns:
(np.ndarray): The first observation.
487 def close(self): 488 """Closes the environment.""" 489 if self.render_window is not None: 490 self.render_window.close()
Closes the environment.
492 @property 493 def all_behaviors(self) -> Dict[str, "Behavior"]: 494 """All solving behaviors using hebg.""" 495 if self._all_behaviors is None: 496 self._all_behaviors = build_all_solving_behaviors(self) 497 return self._all_behaviors
All solving behaviors using hebg.
499 def solving_behavior(self, task: "Task") -> "Behavior": 500 """Get the solving behavior for a given task. 501 502 Args: 503 task: Task to solve. 504 505 Returns: 506 Behavior: Behavior solving the task. 507 508 Example: 509 ```python 510 solving_behavior = env.solving_behavior(task) 511 512 done = False 513 observation, _info = env.reset() 514 while not done: 515 action = solving_behavior(observation) 516 observation, _reward, terminated, truncated, _info = env.step(action) 517 done = terminated or truncated 518 519 assert terminated # Env is successfuly terminated 520 assert task.is_terminated # Task is successfuly terminated 521 ``` 522 """ 523 return self.all_behaviors[task_to_behavior_name(task)]
Get the solving behavior for a given task.
Arguments:
- task: Task to solve.
Returns:
Behavior: Behavior solving the task.
Example:
solving_behavior = env.solving_behavior(task) done = False observation, _info = env.reset() while not done: action = solving_behavior(observation) observation, _reward, terminated, truncated, _info = env.step(action) done = terminated or truncated assert terminated # Env is successfuly terminated assert task.is_terminated # Task is successfuly terminated
525 def planning_problem(self, **kwargs) -> HcraftPlanningProblem: 526 """Build this hcraft environment planning problem. 527 528 Returns: 529 Problem: Unified planning problem cooresponding to that environment. 530 531 Example: 532 Write as PDDL files: 533 ```python 534 from unified_planning.io import PDDLWriter 535 problem = env.planning_problem() 536 writer = PDDLWriter(problem.upf_problem) 537 writer.write_domain("domain.pddl") 538 writer.write_problem("problem.pddl") 539 ``` 540 541 Using a plan to solve a HierarchyCraft gym environment: 542 ```python 543 hcraft_problem = env.planning_problem() 544 545 done = False 546 547 _observation, _info = env.reset() 548 while not done: 549 # Observations are not used when blindly following a plan 550 # But the state in required in order to replan if there is no plan left 551 action = hcraft_problem.action_from_plan(env.state) 552 _observation, _reward, terminated, truncated, _info = env.step(action) 553 done = terminated or truncated 554 assert env.purpose.is_terminated # Purpose is achieved 555 ``` 556 """ 557 return HcraftPlanningProblem(self.state, self.name, self.purpose, **kwargs)
Build this hcraft environment planning problem.
Returns:
Problem: Unified planning problem cooresponding to that environment.
Example:
Write as PDDL files:
from unified_planning.io import PDDLWriter problem = env.planning_problem() writer = PDDLWriter(problem.upf_problem) writer.write_domain("domain.pddl") writer.write_problem("problem.pddl")
Using a plan to solve a HierarchyCraft gym environment:
hcraft_problem = env.planning_problem() done = False _observation, _info = env.reset() while not done: # Observations are not used when blindly following a plan # But the state in required in order to replan if there is no plan left action = hcraft_problem.action_from_plan(env.state) _observation, _reward, terminated, truncated, _info = env.step(action) done = terminated or truncated assert env.purpose.is_terminated # Purpose is achieved
Inherited Members
- gymnasium.core.Env
- spec
- unwrapped
- np_random_seed
- np_random
- has_wrapper_attr
- get_wrapper_attr
- set_wrapper_attr
10def get_human_action( 11 env: "HcraftEnv", 12 additional_events: List["Event"] = None, 13 can_be_none: bool = False, 14 fps: Optional[float] = None, 15): 16 """Update the environment rendering and gather potential action given by the UI. 17 18 Args: 19 env: The running HierarchyCraft environment. 20 additional_events (Optional): Additional simulated pygame events. 21 can_be_none: If False, this function will loop on rendering until an action is found. 22 If True, will return None if no action was found after one rendering update. 23 24 Returns: 25 The action found using the UI. 26 27 """ 28 action_chosen = False 29 while not action_chosen: 30 action = env.render_window.update_rendering(additional_events, fps) 31 action_chosen = action is not None or can_be_none 32 return action
Update the environment rendering and gather potential action given by the UI.
Arguments:
- env: The running HierarchyCraft environment.
- additional_events (Optional): Additional simulated pygame events.
- can_be_none: If False, this function will loop on rendering until an action is found. If True, will return None if no action was found after one rendering update.
Returns:
The action found using the UI.
35def render_env_with_human(env: "HcraftEnv", n_episodes: int = 1): 36 """Render the given environment with human iteractions. 37 38 Args: 39 env (HcraftEnv): The HierarchyCraft environment to run. 40 n_episodes (int, optional): Number of episodes to run. Defaults to 1. 41 """ 42 print("Purpose: ", env.purpose) 43 44 for _ in range(n_episodes): 45 env.reset() 46 done = False 47 total_reward = 0 48 while not done: 49 env.render() 50 action = get_human_action(env) 51 print(f"Human did: {env.world.transformations[action]}") 52 53 _observation, reward, terminated, truncated, _info = env.step(action) 54 done = terminated or truncated 55 total_reward += reward 56 57 print("SCORE: ", total_reward)
Render the given environment with human iteractions.
Arguments:
- env (HcraftEnv): The HierarchyCraft environment to run.
- n_episodes (int, optional): Number of episodes to run. Defaults to 1.
156class Purpose: 157 """A purpose for a HierarchyCraft player based on a list of tasks.""" 158 159 def __init__( 160 self, 161 tasks: Optional[Union[Task, List[Task]]] = None, 162 timestep_reward: float = 0.0, 163 default_reward_shaping: RewardShaping = RewardShaping.NONE, 164 shaping_value: float = 1.0, 165 ) -> None: 166 """ 167 Args: 168 tasks: Tasks to add to the Purpose. 169 Defaults to None. 170 timestep_reward: Reward for each timestep. 171 Defaults to 0.0. 172 default_reward_shaping: Default reward shaping for tasks. 173 Defaults to RewardShaping.NONE. 174 shaping_value: Reward value used in reward shaping if any. 175 Defaults to 1.0. 176 """ 177 self.tasks: List[Task] = [] 178 self.timestep_reward = timestep_reward 179 self.shaping_value = shaping_value 180 self.default_reward_shaping = default_reward_shaping 181 self.built = False 182 183 self.reward_shaping: Dict[Task, RewardShaping] = {} 184 self.terminal_groups: List[TerminalGroup] = [] 185 186 if isinstance(tasks, Task): 187 tasks = [tasks] 188 elif tasks is None: 189 tasks = [] 190 for task in tasks: 191 self.add_task(task, reward_shaping=default_reward_shaping) 192 193 self._best_terminal_group = None 194 195 def add_task( 196 self, 197 task: Task, 198 reward_shaping: Optional[RewardShaping] = None, 199 terminal_groups: Optional[Union[str, List[str]]] = "default", 200 ): 201 """Add a new task to the purpose. 202 203 Args: 204 task: Task to be added to the purpose. 205 reward_shaping: Reward shaping for this task. 206 Defaults to purpose's default reward shaping. 207 terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. 208 If terminal groups is "" or None, task will be optional and will 209 not allow to terminate the purpose at all. 210 By default, tasks are added in the "default" group and hence 211 ALL tasks have to be done to terminate the purpose. 212 """ 213 if reward_shaping is None: 214 reward_shaping = self.default_reward_shaping 215 reward_shaping = RewardShaping(reward_shaping) 216 if terminal_groups: 217 if isinstance(terminal_groups, str): 218 terminal_groups = [terminal_groups] 219 for terminal_group in terminal_groups: 220 existing_group = self._terminal_group_from_name(terminal_group) 221 if not existing_group: 222 existing_group = TerminalGroup(terminal_group) 223 self.terminal_groups.append(existing_group) 224 existing_group.tasks.append(task) 225 226 self.reward_shaping[task] = reward_shaping 227 self.tasks.append(task) 228 229 def build(self, env: "HcraftEnv"): 230 """ 231 Builds the purpose of the player relative to the given environment. 232 233 Args: 234 env: The HierarchyCraft environment to build upon. 235 """ 236 if self.built: 237 return 238 239 if not self.tasks: 240 return 241 # Add reward shaping subtasks 242 for task in self.tasks: 243 subtasks = self._add_reward_shaping_subtasks( 244 task, env, self.reward_shaping[task] 245 ) 246 for subtask in subtasks: 247 self.add_task(subtask, RewardShaping.NONE, terminal_groups=None) 248 249 # Build all tasks 250 for task in self.tasks: 251 task.build(env.world) 252 253 self.built = True 254 255 def reward(self, state: "HcraftState") -> float: 256 """ 257 Returns the purpose reward for the given state based on tasks. 258 """ 259 reward = self.timestep_reward 260 if not self.tasks: 261 return reward 262 for task in self.tasks: 263 reward += task.reward(state) 264 return reward 265 266 def is_terminal(self, state: "HcraftState") -> bool: 267 """ 268 Returns True if the given state is terminal for the whole purpose. 269 """ 270 if not self.tasks: 271 return False 272 for task in self.tasks: 273 task.is_terminal(state) 274 for terminal_group in self.terminal_groups: 275 if terminal_group.terminated: 276 return True 277 return False 278 279 def reset(self) -> None: 280 """Reset the purpose.""" 281 for task in self.tasks: 282 task.reset() 283 284 @property 285 def optional_tasks(self) -> List[Task]: 286 """List of tasks in no terminal group hence being optinal.""" 287 terminal_tasks = [] 288 for group in self.terminal_groups: 289 terminal_tasks += group.tasks 290 return [task for task in self.tasks if task not in terminal_tasks] 291 292 @property 293 def terminated(self) -> bool: 294 """True if any of the terminal groups are terminated.""" 295 return any( 296 all(task.terminated for task in terminal_group.tasks) 297 for terminal_group in self.terminal_groups 298 ) 299 300 @property 301 def best_terminal_group(self) -> TerminalGroup: 302 """Best rewarding terminal group.""" 303 if self._best_terminal_group is not None: 304 return self._best_terminal_group 305 306 best_terminal_group, best_terminal_value = None, -np.inf 307 for terminal_group in self.terminal_groups: 308 terminal_value = sum(task._reward for task in terminal_group.tasks) 309 if terminal_value > best_terminal_value: 310 best_terminal_value = terminal_value 311 best_terminal_group = terminal_group 312 313 self._best_terminal_group = best_terminal_group 314 return best_terminal_group 315 316 def _terminal_group_from_name(self, name: str) -> Optional[TerminalGroup]: 317 if name not in self.terminal_groups: 318 return None 319 group_id = self.terminal_groups.index(name) 320 return self.terminal_groups[group_id] 321 322 def _add_reward_shaping_subtasks( 323 self, task: Task, env: "HcraftEnv", reward_shaping: RewardShaping 324 ) -> List[Task]: 325 if reward_shaping == RewardShaping.NONE: 326 return [] 327 if reward_shaping == RewardShaping.ALL_ACHIVEMENTS: 328 return _all_subtasks(env.world, self.shaping_value) 329 if reward_shaping == RewardShaping.INPUTS_ACHIVEMENT: 330 return _inputs_subtasks(task, env.world, self.shaping_value) 331 if reward_shaping == RewardShaping.REQUIREMENTS_ACHIVEMENTS: 332 return _required_subtasks(task, env, self.shaping_value) 333 raise NotImplementedError 334 335 def __str__(self) -> str: 336 terminal_groups_str = [] 337 for terminal_group in self.terminal_groups: 338 tasks_str_joined = self._tasks_str(terminal_group.tasks) 339 group_str = f"{terminal_group.name}:[{tasks_str_joined}]" 340 terminal_groups_str.append(group_str) 341 optional_tasks_str = self._tasks_str(self.optional_tasks) 342 if optional_tasks_str: 343 group_str = f"optional:[{optional_tasks_str}]" 344 terminal_groups_str.append(group_str) 345 joined_groups_str = ", ".join(terminal_groups_str) 346 return f"Purpose({joined_groups_str})" 347 348 def _tasks_str(self, tasks: List[Task]) -> str: 349 tasks_str = [] 350 for task in tasks: 351 shaping = self.reward_shaping[task] 352 shaping_str = f"#{shaping.value}" if shaping != RewardShaping.NONE else "" 353 tasks_str.append(f"{task}{shaping_str}") 354 return ",".join(tasks_str)
A purpose for a HierarchyCraft player based on a list of tasks.
159 def __init__( 160 self, 161 tasks: Optional[Union[Task, List[Task]]] = None, 162 timestep_reward: float = 0.0, 163 default_reward_shaping: RewardShaping = RewardShaping.NONE, 164 shaping_value: float = 1.0, 165 ) -> None: 166 """ 167 Args: 168 tasks: Tasks to add to the Purpose. 169 Defaults to None. 170 timestep_reward: Reward for each timestep. 171 Defaults to 0.0. 172 default_reward_shaping: Default reward shaping for tasks. 173 Defaults to RewardShaping.NONE. 174 shaping_value: Reward value used in reward shaping if any. 175 Defaults to 1.0. 176 """ 177 self.tasks: List[Task] = [] 178 self.timestep_reward = timestep_reward 179 self.shaping_value = shaping_value 180 self.default_reward_shaping = default_reward_shaping 181 self.built = False 182 183 self.reward_shaping: Dict[Task, RewardShaping] = {} 184 self.terminal_groups: List[TerminalGroup] = [] 185 186 if isinstance(tasks, Task): 187 tasks = [tasks] 188 elif tasks is None: 189 tasks = [] 190 for task in tasks: 191 self.add_task(task, reward_shaping=default_reward_shaping) 192 193 self._best_terminal_group = None
Arguments:
- tasks: Tasks to add to the Purpose. Defaults to None.
- timestep_reward: Reward for each timestep. Defaults to 0.0.
- default_reward_shaping: Default reward shaping for tasks. Defaults to RewardShaping.NONE.
- shaping_value: Reward value used in reward shaping if any. Defaults to 1.0.
195 def add_task( 196 self, 197 task: Task, 198 reward_shaping: Optional[RewardShaping] = None, 199 terminal_groups: Optional[Union[str, List[str]]] = "default", 200 ): 201 """Add a new task to the purpose. 202 203 Args: 204 task: Task to be added to the purpose. 205 reward_shaping: Reward shaping for this task. 206 Defaults to purpose's default reward shaping. 207 terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. 208 If terminal groups is "" or None, task will be optional and will 209 not allow to terminate the purpose at all. 210 By default, tasks are added in the "default" group and hence 211 ALL tasks have to be done to terminate the purpose. 212 """ 213 if reward_shaping is None: 214 reward_shaping = self.default_reward_shaping 215 reward_shaping = RewardShaping(reward_shaping) 216 if terminal_groups: 217 if isinstance(terminal_groups, str): 218 terminal_groups = [terminal_groups] 219 for terminal_group in terminal_groups: 220 existing_group = self._terminal_group_from_name(terminal_group) 221 if not existing_group: 222 existing_group = TerminalGroup(terminal_group) 223 self.terminal_groups.append(existing_group) 224 existing_group.tasks.append(task) 225 226 self.reward_shaping[task] = reward_shaping 227 self.tasks.append(task)
Add a new task to the purpose.
Arguments:
- task: Task to be added to the purpose.
- reward_shaping: Reward shaping for this task. Defaults to purpose's default reward shaping.
- terminal_groups: Purpose terminates when ALL the tasks of ANY terminal group terminates. If terminal groups is "" or None, task will be optional and will not allow to terminate the purpose at all. By default, tasks are added in the "default" group and hence ALL tasks have to be done to terminate the purpose.
229 def build(self, env: "HcraftEnv"): 230 """ 231 Builds the purpose of the player relative to the given environment. 232 233 Args: 234 env: The HierarchyCraft environment to build upon. 235 """ 236 if self.built: 237 return 238 239 if not self.tasks: 240 return 241 # Add reward shaping subtasks 242 for task in self.tasks: 243 subtasks = self._add_reward_shaping_subtasks( 244 task, env, self.reward_shaping[task] 245 ) 246 for subtask in subtasks: 247 self.add_task(subtask, RewardShaping.NONE, terminal_groups=None) 248 249 # Build all tasks 250 for task in self.tasks: 251 task.build(env.world) 252 253 self.built = True
Builds the purpose of the player relative to the given environment.
Arguments:
- env: The HierarchyCraft environment to build upon.
255 def reward(self, state: "HcraftState") -> float: 256 """ 257 Returns the purpose reward for the given state based on tasks. 258 """ 259 reward = self.timestep_reward 260 if not self.tasks: 261 return reward 262 for task in self.tasks: 263 reward += task.reward(state) 264 return reward
Returns the purpose reward for the given state based on tasks.
266 def is_terminal(self, state: "HcraftState") -> bool: 267 """ 268 Returns True if the given state is terminal for the whole purpose. 269 """ 270 if not self.tasks: 271 return False 272 for task in self.tasks: 273 task.is_terminal(state) 274 for terminal_group in self.terminal_groups: 275 if terminal_group.terminated: 276 return True 277 return False
Returns True if the given state is terminal for the whole purpose.
279 def reset(self) -> None: 280 """Reset the purpose.""" 281 for task in self.tasks: 282 task.reset()
Reset the purpose.
284 @property 285 def optional_tasks(self) -> List[Task]: 286 """List of tasks in no terminal group hence being optinal.""" 287 terminal_tasks = [] 288 for group in self.terminal_groups: 289 terminal_tasks += group.tasks 290 return [task for task in self.tasks if task not in terminal_tasks]
List of tasks in no terminal group hence being optinal.
292 @property 293 def terminated(self) -> bool: 294 """True if any of the terminal groups are terminated.""" 295 return any( 296 all(task.terminated for task in terminal_group.tasks) 297 for terminal_group in self.terminal_groups 298 )
True if any of the terminal groups are terminated.
300 @property 301 def best_terminal_group(self) -> TerminalGroup: 302 """Best rewarding terminal group.""" 303 if self._best_terminal_group is not None: 304 return self._best_terminal_group 305 306 best_terminal_group, best_terminal_value = None, -np.inf 307 for terminal_group in self.terminal_groups: 308 terminal_value = sum(task._reward for task in terminal_group.tasks) 309 if terminal_value > best_terminal_value: 310 best_terminal_value = terminal_value 311 best_terminal_group = terminal_group 312 313 self._best_terminal_group = best_terminal_group 314 return best_terminal_group
Best rewarding terminal group.
83class GetItemTask(AchievementTask): 84 """Task of getting a given quantity of an item.""" 85 86 def __init__(self, item_stack: Union[Item, Stack], reward: float = 1.0): 87 self.item_stack = _stack_item(item_stack) 88 super().__init__(name=self.get_name(self.item_stack), reward=reward) 89 90 def build(self, world: "World") -> None: 91 super().build(world) 92 item_slot = world.items.index(self.item_stack.item) 93 self._terminate_player_items[item_slot] = self.item_stack.quantity 94 95 def _is_terminal(self, state: "HcraftState") -> bool: 96 return np.all(state.player_inventory >= self._terminate_player_items) 97 98 @staticmethod 99 def get_name(stack: Stack): 100 """Name of the task for a given Stack""" 101 quantity_str = _quantity_str(stack.quantity) 102 return f"Get{quantity_str}{stack.item.name}"
Task of getting a given quantity of an item.
90 def build(self, world: "World") -> None: 91 super().build(world) 92 item_slot = world.items.index(self.item_stack.item) 93 self._terminate_player_items[item_slot] = self.item_stack.quantity
Build the task operation arrays based on the given world.
98 @staticmethod 99 def get_name(stack: Stack): 100 """Name of the task for a given Stack""" 101 quantity_str = _quantity_str(stack.quantity) 102 return f"Get{quantity_str}{stack.item.name}"
Name of the task for a given Stack
Inherited Members
- hcraft.task.AchievementTask
- reward
- hcraft.task.Task
- name
- terminated
- is_terminal
- reset
105class GoToZoneTask(AchievementTask): 106 """Task to go to a given zone.""" 107 108 def __init__(self, zone: Zone, reward: float = 1.0) -> None: 109 super().__init__(name=self.get_name(zone), reward=reward) 110 self.zone = zone 111 112 def build(self, world: "World"): 113 super().build(world) 114 zone_slot = world.zones.index(self.zone) 115 self._terminate_position[zone_slot] = 1 116 117 def _is_terminal(self, state: "HcraftState") -> bool: 118 return np.all(state.position == self._terminate_position) 119 120 @staticmethod 121 def get_name(zone: Zone): 122 """Name of the task for a given Stack""" 123 return f"Go to {zone.name}"
Task to go to a given zone.
112 def build(self, world: "World"): 113 super().build(world) 114 zone_slot = world.zones.index(self.zone) 115 self._terminate_position[zone_slot] = 1
Build the task operation arrays based on the given world.
120 @staticmethod 121 def get_name(zone: Zone): 122 """Name of the task for a given Stack""" 123 return f"Go to {zone.name}"
Name of the task for a given Stack
Inherited Members
- hcraft.task.AchievementTask
- reward
- hcraft.task.Task
- name
- terminated
- is_terminal
- reset
126class PlaceItemTask(AchievementTask): 127 """Task to place a quantity of item in a given zone. 128 129 If no zone is given, consider placing the item anywhere. 130 131 """ 132 133 def __init__( 134 self, 135 item_stack: Union[Item, Stack], 136 zone: Optional[Union[Zone, List[Zone]]] = None, 137 reward: float = 1.0, 138 ): 139 item_stack = _stack_item(item_stack) 140 self.item_stack = item_stack 141 self.zone = zone 142 super().__init__(name=self.get_name(item_stack, zone), reward=reward) 143 144 def build(self, world: "World"): 145 super().build(world) 146 if self.zone is None: 147 zones_slots = np.arange(self._terminate_zones_items.shape[0]) 148 else: 149 zones_slots = np.array([world.slot_from_zone(self.zone)]) 150 zone_item_slot = world.zones_items.index(self.item_stack.item) 151 self._terminate_zones_items[zones_slots, zone_item_slot] = ( 152 self.item_stack.quantity 153 ) 154 155 def _is_terminal(self, state: "HcraftState") -> bool: 156 if self.zone is None: 157 return np.any( 158 np.all(state.zones_inventories >= self._terminate_zones_items, axis=1) 159 ) 160 return np.all(state.zones_inventories >= self._terminate_zones_items) 161 162 @staticmethod 163 def get_name(stack: Stack, zone: Optional[Zone]): 164 """Name of the task for a given Stack and list of Zone""" 165 quantity_str = _quantity_str(stack.quantity) 166 zones_str = _zones_str(zone) 167 return f"Place{quantity_str}{stack.item.name}{zones_str}"
Task to place a quantity of item in a given zone.
If no zone is given, consider placing the item anywhere.
133 def __init__( 134 self, 135 item_stack: Union[Item, Stack], 136 zone: Optional[Union[Zone, List[Zone]]] = None, 137 reward: float = 1.0, 138 ): 139 item_stack = _stack_item(item_stack) 140 self.item_stack = item_stack 141 self.zone = zone 142 super().__init__(name=self.get_name(item_stack, zone), reward=reward)
144 def build(self, world: "World"): 145 super().build(world) 146 if self.zone is None: 147 zones_slots = np.arange(self._terminate_zones_items.shape[0]) 148 else: 149 zones_slots = np.array([world.slot_from_zone(self.zone)]) 150 zone_item_slot = world.zones_items.index(self.item_stack.item) 151 self._terminate_zones_items[zones_slots, zone_item_slot] = ( 152 self.item_stack.quantity 153 )
Build the task operation arrays based on the given world.
162 @staticmethod 163 def get_name(stack: Stack, zone: Optional[Zone]): 164 """Name of the task for a given Stack and list of Zone""" 165 quantity_str = _quantity_str(stack.quantity) 166 zones_str = _zones_str(zone) 167 return f"Place{quantity_str}{stack.item.name}{zones_str}"
Name of the task for a given Stack and list of Zone
Inherited Members
- hcraft.task.AchievementTask
- reward
- hcraft.task.Task
- name
- terminated
- is_terminal
- reset