Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halite helpers #43

Merged
merged 21 commits into from
Jun 12, 2020
Merged

Halite helpers #43

merged 21 commits into from
Jun 12, 2020

Conversation

harrisse
Copy link
Contributor

@harrisse harrisse commented Jun 3, 2020

REVIEWERS' NOTE: This PR is kind of a monster, apologies in advance. I've annotated each file with some commentary about what's worth reading and what's worth skipping, large parts of the PR are boring refactors on which I welcome comments but are not likely the best use of your time.

Feel free to skip any parts you don't feel are relevant to your interests. We are on a relatively tight timeline to get this out before Halite so if there are any larger pieces of feedback I'll likely save those for a future iteration. If you don't have time to read through this, then no worries and feel free to disregard the PR.

This PR provides wrapper classes for the halite environment and state as well as a convenient hook for the interpreter. This also reimplements the bulk of the original halite interpreter in the new board class, straightens out some logic inconsistencies in corner cases of the old implementation, and hopefully makes it a bit easier to parse what's happening in the code.

I tested the interpreter reimplementation by writing unit tests and by running actual episodes using the agents submitted to the shared repo. I can't validate that behavior is exactly the same but I can at least confirm that everyone's agents played complete and reasonably strategic-looking games that seemed similar or identical to the current implementation.


return board.action
@board_agent
def random_agent(board):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random weights have been tweaked for readability as random is meant to be an example for users not a strong bot.



agents = {"random": random_agent}


def interpreter(state, env):
def populate_board(state, env):
Copy link
Contributor Author

@harrisse harrisse Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in the populate_board method is the same as it was before, just refactored into its own method. Reviewers can skip to the next comment.

for uid, index in list(ships.items()):
del ships[uid]
del obs.players[index][2][uid]
def interpreter(state, env):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers should continue reading here.

All observation update logic is now handled in the Board class, this includes applying actions, resolving collisions, gathering halite, depositing halite, and regenerating halite. Logic to eliminate players or otherwise control the meta-state of the game remains in this method.

THash = TypeVar('TComparable')


def group_by(elements: Iterable[TElement], selector: Callable[[TElement], THash]) -> Dict[THash, List[TElement]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to bring a friend in from Linq 😄

TValue = TypeVar('TValue')


class ReadOnlyDict(Generic[TKey, TValue]):
Copy link
Contributor Author

@harrisse harrisse Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything from here to the start of the Board class is mainly just data binding models. Reviewers can mostly skim these, but there are some small bits of logic like ShipAction.to_point and Player.next_actions

Please let me know though if you have suggestions or questions on the format, structure, or purpose of the binding models. I'd generally like to reuse this pattern for future competitions if the community gets good use out of it.



class Board:
def __init__(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor is basically just used to deserialize the raw observation and configuration received from the halite interpreter into the richer Board model that we'll make use of in the next() method down below.


@property
def observation(self) -> Dict[str, Any]:
"""Converts a Board back to the normalized observation that constructed it."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the observation serializer to the constructor's observation deserializer


def __deepcopy__(self, _) -> 'Board':
actions = [player.next_actions for player in self.players.values()]
return Board(self.observation, self.configuration, actions)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the transformation to an observation copies all of the data into new data structures we can use it for deepcopy as well to isolate a copied instance from its parent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Could someone still mutate next_actions dictionaries though after this call? Should those be explicitly deepcopy'd as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_actions is a computed property constructed from the player's ships and shipyards so we don't have to worry about modification

Player 1 is letter a/A
Player 2 is letter b/B
etc.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if people think this belongs in __str__ or another method like print_board or something like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine in __str__. It's easy to add an additional helper later. You could also argue that observations could be __repr__, but I think being explicit there makes sense because __repr__ is usually just a string.

result += '|\n'
return result

def _add_ship(self: 'Board', ship: Ship) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undocumented as these are meant to be internal only

The current board is unmodified.
This can form a halite interpreter, e.g.
next_observation = Board(current_observation, configuration, actions).next.observation
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the replacement for the observation update logic in the halite interpreter.

@@ -63,6 +63,9 @@
parser.add_argument(
"--host", type=str, help="http-server Host (default=127.0.0.1)."
)
parser.add_argument(
"--out", type=str, help="Output file to write the results of the episode."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows you to write the results to an html file while still getting print statements in std out.



def test_no_move_on_halite_gathers_halite():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of other scenarios we could test in here (like multiship collisions on shipyards, spawning collisions, etc) but I tried to cover most core rules with this pass.

print(buffer.getvalue())
if fallback != None:
output = buffer.getvalue()
if output:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change gets rid of most of those annoying blank lines in the output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@@ -15,6 +15,7 @@
import json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is all style and formatting updates, reviewers can likely skip this file.

@harrisse harrisse requested a review from alexisbcook June 12, 2020 06:47
@harrisse harrisse requested review from moserware, myles-oneill, pculliton and dster2 and removed request for myles-oneill and alexisbcook June 12, 2020 06:47
Copy link
Member

@moserware moserware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to see this launch soon! Looks pretty good. I left a few comments/suggestions.

If you haven't already, make sure to run pylint and mypy and either correct the errors given or fix them (and/or explicitly ignore ones you want to ignore with the appropriate comment) to get them to pass without warnings.

Also, I'm assuming that agents can't exploit the runtime internal variables to allow cheating that otherwise wouldn't be possible?

shipyards = me.shipyards
for shipyard in shipyards:
# 20% chance to spawn
shipyard.next_action = choice([ShipyardAction.SPAWN, None, None, None, None])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the weighted random.choices might be clearer here.


actions = [agent.action for agent in state]
board = Board(obs, config, actions)
board = board.next()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and concise 👏

# region Helper Classes and Methods
Point = NewType('Point', Tuple[int, int])


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially be useful to have a comment and/or ASCII art to show the coordinate system.

See index_to_position for the inverse.
"""
x, y = point
return (size - y - 1) * size + x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ This seems like a change in coordinate system from before? Did you intentionally want (0,0) to be at (size - 1) * size? Is that clearer than it being at 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puts 0 in the lower left corner and (size - 1, size - 1) in the upper right which I personally found a lot more intuitive, curious to get others' thoughts on this though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm fully understanding, but my personal take is that (0, 0) as top left makes more sense because that's how 2d arrays work, and in particular the format used in the replay files. I see how your way is intuitive relative to normal XY plane but IMO it's better to match the programming methodology. But just a personal take and IIUC a lot / all of this is internal and the higher-level structure here means users aren't necessarily forced into either paradigm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dster2 Yeah users would only even notice the coordinate system if they were constructing offsets by hand. If they only used NORTH/SOUTH/EAST/WEST they'd likely never even notice there are coordinates. I'll consider this a bit more, thanks for the feedback all.

if isinstance(data, dict):
self._data = data
else:
# If it's not a Dict it must be a ReadOnlyDict based on our type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✋ Note that they are type hints and not actually enforced by the runtime. You might want to explicitly check and raise explicitly if not true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here wasn't to validate that it must be a readonlydict but that if the user passed in the correct type and it's not a dict then it must be a readonlydict.


player._halite += leftover_convert_halite

def resolve_collision(ships: List[Ship]) -> Tuple[Optional[Ship], List[Ship]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are specific to the game's rules and could be staticmethod? I wonder if it's worth factoring out to a separate class for clarity?

board._delete_shipyard(shipyard)
board._delete_ship(ship)

# Collect halite from cells into ships
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Just to confirm, if a smaller ship collides with a bigger ship, it should also collect in that same turn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check the rules to make sure they're consistent with this behavior but I believe so

delta_halite = int(cell.halite * configuration.collect_rate)
if cell.shipyard_id is None and delta_halite > 0:
ship._halite += delta_halite
cell._halite -= delta_halite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want another assert cell._halite > 0 after this? Just to confirm, a cell cannot run out of halite?

def agent_wrapper(obs, config) -> Dict[str, str]:
board = Board(obs, config)
agent(board)
return board.current_player.next_actions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am assuming the JSON type/domain checking is already verifying these actions are valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These get filtered out and ignored in the board constructor if they're not valid

print(buffer.getvalue())
if fallback != None:
output = buffer.getvalue()
if output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@harrisse
Copy link
Contributor Author

@moserware Re: internal only methods and exploits, no bots can't use them unless users are running untrusted code locally (which we ultimately can't protect against). In production we isolate agents in different containers so there's no risk of them being able to call each others' methods.

@harrisse harrisse merged commit 06bd76f into master Jun 12, 2020
johan-gras added a commit to johan-gras/kaggle-environments that referenced this pull request Dec 15, 2020
Human rendering was not working anymore since Kaggle#43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants