Welcome to Reactor's learning guide! This section describes the latest version of Reactor. Strong players could use this implementation to average high winrates in many variants of 3P Hanabi. No knowledge of version 1.0 is required to learn version 2.0.
The guide is structured into various levels, similar to the H-Group guide. All of the Level 0 and Level 1 conventions are fundamental to Reactor 2.0 and thus are required learning for the system. Adding conventions from Level 2 is encouraged. Conventions from Level 3 onwards can be added in for more experienced players.
Reactor is a conventional framework oriented around giving, receiving, and taking safe actions. Players expect to recieve safe actions from clues if they are the next player without one. When players receive information, the primary type of information they expect to recieve is actionable information. While every system has to provide actions to a degree, Reactor implementations take that idea to a limit. Reactor is built with the belief that in most gamestates, immediately actionable information is significantly better than inactionable information.
Reactor began as an attempt to tackle the Winter 2022 Holiday Mix Hanabi competition. Referential Sieve originally motivated Reactor: one perspective is that Reactor 1.0 transforms Referential Sieve to further appeal to players who love giving playful positionals. Reactor 2.0 shifted the system towards resembling (non-referential) Sieve, while keeping clues that heavily resemble the aforementioned playful positionals.
Why play Reactor?
The notion of a reaction is fundamental to high level Hanabi. In many systems, when Alice gives Cathy a clue, Bob has a chance to modify that clue's meaning. The most common example of this across conventional systems is a finesse: Alice tells Cathy to play a card, but it isn't playable yet, so Bob plays a card to make Cathy's card playable. In addition, many systems allow Bob to play different cards to modify the origianl meaning in different ways. These conventions are typically called signal shifts, with specific reactions named finesse/bluff/ignition, ejection, discharge, charm, blast, and alakazam (collectively shortend to ECCB).
For example, in the H-Group conventions, focusing a 5 with a color clue is called a 5 color ejection. Bob needs to blind play his 2nd-to-leftmost unclued card or else Cathy will believe her 5 is some other rank, which is likely to cause the team significant problems.
Most conventions implement these signal-shifts as value-shifts on the card: Cathy received incorrect information about a card, but Bob's reaction fixes it and provides true information to that same card. One convention that does it differently is Referential Sieve's playful positionals: in this convention, when a rank clue with bad meaning is given to a card, Bob's reaction fixes it and gives a true play signal different card. Playful positionals combine a value shift with a target shift. Reactor is filled with target shifts.
There are three main ways teams lose games of Hanabi:
A big reason systems take BDRs is because they do not have the clue tokens to keep them. A big reason players desync is because they have to rely on deep context when the number of clue tokens gets low. Designing a system to maintain a high clue count can help with all of these. It also lets teams be more careful with their discards than allowed by even the recent innovation of the Universal Rank Save. In order to maintain a high clue count, we desire players to be able to take safe actions as often as possible. Reactor uses target shifting because target shifting is required to provide maximal information about safe actions.
With minimal change, already-existing systems can become compatible with a reactive framework. All that is needed is for the system to redefine EDCB to be a target-shift providing a safe action rather than a value-shift providing inactionable information.
In order to implement reactive clues safely, System implementations should have the following properties:
Players should assume their touched cards will be someday useful until told otherwise. If that means the card is playable, then it can play. This is called the Good Touch Principle. Unlike many systems using the Good Touch Principle, Reactor does not mind bad touch and likes holding on to cross-hand dupes when possible.
Reactor is designed around safe actions. Most conventional information provides plays and discards.
A player with nothing else to do may discard their leftmost unestablished card. Typically this is the leftmost card that is both unclued and has not been called to play by an earlier clue. At high level there are additional rules for chops.
A player with multiple cards all of whose empathy is playable or trash is expected to play them from right to left. For example, if all the 2's but red 2 have been played and Alice has three 2s clued in her hand, then she is expected to play the rightmost one as red 2 unless a fix is given.
The exceptions to this rule are for clued 1s, which always play left to right, and the focus of a rank direct play clue which is the leftmost card touched.
Reactor is designed to handle gamestates with a high level of safety. Its conventionally weakest clues have careful meaning, optimized for awkward hands. In order to afford such safety, we also allow for many powerful clues.
The first clues to understand are Stable clues. Stable clues are the most basic clues Reactor uses. They usually provide one play or one discard and are described in this section.
This green clue says that the green card is playable. In general, color clues say to play the leftmost card touched.
This 5 clue says to discard the card in slot 2. In general, rank clues provide a discard instruction on the previously-untouched card to the left of the card touched. When multiple cards are touched, the instruction is on the leftmost that could make sense.
Even though this is a rank clue, it is not a referential discard signal! As long as the 1 can be playable, it is just a signal to play that 1. Once all unplayable 2s are accounted for, a 2s rank clue is a play signal, and so on.
When a clue only touches new cards and marks them as globally known trash, it simply means to discard those cards. For example, if all of the 1s have been played then a rank 1 clue is a direct discard clue. Similarly, if yellow 4 has been played and Cathy holds a globally known y5, then a yellow clue to Bob is also a direct discard clue.
Fill-ins occur when an established (previously-clued) card is touched, revealing its identity to be actionable. There are three types of identities to consider:
The first two are largely self-explanatory. For an example of the third, consider the following:
Alice has given Bob a stable color clue. Normally, this would be a sign to play the newly-touched card. Here, because the clue fills in red 4, a card identity Cathy also holds, Bob knows that the clue says something else: it says to discard the red 4.
Reactions occur when Alice gives Cathy a clue, but Bob doesn't yet have a safe action! It's Alice's job to make sure that with every clue she leaves Bob with something to do, so when Alice clues Cathy, Bob knows he has an action too. Here we'll learn slot 1 plays, one of the most common reactions.
Here, Alice has reached over Bob to give a color clue to Cathy. In addition to this clue telling Cathy to play her yellow 1, Alice reaching over Bob provides him an action: Bob should play his slot 1 card.
Alice has reached over Bob to give a rank clue to Cathy. This clue works just like with our Stolen Play Clue example. Alice has told Cathy to discard her slot 2, and the fact that Alice reached over Bob tells him to play his slot 1.
In this example, Alice has reached over Bob to tell Cathy to play p2. That's not playable yet! Well, the fact that Alice has reached over Bob means that Bob will play his slot 1. If Bob's slot 1 is purple 1, then the purple 2 will be playable by the time it's Cathy's turn. Bob will trust that his slot 1 is purple 1 and play it.
Here's an example of how a level 1 Reactor game might play out.
Signal modification is one of the most important concepts to understand for playing Reactor. We have two types: target slides and action flips. These two types can even be combined within a single reaction!
Alice has reached over Bob to tell Cathy to play her slot 3 card, but that card isn't playable. If Bob were to play his slot 1, Cathy could immediately bomb purple 5! But there's another playable card in Cathy's hand: her red 1. Bob should indicate how far this playable is from slot 3 by playing a different card. He plays his second (one rightward) slot card to tell Cathy to play one card to the left of slot 3.
Bob's Reaction | Signal Modification to Cathy |
---|---|
Play slot 1 | Keep the same target |
Play slot 2 | Slide one card to the left |
Play slot 3 | Slide two cards to the left |
Play slot 4 | Slide three cards to the left |
Play slot 5 | Slide four cards to the left |
Alice has jumped over Bob to tell Cathy to discard red 1, but red 1 should be playing, not discarding! Bob needs to flip the signal. Instead of playing his leftmost card, he should discard his leftmost card to tell Cathy to play the card instead.
Target sliding and action flipping can be combined. Playing keeps the same type of action, while discarding flips it. The slot that plays or discards determines the how many cards the target slides.
Bob's Reaction | Signal Modification to Cathy |
---|---|
Discard slot 1 | Flip the signal but keep the target |
Discard slot 2 | Flip the signal and slide one card to the left |
Discard slot 3 | Flip the signal and slide two cards to the left |
Discard slot 4 | Flip the signal and slide three cards to the left |
Discard slot 5 | Flip the signal and slide four cards to the left |
With Bob constantly reacting, we have to be careful that everyone is on the same page about how reactive clues work.
When Alice reaches over Bob to give a clue to Cathy, but she holds multiple playables or trash, how does Bob know onto which of Cathy's actions he needs to modify the signal?
In this example, every slot in Cathy's hand is actionable. She could discard her first or fourth cards from the left, and play her second, third of fifth cards. All of these are true signals to Cathy, but Alice needs Bob to pick a specific one so that he takes an action that Alice could predict when she gave the clue.
Because Cathy has plays to be informed about, Bob should give her a play signal. Even though playing slot 1 would give a true play signal, he is supposed to signal her green 1: Cathy's leftmost card needing a play-signal.
It's important to keep these considerations in mind when determining the expected action in Cathy's hand:
Here, some cards have prior positive information in Bob's and Cathy's hands. None of that changes Bob's reactions. Unlike systems where unexpectedly playing a touched card is distinguished from an untouched card, in reactor, every action Bob could take provides a different signal modification.
In this example, to count leftward to Cathy's expected action the team has to wrap around from slot 2, to slot 1, to slot 5, and finally to slot 4.
In this example, Alice has been given given a color play signal to her blue 1, so she knows exactly what the card is. Then, when she gives a reactive clue to Cathy, she is expecting Bob to tell Cathy to discard her blue 1 rather than to play it.
Sometimes, a reactive color clue does not make sense as playful positional. This can happen when receiver has no playable cards or 1-away cards that make sense as targets. In this case, the clue is a discard directive. The reacter should should discard a card to indicate to the reciever the what card to discard.
The focus and focus count of a discard directive work as in opposite orders: the count starts with the second-to-left previously-unclued card in hand.
Here's a replay of how a level-3 Reactor game might play out.
Consider the following situation:
Alice has given Bob as clue, but he already had a card to play whereas Cathy did not! In this scenario, Cathy expects Alice to always be giving her an action. So, she will react to the clue as if she and Bob were swapped.
If Alice had instead given Cathy a clue, nobody needs to react -- Bob already has something to do. So, as in the below example, the clue would simply be stable.
In order to codify this pattern, we talk about the reacter and the receiver.
Determining the reacter relies on different criteria depending on whether Alice is locked or not.
When Alice is locked, her clues mostly work as normal. The main exception: If Bob and Cathy both have safe actions, Alice may give OK stable play clues to either player and may stall with rank clues to either player.
When Alice gives a clue, Bob's has a known 1, queued to play via good touch. So, he is the receiver, and Cathy (lacking any information about her cards) is the reacter. Despite their roles, when Alice clues green to Bob, this clue is not reactive - it simply fixes Bob's play to a discard.
The endgame threshold is reached when pace is strictly less than the number of players. Generally we would want more clues to be stable in the endgame since it is less likely multiple players will have actionable targets at the same time. Role reversals do not apply, and additionally,
For example, the pace is currently +2 and all fours have been played. Cathy has an unclued green 5 on slot 1 and Alice clues Cathy three 1s on slots 2, 3, and 5. The initial meaning of this clue is for Cathy to discard green 5, so Bob flips the signal by blind playing slot 1.
The first turn of the game, stable rank clues touching slot 2 are starting hand stalls (unless the clue is to 1s).
In easy variants, players have no chops until the team has at most 15 points remaining to score. In a 5 suit win, this means that players do not have chops until the tenth card gets played. The last card automatically chop-moved is the one drawn to replace that tenth card.
So, unless they've been told to play or discard, players should assume that they are locked.
When multiple cards with equivalent information on them are clued with 1s, they are expected to play left-to-right. Playing out of order then means something special.
To a player with a chop, it moves the chop inward. To a player without a chop, it provides a discard inward as many untouched cards as 1s skipped.
For example, it is the early game and Alice receives a rank clue from Cathy which touches three 1s. Bob has a single 3 clued on slot 2. Playing the left 1 means nothing special, playing the middle 1 calls for Bob to discard slot 1, and playing the right 1 calls for Bob to discard slot 3 - skipping over the clued 3.
Discarding a card that had been told to play promises its identity in someone's hand. The location is the rightmost possible card.
When a card is called to play in Alice's hand but the identity of that card is unknown, then Alice is expected to play all of the cards (known or unknown) that were previously called to play which could lead into the new unknown playable card before playing that card.
However, if Alice has multiple globally known playable cards, the expected order of play of these cards follows the below priority table:
If Alice has multiple globally known playables and does not respect this priority table from the perspective of another player, it triggers one or more priority plays on the first such player, who is said to have been "prioritized". The prioritized player is promised that they hold the card which Alice's card leads into, and are also promised that they may repeatedly play the rightmost possible card, starting from clued cards and continuing into the unclued cards, until they find the promised card.
For example, Alice has a globally known red 2 and a green 3 in her hand, both of which are playable. Alice has no other playable cards in her hand. Blue 3 and yellow 3 are also played on the stacks. Cathy has two cards clued in her hand - a yellow 4 clued as 4 on slot 5, an unclued blue 4 on slot 4, and an unclued green 4 on slot 3. Cathy knows that red 2 should play before green 3 since it has a lower rank, so Cathy knows she has been prioritized for green 4. Cathy will first play the yellow 4 from slot 5 since that card could match, and then blind play the remaining cards starting from the right until she finds green 4.
Fix clues can be given to a player who will bomb a card from a priority play at some point. This is referred to as a load clue.
The exception is if the prioritized player is Bob and Bob has no clued cards that could match the card being prioritized. In this case, Bob is expected to blind play his rightmost unclued card and nothing else. This is referred to as a priority bluff.
Here, Alice has told Bob to play a yellow card, but she and Bob can both see yellow playable cards in Cathy's hand. This is a help yourself play clue. Bob's card is yellow 3, and he needs to help to make it playable by getting Cathy to play her yellow 1 and 2.
Help yourself play clues should be used a last resort: perhaps Alice and Bob are both locked, and this will help unlock ASAP. Or, perhaps Bob is locked and he needs to know which of Cathy's playables to prioritize. Or, Cathy's playables could be hard to access with color, but from the elimination notes the clue provides, a rank clue will work.
In this example, g3 has been played already. This 5 clue says the the green card is not green 5, and so if it is good it must be green 4 and is playable. So, it should play. We call this a rank negative fill-in clue; it is not a discard signal.
In this scenario, Bob has seemingly been told to play yellow 1. Why hasn't Alice gotten it reactively while also picking up Cathy's purple 1? Surely the clue means something else: Bob's yellow 1 is actually the other card. He can check: does that make sense? In order to get Bob's slot 4 with Cathy's purple 1, Alice would have to give cathy a color clue focusing the green 3, which is not possible. So, it makes sense for Alice to get yellow 1 this way.
If a stable color clue requires focus inversion, the interpretation is to play the second-to-leftmost-newly-touched if multiple cards are newly-touched, and to play the rightmost untouched card if only one card is newly-touched or the clue is a color reclue.
If a stable rank clue requires focus inversion, the interpretation is to play the second-to-leftmost referred-to card if multiple cards are newly-touched, and to discard the rightmost unreferred-to card if only one card is newly-touched or the clue is a rank reclue.
Suppose the following sequence of events occur:
Bob's clue to Cathy will require a focus inversion to be interpreted (or be a delayed play through Alice's hand). Alice decided to give Bob a discard signal rather than take her action because she can see that (without a focus inversion) he had no good stable clue to give Cathy.
A reactive color clue has filled in the yellow 4 and newly-touched the yellow 5. What should Bob do? Let's analyze:
This clue is a example where clue focus depends on the identity of Bob's play as well as the slot he plays: if Bob plays any other card from slot 1, Cathy's is told to play her yellow 5! Alice needs to be careful that Bob actually holds yellow 3 in slot 1 and not some other playable.
Alice has given Bob a rank clue, filling in red 3 and newly-touching yellow 3. What could this clue mean? Because there is no expected signal to Cathy, Bob has to ask himself a question: "what could happen if I play slot 1?" There are several possibilities:
Since there's no clear expected signal from Bob's point of view, he should trust that one of the signals holds and play his slot 1. Generally speaking, Bob should respond to an ambiguous rank clue by acting on his leftmost slot prividing Cathy a possibly-true signal.
Let's do an example with color clues.
Here, Cathy has no playable cards. She even has no 1-away from playable cards! So, Bob must indicate Cathy a discard. If Bob discards slot 1, Cathy will mark her discard on her yellow card, which is critical! If Bob discards his second slot, this will shift the discard onto red 3, which could be a dupe. So, that's what Bob should do.
When Alice jumps over Bob and Cathy to clue Donald, she is providing all three of Bob, Cathy, and Donald actions. Bob and Cathy's reactions combine to provide Donald his potential signal shift and/or action flip. We also still have role reversal. We'll need a reacter, a reciever, and one other role: the interferer?
Reactor should be strong with minimal changes. As a baseline, I suggest the following:
In this section we provide a reference to clarify the guide. Warning: the details get messy.
For any potential signal, there is an expected stable clue to provide it. When one clue potentially provides multiple signals, we use a signal priority to determine which is called for.
There are also some unexpected ways to signal actions combining empathy and the Good Touch Principle:
When a clue touches multiple cards, it will (often) be the expected clue to signal multiple actions. This is a method to determine what expected action(s) the clue provides.
In order, from highest priority to lowest:
Negative fill-ins and rank direct play signals generally take priority over (revoke) referential discard signals. Negative fill-ins do not revoke color play signals.
This configuration optimizes to match reactive initial meanings with stable clue meanings, frequently enable slot-1 finesses, and maintain full signalling diversity.
The most common, highest priority initial reactive signals are the same as for stable clues. They diverge in order to maintain full signaling diversity including considering initial signals which would be nonsensical for stable clues to provide.
Important differences to remember:
When a clue could provide multiple signals due to touching multiple cards, the following signal priority determines the initial reactive signal:
Remaining ties are broken by leftmost.
A card in the receiver's hand is playable if its identity is not already globally gotten and will play successfully once the receiver exhausts their play queue.
To determine whether a clue may be stable, there are two cases to consider based on whether Alice has an action.
When Alice is locked, she may give a stable clue to the next player without a safe action. her clues to the other player are reactive. If Bob and Cathy both have safe actions, Alice may give a stable clue to either of them.
When Alice is unlocked, she may give a stable clue to the next player to run out of plays. Her clues to the other player are reactive.
When a play signal is given, players may have other cards to play first. Cards should play in the order they are signaled. Discards are not generally expected to occur before plays.
Reactions occur in response to clues. If an action is not in response to a clue, it is not a reaction.
A play is a reaction if it was not globally expected. It is not a reaction if it was previously signalled, good touch playable, be due to global elimination notes, or be caused by other expected global context.
A discard is a reaction in response to a clue if it was not globally expected. It is also a reaction if Alice is unlocked and the reacter discards, even if the discard was globally expected. Otherwise, it is not a reaction.
There's a tradeoff between direct and referential play signals. With referential signals, we can have increased clue predictibility and have empathy on cards remaining in hand. With direct play signals, we can more accurately decide whether it is better to play vs. give a clue. Direct signals also enable delayed play clues to speed up unlocks. Finally, direct play clues allow players to more easily decide when to focus invert.
The pithy answer: because we can. To win Hanabi as often as possible is to optimize all our actions: plays and discards, and if we're efficient enough with clues to not need a default discard, then why have one?
The real reason is that when a player is locked, it's nice for the others to not have to say it and instead be able to expect them to give a clue. The downside is that when the otherwise-chop card is discardable, we can lose tempo and clue count.
One promising alternative to a chopless early game is semi-spookiness.
Direct play signals with color can be really helpful. Knowing what card is playing helps the player to accurately decide whether to play or give a clue. Also, direct play signals opens up space for some fancy finesse conventions where the focus of a playful positional depends on the empathy it provides.
On the other hand, knowing the identity of a card is told to discard (and generally knowing rank info) is somewhat less useful. While this is information that could be used to lock in sneaky ways, such as with known sacrifice, the current implementation provides more information to cards remaining in hand and allows for trick finesses.