Signatures for Go game records
One of the most frustrating tasks in using a collection of game records
is locating any particular game; and correspondingly, in maintaining a
collection, eliminating duplications when adding in new material. Except
for title matches in the final rounds of major titles, there is no "coordinate
system" that is commonly used to identify games.
The difficulty of locating a particular game in machine readable form
depends on what information you have available. This brief paper descibes
a simple and effective technique, suitable to find a game when printed
diagram of the game is available.
It is is also useful for detecting duplicates when integrating new material
into a collection.
Theory of Game Signatures
This method is based on the observation thatany set of unrelated
moves from the middle of a game is likely to be unique. If we assume that
all intersections are equally likely to be played (not true, but it's the
right order-of-magnitude), then there are 361*360 = 129,960 possible combinations.
Any three moves gives 46,655,640 possibilities, and so on.
I use two groups of three moves, for a raw probability on the order
of 1 in 10^15; in other words, any two game records with six middle-game
moves the same are extremely likely to be the same game.
In practice, with the modest sized collections that are available today,
any two moves are likely to be unique, so to verify if a particular
game is present in an electronic collection, you need only locate two moves
a printed record.
Game Signatures in practice
Here is the precise specification I use for game signatures.
-
Signatures are represented the same as moves are in
Smart
Go Board format, with missing elements marked by ??. Elements can be
missing because the game doesn't contain that many moves. One of the sample
games is short.
-
Signature-A consists of moves 20,40,60
-
Signature-B consists of moves 31,51,71
-
In handicap games, White's first move is number 1. (I guess that makes
black's first move "move zero".) In any case, any handicap or other fixed
setup is considered move zero. One of the sample games below is a handicap
game.
-
If the game record contains variations, then the signatures are based on
the first variation, which is presumably the true line of play. One of
the sample games contains variations.
Sample Data
Here's some sample data; three games and corresponding keys. I've printed
the keys as plain lisp forms. It might make sense to standardize on printing
the key in a form that would be acceptable as smart go board tags, but
no standard representation has been settled.
Game kage-5-stone-1
and below, the signatures
:SIGNATURE-A "qoqghn"
:SIGNATURE-B "phfqln"
Game 1970-rin-kajiwara
and below, the signatures
:SIGNATURE-A "qqol??"
:SIGNATURE-B "rpip??"
Game YI203
and below, the signatures
:SIGNATURE-A "rcqnnk"
:SIGNATURE-B "bfoojl"
So What?
It is my hope that this scheme will be widely adopted to help build, maintain
and use game databases. There are some hopeful signs. Jan van der Steen
has incorporated signature-based retrieval into his game database (which
is, incidentally, the largest and best organized database available to
the public). Try
it!. (Or maybe you would prefer to read this brief tutorial
first.
comments/suggestions to:
ddyer@real-me.net