The Computer Stratego World Championships is something of which I’ve been aware for a while now (since before its second event in 2008 in fact), but this year I’ve made my mind up to actually compete. Contrary to what its name suggests, it is a rather small competition (though indeed international and the only one of it sort), with only 5 individuals taking part last year. (Now I slightly regret not entering then while I at least had a statistically better chance, despite the lack of time and even experience.) This year there may be as many as twice as that number, and it appears as if there is even some academic interest from various places as well. Also, it seems that I will be submitting the first bot (computer player) written in .NET, thanks to a web service interface provided by Sven, one of the current and last year’s participants. (The standard interfaces provided with the SDK are only for Java and C++, requiring some form of IPC to get the client working with other languages.) Given that I have almost a full year to write, test, and improve upon my submission, I very much hope to submit a competitive bot by the time of the competition in December.
One of the great things about this tournament is the opportunity to test the bot you have coded at any time within a sizable online Stratego community at Metaforge (the host of the actual event), whose admin was nice enough to provide all participants with free testing accounts. If I’m lucky I may be able to get some useful feedback from some of the experts. The main method of testing and learning (perhaps automated learning if I’m feeling adventurous) will be offline, running marginally different versions of the bot against each other.
To me this competition is especially attractive because it deals with a largely unstudied area of game AI (hence the recent attraction of academic interest). Although it could be in vain, I’m hoping that my lack of formal education in this area may not serve as too big a disadvantage to me. It also means I can experiment with some really interesting theories without feeling I’m wasting time or reinventing the wheel. Until now, the most complex of the very few AI-related projects that I’ve worked on are a basic (though surprisingly skillful!) four-in-a-row player and an automatic solver for the Herbert challenge (part of Microsoft Imagine Cup in previous years – see my Herbert.NET project for more information). In fact, the AI that I ended up coding for a four-in-a-row game was quite overkill, since I implemented it primarily based on descriptions out of articles for advanced chess AI – now I hope to actually put it to good use.
Of course, programming a decent Stratego player is going to require quite unique algorithms and tactics. I will try and summarise a few of my theories here in brief, but don’t expect me to give too much away. (Admittedly, my ideas are currently little more developed than what I write here.) My plan is essentially to use something based on the alpha-beta search algorithm for traversing the tree of all possible future moves, almost identically to chess AIs. A transposition table and a few other traditional methods should come in handy, but that’s about where it ends and the statistical nature of the game requires its own approach. Clearly, considering an unknown opponent piece to be anything would allow the tree to branch far too quickly for any useful results, and would in many situations be near-pointless anyway. Very basically, my proposed solution is to use Bayesian probability and inference to calculate the likelihood of certain outcomes and thus assess the risk of different moves and strategies. Maintaining a list of probabilities of each opponent piece being of a certain type (for all nodes within the tree) is therefore a fundamental requirement, though with all likelihood not enough. In addition, it is perhaps equally important to keep track of what the opponent might guess its own pieces to be. Now, simply choosing the move with the statistically best expected outcome would however produce a very poor player, since a good (human or computer) player does not base their moves purely on probabilistic calculations. For this reason, the program must have some understanding of the concept of bluffing among other things; it is generally not wise to attack an unknown piece with the potential of a large loss despite it being relatively likely that that it will defeat it anyway. In summary, I expect the design of the cost function for the search algorithm will be crucial in that it needs to account for movement behaviour of pieces as well as macro-decisions made by the opponent, apart from the usual point counts and piece positions. I cannot yet tell whether it will be more effective to simulate the human decision-making process when playing Stratego or that of existing AIs for board games (or quite possibly a hybrid of the two).
So that concludes the first of my posts on the subject. Look out for a follow-up post as I begin serious work on the bot and get some interesting results to show. For the meanwhile there is a rather long list of post ideas to keep me going.
Update
Imer Satz, the creator of Probe (the reigning AI of the Stratego Computer World Championships), has kindly pointed out that there does in fact exist a host of files (roughly 75,000) of recorded classic Stratego games. The are available for download from the gravon.de website. Imer has stated that he might be willing to put up an archive of all these games; I will update the post again if there is any success in this respect.
Yeah… so if you need to hard-code bluffing tactics into a bot, then you’re using the wrong cost function.
What I *think* you should be trying to do is create a bot that always selects the move which maximises P(winning | move selected, current state of the board). If you create a program which plays a lot of games (either at random or with the aim of losing, against semi-skilled players), and then does a post-mortem from both the winning and losing perspective in order to determine statistically what P(winning | move selected, current state of the board) is (A tactic that’s approximated by Edward de Bono in “The Five-Day Course in Thinking” as a way for humans to learn to play new games).
The most difficult thing to do will then be to determine what representation should be used for “state of the board”, and the resulting probability function: A good choice will generalise well, and allow the system to learn quickly with less training data. A poor choice will make learning impossible.
Now I don’t know enough about the game to know what features make a good “state of the board” representation, but here’s how I’d probably try to proceed:
If you can get your hands on a few thousand human-on-human games (or computer on computer games) and then select random subsets of the data to plot scatter graphs of p(winning | value of metric) for a few metrics, then you might start to see some patterns emerging.
A few example metrics might be:
* p(killing enemy piece next move | next move)
* p(friendly piece dying next move | next move)
* ratio thereof
* expected cost to you of the next move (p(piece dying) * value of piece)
* expected cost to opponent of the next move
* ratio thereof
* what happens if you assign different values to different pieces?
The ratio of one of the above metrics for the move chosen against the average of the same metric for all moves not chosen.
(note that all of the above metrics relate to the value of the move you’re about to pick for each turn of the game. They should all be measurable before the move is played.)
You could then investigate whether the scatter graphs look different if you have different ranges of:
* Number of friendly pieces on the board
* Number of enemy pieces on the board
* Total number of dead pieces
* Number of moves made
* Centre of mass of your pieces, and the enemy’s
This would let you select different tactics for different stages of the game, if this is appropriate.
Then, when you’re designing your bot, try to make it always pick the move which maximises p(winning | whatever metric you’re using). Try building a bot that only uses one metric to begin with, since p(winning | metric1, metric2) != mean(p(winning | metric1), p(winning | metric2))
When you have the framework in place, it should be possible to plug in new metrics almost at random, and start to get an idea of how well they will perform.
I probably won’t be on MSN much, since I need to work, so discuss stuff here/in email.
Be sure to send me any scatter graphs you get out.
That’s an interesting approach. Of course, I already knew you would propose something learning-based… My initial reaction was that it will require far too much training simply because of the huge number of possible states. You really can’t ignore the macroscopic state of the board in such a game. (Notice as well that the state doesn’t just consist of the positions of the pieces on board, but also a list of those which have been revealed to the opponent.) As you seem to suggest, the state does need to be unique for each combination of piece positions/histories. I’ll see how things go, but at the moment I think automated learning is only going to be part of the solution here. Your idea of pluggable metrics does rather intruige me too, and isn’t totally tied in with the other suggestions, so this may be the first idea I put to test.
Now unfortunately, unlike chess (which has the PGN file format), there aren’t thousands of high quality Stratego games available for download. Yet there is still the possibility of making some rough estimates of the probabilities/parameters, and then playing different versions of the AI against each other (or even putting the bot online to play against people once it’s developed enough).
One thing for which I certainly plan on using learning is adjusting the parameter values/piece costs, which should be fairly trivial really. (I knew I was writing Darwin.NET for some reason…) Learning the cost of losing certain pieces when only certain numbers of others remain is aditionally something I would consider implementing as a start.
Ok, I’ll definitely let you know how things go with this. Don’t expect too much progress on the interesting bits yet however, as I still have the matters of board representation (I’ve pretty much decided on bitboards) and move generation to resolve (the crazy N-squares rule may take a bit of work). We’ll likely be able to talk about this online before I get around to coding it.
Also, if you fancy we could even play a few games online… maybe you will be able to develop or alter your ideas somewhat after getting a feel for the kind of decisions/strategy involved. (The impression that although you know basically how the game works, you have never actually played the game but just skimmed over the Wikipedia article.)
The pluggable metrics thing is essentially the only practical way to implement a learning-based bot.
“Metric” == “Feature” == “State of the game”
The reason a learning-based bot which just uses the bitboards representation as the “state of the game” is that it doesn’t generalise *at all*. This is why the evaluation of *useful* metrics is important.
Task number 1 is *definitely* to get yourself a data set though. Doesn’t matter how you get it, or how high quality it is, as long as it’s large (try about 1000 randomly played games, and use them to evaluate some simple metrics: see if the statistics confirm your intuition. If there’s a lot of noise in your estimate of p(winning | metric), then try doubling the amount of data you collect a few times
).
Hey Alex, how’s it coming along? We’ll be excited to have you enter the competition this year.
Probe is written in Delphi, and Imer is able to use the NativeBridge stuff included in the Bot SDK – as long as you can have your .Net stuff compile down to a true native DLL rather than bytecode, I would think it would work. In any case, maybe you or Sven could donate whatever IPC stuff you have cooked up to make it easier for other entrants?
Hey Chip,
I haven’t done a great deal of work on it since initially setting things up and considering my approach, but I very much plan to spend a large amount of time on it during summer (after exams finish in a few weeks), and I look forward to testing on the Metaforge servers during that period.
Regarding the .NET IPC stuff, I haven’t actually heard from Sven in a while. I shall email him and find out what the progress is – otherwise I will have to create something of my own, though it shouldn’t be too difficult… Either way, Sven or I will certainly release the code for public use as soon as possible.
Hi, guys. This is a great thread. It’s high time for a more formal approach to Stratego AI.
You may be interested to know there is a database of about 75,000 classic Stratego games on the gravon site. You could process the games to gain some of the information you want. I downloaded them into a single folder and analyzed them for the new version of Probe (available now, by the way). If you like, I could post the archive for you to download.
Hi Imer,
I had no idea that such a site existed. This should be of immense use to many of us. Indeed, I do plan on applying some structured form of machine learning to my bot, so this would be perfect. If you could upload the archive, that would be much appreciated! I would be glad to edit the post to link to it.