At a packed event in San Francisco, OpenAI Five (OpenAI’s autonomous system) competed against Europe’s OG — an esports collective that became the first win four Dota Major Championships in 2017 — in a series of rounds commentated on by players William “Blitz” Lee, Austin “Capitalist” Walsh, Owen “ODPixel” Davies, Kevin “Purge” Godec, and Jorien “Sheever” van der Heijden. The stakes were somewhat higher than OpenAI’s previous matches; in a best-of-three match at Valve’s The International 2018 esports competition, two teams of Chinese pro gamers overcame OpenAI Five.
This time around, the bots won the first two matches of three.
The rules were the same as those last summer, at The International: the bots didn’t have invulnerable couriers (i.e., NPCs that deliver items to heroes), which in earlier rounds they used to ferry a stream of healing potions to their player characters. OpenAI also played on the latest Dota 2 patch, and benefited both from a “more fluid” process and significantly more training. According to OpenAI cofounder and chairman Greg Brockman, it now has a collective 45,000 years of Dota 2 gameplay experience under its belt.
Historically, an absence of long-term planning has been OpenAI Five’s Achilles Heel — it emphasized short-term payoffs as opposed to long-term rewards. Dota 2 games generally last 30 to 45 minutes, and OpenAI says its AI agents have a “reward half-life” — the length of time they can wait for future payoffs — of 14 minutes. Another of the bot’s disadvantages: It doesn’t learn between games,
In today’s matches, OpenAI preferred to defend its towers, although it occasionally brought over a hero to strike proactively. It made a few misplays, like directing one of its player characters — Death Prophet — to use ultimate skill against an enemy hero, Riki, after which the latter went invisible and ran off. But it demonstrated a knack for “juggling” — killing creatures away from the main action (despite the fact that it strayed away from resource gathering, attacking towers, and getting objectives). And it directed heroes to walk away in situations where damage-over-time was likely to kill them, using unusual items and constantly flickering in and out of invisibility to avoid being killed.
“OG played extremely weirdly the entire time, and we saw sometimes it worked, and sometimes it really, really didn’t,” RAEng research fellow Mike Cook wrote on Twitter. “I’m not sure what to make of the new bots … They’re clearly very different … But I also feel like OG’s draft and play was very different to what we’ve seen from human teams facing them before.”
How OpenAI tackled Dota 2
Valve’s Dota 2 — a follow-up to Defense of the Ancients (DotA), a community-created mod for Blizzard’s Warcraft III: Reign of Chaos — is what’s known as a multiplayer online battle arena, or MOBA. Two groups of five players, each of which are given a base to occupy and defend, attempt to destroy a structure — the Ancient — at the opposing team’s base. Player characters (heroes) have a distinct set of abilities, and collect experience points and items which unlock new attacks and defensive moves.
It’s more complex than it sounds. The average match contains 80,000 individual frames, during which each character can perform dozens of 170,000 possible actions. Heroes on the board finish an average of 10,000 moves each frame, contributing to the game’s more than 20,000 total dimensions. And each of those heroes — of which there are over 100 — can pick up or purchase hundreds of in-game items.
OpenAI Five isn’t able to handle the full game yet — it can only play 18 out of the 115 different heroes, and it can’t use abilities like summons and illusions. And in a somewhat controversial design decision, OpenAI’s engineers opted not to have it read pixels from the game to retrieve information (like human players). I uses Dota 2’s bot API instead, obviating the need for it to search the map to check where its team might be, check if a spell is ready, or estimate an enemy’s health or distance.
That said, it’s able to draft a team entirely on its own that takes into account the opposing side’s choices.
OpenAI’s been chipping away at the Dota 2 dilemma for a while now, and demoed an early iteration of its MOBA-playing bot — one which beat one of the world’s top players, Danil “Dendi” Ishutin, in a 1-on-1 match — in August 2017. It kicked things up a notch in June with OpenAI Five, an improved system capable of playing five-on-five matches that managed to beat a team of OpenAI employees, a team of audience members, a Valve employee team, an amateur team, and a semi-pro team.
Perhaps more impressively, in early August, it won two out of three matches against a team ranked in the 99.95th percentile. In the first of the two matches, Open AI Five started and finished strongly, preventing its human opponents from destroying any of its defensive towers. The second match was a tad less one-sided — the humans took out one of OpenAI Five’s towers — but the AI emerged victorious nonetheless. Only in the third match did the human players eke out a victory.
OpenAI Five consists of five single-layer, 1,024-unit long short-term memory (LSTM) networks — a type of recurrent neural network that can “remember” values over an arbitrary length of time — each assigned to a single hero. The networks are trained using a deep reinforcement learning model that incentivizes their self-improvement with rewards. In OpenAI Five’s case, those rewards are kills, deaths, assists, last mile hits, net worth, and other stats that track progress in Dota 2.
OpenAI’s training framework — Rapid — consists of two parts: a set of rollout workers that run a copy of Dota 2 and an LSTM network, and optimizer nodes that perform synchronous gradient descent (an essential step in machine learning) across a fleet of graphics cards. As the rollout workers gain experience, they inform the optimizer nodes, and another set of workers compare the trained LSTM networks (agents) to reference agents.
To self-improve, OpenAI Five plays 180 years’ worth of games every day — 80 percent against itself and 20 percent against past selves — on 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform. Months ago, when OpenAI kicked off training, the AI-controlled Dota 2 heroes “walked aimlessly around the map.” But it wasn’t long before the AI mastered basics like lane defense in farming, and soon after nailed advanced strategies like rotating heroes around the map and stealing items from opponents.
“People used to think that this kind of thing was impossible using today’s deep learning,” Brockman told VentureBeat in an interview last year. “But it turns out that these networks [are] able to play at the professional level in terms of some of the strategies they discover … and really do some long-term planning. The shocking thing to me is that it’s using algorithms that are already here, that we already have, that people said were flawed in very specific ways.”
Fully trained OpenAI Five agents are surprisingly sophisticated. Despite being unable to communicate with each other (a “team spirit” hyperparameter value determines how much or how little each agent prioritizes individual rewards over the team’s reward), they’re masters of projectile avoidance and experience points sharing, and even of advanced tactics like “creep blocking,” in which a hero physically blocks the path of a hostile creep (a basic unit in Dota 2) to slow their progress.
Dota 2 players are already studying OpenAI Five’s styles of play, some of which are surprisingly creative. (In one match, the bots adopted a mechanic which allowed their heroes to quickly recharge a certain weapon by staying out of range of enemies.) As for OpenAI, it’s applying some of the insights gleaned from to other fields: last February, it released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots to learn from failure, and later in the year published research on a self-learning robotics system that can manipulate objects with humanlike dexterity.
Brockman said that while today’s match was the final public demonstration, OpenAI will “continue to work” on OpenAI Five.
“The beauty of this technology is that it doesn’t even know it’s [playing] Dota … It’s about letting people connect the strange, exotic but still very tangible intelligences that are created … modern AI technology.” he said. “Games have really been the benchmark [in AI research] … These complex strategy games are the milestone that we … have all been working towards because they start to capture aspects of the real world.”