| 幸福大叔 |
2022-07-09 23:25 |
How to get better at video games, according to babies
How to get better at video games, according to babies In 2013, a group of researchers wanted to create an AI system that could beat every Atari game. They developed a system called Deep Q Networks (DQN) and less than two years later, it was superhuman. But there was one notable exception. When playing Montezuma's Revenge, DQN couldn't score a single point. What was it that made this game so vexingly difficult for AI? Brian Christian investigates. [Directed by Gavin Edwards, Movult, narrated by Jack Cutmore-Scott, music by Salil Bhayani, cAMP Studio].
518,236 views 2021~~2022| Brian Christian • TED-Ed
Brian Christian Educator
In 2013, a group of researchers at DeepMind in London had set their sights on a grand challenge. They wanted to create an AI system that could beat, not just a single Atari game, but every Atari game. They developed a system they called Deep Q Networks, or DQN, and less than two years later, it was superhuman. DQN was getting scores 13 times better than professional human games testers at “Breakout,” 17 times better at “Boxing,” and 25 times better at “Video Pinball.”
00:48 But there was one notable, and glaring, exception. When playing “Montezuma’s Revenge” DQN couldn’t score a single point, even after playing for weeks. What was it that made this particular game so vexingly difficult for AI? And what would it take to solve it?
01:10 Spoiler alert: babies. We’ll come back to that in a minute.
01:16 Playing Atari games with AI involves what’s called reinforcement learning, where the system is designed to maximize some kind of numerical rewards. In this case, those rewards were simply the game's points. This underlying goal drives the system to learn which buttons to press and when to press them to get the most points. Some systems use model-based approaches, where they have a model of the environment that they can use to predict what will happen next once they take a certain action. DQN, however, is model free. Instead of explicitly modeling its environment, it just learns to predict, based on the images on screen, how many future points it can expect to earn by pressing different buttons. For instance, “if the ball is here and I move left, more points, but if I move right, no more points.”
02:12 But learning these connections requires a lot of trial and error. The DQN system would start by mashing buttons randomly, and then slowly piece together which buttons to mash when in order to maximize its score. But in playing “Montezuma’s Revenge,” this approach of random button-mashing fell flat on its face. A player would have to perform this entire sequence just to score their first points at the very end. A mistake? Game over. So how could DQN even know it was on the right track?
02:47 This is where babies come in. In studies, infants consistently look longer at pictures they haven’t seen before than ones they have. There just seems to be something intrinsically rewarding about novelty. This behavior has been essential in understanding the infant mind. It also turned out to be the secret to beating “Montezuma’s Revenge.”
03:12 The DeepMind researchers worked out an ingenious way to plug this preference for novelty into reinforcement learning. They made it so that unusual or new images appearing on the screen were every bit as rewarding as real in-game points. Suddenly, DQN was behaving totally differently from before. It wanted to explore the room it was in, to grab the key and escape through the locked door— not because it was worth 100 points, but for the same reason we would: to see what was on the other side. With this new drive, DQN not only managed to grab that first key— it explored all the way through 15 of the temple’s 24 chambers.
03:58 But emphasizing novelty-based rewards can sometimes create more problems than it solves. A novelty-seeking system that’s played a game too long will eventually lose motivation. If it’s seen it all before, why go anywhere? Alternately, if it encounters, say, a television, it will freeze. The constant novel images are essentially paralyzing.
04:23 The ideas and inspiration here go in both directions. AI researchers stuck on a practical problem, like how to get DQN to beat a difficult game, are turning increasingly to experts in human intelligence for ideas. At the same time, AI is giving us new insights into the ways we get stuck and unstuck: into boredom, depression, and addiction, along with curiosity, creativity, and play.
psjmz mz, Translator Helen Chang, Reviewer
如何让宝宝们更擅长玩电子游戏
2013年一群伦敦 DeepMind公司的研究者 把目光放在一大挑战上。 他们想要创建一个人工智能系统, 不仅能胜一个, 而是能够全部完胜雅达利 (Atari)游戏。 他们开发了个名叫 强化学习的网络(DQN), 在不到两年,它超越人类。 DQN打砖块游戏(Breakout)的得分 比人类专业游戏玩家高13倍, 在拳击游戏中高17倍, 在电子弹珠台中高25倍。
但有一很明显的例外。 玩游戏《Montezuma’s Revenge》时, DQN一分都拿不到, 即便玩了几周后。 是什么让这个特别的游戏 对人工智能这么难胜? 需要采取什么来解决它?
剧透警告:婴儿。 我们1分钟后回来。
人工智能玩雅达利游戏 涉及到强化学习, 在这里系统被设计为 最大化某种量化的奖励。 在这个例子中,这些奖励是游戏分数。 这个潜在的目标驱使系统 学习按哪个按键 以及何时去按来获得最高分数。 一些系统使用基于模型的方法, 它们有一个环境的模型 这样它们就能用来预测 一旦它们采取特定行动后, 下一步会发生什么。 然而,DQN没有任何模型。 与其明确地建模环境, 它只需要基于屏幕上的图像学习预测, 它们按不同的键能够 期望获得多少分数。 例如,“如果球在这里, 我向左移就得更多的分数, 但如果向右移就不得分。”
但学习这些联系需要大量的试错。 DQN系统从随意敲按键开始, 然后慢慢拼凑 何时需要敲哪个按键 才能够得到最高分。 但在玩《Montezuma’s Revenge》时, 这种随意敲按键的方法彻底失效了。 玩家得做完这整个序列动作 才能最终得到第一分。 犯个错误?游戏结束。 那么DQN如何知道它在正确的道路上?
婴儿上场的时候到了。 在研究中,婴儿看没见过的图片要比 见过的图片花更多的时间。 新奇似乎就是某种内在奖励。 这种行为对于理解 婴儿的心理至关重要。 这正好也是玩好 《Montezuma’s Revenge》游戏的秘密。
DeepMind研究人员找到 巧妙的方法 将这种对新奇事物的偏好 插入到强化学习中。 他们让不同寻常 或新的图片出现在屏幕中时 与真正的游戏积分一样有奖励意义。 突然之间,DQN的行为 完全跟起初不一样了。 它想要探索所处的房间, 去抓住钥匙并 通过锁住的门逃出去—— 不是因为这价值100分, 而是跟我们的理由一样: 去看看另一边有什么。 通过这种新的激励,DQN不仅 能够抓住第一把钥匙—— 它还在24个房间中,探索了15个房间。
但强调基于新奇的奖励有时候 会带来比它解决的问题 更多的问题。 一个新颖性探索的系统 如果玩游戏太久 最终会失去动力。 如果这都是以前见过的, 为什么还要去呢? 换之,假如它遇到电视, 它就会停下来。 不断出现的新奇图像基本让人瘫痪。
这个想法和启发是双向的。 人工智能研究人员被一个问题困住了, 比如如何让DQN打赢一个不同的游戏, 逐渐变成了探索人类的思想智能。 同时, 人工智能给我们提供了新的视角, 让我们了解如何陷入和摆脱困境: 变得无聊、沮丧和上瘾, 还有好奇心、创造力和玩乐。
|
|