TA的每日心情 | 奋斗 6 天前 |
---|
签到天数: 2052 天 [LV.Master]无
|
晨大啊,3000万手的人类对局监督学习,怎么就被你吃掉了个万字?这个要算一局300手的话,那就是10万棋局(其实估计不止,因为很少有下到300手),那基本上是职业比赛的棋局都拿来填鸭了,没有精选。
然后Demis Hassabis说了,可以跳过监督学习这一步,直接从一张白纸开始强化学习发展到现在的功力
Is that possible because of where the algorithm has reached now?
No, no, we could have done that before. It wouldn’t have made the program stronger, it just would have been pure learning. so there would’ve been no supervised part. We think this algorithm can work without any supervision. The Atari games that we did last year, playing from the pixels — that didn’t bootstrap from any human knowledge, that started literally from doing random things on screen. |
评分
-
查看全部评分
|