Having a dev set and metric speeds up iterations
大步前进——利用开发集和指标
我们事先难以知道那个方案能够最好的解决问题。甚至老练的机器学习研究人员都需要踏破铁鞋才能发现些令人满意的东西。当我在构建一个机器学习算法的时候,我会经常这么做:
- 制定如何构建系统的计划
- 将计划用代码实现
- 做实验,然后区别那个方案做得好哪些做得不好。然后在根据结论,继续提出更多的计划,如此反复。
It is very difficult to know in advance what approach will work best for a new problem. Even experienced machine learning researchers will usually try out many dozens of ideas before they discover something satisfactory. When building a machine learning system, I will often:
- Start off with some idea on how to build the system.
- Implement the idea in code.
- Carry out an experiment which tells me how well the idea worked. (Usually my first few ideas don’t work!) Based on these learnings, go back to generate more ideas, and keep on iterating.
这是一个循环的过程。如果你每一轮都做得很快,那么你将迅速地取得进步。这就是未扫描拥有一个开发或者测试集如此重要的原因了:每当你提出点子的时候,你都是依据它在测试集或开发集上的表现来衡量这样子修改是否正确。或者,这个方向是否是可行的。
This is an iterative process. The faster you can go round this loop, the faster you will make progress. This is why having dev/test sets and a metric are important: Each time you try an idea, measuring your idea’s performance on the dev set lets you quickly decide if you’re heading in the right direction.
相反,假设你没有特定的开发集和评价指标。每当你的团队开发出一个新的喵咪分类器的时候,要想测试它是否有用,你得把它装到手机上,让后再玩上个把小时,来看看这个新的分类器是否在原来的基础上有所提升。难以想象,这样子效率该有多低!另外,如果你的团队将分类器的准确性提高了0.1%,那么,你在怎么在手机上玩,你都不会察觉到这细微的提升的。然而,很多时候,系统的提升通常是通过无数0.1%积累而来。因此,拥有一个开发集和指标能让你更快的察觉到,那些点子会给你的方案带来提高,然后,你就能迅速地知道,哪些点子能继续走下去,哪些点子需要被无情的抛弃。
In contrast, suppose you don’t have a specific dev set and metric. So each time your team develops a new cat classifier, you have to incorporate it into your app, and play with the app for a few hours to get a sense of whether the new classifier is an improvement. This would be incredibly slow! Also, if your team improves the classifier’s accuracy from 95.0% to 95.1%, you might not be able to detect that 0.1% improvement from playing with the app. Yet a lot of progress in your system will be made by gradually accumulating dozens of these 0.1% improvements. Having a dev set and metric allows you to very quickly detect which ideas are successfully giving you small (or large) improvements, and therefore lets you quickly decide what ideas to keep refining, and which ones to discard.