Betting on methods that scale
- EN
- ZH-CN
After reading The Bitter Lesson, how do I actually put weight on scalable learning, search, and compute?
What does the bitter lesson mean in practice, and how do I tell if a method follows it?
› show full response
In one line: hardcode less of your own judgment, and build something that gets better on its own when you give it more data, more compute, longer training, and a bigger search budget.
Three layers. Learning: let the model work out its own features and strategies from data, instead of you hand-coding them. Search: spit out options, score them, keep the good ones, repeat. You set the goal and how to grade it, not every step. Compute: the method has to batch, run across many chips, stay cheap to talk between them, and read memory in order, so it gets faster for free as the hardware grows.
Put what you know into the goal, the data, the environment, and the metrics, not into a pile of fragile if-else. Tell the model what a good answer looks like and give it lots of data and a verifier, instead of writing rules to fake being smart.
The test is simple. 10x the data, does it get better? 10x the model? 10x the training time? If the answer is mostly “no, not unless I keep tuning by hand,” it doesn’t scale well enough.
Two things stuck: judge a method by whether 10x the data, model, or training time makes it better on its own; put what you know into the goal, the data, the environment, and the metrics, not into a pile of fragile rules.