AI Blindspots

Scientific Debugging

When there is a bug, there are broadly two ways you can try to fix it. One way is to randomly try things based on vibes and hope you get lucky. The other is to systematically examine your assumptions about how the system works and figure out where reality mismatches your expectations. I generally think that the second approach of scientific debugging is better in the long run; even if it takes you more time to do, you will walk away with a better understanding of the codebase for next time.

A non-reasoning model is not going to do the scientific method. It is going to “guess” what the fix is, and try to one shot it. If you are in an agent loop, it can rerun the test suite to see if it worked or not, and then randomly try things until it succeeds (but more likely, gets into a death loop).

The general word on the street is you should use a reasoning model for debugging (actually, people tend to prefer using models like Grok 3 or DeepSeek-R1 for this sort of problem). Personally, when I do AI coding, I am still maintaining a pretty detailed mental model of how the code is supposed to work, so I think it’s pretty reasonable to also try to root cause it yourself (you don’t have to fix it, if you tell the model what’s wrong it will usually do the right thing).

Examples