Imagine you have just written a complex piece of code (maybe an agent?) that interacts with several APIs and possibly uses large language models (LLMs). Now, you want to write tests but calling the APIs or LLMs during tests would make them slow, non-deterministic, and dependent on network availability.
You could write mock implementations for every external call, but that is time-consuming, difficult to maintain, and prone to breaking with code changes. Wouldn’t it be great if you could simply record one execution of your code and reuse that data to stub external calls during tests?
Cached stubs allow you to record the results of function calls once and use those cached results in your tests. This approach eliminates the need to create and maintain complex mocks manually. It is fast, reliable, and integrates seamlessly with testing workflows.
I came up myself with the name “cached stubs” since I have not seen this pattern used anywhere else. If you end up using this approach in your code, feel free to link back to this post! And if you have seen something similar before, I would love to hear about it.