TL;DR you’re probably wasting time, resources, and revenue running unnecessary A/B tests. Offline policy evaluation can predict how changes to your production systems will affect metrics and help you A/B test only the most promising changes.
A/B tests are essential for measuring the impact of a change or new feature. They provide the required information for making data-driven decisions. They also take a long time to run, are subject to false positives/negatives, and can give real users bad experiences.
Just like A/B tests be came standard practice in the 2010s, offline policy evaluation (OPE) is going to become standard practice in the 2020s as part of every experimentation stack.
At a certain point every company runs A/B tests. Simple changes, like changing text on a landing page, are easily tested with any A/B service available now.. They can also test something more complex, like a decision making system. A decision making system could be a recommendation engine, a set of handcrafted fraud rules, or a push notification system. In machine learning jargon, decision making systems are called “policies”. A policy simply takes in some context (e.g. time of day) and outputs a decision (e.g. send a push notification). A perfectly data-driven company would measure the impact of any and every change to a policy.
At Facebook, I worked on the applied reinforcement learning (RL) team where our goal was to develop better policies for push notifications, feed ranking, and many other problems. Every time we created a new policy, or tweaked an existing policy, we would run an A/B test…and then wait several weeks to gather data to find out if this new policy was better than the current one.