An AI/ML accuracy tale

submited by
Style Pass
2023-11-19 10:00:07

On the other hand there was the discovery that most Search improvements are manually reviewed by engineers through ‘side-by-side’ comparisons between old and new results…on spreadsheets!

The above quote reminded of how hard, and often understated, quality assurance QA) in AI/ML systems is. Each change to a model needs to be validated, and validation is hard and cumbersome. Also, the fact that models can have a freshness does not help - that means that quality assurance must be done continuously and treated as a service level.

At a former employer we had a system that categorized a stream of financial transactions using ML. For example, “MacDonald’s” was categorized as “Restaurant”, and “H&M” was categorized as “Clothing”, etc. The users could adjust incorrectly categorized transactions if they were wrong. Our goal was to measure the accuracy of how well these categories were applied by the ML model.

Initially, we considered to ask for explicit feedback (“Was this category correct?”) from the user in the UI. However, we concluded that we did not want to make our UI bloated. We asked ourselves, can we somehow figure out whether our classification is accurate without modifying our UI?

Leave a Comment