This project extends the work presented in the paper

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2025-01-04 22:00:13

This project extends the work presented in the paper "Human Expertise in Algorithmic Prediction". The paper discusses how algorithms, such as deep models trained via empirical risk minimization, have generally outperformed human experts in various domains. It proposes a framework for identifying regions within the input space where human experts are more likely to outperform the model and when an algorithm should "ask" a human expert to help. They demonstrate this framework on a medical binary classification task with expert human annotators (doctors) in their experiments. This project explores this framework in a multi-class classification setting with less reliable human annotators (mechanical turkers).

This repository extends their experiments to a multi-class classification setting - the CIFAR-10 dataset. It lets $F$ (the feasible set of predictors) be a finite set of deep model architectures either trained from scratch or fine-tuned on the CIFAR-10 training set. It uses annotations from mechanical turkers on the test set of CIFAR-10 be the humans the framework can escalate to, in the regions where the model's predictions are less reliable or uncertain. I use the CIFAR-10H dataset, which contains raw human annotations from mechanical turkers on the test set of CIFAR-10, to evaluate the framework. Each example in the test set is annotated by approximately 50 annotators.

The framework functioned as outlined in the paper. The Chebyshev distance metric outperformed the Hamming distance metric, yielding subsets of the input space that more closely approached alpha indistinguishability based on the plots (see results folder). With a small number of clusters, it's not clear that there is any subset of $X$ where requesting human help is meaningfully beneficial, except if you were using a ResNet (which is almost 10 years old!).

Leave a Comment