Having real-time inference is crucial for computer vision applications. In some domains, a 1-second delay in inference could mean life or death.
Thus, having real-time inference capability is paramount and will determine whether a model gets deployed or not. In many cases, you can pick one or the other:
By the end of the post you’ll learn how to supercharge the inference speed of any image models from TIMM with optimized ONNX Runtime and TensorRT.
Let’s begin with the installation. I will be using a conda environment to install the packages required for this post. Feel free to the environment of your choice.
Look closely, the EVA02 model achieves top ImageNet accuracy (90.05% top-1, 99.06% top-5) but is lags in speed. Check out the model on the timm leaderboard here.
Although the performance on the GPU is not bad, 12+ FPS is still not fast enough for real-time inference. On my reasonably modern CPU, it took over 1.5 seconds to run an inference.