This repository contains benchmarks for comparing two popular artificial intelligence frameworks that work on Apple Silicon devices: MLX and PyTorch.
We ran five benchmarks several times to emulate a day-to-day usage. For more information about them, please refer to section Details about each benchmark.
We executed the tests for ten iterations each, except the language model training and the BERT training ones, for which we ran only three iterations due to the extra time they took.
The results on the tables below show the average time for the iterations we ran. For information about the median of the execution times for each benchmark, refer to raw_results.txt.
Every Python file in the root folder represents a different benchmark. All of them require two arguments: the number of times to run the benchmark and the framework. If you'd like to run, for example, the TinyLLama inference benchmark ten times using PyTorch, execute:
The whisper_inference benchmark only works with the latest commit from the PyTorch repository, so build it from sources to run this benchmark.