We believe in giving back to the community. So today we introduce Prem Benchmarks. This is a fully open-source project with its primary objective bein

Introducing Benchmarks v2

submited by
Style Pass
2024-05-15 01:00:10

We believe in giving back to the community. So today we introduce Prem Benchmarks. This is a fully open-source project with its primary objective being to benchmark popular LLM inference engines (currently 13 + engines) like vLLM, TensorRT LLM, HuggingFace Transformers, etc on different precisions like float32, float16, int4, and int8. We benchmark w.r.t. different parameters of comparison, hoping to give the open-source LLM community and enterprises a better idea of LLM inference metrics. The benchmarks are exclusively dedicated to comparing distinct open-source implementations. Don't forget to star this repository and follow Prem. We are constantly improving and adding benchmarks for newer engines.

You might wonder why one open-source repository is comparing other open-source implementations. Ultimately, it all boils down to making important decisions and the cost associated with those decisions. The final goal is to decide on an option that fulfills our requirements, is cost-effective, and delivers the quality we expect. Benchmarks aim to encompass all these aspects. To be specific, here are three pointers that summarize the need for benchmarks:

Our benchmarking process is simple. We refer to each implementation as an 'engine'. Each engine has its dedicated folder structure. Each of them will have these four files in common. Here is an example:

Leave a Comment