Off-Peak Computing is the most cost-efficient batch inference API for open-source models             Get high quality output, at cheapest p

Off-peak computing Most affordable batch LLM Inference provider 50-90% cheaper than any other provider

submited by
Style Pass
2024-10-18 13:30:08

Off-Peak Computing is the most cost-efficient batch inference API for open-source models Get high quality output, at cheapest price, for all use cases when you can tolerate some delay !

Per million tokens for Llama-3.1-70b-Instruct FP16 ($0.30/0.50 input/ouput per million tokens)

Data centers have *many* short intervals of unused compute time—minutes or hours—that go to waste. Traditional systems cannot efficiently capture these brief windows, and once they are gone, the opportunity is lost. At EXXA, we have created a custom scheduler and orchestrator that aggregates these unused fragments across multiple data centers, enabling us to run AI workloads efficiently on underutilized compute acquired at a discount. We then pass those savings on to you.

Custom inference engine optimized for batch API (incl. persistent KV cache, cross-platform and cross-GPU)

Leave a Comment
Related Posts