While we await GPT-5, few would argue that in May, 2024, OpenAI’s GPT-4 is still the champ when it comes to overall performance as an LLM. Where it

Need for Speed: LLMs Beyond OpenAI with C#, .NET 8 SSE + Channels, Llama3, and Fireworks.ai

submited by

Style Pass

2024-05-06 11:00:06

While we await GPT-5, few would argue that in May, 2024, OpenAI’s GPT-4 is still the champ when it comes to overall performance as an LLM. Where it comes up short is its relatively low throughput and high latency which can make it sub-optimal if the UX requires a more interactive experience.

A recent Hackernews thread led me to TheFastest.ai and I was quite intrigued by both the high throughput of Meta’s Llama 3 as well as two platforms: Groq.com and Fireworks.ai.

In this article, we’ll explore building an app with Fireworks.ai, Meta Llama 3 8B/70B, .NET 8, System.Threading.Channels, and Server Sent Events (SSE).

💡 Full repo is available here: https://github.com/CharlieDigital/dn8-sk-llama3-fireworks. Follow the instructions in the README.md to get it up and running. Start by signing up for a free account and credits at Fireworks.ai

The top of the stack is dominated by Llama-3 and Groq with Fireworks.ai rounding out the top 5 (we’ll discuss in a bit why teams should probably choose Fireworks)