One of the most rewarding things at Fireworks is being a part of the scaling journey for many, exciting AI start-ups. Over the last few months, we’v

GPUs on-demand: Not serverless, not reserved, but some third thing

submited by
Style Pass
2024-06-07 19:00:21

One of the most rewarding things at Fireworks is being a part of the scaling journey for many, exciting AI start-ups. Over the last few months, we’ve seen an explosion in the number of companies beginning to productionize generative AI. A question that we commonly get is:

Fireworks hosts our most popular models “serverless”, meaning that we operate GPUs 24/7 to serve these models and provide an API for any of our users to use the models. Our serverless offering is the fastest, widely available platform and we”re proud of its production-readiness. Serverless is the perfect option for running limited traffic and experimenting with different LLM set-ups. However, serverless has limitations:

Serverless is not personalized for you Your LLM serving can be made faster, higher-quality or lower cost based on personalization on several levels:

Our serverless platform is designed to excel at a variety of goals and support hundreds of base models and fine-tuned models. However, a personally-tailored stack still may provide better experiences.

Leave a Comment