Hugging Face this week announced HUGS, its answer to Nvidia's Inference Microservices (NIMs), which the AI repo claims will let customers deploy and r

Hugging Face puts the squeeze on Nvidia's software ambitions

submited by
Style Pass
2024-10-24 21:30:08

Hugging Face this week announced HUGS, its answer to Nvidia's Inference Microservices (NIMs), which the AI repo claims will let customers deploy and run LLMs and models on a much wider variety of hardware.

Like Nvidia's previously announced NIMs, Hugging Face Generative AI Services (HUGS) are essentially just containerized model images that contain everything a user might need to deploy the model. The idea is that rather than having to futz with vLLM or TensorRT LLM to get a large language model running optimally at scale, users can instead spin up a preconfigured container image in Docker or Kubernetes and connect to it via standard OpenAI API calls.

HUGS are built around its open source Text Generation Inference (TGI) and Transformers frameworks and libraries, which means they can be deployed on a variety of hardware platforms including Nvidia and AMD GPUs, and will eventually extend support for more specialized AI accelerators like Amazon's Inferentia or Google's TPUs. Apparently no love for Intel Gaudi just yet.

Despite being based on open source technologies, HUGS like NIMS aren't free. If deployed in AWS or Google Cloud, they'll run you about $1 an hour per container.

Leave a Comment