This project provides a Docker-based inference engine for running Large Language Models (LLMs) on AMD GPUs. It's designed to work with models from Hug

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-02 07:30:04

This project provides a Docker-based inference engine for running Large Language Models (LLMs) on AMD GPUs. It's designed to work with models from Hugging Face, with a focus on the LLaMA model family.

The project includes an Aptfile that lists the necessary ROCm packages to be installed in the Docker container. This ensures that all required ROCm drivers and libraries are available for the inference engine to utilize the AMD GPU effectively.

If you need to change how the inference is performed, modify the run_inference.py file. Remember to rebuild the Docker image after making changes.

Leave a Comment