This project provides a Docker-based inference engine for running Large Language Models (LLMs) on AMD GPUs. It's designed to work with models from Hugging Face, with a focus on the LLaMA model family.
The project includes an Aptfile that lists the necessary ROCm packages to be installed in the Docker container. This ensures that all required ROCm drivers and libraries are available for the inference engine to utilize the AMD GPU effectively.
If you need to change how the inference is performed, modify the run_inference.py file. Remember to rebuild the Docker image after making changes.