…and it’s pretty fun. I was very skeptical about the AI/LLM “boom” back when it started. I thought, like many other people, tha

llama.cpp guide - Running LLMs locally, on any hardware, from scratch

submited by
Style Pass
2024-11-29 15:30:02

…and it’s pretty fun. I was very skeptical about the AI/LLM “boom” back when it started. I thought, like many other people, that they are just mostly making stuff up, and generating uncanny-valley-tier nonsense. Boy, was i wrong. I’ve used ChatGPT once or twice, to test the waters - it made a pretty good first impression, despite hallucinating a bit. That was back when GPT3.5 was the top model. We came a pretty long way since then.

However, despite ChatGPT not disappointing me, i was still skeptical. Everything i’ve wrote, and every piece of response was fully available to OpenAI, or whatever other provider i’d want to use. This is not a big deal, but it tickles me in a wrong way, and also means i can’t use LLMs for any work-related non-open-source stuff. Also, ChatGPT is free only to a some degree - if i’d want to go full-in on AI, i’d probably have to start paying. Which, obviously, i’d rather avoid.

At some point i started looking at open-source models. I had no idea how to use them, but the moment i saw the sizes of “small” models, like Llama 2 7B, i’ve realized that my RTX 2070 Super with mere 8GB of VRAM would probably have issues running them (i was wrong on that too!), and running them on CPU would probably yield very bad performance. And then, i’ve bought a new GPU - RX 7900 XT, with 20GB of VRAM, which is definitely more than enough to run small-to-medium LLMs. Yay!

Leave a Comment