I was a bit surprised Meta didn't publish an example way to simply invoke one of these LLM's with only torch (or some minimal set of dependencies), though I am obviously grateful and so pleased with their contribution of the public weights! There are other popular ways to invoke these models, such as Ollama and Hugging-Face's general API package: transformers, but those hide the interesting details behind an API. I want to peel back the layers, poke, prod, understand and gain insight into these models and help you do the same.
The three global variables in run_inference.py: MODEL_NAME, LLAMA_MODELS_DIR and INPUT_STRING take the values you'd expect (there are adjacent comments with examples and more details too). They should be modified as you see fit.
The minimal set of dependencies I found includes torch (perhaps, obviously), a lesser known library also published by Meta: fairscale, which implements a variety of highly scalable/parallelizable analogues of torch operators and blobfile, which implements a general file I/O mechanism that Meta's Tokenizer implementation uses.