Run state-of-the-art language models directly from Ruby. No Python, no APIs, no external services - just Ruby with blazing-fast Rust under the hood. Hardware accelerated with Metal (Mac) and CUDA (NVIDIA). Red candle leverages the Rust ecosystem, notably Candle and Magnus, to provide a fast and efficient way to run LLMs in Ruby. See Dependencies for more.
You just ran a 1.1-billion parameter AI model inside Ruby. The model lives in your process memory, runs on your hardware (CPU/GPU), and responds instantly without network latency.
Note on GGUF Support: Red-Candle now uses a unified GGUF loader that automatically detects the model architecture from the GGUF file. This means all GGUF models (including Mistral models from TheBloke) should now work correctly! The loader automatically selects the appropriate tokenizer based on the model type to ensure proper text generation.
Warning: Q2_K quantization can lead to "weight is negative, too large or not a valid number" errors during inference. Use Q3_K_M or higher for stable operation.