LLM monitoring is about collecting, visualizing, and setting up alerts based on general metrics (latency, tokens, cost…) or custom KPIs (evaluations).
LLM observability tools provide an SDK to log LLM calls from your code or an LLM Proxy to intercept requests. You can use the SDK to manually log inputs/outputs of LLMs and other steps like preprocessing or retrieval from a vector database
Unlike traditional software where the same input always produces the same output, LLMs (a probabilistic system) introduce variability by design.
Monitoring what your AI/LLM software system is doing by collecting data and generating system metrics is a good way to track the health of the system.
Observability tools provide a total view into the health and behavior of the application, which is crucial when using a non-deterministic system like LLMs.
You can use an LLM with a specific prompt to evaluate LLM generations and ask the LLM to judge the generation on hallucinations, toxicity, context relevancy for RAG applications, or any criteria you think are useful.