Posted on Sep 18, 2024 — Reading 14 minutes

Serving AI From The Basement — Part II

submited by

Style Pass

2024-09-21 22:00:02

Posted on Sep 18, 2024 — Reading 14 minutes Inference Agents AI AI Server

In this blogpost: SWE Agentic Framework – think of it as the puppet master for coders plus Replit’s next nemesis. MoEs – imagine a team of AI experts, each shouting answers when it’s their topic. Quantizations & Mixed Precision – turning AI from gourmet to fast food without losing the flavor. Batch Inference – AKA AI’s quiz night, answering all questions at once. LLM Architectures – blueprints for our chatty AI friends. vLLM and Tensor Parallelism – or the thing that makes big AI models run lean. DeepSeek v2.5 – our open weights savior. Embedding Models – translating human words into AI-understandable numbers. Speculative Decoding – or AI’s attempt at mind-reading, guessing your sentences before you finish them.

For about 3 weeks now, I have been working on a multi-agent system that simulates a team of Software Engineers; this system assigns projects, creates teams and adds members to them based on areas of expertise and need, and asks team members to build features, assign story points, have pair programming sessions together, etc. Started mainly for fun and exploration, however, last week the following paper was released: Agents in Software Engineering .