Speedrunning ML Ops Let's quickly learn how to do ML Ops

submited by

Style Pass

2025-01-07 22:00:18

My background is in building custom DevOps workflows for developing and testing AI accelerators, which I guess is a specific kind of ML Ops. But I've never sat down to make a study of it, so here we are.

In DevOps you want to manage the entire developer workflow from coding to deployment in production. Doing this means tracking the following automatically:

ML Ops is the same, but requires also managing the machine learning workflow. Whereas DevOps just manages code and code artifacts, ML Ops manages:

These three things are the elements of any ML Ops system. And they compound with each other, meaning you'll have around 33 combinations of changes to track.

Let's assume we have access to GPUs somewhere, like in a data center or cloud cluster. And assume that the code is tracked with git as is common in developer workflows already. There are still two major sources of variance in the workflow that need to be locked down:

In tutorials and docs you often see datasets being pulled from pytorch or sklearn or huggingface directly. This works for tutorials, but for production systems you need be sure that the data you train with doesn't change from under you: the content of the data must be hashed and tracked in some stable storage.

Speedrunning ML Ops Let's quickly learn how to do ML Ops

Leave a Comment

Related Posts

Recent Posts

Keep Up With the Industry, At Least a Little - Don't Break Prod

C++26: a placeholder with no name

I traveled 70,000+ miles last year for work - here's what's in my bag

Meta Open-Sources Byte Latent Transformer LLM with Improved Scalability

Housing | Silicon Valley homeless nonprofit sues…

Toshihiko Izutsu - Wikipedia

White House unveils Cyber Trust Mark program for consumer devices

Logging off life but living on: How AI is redefining death, memory and immortality

Learning Pyxel: Moving Sprites with the Mouse

Using "quadratic funding" to make a living building free software

Sex Differences in Human Brain Structure at Birth

Limit coffee-drinking to this time window to lower early death risk, study suggests

Je suis Charlie, ten years on

Acer unveils the comically huge Nitro Blaze 11, a gaming handheld more than three times the weight of a Nintendo Switch

PayPal's Honey Extension Accused of Hijacking Affiliate Links

Why Waiting Is Torture

3Blue1Brown copyright takedown blunder by AI biz blamed on human error

Digs & Discoveries - Bad Moon Rising - Archaeology Magazine - January/February 2025

AI data extraction made ridiculously simple

Never mind those Chinese spies: US Air Force picks Verizon for 35 base network upgrades