The first step to speeding up your code: identify your code’s performance bottlenecks. But that’s harder than you’d like.
Production is different than your laptop, from CPUs to memory to disk speed to network latency. Is loading from S3 a problem? Are you swapping? Is your program running with less parallelism when there’s more CPUs? Plus, performance problems with production data won’t necessarily show up when using test data.
The best place to get an accurate understanding of performance bottlenecks is by observing production. But how do you get that data?
By running all your production jobs with profiling enabled from the start, by default. Most profilers are not designed to run constantly in production, of course. You need a profiler designed to have a low performance overhead, and that’s robust enough to run in production.
This is where the Sciagraph profiler comes in: it’s a profiler designed to run in production, and profile data processing batch jobs.