Benchmarking IO throughput for Networking and Storage intensive workloads in Python is trickier than in other languages due to the single-threaded nature of the Python interpreter. So, we are forced to spawn sibling processes that will run the client and server code independently to avoid them affecting each other and competing for resources inside of a single interpreter instance.
Most developers don't even consider using UDP stateless sockets and go directly for the heavy TCP/IP stateful stack. Similarly, many assume that async code will make IO faster, but it's not free and will backfire in a well-connected data center. Moreover, state management in high-level libraries is often more expensive than sending a packet to a neighboring country or running the inference of an AI model 🤯
Frankly, pytest-benchmark is not the most accurate benchmarking tool, but most numbers here are so large that timing isn't an issue. Here are the highlights: