In a production environment, filesystem faults might occur due to various incidents such as disk failures and administrator errors. As a Chaos Enginee

How to Simulate I/O Faults at Runtime | PingCAP

submited by
Style Pass
2021-07-23 01:30:08

In a production environment, filesystem faults might occur due to various incidents such as disk failures and administrator errors. As a Chaos Engineering platform, Chaos Mesh has supported simulating I/O faults in a filesystem ever since its early versions. By simply adding an IOChaos CustomResourceDefinition (CRD), we can watch how the filesystem fails and returns errors.

However, before Chaos Mesh 1.0, this experiment was not easy and may have consumed a lot of resources. We needed to inject sidecar containers to the Pod through the mutating admission webhooks and rewrite the ENTRYPOINT command. Even if no fault was injected, the injected sidecar container caused a substantial amount of overhead.

Chaos Mesh 1.0 has changed all this. Now, we can use IOChaos to inject faults to a filesystem at runtime. This simplifies the process and greatly reduces system overhead. This blog post introduces how we implement the IOChaos experiment without using a sidecar.

To simulate I/O faults at runtime, we need to inject faults into a filesystem after the program starts system calls (such as reads and writes) but before the call requests arrive at the target filesystem. We can do that in one of two ways:

Leave a Comment