Chris's Wiki :: blog/unix/NFSLosingLocksAndSIGLOST

submited by
Style Pass
2024-11-17 13:30:02

NFS is a network filesystem that famously also has a network locking protocol associated with it (or part of it, for NFSv4). This means that NFS has to consider the issue of the NFS client losing a lock that it thinks it holds. In NFS, clients losing locks normally happens as part of NFS(v3) lock recovery, triggered when a NFS server reboots. On server reboot, clients are told to re-acquire all of their locks, and this re-acquisition can explicitly fail (as well as going wrong in various ways that are one way to get stuck NFS locks). When a NFS client's kernel attempts to reclaim a lock and this attempt fails, it has a problem. Some process on the local machine thinks that it holds a (NFS) lock, but as far as the NFS server and other NFS clients are concerned, it doesn't.

Sun's original version of NFS dealt with this problem with a special signal, SIGLOST. When the NFS client's kernel detected that a NFS lock had been lost, it sent SIGLOST to whatever process held the lock. SIGLOST was a regular signal, so by default the process would exit abruptly; a process that wanted to do something special could register a signal handler for SIGLOST and then do whatever it could. SIGLOST appeared no later than SunOS 3.4 (cf) and still lives on today in Illumos, where you can find this discussed in uts/common/klm/nlm_client.c and uts/common/fs/nfs/nfs4_recovery.c (and it's also mentioned in fcntl(2)). The popularity of actually handling SIGLOST may be indicated by the fact that no program in the Illumos source tree seems to set a signal handler for it.

Leave a Comment
Related Posts