The other day I saw the launch of Apple Vision Pro, the whole thing was very interesting, but one thing that caught my attention was the finger input. It seems very intuitive, by using the finger pinching as sort of like a cursor or mouse input. I figured I want to try it out, so I took it upon myself to create it.
The goal here is to use the hand as an input device for computer. It should be able to handle clicking and moving the mouse cursor. In order to do this, we obviously need a camera, and for starter, let’s point the camera to face downwards, since that is basically where the hands will be when using a computer. Next we need some sort of way to detect the hand and fingers position for controlling the mouse. For this, I will be using MediaPipe from Google. The best way I can describe MediaPipe is, a set of prebuilt solutions for ML, and it happens to have a hand landmarker feature, which is what we needed. Lastly we will need to somehow simulate the mouse input.
Starting with this, we can use the python version of MediaPipe, and have OpenCV read the camera feed and input it to MediaPipe, then utilize the hand landmarks to simulate the mouse, simple right? Except that this doesn’t work very well. I’ve only got into the OpenCV with MediaPipe and I found that it is super laggy.