Published
- 2 min read
Sign Language Recognition with Computer Vision

Quick Intro
Building a real-time action-tracking model. requires a blend of computer vision savvy and a good grasp of anatomy. But here’s the good news: the heavy lifting in technology has been done, thanks to Google’s Mediapipe library.
This nifty tool lets us use its holistic model to track body movements in real-time. So, the main puzzle left for us to solve would be developing and training a model that can classify these movements. For dealing with sequence tasks like this, LSTM is always one of the top choices.
The Work Flow

The Result



Final Thoughts
-
Data Quality is ALWAYS the first: The quality of your data (think camera resolution, lighting, backgrounds, frame duration) is a game-changer for your model’s performance. Sometimes, a simple shallow LSTM model works wonders with top-notch data.
-
Holistic + LSTM = A Winning Combo: The holistic model is a star – it’s quick, accurate, and dependable. Pair it with some shallow LSTM layers, and an efficient setup for action recognition is ready.
-
Feature Selection Tricks: In this project, I played around with face, hand, and body landmarks. I guess dropping the face landmarks might speed up the prediction process even more without messing up the results too much. I haven’t tested this yet but I believe this might be the case.
Improvements
-
Training is not smooth, overfitting: Some dropouts and normalizations might improve the model’s training stability, as you can see the training is not smooth at all.
-
The recognition speed: Not fast enough for tackling real-world hand gesture translation tasks, as I am using 50% as the threshold, meaning if the prediction probability reaches 50% then it yields the corresponding action as the result. I think lowering that threshold and increasing the model’s generalization ability might lead to a much faster prediction speed.
Project Repository
👉 Github