Home

Published

- 2 min read

Sign Language Recognition with Computer Vision

img of Sign Language Recognition with Computer Vision

Quick Intro

Building a real-time action-tracking model. requires a blend of computer vision savvy and a good grasp of anatomy. But here’s the good news: the heavy lifting in technology has been done, thanks to Google’s Mediapipe library.

This nifty tool lets us use its holistic model to track body movements in real-time. So, the main puzzle left for us to solve would be developing and training a model that can classify these movements. For dealing with sequence tasks like this, LSTM is always one of the top choices.

The Work Flow

Flowchart of action recognition
Basic models work flow

The Result

Training-Accuracy
Training Accuracy
Training-Loss
Training Loss
Real-time sign language recognition
Real-Time Sign Language Recognition

Final Thoughts

  • Data Quality is ALWAYS the first: The quality of your data (think camera resolution, lighting, backgrounds, frame duration) is a game-changer for your model’s performance. Sometimes, a simple shallow LSTM model works wonders with top-notch data.

  • Holistic + LSTM = A Winning Combo: The holistic model is a star – it’s quick, accurate, and dependable. Pair it with some shallow LSTM layers, and an efficient setup for action recognition is ready.

  • Feature Selection Tricks: In this project, I played around with face, hand, and body landmarks. I guess dropping the face landmarks might speed up the prediction process even more without messing up the results too much. I haven’t tested this yet but I believe this might be the case.

Improvements

  • Training is not smooth, overfitting: Some dropouts and normalizations might improve the model’s training stability, as you can see the training is not smooth at all.

  • The recognition speed: Not fast enough for tackling real-world hand gesture translation tasks, as I am using 50% as the threshold, meaning if the prediction probability reaches 50% then it yields the corresponding action as the result. I think lowering that threshold and increasing the model’s generalization ability might lead to a much faster prediction speed.

Project Repository

👉 Github