A library for prototyping realtime hand detection (bounding box), directly in the browser.
View a live demo in your browser here.
Note:
Version0.0.13
is the old version of handtrack.js which tracks only hands and was trained on the egohands dataset. It is slightly more stable than the recent version v0.1.x
which is trained on a new dataset (still in active development) and supports more classes (open, closed, pinch, point, zoom etc.). You might see some issues with the new version (feel free to downgrade to0.0.13
as needed) and also report the issues you see.
Handtrack.js is a library for prototyping realtime hand detection (bounding box), directly in the browser. It frames handtracking as an object detection problem, and uses a trained convolutional neural network to predict bounding boxes for the location of hands in an image.
Handtrack.js is currently being updated (mostly around optimizations for speed/accuracy and functionality). Here is a list of recent changes:
New dataset curation: A new dataset (~2000 images, 6000 labels) has been curated to cover new hand poses (discussed below) and focuses on the viewpoint of a user facing a webcam. Note that the dataset is not released (mostly because it contains personal information on the participants and effort is still underway to extract a subset that is free of PII). In the meantime, the project can still be reproduced using the egohands dataset which is public.
New Classes: Following a review of the use cases that developers have created so far with handtrack.js (e.g. game controls, detect face touching to minimize covid spread, air guitar etc), a new set of hand pose labels have been curated:
Reduced Model size: Handtrack.js now supports multiple models (e.g. ssd320fpnlite, ssd640fpnlite) with multiple sizes (large, medium and small). The large size is the default fp32 version of the each model while medium and small are fp16 and Int8 quantized versions respectively. In my experiments, the small version yields comparable accuracy but with a much smaller model weight size. For example, ssd320fpnlite sizes (large -> 12MB, medium -> 6MB, small -> 3MB!)
Note that smaller models don’t translate to faster inference speed - all three sizes yield about the same FPS.
The underlying models are trained using the tensorflow object detection api (see here).
FPS | Image Size | Device | Browser | Comments |
---|---|---|---|---|
26 | 450 * 380 | Macbook Pro (i7, 2.2GHz, 2018) | Chrome Version 72.0.3626 | – |
14 | 450 * 380 | Macbook Pro (i7, 2.2GHz, mid 2014) | Chrome Version 72.0.3626 | – |
Note: Handtrack.js has not been extensively tested on mobile browsers. There have been some known inconsistencies still being investigated.
Handtrack.js is provided as a useful wrapper to allow you prototype hand/gesture based interactions in your web applications. without the need to understand machine learning. It takes in a html image element (img
, video
, canvas
elements, for example) and returns an array of bounding boxes, class names and confidence scores.
Note that the current version of the handtrack.js library is designed to work in the browser (frontend Javascript) and not Node.js.
Handtrack.js can be imported into your application either via a script tag
or via npm
.
Once imported, handtrack.js provides an asynchronous load(
) method which returns a promise for a object detection model
object.
<!-- Load the handtrackjs model. -->
<script src="https://cdn.jsdelivr.net/npm/handtrackjs@latest/dist/handtrack.min.js"> </script>
<!-- Replace this with your image. Make sure CORS settings allow reading the image! -->
<img id="img" src="hand.jpg"/>
<!-- Place your code in the script tag below. You can also use an external .js file -->
<script>
const img = document.getElementById('img');
const model = await handTrack.load();
const predictions = await model.detect(img);
</script>
npm install --save handtrackjs
import * as handTrack from 'handtrackjs';
const img = document.getElementById('img');
const model = await handTrack.load();
const predictions = await model.detect(img);
Handtrack.js also proivdes a set of library helper methods (e.g. to start and stop video playback on a video element) and some model methods (e.g. detect
, getFPS
etc). Please see the project documentation page for more details on the API and examples.
If you are interested in prototyping gesture based (body as input) interactive experiences, Handtrack.js can be useful. The user does not need to attach any additional sensors or hardware but can immediately take advantage of engagement benefits that result from gesture based and body-as-input interactions.
Some (not all) relevant scenarios are listed below:
The main limitation currently is that handtrack.js is still a fairly heavy model and there have been some inconsistent results when run on mobile.
Commands below runs the demo example in the demo
folder.
npm install
npm run start
The start script launches a simple python3
webserver from the demo folder using http.server
. You should be able to view it in your browser at http://localhost:3005/. You can also view the pong game control demo on same link http://localhost:3005/pong.html
Paper abstract of the paper is here. (a full paper will be added when complete).
If you use this code in your project/paper/research and would like to cite this work, use the below.
Victor Dibia, HandTrack: A Library For Prototyping Real-time Hand TrackingInterfaces using Convolutional Neural Networks, https://github.com/victordibia/handtracking
@article{Dibia2017,
author = {Victor, Dibia},
title = {HandTrack: A Library For Prototyping Real-time Hand TrackingInterfaces using Convolutional Neural Networks},
year = {2017},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/victordibia/handtracking/tree/master/docs/handtrack.pdf},
}
[ ] Optimization: This thing is still compute heavy (your fans may spin after while). This is mainly because of the neural net operations needed to predict bounding boxes. I am currently exploring CenterNets (an anchor free object detection model) as one way to minimize compute requirements.
[ ] Tracking id’s across frames. Perhaps some nifty methods that assigns ids to each had as they enter a frame and tracks them (e.g based on naive euclidean distance).
[x] Add some discrete poses (e.g. instead of just hand, detect open hand, closed, ).