ml5-bodypose-example

A web app to detect the body pose of a human occupant of a vehicle, and warn for unsafe positions in real-time. The app runs on a phone mounted in the interior of the car.

Try it now!: https://jjbel.github.io/ml5-bodypose-example/

Testing with OptiTrack
Body Tracking

Motivation

Testing with OptiTrack

OptiTrack Motive UI

We test the accuracy of head turn detection by comparing it with OptiTrack - a marker-based 3D tracking system.

An occupant sits in the driving simulator. The iPhone is mounted on the dashboard with the occupant in view.
The occupant turns their head through half a rotation from left to right, pausing at 0, 30, 60, and 90 degrees on each side.
The app records the angle directly. OptiTrack uses markers mounted on a headband to track.

The data is collected and analyzed in MATLAB:

The model data is of much lower amplitude than the OptiTrack data. The UI has buttons to set 0 and 90 degrees, and the scaled output fits much better:

Optitrack Angle	Model Angle
0°	5.955°
30°	28.22°
60°	61.80°
90°	87.98°

The model has a delay in running, hence the model data is shifted behind the OptiTrack data by around 500ms.

Body Tracking

The app uses the following javascript libraries:

TensorFlow MoveNet (https://www.tensorflow.org/hub/tutorials/movenet): real-time pose detection
ml5.js bodypose (https://docs.ml5js.org/#/reference/bodypose) : which provides a simple API to actually use MoveNet in javascript
p5.js (https://p5js.org/) for graphics functionality like video capture, canvas rendering

The app is hosted on the Github Pages of this repo: https://jjbel.github.io/ml5-bodypose-example/

Keypoints

BlazePose

BlazePose:
nose
left_eye_inner
left_eye
left_eye_outer
right_eye_inner
right_eye
right_eye_outer
left_ear
right_ear
mouth_left
mouth_right
left_shoulder
right_shoulder
left_elbow
right_elbow
left_wrist
right_wrist
left_pinky
right_pinky
left_index
right_index
left_thumb
right_thumb
left_hip
right_hip
left_knee
right_knee
left_ankle
right_ankle
left_heel
right_heel
left_foot_index
right_foot_index

Todo

Integrate a biomechanical model
Try to select a wide-angle lens of the phone
Make the app usable offline after downloading (eg by doing “Add to Home screen” from the browser)
figure out why 2 huge delays, one when loading, other before camera feed starts

Other Pose Detection Approaches tried

1. ARKit

We initially tried using iPhone Pro models for possibly more accurate pose detection, as they have a Time-of-Flight LiDAR scanner which gives depth data, along with RGB data from the camera. We used ARKit’s body tracking via Unity’s ARFoundation. However ARKit requires full-body visibility to initialize tracking. If a standing person sat in the driving simulator, ARKit would lose tracking.

The lack of visibility is a central issue. ARKit uses the default 24mm lens of the iPhone 14 Pro for the RGB data to accompany the depth data from the LiDAR sensor. When mounting the iPhone on the dashboard of the car, the visibility is too poor for detection. The iPhone has a wider 13mm lens, but the ARKit API does not give a choice of lens.

We tested ARFoundation by building the arfoundation-samples demo app for the iPhone. 2d tracking on a dashboard-mounted iPhone seemed to detect better than 3d tracking, which just failed.

2. OpenPose

https://github.com/CMU-Perceptual-Computing-Lab/openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

OpenPose uses just RGB data for detection. It can detect poses from an image, video or live camera feed. It supports detecting multiple humans.

However OpenPose is unsuitable because it does not seem to be built for mobile.

Although OpenPose claims realtime detection, testing it on both a laptop and powerful desktop failed to give realtime results.

We tested 3 videos:

video.avi which comes as a sample with OpenPose
driving-sim 480p: a 30Hz 1920x1080 video of a person in the driving simulator, downscaled to 480p for better performance. The video was taken using the ultrawide 13mm lensof the iPhone 14 Pro.
driving-sim 240p: the same video downscaled to 240p

Testing was conducted on:

Laptop: A Dell XPS Snapdragon X Elite (ARM) running open-pose cpu. This gave unusable framerates of 0.2Hz, possible because openpose was running in emulation mode on ARM.
Lab PC: with an Intel i9-14900KF CPU and NVIDIA RTX 4090 GPU. Even this powerful PC ran openpose-gpu at 17Hz for a 30Hz video, i.e. half of realtime. The GPU utilisation was not high, but certain CPU cores were 100% utilized suggesting openpose-gpu is still CPU single-thread bound.

3. ml5.js bodypose

ml5.js bodypose is attractive because:

It provides javascript access to the Tensorflow MoveNet and BlazePose models, meaning it can run in a website on any device with a browser
The models provide 3d pose detection without the need for a depth camera.
The models run in realtime on modern phone hardware
The API is relatively simple to use