Objectives

Primary Goals:

Develop a machine-learning based model to properly recognize and quantify video and audio time offsets for videos of a single speaker in full-frontal view, with an error less than 40ms Completed
Define a performance spectrum of the model for varying delays between the audio and video streams, to determine the range of its best use case Completed

Create a backend program to run in real time, to capture the on screen video, determination in time offset, and make the necessary correction, all automatically Completed
Expand analysis to similar video types, acquired from various web sources (YouTube) Completed
Implement delay compensation for a common video streaming service Aborted

All datasets – full-frontal recordings of Loic, An, and Mark, and full-frontal online recordings
Scripts that demultiplex audio/video files, preprocess the audio/video streams, and inject time delay into video streams
Suitable neural network architecture for determining time offset
Program that restores audio/video synchronization offline
Complete demonstration of synchronizing audio and video on a laggy live stream
Final presentation video

Dataset Google drive link found under Datasets section.

Demo video, local machine test script with readme, final presentation video found here: Demo Files

Errors in software (potentially a major obstacle as many packages and dependencies need to be installed/utilized correctly)
Neural network training speed
Error between estimated and actual time offset

Major Objective with Subtasks	Week 5	Week 6	Week 7	Week 8	Week 9	Week 10
I. Datasets
Find datasets online (full frontal, single person): Loic, Mark, An
Collect own datasets (full frontal, single person): Loic, Mark, An
Experiment with video streaming sites, to test real-time performance (optional): Loic, Mark, An
II. Capture audio and video from user end
Select hardware/software tools: Mark
Implementation (optional): Mark
III. Handle the captured audio and video inputs
Determine what tools to use for demuxing audio/video of .mp4 files: Mark
Create translation into a usable format for the feature extraction program: Mark
Expand to other video formats (optional): Mark, An, Loic
IV. Data Processing and Time Offset Computation
Define features to extract from video/audio: An, Loic, Mark
Build software to perform feature extraction: An
Cross-correlation implementation: An
Complete processing pipeline: An, Mark, Loic
Optimization on training data: An
V. Create wrapper program
Define how to handle time injection: Loic
Integrate the processing, resulting delay injection, and video player: Loic
VI. Translate into real-time system (optional)
Make video/audio capture and processing run in the background: Mark
Implement real time delay injection: Loic