Table of Contents
Primary Goals:
- Develop a machine-learning based model to properly recognize and quantify video and audio time offsets for videos of a single speaker in full-frontal view, with an error less than 40ms Completed
- Define a performance spectrum of the model for varying delays between the audio and video streams, to determine the range of its best use case Completed
Stretch Goals:
- Create a backend program to run in real time, to capture the on screen video, determination in time offset, and make the necessary correction, all automatically Completed
- Expand analysis to similar video types, acquired from various web sources (YouTube) Completed
- Implement delay compensation for a common video streaming service Aborted
Deliverables
- All datasets – full-frontal recordings of Loic, An, and Mark, and full-frontal online recordings
- Scripts that demultiplex audio/video files, preprocess the audio/video streams, and inject time delay into video streams
- Suitable neural network architecture for determining time offset
- Program that restores audio/video synchronization offline
- Complete demonstration of synchronizing audio and video on a laggy live stream
- Final presentation video
Dataset Google drive link found under Datasets section.
Demo video, local machine test script with readme, final presentation video found here: Demo Files
YouTube Link to Video: EEM202A Final Project Video
Success Metrics
- Errors in software (potentially a major obstacle as many packages and dependencies need to be installed/utilized correctly)
- Neural network training speed
- Error between estimated and actual time offset
Project Timeline
Legend
Status | Color |
---|---|
Scheduled | |
In progress | |
Completed | |
Aborted |
Schedule
Major Objective with Subtasks | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 |
---|---|---|---|---|---|---|
I. Datasets | ||||||
Find datasets online (full frontal, single person): Loic, Mark, An | ||||||
Collect own datasets (full frontal, single person): Loic, Mark, An | ||||||
Experiment with video streaming sites, to test real-time performance (optional): Loic, Mark, An | ||||||
II. Capture audio and video from user end | ||||||
Select hardware/software tools: Mark | ||||||
Implementation (optional): Mark | ||||||
III. Handle the captured audio and video inputs | ||||||
Determine what tools to use for demuxing audio/video of .mp4 files: Mark | ||||||
Create translation into a usable format for the feature extraction program: Mark | ||||||
Expand to other video formats (optional): Mark, An, Loic | ||||||
IV. Data Processing and Time Offset Computation | ||||||
Define features to extract from video/audio: An, Loic, Mark | ||||||
Build software to perform feature extraction: An | ||||||
Cross-correlation implementation: An | ||||||
Complete processing pipeline: An, Mark, Loic | ||||||
Optimization on training data: An | ||||||
V. Create wrapper program | ||||||
Define how to handle time injection: Loic | ||||||
Integrate the processing, resulting delay injection, and video player: Loic | ||||||
VI. Translate into real-time system (optional) | ||||||
Make video/audio capture and processing run in the background: Mark | ||||||
Implement real time delay injection: Loic |