Bootstrapping Sign Language Annotations with Sign Language Models
Summary
There are limitations to AI-based sign language interpretation due to a lack of quality annotation data, so this is a study on developing a pipeline that bootstraps sign language annotation with a sign language model.
Key Points
- AI-based sign language interpretation suffers from a lack of high-quality annotated data.
- A new dataset (ASL STEM Wiki, FLEURS-ASL) involves expert interpreters and includes hundreds of hours of data, but is only partially annotated and therefore underutilized.
- This study developed a pseudo-annotation pipeline that takes signed video and English as input and generates annotations including time intervals.
- The pipeline uses the K-Shot LLM approach as well as predictions from the fingerspelling recognizer and the isolated sign language recognizer (ISR).
- Professional interpreters annotated nearly 500 videos on the ASL STEM Wiki to provide a gold standard benchmark, which will be made publicly available along with over 300 hours of pseudo-annotated data.
Notable Quotes & Details
Notable Data / Quotes
- 6.7% CER (FSBoard)
- 74% top-1 accuracy (ASL Citizen datasets)
- 500 videos
- 300 hours of pseudo-annotations
Intended Audience
AI researcher, sign language researcher, HCI researcher