Captions are often treated as speech transcripts, but video meaning is carried by much more than dialogue. Music can foreshadow danger. Laughter can change tone. A door slam, a long silence, or a shift in background sound can all be part of the story.
Non-speech information captioning asks how captions can represent these sounds in ways that are accurate, useful, and enjoyable for D/deaf and hard-of-hearing audiences. The challenge is not only detecting sound automatically. It is deciding what matters, how to phrase it, and how to avoid overwhelming the viewer.
This is where accessibility research becomes design research. Good captions are not merely generated; they are authored with audience needs, context, and agency in mind.