Subjecthood desk method note: We report the discourse. We do not assert AI systems are or are not conscious. We label position families.

arXiv AI recent: How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

2026-06-20 arxiv.org

Researchers proposed cross-attention attribution for speech diffusion models to understand how individual words influence acoustic output in style-captioned text-to-speech systems.,The me...

The study used a dataset of 120 style captions conditioning the generation of 30 text transcripts each, resulting in 3,600 combinations.,The method extracts per-token heatmaps across 25 layers and 24 ODE steps, providing detailed information about the influence of individual words on acoustic out...

Sources

arXiv AI recent challenge