arXiv AI recent: FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS
The authors present FlowEdit, a lifelong adaptation framework for frozen flow‑matching text‑to‑speech (TTS) models that learns pronunciation corrections as latent conditioning edits inste...
The paper reports that on a benchmark of 312 multilingual proper nouns across 18 language families, FlowEdit reduces the target‑word Phoneme Error Rate by 92.7% relative to the zero‑shot baseline while preserving overall speech quality. The correction process takes about 15 seconds on a single GPU.