arXiv AI recent: Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades
The authors introduce the Forced Deferral Attack (FDA), an adversarial image attack that reduces the confidence of a weak multimodal language model in a cascade, causing the system to rou...
The paper describes multimodal large language model (MLLM) cascades that use a cheap weak model first and defer to a strong model when the weak model’s confidence is low. It identifies a vulnerability where an adversary can manipulate the weak model’s confidence to force deferral.