Dmcl Achieves Robust Dai-Tir Performance By Eliminating Hallucinated Visual Cues

Researchers are tackling a significant problem hindering the progress of Diffusion-Interactive Text-to-Image Retrieval (DAI-TIR): the tendency of diffusion models to introduce misleading visual ‘hallucinations’ that degrade performance. Zhuocheng Zhang from Hunan University, Kangheng Liang and Paul Henderson from the University of Glasgow, along with Guanxuan Li, Richard Mccreadie and Zijun Long, demonstrate empirically how these inaccuracies can substantially reduce retrieval effectiveness. Their new framework, Diffusion-aware Multi-view Contrastive Learning (DMCL), offers a robust training approach by optimising representations of both query intent and target images, effectively filtering out these hallucinated cues and improving the alignment of textual and…

Source link

Dmcl Achieves Robust Dai-Tir Performance By Eliminating Hallucinated Visual Cues

Related

Leave a Comment Cancel reply

Share this:

Related

Related posts:

Leave a Comment Cancel reply