Overcoming Language Priors via Shuffling Language Bias for Robust Visual Question Answering
Recent research tommy todd ointment has revealed the notorious language prior problem in visual question answering (VQA) tasks based on visual-textual interaction, which indicates that well-developed VQA models rely on learning shortcuts from questions without fully considering visual evidence.To tackle this problem, most existing methods focus on