| ||||
| ||||
![]() Title:CSMC-VQA: Dynamic Code-Switch Multimodal Curriculum Learning for Vietnamese Visual Question Answering Authors:Khoi Tran Tam, Nhi Nguyen Thi Uyen, Thanh Le Manh, Thuy Nguyen Thi Thanh and Dinh Nguyen Thi Conference:ACIIDS2026 Tags:Code-switching, Curriculum Learning, Data Augmentation, Difficulty Scoring, Multimodal Learning and Visual Question Answering Abstract: The Visual Question Answering (VQA) problem on Vietnamese data is still challenging due to the complex and heterogeneous language structure, because the code-switching phenomenon between Vietnamese and English appears together in real-life situations. To address these issues, this paper proposes a CSMC-VQA model utilizing Dynamic Code-Switch Multimodal Curriculum Learning to enhance the accuracy and adaptability of the VQA model in bilingual and multimodal settings. The proposed method includes three main components: (1) semantically guided bilingual data augmentation, ensuring that the mixed English-Vietnamese “question-answering” pairs retain their meaning through cosine similarity testing; (2) determining the complexity of each sample based on the semantic difference between linguistic and visual features; (3) learning according to the Dynamic Curriculum Learning path, helping the model learn from easy to difficult and self-adjust the training sequence based on actual performance. Experimental results on two datasets, ViVQA and OpenViVQA, show that the combination of CLIP + PhoBERT for the CSMC-VQA model achieves the highest accuracy of 68.5% and 65.2% respectively. This result demonstrates a significant improvement in accuracy performance for the VQA problem, while also opening up a new approach for the multimodal VQA problem and applying it to English-Vietnamese bilingual data. CSMC-VQA: Dynamic Code-Switch Multimodal Curriculum Learning for Vietnamese Visual Question Answering ![]() CSMC-VQA: Dynamic Code-Switch Multimodal Curriculum Learning for Vietnamese Visual Question Answering | ||||
| Copyright © 2002 – 2026 EasyChair |
