Mimic In-Context Learning for Multimodal Tasks was accepted to CVPR 2025.