Improving high-resolution VQA by learning to crop without bounding box annotations.
nlp machine-learning natural-language-processing computer-vision deep-learning visual-question-answering visual-question-generation visrl uv-cot mini-o3 chain-of-focus deepeyes
-
Updated
Dec 6, 2025 - Python