Advancing MLLM Alignment Through MM-RLHF: A Large-Scale Human Preference Dataset for Multimodal Tasks
Multimodal Large Language Models (MLLMs) have gained significant attention for their ability to handle complex tasks involving vision, language, and audio integration. However, they lack the comprehensive alignment beyond basic Supervised Fine-tuning (SFT). Current state-of-the-art […]
