Skip to content

Conversation

@ppraneth
Copy link
Contributor

@ppraneth ppraneth commented Dec 5, 2025

This PR resolves a TODO in slime/utils/data.py to support datasets where message["content"] is a list of dictionaries (e.g., [{"type": "image", ...}, {"type": "text", ...}]), which is standard for many multimodal instruction tuning datasets.

Changes

  • Updated _build_messages in slime/utils/data.py to handle list inputs.
  • Added validation to ensure list items are dictionaries with a valid type.
  • Maintained backward compatibility for the legacy string format (text with <image> placeholders).

@ppraneth
Copy link
Contributor Author

ppraneth commented Dec 5, 2025

cc @zhuzilin

@ppraneth
Copy link
Contributor Author

@yitianlian Can you check this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant