@geekynews: Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open-source foun- dational model that natively supports multimodal understanding and generation. BAGEL is a unified, decoder-only model pretrained on trillions of tokens curated from large-scale interleaved text, image, video, and web data. When scaled with such diverse multimodal interleaved data, BAGEL exhibits emerging capabilities in complex multimodal reasoning. As a result, it significantly outperforms open-source unified models in both multimodal generation and understanding across standard benchmarks, while exhibiting advanced multimodal reasoning abilities such as free-form image manipulation, future frame prediction, 3D manipulation, and world navigation. In the hope of facilitating further opportunities for multimodal research, we share the key findings, pretraining details, data creation protocal, and release our code and checkpoints to the community. #bytedance #opensource #multimodal

Geeky News
Geeky News
Open In TikTok:
Region: GB
Monday 26 May 2025 08:08:53 GMT
1062
13
4
3

Music

Download

Comments

trailervault7
TrailerVault :
😎nice
2025-05-28 09:26:47
0
ichigo.kirito5
Ichigo Kirito :
I've tried it, it's so slow and the images aren't that good
2025-05-26 10:00:33
1
onscreeninshort
Films & TV - The Short Version :
🥰
2025-05-29 04:44:22
0
To see more videos from user @geekynews, please go to the Tikwm homepage.

Other Videos


About