Bento, Snap's ML Platform

I got to listen to a (fanstastic, by the way) talk outlining this platform a couple days ago at a meetup. What struck me about the way Snap described their models is the sheer size of their recommendation systems. Deepseek V3 was trained on about 60 TB of tokens, but Snap’s recommendation system eats about 20 times that.

Caveat, of course: a recommendation model is very sparse, meaning not all of the model needs to be activated at the same time, whereas most LLMs are still dense. Bigger model doesn’t necessarily mean a harder scaling problem, just a different one.