GPTrillion - the 1.5 Trillion parameter model

GPTrillion - the 1.5 Trillion parameter model

cover image for the ml model GPTrillion.

We’ve created GPTrillion, the latest milestone in Banana’s effort to scale up machine learning. GPTrillion is a large multimodal model that, while less capable than humans in many real-world scenarios, has 1.5 trillion parameters and is the largest open-source model in the world.

You can read the full GPTrillion paper here.

Over the past 6 months, we trained GPTrillion on Banana’s serverless GPU cluster, using spare capacity during periods of low activity. Throughout the training process, we found and fixed some bugs and improved our theoretical foundations. As a result, our GPTrillion training run was (for us at least!) unprecedentedly stable, becoming the largest GPT model by parameter size to be fully open-sourced and high performance.

As we continue to focus on reliable serverless GPU scaling, we aim to hone our methodology to help us predict and prepare for future capabilities increasingly far in advance—something we view as critical for our customers.

We are releasing GPTrillion’s capability via HuggingFace. We’re open-sourcing everything, to allow anyone to report shortcomings in the model to help guide further improvements.

Please check out GPTrillon on HuggingFace, read the paper, and download the model to try it for yourself.