Banana

Public Roadmap

Hi there! Banana is Serverless GPUs. But why?

Because we think that ML projects should get the same beautiful, thoughtless, and affordable tooling that the rest of the software world gets. We believe in Software 2.0, the belief that the software of the future will be increasingly powered my ML models rather than coded logic.

At Banana, our mission is to power the next million ML applications by giving them software margins. As cliche as it is, we’re democratizing ML.

We’re open-sourcing our roadmap in the spirit of that democracy.

Phase 1: Make it Function

Build Inference API with async worker queue
Open-source client SDKs
Build Programatic infra for self-healing
Build 0-to-1 autoscaling (cold start)
Build 1-to-many autoscaling
Open-source http server framework
Github App integration to build & deploy

Image that is completely transparent, used for layout purposes.
Transparent Image.png

Phase 2: Make it Fast

Eliminate GPU provisioning overhead
Eliminate container pulling overhead
Reduce GPU RAM load time
Reduce inference pipeline latency
Add minimum replicas configuration
Add keep-alive configuration

Reduce GPU RAM load time even more
Reduce kubernetes provisioning overhead
Migrate to advanced GPUs for faster inference
Build model weight pruning for faster inference
Eliminate as many network hops as possible
Multi-region deployments

Phase 3: Make it Beautiful

Account Admin tools (multi platform auth, teams, billing)
Provide logging and call tracing
Add metrics & dashboard view
Build a Deployment UI

Transparent Image.png
Transparent Image.png

Phase 4: Make it (more) Affordable

Support Multi-tenant GPUs
Fine-tune autoscaling logic
Support model weight sharing
Enable bulk commitments
Support concurrent GPU memory access

Phase 5: Make it (more) Awesome

We’re proud of our agility as a scrappy team. As we move toward being the cloud of ML, it’s very important to nail the first value prop before we expand. Into the future, we could imagine expanding our features to:

API resell marketplace
Model / data drift dashboard
Model versioning
A/B testing models
Live retraining

Transparent Image.png

Phase 1: Make it Function

Build Inference API with async worker queue
Open-source client SDKs
Build Programatic infra for self-healing
Build 0-to-1 autoscaling (cold start)
Build 1-to-many autoscaling
Open-source http server framework
Github App integration to build & deploy

Phase 2: Make it Fast

Eliminate GPU provisioning overhead
Eliminate container pulling overhead
Reduce GPU RAM load time
Reduce inference pipeline latency
Add minimum replicas configuration
Add keep-alive configuration

Reduce GPU RAM load time even more
Reduce kubernetes provisioning overhead
Migrate to advanced GPUs for faster inference
Build model weight pruning for faster inference
Eliminate as many network hops as possible
Multi-region deployments

Phase 3: Make it Beautiful

Account Admin tools (multi platform auth, teams, billing)
Provide logging and call tracing
Add metrics & dashboard view
Build a Deployment UI

Phase 4: Make it (more) Affordable

Support Multi-tenant GPUs
Fine-tune autoscaling logic
Support model weight sharing
Enable bulk commitments
Support concurrent GPU memory access

Phase 5: Make it (more) Awesome

We’re proud of our agility as a scrappy team. As we move toward being the cloud of ML, it’s very important to nail the first value prop before we expand. Into the future, we could imagine expanding our features to:

API resell marketplace
Model / data drift dashboard
Model versioning
A/B testing models
Live retraining