What is Carrot?

Carrot is a cutting-edge vision-language model that performs general-purpose image captioning, image question and answering, and image classification.

Carrot is a significant leap forward in the capabilities of computer vision. Using Carrot, you can identify anything using natural English prompts - similar to GPT-3.

Generated Caption:
”A group of people with their horses on the side of a hill next to a mountain covered in mounds of dirt with the top of a volcano visible into the sky above.”

Question: What objects are all in the image?
Answer: horses, people
Question: How many horses are there in the image? Answer: 10
Question: Is someone at the risk of falling?
Answer: no


  • 1 hour of FREE credits 💸
  • Run on A100 GPUs
  • ML Models up to 16GB
  • Network Payload up to 50MB
  • Autoscaling
  • Spike Tolerance (up to 25 replicas)
10-40%/off usage rate
  • Everything in Usage Pricing
  • Minimum purchase of $1,000
  • More you buy = more you save
  • Dedicated SLA Response Time
  • Increased Spike Tolerance (25+ replicas)
