Skip to main content

Mosaic AI Foundation Model Serving

Two ways to purchase

Access and query state-of-the-art open foundation models and use them to quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment.

Select plan

help me choose

Select cloud

Select model

Select

Select region

Select

Loading...

Foundation Model Serving DBU rates and Throughput

ModelPay-Per-Token ServingProvisioned Throughput serving
DBU / 1M INPUT tokens
(Global)
DBU / 1M OUTPUT tokens
(Global)
DBU rate
(Global)
Throughput Band1
(max tokens / sec)2
Llama 3 70B14.286 42.857212.143670
DBRX 10.714 32.143171.429 600
Llama 2 70B 7.143 21.429 157.143 635
Mixtral 8x7B 7.143 14.286 290.857 1,700
Llama 3 8B n/a n/a 106.000 3,600
MPT 30B 14.286 14.286 112.000 580
Llama 2 13B 13.571 13.571 78.571 1,580
MPT 7B 7.143 7.143 20.000 2,450
BGE Large 1.429 1.429n/an/a

1: Throughput band is a model-specific maximum throughput (tokens per second) provided at the above per-hour price.  With Provisioned Throughput Serving, model throughput is provided in increments of its specific "throughput band"; higher model throughput will require the customer to set an appropriate multiple of the throughput band which is then charged at the multiple of the per-hour price above.

2: Shown for serving on Azure.  Some  numbers are different on AWS when charged at a different price.

Pay-Per-Token Serving Pricing Examples

ModelInput tokensOutput tokensRegionUnit price
$ / DBU
Total Price
Llama 3 70B4,000,0001,000,000US East$0.070$7.00
DBRX4,000,0001,000,000US East$0.070$5.25
Llama 2 70B4,000,0001,000,000Europe (Ireland)$0.077$3.30
Mixtral 8x7B4,000,0001,000,000AP (Sydney)$0.088$4.40

Provisioned Throughput Serving Pricing Examples

ModelHours / monthRegionUnit price
$ / DBU
Monthly Price3
Llama 3 70B720US East$0.070$10,692
DBRX720US East$0.070$8,640
Llama 2 70B720Europe (Ireland)$0.077$8,712
Mixtral 8x7B720AP (Sydney)$0.088$18,429

3: Per throughput band

Pay as you go with a 14-day free trial or contact us for committed-use discounts or custom requirements.

Mosaic AI Model Serving FAQ

Our regional prices are based on the regional cost of infrastructure supporting our serverless products.