GLM 5.1 Inference.
Unlimited Tokens.

Dedicated GPU infrastructure, a full dashboard, and API keys — all for a flat $1,000/mo. We spin up clusters in cohorts of 80–100 customers so the economics work for everyone. Join the waitlist to lock in your seat.

Join the Waitlist

We're gauging interest for our first cohort. Drop your email and we'll notify you as soon as we hit the minimum to launch.

What You Get

Unlimited Tokens

Send as many requests as you need every month. No per-token billing, no surprise invoices — just a flat rate under our fair-use policy.

Dashboard & API Keys

A clean web dashboard to monitor usage, rotate keys, and manage your account. Standard REST API — drop it into any stack in minutes.

Dedicated GPU Cluster

Each cohort runs on its own provisioned GPU infrastructure. No noisy neighbours, no throttling — consistent, low-latency inference.

How Cohorts Work

We group customers into cohorts of 80–100 to share the cost of high-end GPU infrastructure fairly. Here's the process:

1. Join the Waitlist

Drop your email. You'll be placed in the current forming cohort and kept updated on progress toward the minimum.

2. Cohort Fills Up

Once 80 confirmed customers commit, we lock in the cohort. You'll receive an invoice for the first month's payment.

3. Infrastructure Goes Live

First-month payment is collected (non-refundable, unless we fail to deliver). We provision dedicated GPUs and spin up your cluster.

4. You're In

You receive your dashboard login, API keys, and documentation. Start sending requests immediately.

⚡ Can't Wait for the Next Cohort?

If there's already an active cohort with fewer than 100 customers, you can skip the line by paying a premium on top of the regular monthly fee. Once the next cohort launches, you'll be moved over and your price reverts to the standard $1,000/mo — no penalty, completely fair.

Pricing

Premium Tier

$1,000/mo

unlimited AI agent usage under fair use

24/7 API access
Guaranteed uptime SLA: 99.5%+
Support response target: under 4 hours
Soft cap guidance: 20-50M tokens per month
Soft cap guidance: 1-2 concurrent long-running agents
Usage over soft caps is not blocked, but may be throttled and de-prioritized

Enterprise Tier

Custom pricing

for sustained high-volume workloads

Minimum engagement: $2,500/month
Built for customers consistently exceeding Premium soft caps
Higher token quotas: 100M+ tokens per month
Priority queue access
Dedicated support
Custom SLA options

Premium is designed for most teams and includes fair-use unlimited access. If your usage consistently pushes past Premium soft caps, we will guide you into Enterprise so you get higher quotas, faster queue priority, and a stronger support/SLA profile.

FAQ

What is GLM 5.1?

GLM 5.1 is a powerful large language model suitable for a wide range of tasks including code generation, content creation, data analysis, and conversational AI.

What does "unlimited tokens" actually mean?

There is no hard cap on the number of tokens you can send or receive each month. Usage is governed by a fair-use policy (details to be published before launch) that prevents abuse while ensuring legitimate workloads run without interruption.

How does the cohort model work?

We collect signups until we reach 80–100 committed customers. At that point we collect the first month's payment, provision dedicated GPU infrastructure, and hand out dashboard credentials and API keys. Meanwhile, signups for the next cohort continue on a separate waitlist.

What if the cohort doesn't fill?

No payment is collected until the minimum is reached. If it takes longer than expected, we'll keep you updated — but you won't be charged a cent until the cohort is confirmed and infrastructure is being provisioned.

Can I skip the waitlist?

Yes. If there's an active cohort with fewer than 100 customers, you can pay a premium to join immediately. When the next cohort goes live you'll be moved there and your rate drops back to the standard $1,000/mo.

Is the first month really non-refundable?

Only if we deliver. If we fail to provision the infrastructure or the service doesn't go live, you receive a full refund. The non-refundable clause protects us from covering large upfront hardware costs without committed revenue.

Can I cancel after the first month?

Yes. After the first month, the service is month-to-month. Cancel anytime before the next billing cycle and you won't be charged again.

Is my data private?

Your inference requests are processed on dedicated infrastructure and are not used to train or fine-tune any models. We do not sell or share your data with third parties.

What kind of hardware backs the service?

Each cohort runs on enterprise-grade NVIDIA GPUs provisioned specifically for that group. Exact specs will be shared before launch, but expect hardware optimised for high-throughput, low-latency LLM inference.

Also Available

Private OpenClaw Setups

Need your own isolated AI infrastructure instead of shared inference? We also offer one-off OpenClaw deployments — private, fully managed instances on dedicated hardware. We handle setup, configuration, and ongoing maintenance so you can focus on using it.

$2,000 one-time setup
$200/mo managed hosting & support (or $2,000/yr)
Your data stays on your hardware — full sovereignty
Dashboard, messaging-app integrations, and pre-loaded skills included

Ready to Get In?

Seats are limited to 100 per cohort. Drop your email and be the first to know when we launch.

GLM 5.1 Inference.Unlimited Tokens.