The Best Places to Rent GPUs for AI Video Generation in 2026

2026-05-27 · 7 min read · LoreMotion Team

We compared RunPod, Lambda, Vast.ai, CoreWeave, Paperspace, fal.ai, and Replicate on real AI video workloads. Honest pricing, uptime, and per-clip cost for LTX-Video, Wan, and HunyuanVideo.

Owning hardware for AI video generation only makes sense above a certain throughput. If you render fewer than a few hundred clips per month, renting GPUs by the hour is dramatically cheaper than buying an RTX 4090 — and renting an H100 by the hour is the only sensible way to evaluate a model like HunyuanVideo without dropping $30,000 on a workstation.

We've run real workloads on every major GPU rental platform over the past 12 months. This post compares them honestly on the dimensions that matter: actual cost per clip, how long it takes to get a usable GPU, how often spot instances get pre-empted, and which platforms make life easy versus painful.

Disclosure: nothing in this post is sponsored. We pay for these GPUs out of LoreMotion's ad revenue.

TL;DR — what to rent

Cheapest spot pricing on H100s / A100s: Vast.ai. Brutally cheap, but uptime and reliability are inconsistent.
Best balance of price and reliability: RunPod Community Cloud. The default for most workloads under $1k/mo.
Best for production / enterprise: Lambda or CoreWeave. Pricey, but rock-solid SLAs.
Best for "just run my model" without setup: fal.ai or Replicate. You pay 3–5x raw GPU cost but skip Docker, drivers, and pod management entirely.
Best for long-running training jobs: Lambda 1-Click Clusters or CoreWeave reserved instances.

Detail below.

How we benchmarked

For each platform we ran the same three workloads:

LTX-Video 2.3 via Wan2GP profile=4 on a single 24 GB GPU (RTX 3090 or 4090)
HunyuanVideo full-precision on a single 80 GB GPU (A100 or H100)
Continuous serving — claim-loop worker processing a queue for 4 hours straight

Each was repeated three times across different days and time zones. We measured: time-to-GPU (from clicking "deploy" to running inference), wall-clock per clip, spot pre-emption rate, dollar cost per clip, and how painful the developer experience was on a 1–5 scale.

RunPod (Community + Secure Cloud)

Pricing (May 2026):

RTX 3090 24 GB: $0.22/hr Community, $0.34/hr Secure
RTX 4090 24 GB: $0.34/hr Community, $0.69/hr Secure
A100 80 GB: $1.19/hr Community, $1.89/hr Secure
H100 PCIe 80 GB: $2.39/hr Community, $3.29/hr Secure
H100 SXM 80 GB: $2.79/hr Community, $4.49/hr Secure

Time-to-GPU: 30–90 seconds for Community pods, instant for Secure Cloud reservations.

Pre-emption rate: Community pods can be pre-empted with 5-minute warning when the host operator reclaims their hardware. In our testing this happened roughly 1 in 40 long-running pods on 3090s, less frequent on H100s (operators don't reclaim premium hardware as often).

DX: Good. Pre-built PyTorch templates, easy SSH, persistent volumes work cleanly. The Pod CLI (runpodctl) is genuinely useful for scripting.

Cost per LTX clip on a 3090 Community: $0.0044 ($0.22/hr ÷ 50 clips/hr at 72s each). Cost per HunyuanVideo clip on an H100 PCIe Community: $0.034.

RunPod is the default we recommend. The Community Cloud price-performance is hard to beat, and Secure Cloud exists when you can't tolerate pre-emption. Documentation is the best of any platform in this comparison.

Vast.ai

Pricing (May 2026):

RTX 3090: $0.13–0.20/hr (depending on host)
RTX 4090: $0.28–0.45/hr
A100 80 GB: $0.79–1.40/hr
H100 SXM: $1.99–2.80/hr

Time-to-GPU: 1–5 minutes typically, though "verified" hosts deploy faster.

Pre-emption rate: Highly variable. Some operators advertise "interruptible" pods at 50% off and pre-empt aggressively (we saw 1-in-15 sessions get pre-empted under 4 hours). Non-interruptible pods are more stable but cost 2x.

DX: The marketplace UI is functional but feels like eBay for GPUs. You're choosing among individual hosts with wildly varying network speeds, disk types, and CUDA versions. Pre-built templates exist but are less polished than RunPod's.

Cost per LTX clip on a $0.15/hr 3090: $0.003 — the cheapest we've measured anywhere.

Vast.ai is brilliant when you have a stateless, fault-tolerant workload (batch rendering with automatic retries) and you optimise hard for cost. It's painful for interactive work, model development, or anything where a 4-minute deploy delay matters.

Lambda

Pricing (May 2026):

RTX 4090 24 GB: $0.79/hr
A100 40 GB PCIe: $1.10/hr
A100 80 GB SXM: $1.79/hr
H100 PCIe 80 GB: $2.49/hr
H100 SXM 80 GB: $3.29/hr
8× H100 cluster: $25.92/hr

Time-to-GPU: Instant for on-demand instances if capacity is available. H100s sometimes show "out of stock" — Lambda is a popular platform.

Pre-emption rate: Effectively zero. On-demand instances run until you stop them.

DX: Excellent. Real Linux instances over SSH, pre-installed PyTorch and CUDA, real public IPs, real persistent block storage. Feels like a proper IaaS provider rather than a container-on-someone's-PC.

Cost per HunyuanVideo clip on an H100 SXM: $0.048.

Lambda is the platform you graduate to when RunPod Community pre-emptions start costing you more than the price difference. It's also the platform we recommend for anyone doing fine-tuning — the reliability matters when a single training run is 12+ hours.

CoreWeave

Pricing (May 2026):

A100 80 GB: $2.10/hr on-demand
H100 SXM 80 GB: $4.25/hr on-demand
8× H100 reserved (1-year): $19.20/hr equivalent

Time-to-GPU: Minutes for on-demand, longer for large multi-GPU configurations.

Pre-emption rate: Zero on on-demand.

DX: Enterprise-grade. Kubernetes-first, deeply Terraform-friendly, real BGP networking. The flip side: if you just want to SSH into a box and run a script, CoreWeave is overkill and the learning curve is real.

CoreWeave is where you go when you're running production inference at scale (>1000 GPU-hours per month), need predictable capacity, or need features like fast NVLink-connected multi-GPU pods for tensor-parallel inference. It's not the right platform for "I want to try HunyuanVideo this weekend".

Paperspace (DigitalOcean)

Pricing (May 2026):

RTX 4000 16 GB: $0.51/hr
RTX 5000 16 GB: $0.78/hr
A100 80 GB: $3.09/hr
H100 80 GB: $5.95/hr

Time-to-GPU: Instant for Gradient notebooks, 1–2 minutes for Core machines.

Pre-emption rate: Zero.

DX: Notebook-first product (Gradient Notebooks) plus a more traditional VM product (Core). Good if you want JupyterLab pre-configured.

Paperspace prices are noticeably higher than Lambda or RunPod for equivalent hardware. Worth it only if Gradient Notebooks specifically fit your workflow.

fal.ai

Pricing model: Per-second of GPU time, with the model preloaded. Roughly:

LTX-Video 2.3 inference: $0.05 per second of generated video
HunyuanVideo inference: $0.45 per second of generated video
Wan 2.1: $0.04 per second of generated video

Time-to-first-inference: 200–800ms for a "warm" model, 30–90s for cold start.

DX: Spectacular for the right use case. You hit a REST endpoint with a prompt, you get back a video URL. No Docker, no pod management, no driver hell. Models are pre-deployed and version-pinned.

Cost per 5-second LTX clip: $0.25 (5s × $0.05). That's 50–80x the cost of raw RunPod, but it's also 5 minutes of work to integrate versus a weekend.

fal.ai is the platform we'd recommend to anyone building a side project or MVP that needs AI video. Pay the premium until your unit economics justify owning the inference stack.

Replicate

Pricing model: Same per-second model as fal.ai but priced by GPU type:

A100 40 GB: $0.00115/sec ($4.14/hr equivalent)
A100 80 GB: $0.00140/sec ($5.04/hr equivalent)
H100 80 GB: $0.00135/sec ($4.86/hr equivalent — yes, cheaper than A100 80 GB during their pricing rebalance)

Time-to-first-inference: 5–30s cold start typical, sub-second when warm.

DX: Best-in-class for "I just want to call this model from my app". Their Cog format makes model packaging trivial, and the API client libraries are clean.

Replicate is fal.ai's main competitor with broadly similar economics. We use both, with Replicate slightly favoured for less common models (their model marketplace is larger) and fal.ai favoured for the specific video models they've optimised.

What we use at LoreMotion

For full transparency:

LTX-Video 2.3 production traffic: Self-owned RTX 3090s in a colocation rack. Cheapest per clip once you exceed ~150 clips/day.
Premium models (Veo, Grok 3, Kling): Vendor APIs directly (geminigen.ai). No GPU rental involved.
R&D and model evaluation: RunPod Community H100 PCIe (~$2.40/hr) for hours of experimentation, then Lambda H100 SXM when we need to do a sustained benchmark run.
One-off "let me try this new open model": fal.ai or Replicate when a model exists there; RunPod Community otherwise.

Decision matrix

Hobby project, <50 clips/month, want zero ops: fal.ai or Replicate.
Small business, <500 clips/month, willing to write some code: RunPod Community.
Production service, >500 clips/month, can't tolerate pre-emption: Lambda on-demand.
Production at scale, >5,000 clips/month: Mix of Lambda reserved + CoreWeave for burst.
Fine-tuning or model R&D: Lambda or CoreWeave.
Optimising hard for cost, fault-tolerant batch work: Vast.ai with retry logic.

What to watch for

A few practical gotchas that cost us real money:

Bandwidth charges. RunPod, Vast.ai, and Paperspace don't charge egress. Lambda and CoreWeave do. If your workflow downloads model weights repeatedly (CI runs, scratch environments), egress can become 20%+ of your bill.
Persistent volume cost. RunPod and Lambda both charge for persistent volume storage when the pod is stopped. A 200 GB volume sitting unused for a month is ~$10. Cheap, but it adds up across many idle volumes.
CUDA version drift. Some templates ship CUDA 12.4 when your model needs 12.6 (or vice versa). Always check before assuming "PyTorch is installed" means "PyTorch can find the GPU".
Spot pre-emption mid-render. Build worker logic that can resume from a checkpoint or just retry. We use a heartbeat + claim model in LoreMotion's worker so pre-emption is automatically recovered within 90 seconds.

If you want to try AI video without renting a GPU at all, LoreMotion runs LTX-Video 2.3 on owned hardware and the first clip is free with no signup. For everything else, RunPod Community is where most readers should start.