Running LTX-Video 2.3 on Your Own GPU vs Paying for the API — Real Numbers

2026-05-28 · 7 min read · LoreMotion Team

A no-fluff cost and quality breakdown of self-hosting LTX-Video 2.3 on an RTX 3090/4090 versus using Lightricks' hosted API or third-party providers like Replicate and Fal. With break-even math.

We get this question every week, from indie developers and small studios alike: "Should I just rent a GPU and run LTX-Video 2.3 myself, or pay for the hosted API and skip the headache?" The honest answer is "it depends" — but it depends on a small number of very measurable variables, and most people guess wrong in both directions.

At LoreMotion we do both. We run a small fleet of RTX 3090s for our free tier (the LTX-Video 2.3 generations) and we use a third-party hosted API for our premium models (Veo 3.1, Grok 3, Kling). So we've felt the tradeoffs first-hand in production, and we have the bills to prove it. This post is the breakdown.

TL;DR

Under ~200 videos a month: use the API. The hardware payback never arrives.
200–2,000 videos a month: cloud GPU rental (RunPod, Vast.ai) on a per-second basis. Cheaper than the API, no hardware risk.
2,000+ videos a month and predictable load: buy a used RTX 3090 ($700–900) and self-host. Payback in under 3 months.
You care about latency, queueing, or want a custom fine-tune: self-host regardless of volume.

The rest of this post is the math and the operational reality behind those numbers.

What "LTX-Video 2.3 API" actually means in 2026

There isn't one official LTX-Video API. There are several routes, and they have different pricing models:

Lightricks' own hosted endpoint (via their developer portal). Pay-per-second of generated video, currently around $0.06–$0.08 per second of output. A 5-second clip is roughly $0.30–$0.40.
Replicate. Per-second of compute, not output — currently $0.000725/sec on A100. A typical LTX 2.3 render takes 35–60 seconds, so a clip costs $0.025–$0.045.
Fal. Per-clip pricing, around $0.20–$0.30 per 5-second 720p clip.
Self-hosted on a cloud GPU (RunPod, Vast.ai, Lambda). You pay for the GPU by the second whether it's busy or not. RTX 3090: ~$0.20/hr. RTX 4090: ~$0.35/hr.
Self-hosted on your own hardware. You pay upfront for the card and electricity afterward.

These four-plus options are not interchangeable. Replicate is genuinely the cheapest per-clip if your usage is bursty (they auto-scale to zero), but the cold-start adds 30–90 seconds to the first request after idle. Lightricks' own endpoint is the most expensive per second but has guaranteed availability and the latest model versions immediately. Fal sits in the middle.

The cost-per-clip math

Let's pin down what a single 5-second 720p clip actually costs in each scenario, assuming the model takes ~60 seconds to render on the GPU it's running on.

Route	Cost per clip	Notes
Lightricks API	$0.30–$0.40	Most expensive, but zero ops
Fal	$0.20–$0.30	Same idea, slightly cheaper
Replicate (A100)	$0.04	Cheapest hosted, watch for cold starts
RunPod RTX 4090 (rented)	$0.006	$0.35/hr ÷ 60 clips/hr
RunPod RTX 3090 (rented)	$0.004	$0.20/hr ÷ 50 clips/hr
Your own RTX 3090	$0.0015	Electricity only (~250W × $0.12/kWh)

The gap between hosted API and your own metal is staggering — roughly 200x. That's not a typo. The reason is that hosted APIs aren't really charging for compute, they're charging for software, uptime, support, and the fact that you don't have to think about anything.

When the API is the right choice

If you're below ~200 videos a month, pay for the API and move on with your life. Here's why.

A used RTX 3090 is ~$700. Even at the wildly favourable $0.30/clip Lightricks rate, you'd need to generate 2,300 clips before the card pays for itself, ignoring electricity, space, noise, and time spent on setup. At 200/month, that's a year-long payback — and that's assuming the card doesn't fail and you don't ever want to upgrade.

More importantly: at low volumes, your operational time is worth way more than the compute savings. Setting up Wan2GP, debugging CUDA versions, dealing with NVIDIA driver kernel mismatches, restarting the worker when it OOMs — that's hours per month. At any reasonable hourly rate, those hours cost more than the API fees you'd save.

The API also gives you something the DIY route can't: instant scaling. If you suddenly need to generate 500 clips for a client project, the API just does it. Your single RTX 3090 will take 8+ hours.

When self-hosting wins

The crossover happens around 1,500–2,000 clips per month, depending on which API you'd be using as your baseline.

At 2,000 clips/month:

Lightricks API: $700/month
Replicate: $80/month
Your own RTX 3090: ~$12 in electricity, amortised hardware ~$30/month over 2 years

If you're generating that much video — for a SaaS product, a content factory, a research lab — self-hosting is not optional, it's mandatory. The API costs will eat your margins alive. We've talked to several startups who only realised this after their first big growth spike and got hit with $5K+ API bills.

But you also need three other things for self-hosting to make sense:

Predictable load. If you spike 10x once a week and go quiet, the GPU is idle most of the time and your effective cost goes up. Solution: hybrid setup — self-host for baseline, burst to a hosted API for spikes.
Operations capacity. Someone needs to wake up when the worker dies at 3am. We've automated most of this with our janitor system but the first month or two is real work.
A clear plan for upgrades. LTX 2.3 fits on 24GB. LTX 3 might need 32GB. Hunyuan Video 13B already needs 40GB+. Your card has a lifespan as a frontier-model runner — usually 12–18 months.

The middle ground people forget about

Most "should I self-host" discussions skip the option that's actually best for most early-stage products: rent a GPU on RunPod or Vast.ai by the second, and only spin it up when you have a job.

You write a small dispatcher that:

Receives a render request from your app.
Spawns a RunPod pod with the LTX-Video Docker image (cold start ~45–90 seconds).
Sends the prompt to the pod, waits for completion.
Tears the pod down.

Total cost for a 5-second clip: ~$0.01–$0.02 on RTX 4090 once you factor in cold start. That's 15x cheaper than Lightricks' API and you don't own any hardware. The downside is the cold start — for a chat-style UX where users expect instant response, this is a bad fit. For an asynchronous "your video is ready in 2 minutes" UX, it's perfect.

For LoreMotion specifically, we started here. Only when we hit consistent enough volume to keep at least one GPU saturated did we switch to owning hardware.

Quality differences — usually exaggerated

People assume hosted APIs run "the official" model and self-hosted runs "a worse quantised version." In practice the gap is tiny.

LTX-Video 2.3 in int8 quantisation (what we run via Wan2GP profile=4) produces output that's visually indistinguishable from bf16 in side-by-side blind tests we've run. We've shown blind comparisons to people who work with this stuff full time and they call it 50/50.

Where quality differences DO show up:

Newer models, before quantisation is dialled in. When LTX 2.3 first dropped, the int8 quant was noticeably worse for fast motion. It took about three weeks for the community to tune the quantisation properly.
Custom fine-tunes. If you've fine-tuned the model, you obviously have to self-host. No API will run your weights.
Negative prompts and exotic samplers. Hosted APIs often expose a subset of options. Self-hosted, you get everything.

What we actually do at LoreMotion

For full transparency, our setup as of May 2026:

Free tier (LTX-Video 2.3): self-hosted on RTX 3090s. ~72 seconds per 720p/5s clip. Pure margin.
Premium tier (Veo 3.1, Grok 3, Kling): third-party hosted API. We mark up per-credit but the unit economics are tight because the underlying models are expensive.
Spike protection: we used to burst to Replicate for LTX during free-tier spikes but turned it off — the cold starts created bad UX during exactly the busiest moments. Now we just queue.

If we were starting today with no users, we'd run everything through Replicate or Fal until we had enough volume to justify the metal. The "buy hardware on day one" path is romantic but rarely the right call.

Quick decision matrix

Your situation	Right choice
Side project, <50 clips/month	Lightricks or Fal API, easiest
Indie SaaS, 100–500 clips/month	Replicate (cheap per clip, no ops)
Real product, 500–2000 clips/month	Self-host on rented GPU
Production, 2000+ clips/month	Buy your own GPU
Need custom fine-tune	Self-host (only option)
Need <5s latency, no cold starts	Self-host (or pay enterprise tier)
Building free public tool	Self-host or it'll bankrupt you

If you take one thing from this post: the right answer changes as you scale. The setup that's optimal at 100 clips/month is the wrong one at 5,000. Build for one stage at a time, and re-evaluate every quarter.

Try LTX-Video 2.3 free

If you want to evaluate what LTX-Video 2.3 produces before deciding which route to take, you can generate clips on LoreMotion for free — no signup, no card required. Our backend is the self-hosted Wan2GP setup described above, so what you see is what you'd get if you ran it yourself.

For more on the actual hardware side, see our GPU benchmark roundup and where to rent GPUs.