The Best Open-Source AI Video Generation Models in 2026
2026-05-28 · 7 min read · LoreMotion Team
An honest, hands-on review of the top open-source AI video models you can run yourself in 2026 — LTX-Video 2.3, Wan 2.1, HunyuanVideo, Mochi, CogVideoX, and more.
Open-source AI video generation has come a long way in the past 18 months. When OpenAI announced Sora in early 2024, the gap between proprietary and open models was vast — most open releases produced 2-second flickering clips that looked like fever dreams. By mid-2026 that gap has narrowed dramatically. Several open models now produce video that, in a blind test, is genuinely hard to distinguish from Runway Gen-3 or Pika 2.0.
This guide is a hands-on review of the models we actually run in production on LoreMotion, plus the ones we evaluated and rejected (and why). We focus on what matters: output quality, VRAM requirements, real-world generation time on consumer GPUs, and how usable the model is for production work.
What we mean by "open-source"
For this article we count a model as open if:
- The model weights are downloadable under a licence that allows non-research use
- There is a public inference codebase, not just a paper
- You can self-host it on commodity hardware (cloud GPUs at minimum)
This excludes Sora, Veo, Kling, Grok 3, Hailuo, and MiniMax — all closed-weight APIs. It also excludes "open" research releases like Make-A-Video that never shipped runnable code. The five models below all meet the bar.
1. LTX-Video 2.3 (Lightricks) — our daily driver
Architecture: 22B-parameter diffusion transformer, distilled from a 130B teacher. Licence: OpenRAIL-M (commercial use allowed, some content restrictions). VRAM: 24 GB minimum with Wan2GP's profile 4 (int8 + block-swap); 48 GB+ for the raw Lightricks repo. Output: 720p MP4, up to 8 seconds, with synced ambient audio. Speed: ~72 seconds for a 720p / 5-second clip on a single RTX 3090.
LTX-Video 2.3 is the model that made open AI video genuinely useful. The 2.0 release in late 2025 already had impressive temporal coherence — characters stayed consistent across frames, faces didn't melt, lighting was physically plausible — but 2.3 added two things that matter enormously in production: synced ambient audio (footsteps, wind, room tone) and a properly trained image-to-video mode that respects the input frame's colour palette and composition.
The downside is text rendering. LTX still produces gibberish whenever a scene contains visible text — signs, t-shirts, screens. This is a universal problem in video diffusion right now, but it's worth knowing if your prompts involve readable text.
Why we run it: Wan2GP's profile=4 quantises the model to int8 and uses block-swap offloading to fit it on a single RTX 3090 (24 GB) with no quality loss we could measure. End-to-end claim→done time on our worker is ~73 seconds. That's the magic number — it's fast enough to feel responsive, cheap enough to be sustainable as a free service.
2. Wan 2.1 (Alibaba DAMO) — strong open-source competitor
Architecture: 14B-parameter DiT (Diffusion Transformer) trained on a curated 100M+ clip dataset. Licence: Apache 2.0 (genuinely permissive). VRAM: 16 GB for the 1.3B variant; 24 GB for the 14B flagship. Output: 480p / 720p, up to 5 seconds. Speed: ~95 seconds for a 720p / 5-second clip on RTX 3090.
Wan 2.1 is the model we evaluated as a potential LTX replacement in early 2026. It has two things going for it: a genuinely permissive Apache 2.0 licence (LTX's OpenRAIL has content restrictions that can be ambiguous for commercial work) and excellent prompt adherence — give it a complex spatial prompt and it tends to honour the geometry better than LTX.
What kept us on LTX: Wan 2.1 doesn't produce audio, its motion is sometimes too "smooth" (think early Pixar renders rather than handheld realism), and it's noticeably slower per clip. For LoreMotion's use case — quick free generation for casual users — LTX wins. For commercial work where licence matters more than speed, Wan 2.1 is the better choice.
3. HunyuanVideo (Tencent) — best quality, demanding hardware
Architecture: 13B-parameter MMDiT (multimodal DiT). Licence: Custom (requires accepting a Tencent EULA; commercial use allowed with restrictions). VRAM: 60 GB for inference; 80 GB+ for fine-tuning. Output: 720p, up to 5 seconds (longer with text-to-image-to-video chaining). Speed: ~180 seconds for a 5-second clip on H100; not feasible on consumer GPUs without aggressive quantisation.
HunyuanVideo is, on raw output quality, the best open video model we've tested. Its motion looks natural, it handles complex scenes (multiple subjects, camera moves, lighting transitions) better than LTX or Wan, and its prompt adherence is excellent.
The problem is hardware. There's a 4-bit GGUF community port that fits on a 24 GB GPU, but the quality degradation is visible — colours flatten, fine motion gets jittery. To run HunyuanVideo properly you need an H100 (80 GB) or at minimum an A100 80 GB. That puts it out of reach for self-hosting on a budget. If you have the budget — or you're using a managed API like fal.ai or Replicate — HunyuanVideo is worth it.
4. Mochi 1 (Genmo) — open, but a step behind
Architecture: 10B-parameter AsymmDiT. Licence: Apache 2.0. VRAM: 60 GB for full precision; FP8 community port fits on 24 GB. Output: 480p at 30 fps, up to 5.4 seconds. Speed: ~120 seconds for a 5-second clip on RTX 3090 with FP8 quant.
Mochi 1 was the open-source headline of late 2024 and it still gets recommended in forum posts. It has genuinely impressive motion quality — fluid simulations, hair physics, cloth dynamics are all noticeably better than what LTX or Wan produce.
But it's stuck at 480p natively. Upscaling helps but introduces its own artefacts. Generation is slow, the FP8 quant has visible colour issues, and there's been no major model update in 18 months. If you specifically need fluid or particle motion (water, smoke, hair), Mochi is still the best open option. For general-purpose video, it's been overtaken.
5. CogVideoX-5B (Tsinghua KEG) — the best small model
Architecture: 5B-parameter expert-transformer. Licence: Apache 2.0. VRAM: 12 GB (yes, really — runs on an RTX 3060 12 GB). Output: 720p, up to 6 seconds. Speed: ~150 seconds on RTX 3060; ~45 seconds on RTX 4090.
CogVideoX-5B is the model to run if you don't have a top-tier GPU. It's the only open model that fits comfortably on a 12 GB consumer card with no quantisation drama. Output quality is well behind LTX and Wan — motion is jerkier, faces drift, complex scenes confuse it — but for prototyping, learning, or generating placeholder content it's genuinely useful.
The 5B-I2V variant (image-to-video) is particularly good for animating product photos or AI-generated illustrations. If you're building a hobby project on a single 12 GB card, this is what you should be using.
Models we evaluated and skipped
A quick honourable-mentions list for completeness:
- Stable Video Diffusion (SVD) — Stability AI. Released in late 2023, still the easiest open model to run, but the quality has been overtaken by everything above. Hard to recommend in 2026.
- AnimateDiff. Technically a motion adapter for Stable Diffusion 1.5 / SDXL, not a true video model. Useful for short looping animations, not for free-form text-to-video.
- Open-Sora 1.3 (HPC-AI). Ambitious community attempt to replicate Sora's architecture; output quality is roughly Mochi-tier with worse motion. Promising but not production-ready.
- EasyAnimate (Alibaba). A pre-Wan release that's been superseded by Wan 2.1.
- VideoCrafter 2 (Tencent). Older Tencent release, deprecated in favour of HunyuanVideo.
What model should you run?
A short decision tree based on what we actually deploy:
- You have an RTX 3090 / 4090 and want general-purpose AI video: LTX-Video 2.3 via Wan2GP. Best quality-per-second on consumer hardware.
- You have an H100 or A100 80 GB and quality matters more than cost: HunyuanVideo. Worth the hardware tax.
- You need a permissive commercial licence: Wan 2.1 (Apache 2.0). LTX is fine for most uses but the OpenRAIL licence has content restrictions that are easier to violate accidentally.
- You're on a 12 GB GPU: CogVideoX-5B. The only model that runs cleanly without aggressive quantisation drama.
- You need fluid / particle motion specifically: Mochi 1 at FP8. Old but specialised.
- You don't want to self-host at all: Use LoreMotion — we run LTX-Video 2.3 on RTX 3090s and the first clip is free with no signup.
Where the field is headed
Two trends are worth watching for the rest of 2026:
Longer clips. Every model in this list maxes out at 5–8 seconds. Both Wan 3 and the rumoured LTX-Video 3 are targeting 30-second native generation. This unlocks completely different use cases (real ad creative, mini music videos, short trailers).
Audio integration. LTX-Video 2.3 is currently the only open model with native synced audio. Wan and Hunyuan are both rumoured to be training audio variants. Once audio is table stakes in open video, the gap to closed APIs will shrink further.
We'll publish updated benchmarks as new versions ship. In the meantime, if you want to try LTX-Video 2.3 without any local setup, our free generator runs it on every prompt — no signup, no watermark, generate your first video here.