AI Video Prompt Engineering — How to Actually Get the Shot You Want

2026-05-28 · 8 min read · LoreMotion Team

A practical guide to writing prompts for LTX-Video, Veo, Kling, and other modern text-to-video models. Camera language, motion descriptions, lighting, and the specific words that change output quality.

Most "AI video prompt guides" are recycled image prompt guides with the word "cinematic" sprinkled on top. They miss the things that actually matter for video: motion, camera behaviour, temporal coherence, scene continuity. After running tens of thousands of generations across LTX-Video, Veo 3.1, Kling, and Grok 3, here's what we've actually learned moves the quality needle.

This guide is structured by the kinds of mistakes people make most often. Skip the sections that don't apply to you.

The single biggest mistake: prompting like it's an image

Image models reward dense, layered descriptions. Video models punish them.

Consider this image-style prompt:

A majestic golden retriever sits in a sun-drenched meadow at golden hour, lens flare, shallow depth of field, bokeh, intricately detailed fur with individual whiskers visible, cinematic composition, 8k, masterpiece, trending on artstation

Fed to a video model, this produces a beautiful first frame and then 4 seconds of weirdly static dog with eyes that don't blink. Why? Because the prompt didn't describe a single moment of motion. The model has to invent what happens between frames, and absent guidance, it invents "almost nothing."

The same scene, rewritten for video:

Golden retriever in a meadow at golden hour. Camera slowly pushes in. Dog turns its head left to follow a butterfly, ears lifting. Soft summer breeze moves the grass and the dog's fur. Warm rim light from low sun.

Notice what's different: every sentence either describes camera movement, subject movement, or environmental movement. The model now has things to do across the timeline. The output goes from "still photo with light shimmer" to "actual short film."

The five things every video prompt should specify

Subject and setting — what is the thing, where is it
Camera behaviour — what does the lens do
Subject motion — what does the subject do
Environmental motion — what moves in the background
Lighting and atmosphere — mood and quality of light

You don't always need all five, but if you have fewer than three, expect a static or chaotic output.

Camera language that actually works

Modern video models understand cinematography vocabulary remarkably well. The terms that produce reliable, predictable behaviour:

"Push in" / "dolly in" — camera moves toward subject
"Pull out" / "dolly out" — camera moves away
"Pan left/right" — camera rotates horizontally, position fixed
"Tilt up/down" — camera rotates vertically, position fixed
"Tracking shot" — camera moves alongside a moving subject
"Orbit" or "arc around" — camera circles the subject
"Crane up/down" — camera rises or descends vertically
"Static shot" — no camera movement (use this when you want it!)
"Handheld" — adds organic micro-movement, great for documentary feel
"Steadicam" — smooth following movement
"Dutch angle" — tilted frame for unease

Terms that work less reliably:

"Cinematic shot" — too vague, the model picks something
"Beautiful composition" — does nothing useful
"Dynamic camera" — produces unpredictable swooping

A reliable structure: [subject doing thing], [camera behaviour], [lighting].

A samurai walks through bamboo forest, slow dolly in following from behind, dappled light through leaves

This works on every modern video model.

Lens and framing — the cheat code

If you want to look like you know what you're doing, specify a lens and a shot type:

"Wide shot, 24mm lens" — establishing, lots of environment
"Medium shot, 50mm lens" — natural, conversational
"Close-up, 85mm lens" — intimate, shallow depth of field
"Extreme close-up, macro lens" — detail
"Telephoto, 200mm" — compressed background, far subject feels close

These work because the training data is full of films and stock footage tagged with this vocabulary. The models genuinely understand what a 24mm wide shot looks like versus an 85mm close-up.

Motion vocabulary — being specific saves you

"The dog runs" produces wildly variable output. "The dog runs left to right across the frame, paws kicking up dust" produces the shot you imagined.

Useful motion specifiers:

Direction in frame: "left to right," "toward camera," "diagonally away"
Speed: "slowly," "at full speed," "in slow motion"
Body part / detail: "tail wagging," "hair flowing," "eyes blinking," "fingers tapping"
Interaction: "kicks up dust," "knocks over a chair," "splashes through puddle"

Compare:

A car driving on a road

versus

A red Porsche 911 drives at high speed toward camera on a winding mountain road, tires gripping each turn, headlights cutting through dusk

The second prompt has the same word count but specifies brand (gives the model a clear reference), speed, direction, environment detail, time of day. The output is 10x more controlled.

Lighting — the difference between amateur and pro output

Lighting is where AI video either looks like a stock clip or a real film. Vocabulary that works:

Time of day: "golden hour," "blue hour," "noon harsh light," "overcast morning"
Quality: "soft diffused light," "hard directional light," "rim light from behind"
Source: "candlelight," "neon signs," "single window light," "fluorescent overhead"
Mood phrases: "moody chiaroscuro," "high-key bright," "low-key dramatic"

Stack 2–3 lighting words. "Cinematic" doesn't work. "Moody chiaroscuro lit by single candle from below" works.

Style anchors that don't backfire

Style words can wreck a video if they're too dominant — the model tries to make the entire output look like that style and forgets about your subject. Use sparingly and prefer concrete references:

Works well:

"Shot on 35mm film"
"Like a Wes Anderson film, symmetric framing, pastel palette"
"Documentary style, handheld, natural light"
"Cinematic, anamorphic, lens flares"
"Music video, fast cuts" (the model will still produce one continuous shot but adopt the aesthetic)

Backfires:

"Masterpiece, best quality, 8k, ultra detailed" — does nothing, wastes tokens
"Trending on Artstation" — confuses the model with image data
"Hyperrealistic" — often produces an uncanny look
"AI art style" — produces actual bad AI art

What every model is bad at (manage your expectations)

Even the best 2026 models struggle with:

Hands doing complex things. Holding a glass: fine. Tying a shoelace: chaos. If your shot relies on hand dexterity, find another shot.
Text within the video. Signs, writing on shirts, logos. Models can sometimes produce plausible-looking text but it's rarely correct.
Multiple distinct people interacting. Two characters in dialogue, three friends laughing — the model often blends identities or duplicates features.
Long shots of crowds. Becomes a mess of half-formed bodies.
Specific real people. Most open models won't reproduce known celebrities reliably; closed models have policy filters.
Continuity across cuts. Each clip is generated independently. Same character in two clips will not look identical.

The trick: write prompts that play to the model's strengths. Single character, clear motion, well-defined environment, good lighting. That's the recipe for output that doesn't look "AI-generated."

Model-specific quirks

After running each of these in production, the cheat sheet:

LTX-Video 2.3

Loves cinematic vocabulary. "Anamorphic," "shallow depth of field," "film grain" all work.
Weak on fast motion — keep action measured.
Best for: atmospheric shots, slow camera moves, dialogue-style framing.

Wan 2.1

Best motion physics of the open models. Real running, real water, real fabric.
Less polished aesthetic out of the box — needs more lighting/style direction.
Best for: action, sports, anything with kinetic energy.

Veo 3.1 (Fast/Lite)

Most "movie-like" out of the box of any model we've tested.
Synchronised audio is a game-changer — describe sound in your prompt ("birds chirping," "footsteps on gravel").
Best for: polished short content where audio matters.

Kling 2.5/3

Wins for stylised content — anime, illustration-style, painterly looks.
Slower than the others but quality justifies it for hero shots.
Best for: when you want it to not look photoreal.

Grok 3

Most flexible with weird/creative prompts. Less safety-tuned than competitors.
Quality slightly behind Veo but motion handling is excellent.
Best for: experimental ideas, surreal scenes, when other models refuse the prompt.

A complete prompt template you can steal

Copy this and fill in the brackets. It produces strong output on every modern model:

[Subject doing specific action] in [setting], [time of day and lighting quality]. Camera [movement]. [One environmental detail in motion]. [Optional: shot on [lens/format]].

Examples:

A lone astronaut planting a flag on a red dune in golden afternoon light. Camera slowly orbits clockwise. Dust kicked up by wind catches the low sun. Shot on 35mm anamorphic.

Chef plating a dish in a busy restaurant kitchen at night under fluorescent overhead light. Camera medium shot, slight handheld. Steam rising from the plate. Shot on 50mm lens.

Surfer paddling into a perfect dawn wave, blue hour. Tracking shot from a drone above and behind. Spray rising from the board. Shot on telephoto, compressed perspective.

Each of these specifies all five required elements without overstuffing. Try them in our free generator — copy any of the examples directly and see what you get.

The iteration mindset

The biggest unlock isn't a better prompt — it's accepting that you'll generate 3–5 versions before you get the one. Even on the closed frontier models, hit rate for "first try is the keeper" is maybe 30%. Plan for it. Generate variants. Pick the best. Then refine the prompt for the next attempt based on what you actually got versus what you wanted.

The people who get great AI video output aren't writing magic prompts on the first try. They're iterating fast and learning the model's personality. That's the actual skill — and it transfers between models.

Try it yourself

You can generate free clips on LoreMotion without an account to practice prompt writing — we run LTX-Video 2.3 on the free tier and offer the premium models (Veo, Kling, Grok) on credits. Try the same prompt across different models to see how each one interprets it. That's the fastest way to learn what each one is good at.

Also see: our model comparison for which model fits which use case, and our open-source model roundup if you're picking what to run yourself.