AI Video Prompt Engineering — How to Actually Get the Shot You Want

2026-05-28 · 8 min read · LoreMotion Team

A practical guide to writing prompts for LTX-Video, Veo, Kling, and other modern text-to-video models. Camera language, motion descriptions, lighting, and the specific words that change output quality.

Most "AI video prompt guides" are recycled image prompt guides with the word "cinematic" sprinkled on top. They miss the things that actually matter for video: motion, camera behaviour, temporal coherence, scene continuity. After running tens of thousands of generations across LTX-Video, Veo 3.1, Kling, and Grok 3, here's what we've actually learned moves the quality needle.

This guide is structured by the kinds of mistakes people make most often. Skip the sections that don't apply to you.

The single biggest mistake: prompting like it's an image

Image models reward dense, layered descriptions. Video models punish them.

Consider this image-style prompt:

A majestic golden retriever sits in a sun-drenched meadow at golden hour, lens flare, shallow depth of field, bokeh, intricately detailed fur with individual whiskers visible, cinematic composition, 8k, masterpiece, trending on artstation

Fed to a video model, this produces a beautiful first frame and then 4 seconds of weirdly static dog with eyes that don't blink. Why? Because the prompt didn't describe a single moment of motion. The model has to invent what happens between frames, and absent guidance, it invents "almost nothing."

The same scene, rewritten for video:

Golden retriever in a meadow at golden hour. Camera slowly pushes in. Dog turns its head left to follow a butterfly, ears lifting. Soft summer breeze moves the grass and the dog's fur. Warm rim light from low sun.

Notice what's different: every sentence either describes camera movement, subject movement, or environmental movement. The model now has things to do across the timeline. The output goes from "still photo with light shimmer" to "actual short film."

The five things every video prompt should specify

  1. Subject and setting — what is the thing, where is it
  2. Camera behaviour — what does the lens do
  3. Subject motion — what does the subject do
  4. Environmental motion — what moves in the background
  5. Lighting and atmosphere — mood and quality of light

You don't always need all five, but if you have fewer than three, expect a static or chaotic output.

Camera language that actually works

Modern video models understand cinematography vocabulary remarkably well. The terms that produce reliable, predictable behaviour:

Terms that work less reliably:

A reliable structure: [subject doing thing], [camera behaviour], [lighting].

A samurai walks through bamboo forest, slow dolly in following from behind, dappled light through leaves

This works on every modern video model.

Lens and framing — the cheat code

If you want to look like you know what you're doing, specify a lens and a shot type:

These work because the training data is full of films and stock footage tagged with this vocabulary. The models genuinely understand what a 24mm wide shot looks like versus an 85mm close-up.

Motion vocabulary — being specific saves you

"The dog runs" produces wildly variable output. "The dog runs left to right across the frame, paws kicking up dust" produces the shot you imagined.

Useful motion specifiers:

Compare:

A car driving on a road

versus

A red Porsche 911 drives at high speed toward camera on a winding mountain road, tires gripping each turn, headlights cutting through dusk

The second prompt has the same word count but specifies brand (gives the model a clear reference), speed, direction, environment detail, time of day. The output is 10x more controlled.

Lighting — the difference between amateur and pro output

Lighting is where AI video either looks like a stock clip or a real film. Vocabulary that works:

Stack 2–3 lighting words. "Cinematic" doesn't work. "Moody chiaroscuro lit by single candle from below" works.

Style anchors that don't backfire

Style words can wreck a video if they're too dominant — the model tries to make the entire output look like that style and forgets about your subject. Use sparingly and prefer concrete references:

Works well:

Backfires:

What every model is bad at (manage your expectations)

Even the best 2026 models struggle with:

The trick: write prompts that play to the model's strengths. Single character, clear motion, well-defined environment, good lighting. That's the recipe for output that doesn't look "AI-generated."

Model-specific quirks

After running each of these in production, the cheat sheet:

LTX-Video 2.3

Wan 2.1

Veo 3.1 (Fast/Lite)

Kling 2.5/3

Grok 3

A complete prompt template you can steal

Copy this and fill in the brackets. It produces strong output on every modern model:

[Subject doing specific action] in [setting], [time of day and lighting quality]. Camera [movement]. [One environmental detail in motion]. [Optional: shot on [lens/format]].

Examples:

A lone astronaut planting a flag on a red dune in golden afternoon light. Camera slowly orbits clockwise. Dust kicked up by wind catches the low sun. Shot on 35mm anamorphic.

Chef plating a dish in a busy restaurant kitchen at night under fluorescent overhead light. Camera medium shot, slight handheld. Steam rising from the plate. Shot on 50mm lens.

Surfer paddling into a perfect dawn wave, blue hour. Tracking shot from a drone above and behind. Spray rising from the board. Shot on telephoto, compressed perspective.

Each of these specifies all five required elements without overstuffing. Try them in our free generator — copy any of the examples directly and see what you get.

The iteration mindset

The biggest unlock isn't a better prompt — it's accepting that you'll generate 3–5 versions before you get the one. Even on the closed frontier models, hit rate for "first try is the keeper" is maybe 30%. Plan for it. Generate variants. Pick the best. Then refine the prompt for the next attempt based on what you actually got versus what you wanted.

The people who get great AI video output aren't writing magic prompts on the first try. They're iterating fast and learning the model's personality. That's the actual skill — and it transfers between models.

Try it yourself

You can generate free clips on LoreMotion without an account to practice prompt writing — we run LTX-Video 2.3 on the free tier and offer the premium models (Veo, Kling, Grok) on credits. Try the same prompt across different models to see how each one interprets it. That's the fastest way to learn what each one is good at.

Also see: our model comparison for which model fits which use case, and our open-source model roundup if you're picking what to run yourself.