Complete Guide · Updated May 2026

Wan 3.0 Prompt Guide — Write Prompts That Work

The complete reference for structuring Wan 3.0 prompts — covering camera control, multi-shot AI Director mode, character consistency with Identity Lock, native audio, and the @reference syntax. Includes copy-paste templates for real production scenarios.

4K Native Output30-Second Single-Pass6-Shot AI Director12 @reference AssetsNative Stereo AudioApache 2.0 Open Source

01 — Prompt Anatomy

What Makes a Wan 3.0 Prompt Different

Wan 3.0 prompts control an entire video production pipeline — camera movement, character appearance, scene lighting, audio mix, and multi-shot structure — all from a single text input. Unlike image prompts or standard chatbot instructions, a well-structured Wan 3.0 prompt can reference up to 12 uploaded assets using @reference syntax, and trigger automated multi-shot editing via AI Director mode.

The anatomy of a complete prompt follows this five-segment structure:

[Scene + Lighting] -> Camera [Movement] -> [Subject + Appearance] -> [Audio Tone] -> @Image1 @Audio1
SegmentPurposeExample
Scene + LightingSets environment, time of day, atmosphererain-soaked Tokyo street at night, neon reflections on wet pavement
Camera MovementDefines framing and physical motion of the cameraCamera slowly pushes forward into frame
Subject + AppearanceDescribes character or object in specific detailwoman in red trench coat, dark shoulder-length hair
Audio ToneDirects dialogue, ambient sound, score, and effectsambient city traffic, distant jazz saxophone
@Reference TagsAnchors uploaded assets to specific prompt roles@Image1 @Video1 @Audio1

02 — Formula System

The Three Core Formulas

Every strong Wan 3.0 prompt maps to one of three formulas. Pick the right formula for your mode before writing a single word — it determines what Wan 3.0 optimizes for.

1

Base Formula — Text-to-Video & Image-to-Video

T2V · I2V
[Shot Type] of [Subject + Appearance], [Action], [Setting + Lighting]. Camera [Movement]. [Audio description]. [Style note]. 4K.
Medium tracking shot of a man in a navy wool coat standing at a fog-covered harbor pier at sunrise, gazing toward open water. Camera slowly cranes upward to reveal the full skyline. Low distant foghorns, gentle waves, orchestral score swells softly. Cinematic, 35mm film grain. 4K.
In T2V, your prompt is the only visual reference Wan 3.0 has. Lead with shot type, then subject appearance, then action — never lead with adjectives.
2

AI Director Formula — Multi-Shot Sequences

AI Director · 6-shot
Shot 1 [0–5s]: [Shot type] — [Scene content].
Shot 2 [5–12s]: [Shot type] — [Scene content].
Shot 3 [12–20s]: [Shot type] — [Scene content].
Shot 4 [20–28s]: [Shot type] — [Scene content].
Overall tone: [audio and mood description].
Character reference: @Image1.
Each shot gets its own shot type and time range. Wan 3.0 handles transitions and cross-shot consistency automatically. You do not need to describe transitions — they are inferred from shot-to-shot context.
3

Reference-Anchored Formula — Asset-Driven Generation

I2V · R2V · @reference
[Scene description]. Character appearance: @Image1. Camera style reference: @Video1. Background music tone: @Audio1. [Style and resolution notes].
Tag each reference by type and index — Image1 through Image9, Video1–Video3, Audio1–Audio3. Wan 3.0 resolves each tag to the matching uploaded asset. Upload order determines index number.

03 — Camera Prompts

Camera Control — Movement, Shot Type & Framing

Camera language is the single biggest lever in Wan 3.0 output quality. Prompts with explicit shot type and named camera movement consistently outperform generic descriptions. Use the vocabulary below exactly — Wan 3.0 is trained on these terms.

Shot Type Prompts

Always lead your prompt with the shot type. This anchors the spatial composition before any other instruction is applied.

Wide shot · Medium shot · Close-up · Extreme close-up · Over-the-shoulder shot · POV shot · Aerial shot · Dutch angle · Two-shot · Insert shot

Camera Movement Prompts

Name camera movements with standard cinematography terms. Generic descriptions like "moving camera" produce inconsistent results.

- Slow push in          -> camera moves toward subject
- Tracking shot         -> camera moves alongside moving subject
- Dolly zoom            -> zoom + physical move, creates vertigo
- Crane up              -> camera rises vertically, reveals skyline or context
- Crane down            -> camera descends, closes in on subject
- Orbit                 -> camera circles subject (specify clockwise/counterclockwise)
- Handheld              -> intentional shake, naturalistic, documentary feel
- Static locked         -> no movement, deliberate stillness
- Whip pan              -> fast horizontal snap between subjects
- Dutch tilt            -> angled frame, psychological unease
- Rack focus            -> shifts focus between foreground and background

Combined Camera Prompt Example

Extreme close-up of coffee being poured into a white ceramic cup on a marble countertop, morning light through frosted glass. Camera slowly orbits counterclockwise while rack focusing from steam to the cup surface. Rich coffee pour sounds, quiet kitchen ambience, gentle jazz in background. Warm golden color grade. 4K.

04 — Multi-Shot / AI Director

AI Director Mode — Writing 6-Shot Sequences

AI Director mode lets you specify up to 6 independent shots in a single generation pass. Wan 3.0 handles framing, transitions, and cross-shot character and environment consistency automatically — you define what happens in each shot, not how the cuts work.

Rules for AI Director Prompts

Four things to include in every AI Director prompt:

1. Shot number and time range for each shot [Shot 1 [0–6s]]
2. Shot type for each shot (Wide, Medium, Close-up, etc.)
3. A brief scene description per shot — one to two sentences max
4. An overall tone line at the end covering audio and mood

Full AI Director Example — Product Launch

Shot 1 [0–6s]: Aerial shot — overhead view of a minimalist product on white marble, soft studio light, camera slowly descends.
Shot 2 [6–12s]: Medium shot — hands carefully unbox the product, crisp packaging sounds, warm studio light.
Shot 3 [12–18s]: Close-up — product features highlighted with subtle camera orbit, macro lens detail.
Shot 4 [18–24s]: Over-the-shoulder shot — person using the product naturally in a home setting, golden hour light.
Shot 5 [24–30s]: Wide shot — product placed on table, camera pulls back slowly to reveal lifestyle context, fade to brand color.
Overall tone: minimal ambient sound design, no dialogue, understated orchestral swell at Shot 5. Color grade: clean, warm whites. 4K.
Character reference: @Image1.
Do not describe transitions between shots. Wan 3.0 infers transitions from the shot content. Adding "cut to" or "then" between shots can confuse the model and break sequence structure.

05 — Character Consistency

Identity Lock — Consistent Characters Across Sessions

Identity Lock saves a character's visual profile after the first generation. Referencing that profile in later sessions produces the same character in new scenes without re-describing appearance. This is Wan 3.0's answer to one of the biggest pain points in AI video: character drift across clips.

First Generation — Establishing the Character

On the first generation, describe the character in full physical detail. The more specific, the stronger the Identity Lock profile.

Medium shot of a woman, early 30s, East Asian features, straight shoulder-length black hair, warm skin tone, wearing a structured white blazer over a cream silk top. She stands at a glass-walled office, afternoon city light behind her. Camera holds static. Audio: quiet office ambience, no dialogue. 4K. [Save as: brand-spokesperson-01]

Subsequent Generations — Calling the Profile

Character: @brand-spokesperson-01. Wide shot of the same character walking through a sunlit park, casual weekend outfit — dark jeans, white linen shirt. Camera tracks alongside her. Audio: birdsong, light acoustic guitar, no dialogue. 4K.
You do not need to re-describe physical appearance when using a saved Identity Lock profile. Wan 3.0 pulls the full character definition from the saved profile. Only describe what changes — outfit, setting, action.

06 — Audio & Lip-Sync

Native Audio Prompts & Lip-Sync Across 12 Languages

Wan 3.0 generates audio — dialogue, ambient sound, effects, and background score — in the same pass as the video. No post-production audio sync is required. Your prompt controls every layer of the audio output separately.

Audio Prompt Structure

Audio: [Dialogue description or "no dialogue"]. [Ambient environment sound]. [Sound effects]. [Score/music description]. [Room tone or acoustic space].

Audio Prompt Examples by Use Case

- Product reveal:
  Audio: no dialogue. Studio silence. Subtle product design sound (click, texture). Cinematic bass hum. Luxury brand aesthetic.

- Spokesperson (lip sync from @Audio1):
  Audio: phoneme-accurate lip sync to @Audio1. Natural room reverb, softbox lighting acoustic. No background music.

- Short film scene:
  Audio: sparse dialogue between two characters — tense, quiet. Interior ambient hum. No score until Shot 4.

- Social content:
  Audio: trending upbeat lo-fi instrumental. No dialogue. Satisfying product interaction sounds at key moments.

- Multilingual ad (12 languages supported):
  Audio: lip sync to @Audio1 (Spanish voiceover). Phoneme-accurate. Neutral room tone. No music.
Lip-sync works at the phoneme level across 12 languages including English, Spanish, French, German, Japanese, Korean, Mandarin, Arabic, and more. Upload your voiceover as @Audio1 and reference it in the audio segment of your prompt.

07 — Copy-Paste Templates

Ready-to-Use Wan 3.0 Prompt Templates

Adjust the bracketed fields and generate. Every template is structured for the right formula and audio syntax.

📦

Product Commercial — Luxury Reveal

T2V4K · 30s
Slow-motion product reveal of a [product name] rotating on a white marble pedestal, soft directional studio lighting, subtle smoke rising from below. Camera orbits clockwise at 45-degree angle. Close-up on key design details at 15s mark. Audio: deep cinematic bass hum, subtle product texture sound, no dialogue. Luxury brand aesthetic, desaturated background, product in full saturation. 4K.
📱

Social Content — TikTok / Reels (9:16)

T2V9:16 · 15s
Vertical 9:16 frame. Close-up of [subject action] in [setting], natural light. Handheld energy, dynamic. Quick cuts implied by subject movement. Audio: trending upbeat lo-fi music, no dialogue, satisfying sound effect at key moment. Text overlay space at bottom third. Warm vivid color grade. 4K. 15 seconds.
🎬

Short Film — 4-Shot Narrative (AI Director)

AI Director4K · 28s
Shot 1 [0–7s]: Wide establishing — [location], [time of day], no characters yet.
Shot 2 [7–15s]: Medium — [character description] enters frame, [action].
Shot 3 [15–22s]: Close-up — [emotional beat, object detail, or reaction].
Shot 4 [22–28s]: Wide pullback — [resolution or open ending].
Overall tone: sparse [genre] score, minimal dialogue, naturalistic ambient sound. 4K.
Character: @Image1.
🛒

E-Commerce Product Demo — Unboxing

I2V@Image1 required
Product reference: @Image1. The product is unboxed on a clean white surface, hands gently lift it from clean packaging. Camera slowly pushes in as the product is held up for inspection in bright studio light. Audio: satisfying unboxing sounds — tissue paper, crisp packaging material, subtle product weight sounds. No music, no dialogue. 4K.
🏢

Corporate Brand Video — 5-Shot (AI Director)

AI Director4K · 30s
Shot 1 [0–6s]: Aerial — [company HQ or city skyline], sunrise, camera descends slowly.
Shot 2 [6–14s]: Medium — diverse team collaborating around a table, natural office light, no posed expressions.
Shot 3 [14–20s]: Close-up — hands at keyboard or on product, focused detail work.
Shot 4 [20–26s]: Wide — open-plan office, energy and motion.
Shot 5 [26–30s]: Wide pullback — group moment, faces visible, natural smiles.
Brand color reference: @Image1 (logo). Audio: corporate orchestral lift, subtle, no dialogue. 4K.
🌍

Multilingual Spokesperson Ad — Lip Sync

I2V@Image1 + @Audio1
Character appearance: @Image1. Medium close-up — character speaks directly to camera in a modern, softly lit home studio setting. Phoneme-accurate lip sync to @Audio1 ([language] voiceover). Audio: natural room tone, slight reverb, no background music. Clean neutral background, no distractions. 4K. Suitable for multilingual ad campaign.
🏃

Fitness / Action Content

T2V4K · 20s
Medium tracking shot of an athlete in [sport/activity], outdoor setting, [time of day] light. Camera tracks alongside at matched speed. High-energy handheld feel with intentional motion blur on fast movement. Audio: fast rhythmic music — no lyrics — rising intensity, footstep sounds synced to movement, no dialogue. High contrast color grade, vivid colors. 4K. 20 seconds.
🍽️

Food & Beverage — Cinematic Pour

T2V4K · 12s
Extreme close-up of [food or drink] being prepared on [surface], natural window light from left. Slow motion at 60fps. Camera holds static then slow push in at 6s. Audio: satisfying preparation sounds — sizzle, pour, texture — no music, no dialogue. Warm golden color grade, shallow depth of field. 4K. 12 seconds.

08 — Weak vs Strong

Prompt Mistakes — and How to Fix Them

The most common Wan 3.0 prompt failures share a pattern: too vague, leading with adjectives, or missing camera and audio direction entirely. Here are six common mistakes with corrected versions.

Mistake 1 — Leading with adjectives instead of shot type

Weak
Beautiful cinematic video of a woman walking.

Adjectives do not anchor spatial composition. Wan 3.0 has no shot type to work from.

Strong
Medium tracking shot of a woman in a cream linen dress walking along a coastal cliff path at golden hour. Camera tracks alongside at walking pace. 4K.

Shot type first, then subject with detail, then action, then setting and light.

Mistake 2 — No camera movement specified

Weak
Close-up of coffee being poured. Morning light. Warm.
Strong
Extreme close-up of espresso being poured into a white ceramic cup on a marble surface, morning window light. Camera holds static then slowly pushes in at 5s. Audio: rich pour sounds, quiet kitchen ambience. 4K.

Mistake 3 — Missing audio direction

Weak
A woman running through a forest at dawn. 4K.
Strong
Medium tracking shot of a woman trail running through a dense green forest at dawn, mist rising. Camera tracks alongside at running pace, handheld energy. Audio: rhythmic footsteps on earth, birdsong, gentle instrumental score rising — no dialogue. 4K.

Mistake 4 — Describing transitions in AI Director mode

Weak
Shot 1: Wide shot of city. Cut to Shot 2: Medium shot of character entering. Then close-up of their face.

"Cut to" and "Then" confuse AI Director's sequence model.

Strong
Shot 1 [0–6s]: Wide — city skyline at dusk.
Shot 2 [6–14s]: Medium — character enters building lobby, confident stride.
Shot 3 [14–20s]: Close-up — face in elevator reflection, determined expression.

09 — Keyword Cheat Sheet

Wan 3.0 Prompt Keyword Reference

Copy these terms directly into your prompts. Wan 3.0 maps each to specific generation behavior.

Shot Types

Wide shotMedium shotClose-upExtreme close-upOver-the-shoulderPOV shotAerial shotDutch angleTwo-shotInsert shot

Camera Movement

Slow push inTracking shotDolly zoomCrane upCrane downOrbit clockwiseHandheldStatic lockedWhip panRack focus

Lighting

Golden hourBlue hourStudio softboxNatural window lightNeon reflectionsOvercast diffusedHard directionalBacklit silhouetteCandlelightFluorescent interior

Audio Cues

No dialogueNatural room tonePhoneme-accurate lip syncAmbient [environment]Orchestral swellLo-fi instrumentalCinematic bass humSound design onlyNo musicSparse score

Style / Grade

35mm film grainCinematicWarm golden gradeHigh contrastDesaturated mutedVivid saturatedClean studio whiteShallow depth of fieldAnamorphic lens flareSlow motion 60fps

@Reference Syntax

@Image1 — @Image9@Video1 — @Video3@Audio1 — @Audio3Character: @Image1Camera style: @Video1Background music: @Audio1Lip sync to: @Audio1Brand color: @Image1

10 — FAQ

Prompt Guide — Frequently Asked Questions

How long should a Wan 3.0 prompt be?

For T2V and I2V, 50–120 words is the optimal range. Long enough to anchor shot type, camera movement, subject appearance, and audio direction — short enough to stay unambiguous. AI Director prompts run longer because each shot gets its own line, but each individual shot description should stay under 20 words.

Does prompt order matter in Wan 3.0?

Yes, significantly. Wan 3.0 applies heavier weight to the first and last elements in your prompt. Lead with shot type and subject appearance — these are the highest-leverage positions. Audio and style notes work best at the end of the prompt after the main visual description is complete.

Can I use Chinese prompts in Wan 3.0?

Yes. Wan 3.0 processes prompts in both English and Chinese natively. For multilingual lip-sync output, provide the voiceover as @Audio1 in the target language and specify the language in the audio section of your prompt. The model supports 12 languages for phoneme-level lip sync.

How many @reference assets can I attach per generation?

Up to 12 reference assets per generation: Image1 through Image9 (9 images), Video1 through Video3 (3 video clips), and Audio1 through Audio3 (3 audio files). Each tag in your prompt maps to the uploaded asset by type and index number in order of upload.

Why is my multi-shot output losing character consistency between shots?

This usually happens in one of two ways: either no character reference image was attached, or the character description in each shot block is too different. In AI Director mode, attach your character photo as @Image1 and reference it with "Character: @Image1" in the overall tone line — not per individual shot. Let Identity Lock handle consistency across shots automatically.

What resolution should I specify in my prompt?

Add "4K" at the end of every prompt to activate native 4K output. For vertical social content, specify "9:16" alongside the 4K note. For standard widescreen, "16:9" is the default and does not need to be stated.

How do I generate a 30-second clip in a single pass?

Wan 3.0 supports up to 30 seconds in a single generation pass. To target the full 30 seconds, use AI Director mode with 4–6 shots filling the full time range (for example, Shot 1 [0–6s] through Shot 5 [24–30s]). Single-shot T2V prompts can also reach 30 seconds — specify "30 seconds" explicitly at the end of your prompt.

Ready to Generate?

Apply any template from this guide in the Wan 3.0 preview workflow, then follow updates as the full release opens up.