Skip to content
Clipia.
Home
What are we creating?

Pick a mode — Studio opens with the right prompt ready

All templates
Video
Text → videoDescribe a scene and get a clipImage → videoBring a frame to life with motionVideo templatesReady-made scenes and styles
Images
Text → imageGenerate an image from a promptEditModify an existing photoImage templatesReady-made product shots and art
BlogPricingFor Partners
Sign In
  • Home

  • Create Video

  • Create Image

  • Templates

  • My Works

  • Models

  • Support

Clipia.

Think differently — create the impossible.

Product

  • Create Image
  • Create Video
  • AI Models
  • Video Models
  • Image Models
  • Guides
  • Model Rankings
  • Balance

Support

  • About
  • Contact Us
  • Telegram Support

Legal

  • Terms of Service
  • Privacy Policy
  • Payment information
  • Cross-Border Transfers
  • Acceptable Use
  • Cookie Policy
  • Content License
  • Partner Agreement
Terms of Service·Privacy Policy·Cookie Policy·Acceptable Use
© 2026 Clipia.ai. All rights reserved.
  1. Home/
  2. Video Models/
  3. Wan 2.7
Wan 2.7 Video — Alibaba Tongyi

Wan 2.7 Video by Alibaba

Next-gen AI video generation by Alibaba Tongyi Lab. Four modes: Text-to-Video, Image-to-Video, Reference-to-Video, and VideoEdit. Up to 1080p, 15 seconds, thinking mode and audio support

Learn More
1080presolution
15 secmax duration
from 2credits/sec
Prompt

POV shot from inside a lunar lander cockpit, the spacecraft slowly crests over the far side of the Moon. The camera reveals an impossible sight: four green-skinned aliens casually sitting around a campfire on the lunar surface, drinking beer from Earth-brand bottles. No helmets, just Hawaiian shirts. One alien raises his bottle toward the camera in a friendly toast. Earth visible on the horizon. Warm orange campfire glow contrasting with cold blue moonlight. Steven Spielberg style, photorealistic.

→
Generation
AI
→
Result

Get Started with Wan 2.7

Three steps to your first video

1

Select Mode and Upload

Choose T2V, I2V, R2V, or VideoEdit. Upload source images, videos, or audio references depending on the mode.

2

Write Prompt and Configure

Describe the scene or editing instruction. Set resolution, duration, aspect ratio, and toggle thinking mode.

Generate and Download

Hit generate and receive your 1080p video in 30-90 seconds. Download for any commercial use.

Four Modes of Wan 2.7

Choose the right mode for your creative task

T2V

Text-to-Video

Generate video from a text prompt. Thinking mode for complex scenes, audio track support, prompt extension via LLM

720p / 1080p resolution2 to 15 seconds duration5 aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4)Thinking mode for better coherenceExternal audio track overlay
I2V

Image-to-Video

Animate any image or 9-grid composition. Control first and last frames, add driving audio for motion sync

Single image or 9-grid inputFirst and last frame control2 to 15 seconds durationDriving audio for motion sync720p / 1080p output
R2V

Reference-to-Video

Use up to 5 references: images, video clips, and audio. Voice cloning, lip sync, and motion replication for character consistency

Up to 5 reference inputsImage, video, and audio refsVoice clone and lip syncMotion replication from video2 to 10 seconds duration
VIDEOEDIT

VideoEdit

Edit existing video via text instructions. Style transfer, colorization, object replacement, and local edits without regenerating

Text instruction-based editingStyle transfer and colorizationObject and background replacementLocal area edits2 to 10 seconds source video

Key Features of Wan 2.7

What sets Wan 2.7 Video apart from the previous generation

Native 1080p Output

Full HD video by default. Clean detail, no upscaling artifacts. Suitable for social media, ads, and professional projects

First & Last Frame Control

Set the start and end points of your animation precisely. Ensures smooth motion arcs and consistent composition across the clip

9-Grid Image Input

Upload a 3x3 grid of reference shots. The model reads them as a storyboard and generates coherent multi-element scenes

Voice Clone & Character Reference

Provide an image and a voice sample. The model creates a speaking character with consistent appearance and cloned voice across generations

Instruction-Based Video Editing

Describe what to change in natural language. Swap styles, recolor, replace objects, or refine specific regions without touching the rest

Use Cases

What Wan 2.7 Video is built for

Short-Form Content

TikTok, Reels, Shorts in any aspect ratio. Rapid iteration from text or a single reference image

Advertising & Product Demos

Consistent characters and voice across ad variations. VideoEdit mode for fast creative iteration

Visual Storytelling

Character consistency via R2V references. Voice clone for narration. 9-grid storyboard input for complex scenes

Post-Production & Editing

Restyle footage, fix color grading, swap backgrounds, or adjust local details without re-shooting

Why Use Wan 2.7 on Clipia

The advantages of generating through our platform

No API Keys or Setup

Start generating immediately. No registration with third-party services, no quota management, no infrastructure to maintain

All Four Modes in One Place

T2V, I2V, R2V, and VideoEdit available from the same interface with unified credit-based pricing

20+ Models Under One Roof

Switch between Wan 2.7, Kling 3, Seedance 2, Veo 3, and other models without changing platforms or managing separate accounts

Frequently Asked Questions

Answers to common questions about Wan 2.7 Video

Wan 2.7 Video is the latest video generation model from Alibaba's Tongyi Lab. It supports four modes: Text-to-Video, Image-to-Video, Reference-to-Video, and VideoEdit. Output up to 1080p at 2-15 seconds with thinking mode, audio support, and prompt extension.

Four modes: T2V generates video from text with optional audio overlay. I2V animates images including 9-grid compositions with first/last frame control. R2V uses up to 5 references (image, video, audio) for voice clone and motion replication. VideoEdit edits existing video via text instructions.

Text prompts up to 5,000 characters. Images (single or 9-grid). Video clips for R2V and VideoEdit (2-10 sec). Audio files for driving audio, voice cloning, and audio overlay. Negative prompts up to 500 characters.

VideoEdit takes an existing video and a text instruction, then applies the change: style transfer (anime, oil painting, etc.), colorization, object replacement, background swap, or local region editing. Source video must be 2-10 seconds.

Use T2V when you start from scratch with a text description. Thinking mode helps with complex multi-element scenes. Use I2V when you have a specific starting image and want to animate it with precise control over the first and last frames.

Wan 2.7 adds two new modes (R2V and VideoEdit), increases max duration to 15 seconds, enables thinking mode for complex scenes, supports 5 aspect ratios (added 4:3, 3:4), allows prompts up to 5,000 characters, and defaults to 1080p resolution.

Typical generation time is 30-90 seconds depending on mode, resolution, and duration. T2V with thinking mode may take slightly longer. VideoEdit and R2V with multiple references are usually in the 60-120 second range.

Create Videos with Wan 2.7

Four generation modes, 1080p output, and thinking mode from Alibaba Tongyi Lab

No credit card required

Wan 2.7 Video — 1080p AI Video with 4 Modes | Clipia.ai