Video generation model by xAI on the autoregressive Aurora engine. Turns your image into a clip with built-in audio — speech, effects, and music. Up to 720p, 24 fps, up to 15 seconds, and 7 aspect ratios
Portrait of a woman by a cafe window: she smiles warmly and waves, the camera slowly pushes in to a close-up, with quiet city ambience and soft background music
Grok Imagine Video 1.5 by xAI shipped in late May 2026 (build dated May 30, 2026). The Grok Imagine line debuted in first place on the Artificial Analysis Video Arena — a blind pairwise comparison of video generators — ahead of Google Veo 3.1, Kling 2.5 Turbo, and Runway Gen-4.5. The arena is a live leaderboard and changes over time.
Source: Artificial Analysis Video Arena (artificialanalysis.ai). Rankings update as models and votes are added.
Four standout strengths of the xAI model
Grok generates video and audio together: dialogue with intonation and pauses, ambient sound, effects, and background music — synced with the visuals, with no separate audio step.
Upload a photo and the model turns it into a living clip with natural motion, camera movement, and transitions, keeping the composition and character recognizable.
The Aurora engine builds video frame by frame: smooth facial expressions, correct lighting, and motion physics deliver a lifelike picture with no jitter or scene drift.
Take the last frame of a finished clip and continue the scene — motion, character position, and lighting stay seamless for longer stories.
What you need to know about Grok Imagine Video 1.5
Crisp output up to 720p (1280×704), plus a fast 480p draft mode for tests.
Smooth cinematic motion at the standard 24 frames-per-second film rate.
Pick your clip length; defaults to about 8 seconds, longer via continuation.
1:1, 16:9, 9:16, 4:3, 3:4, 3:2, and 2:3 — for any platform and orientation.
Speech, sound effects, and music are generated together with the video in one pass.
The finished clip is universal MP4 — ready for social, ads, and editing.
From image to a clip with audio in a couple of minutes
Choose the photo or frame you want to animate. It becomes the first frame of your clip.
Tell the model what happens in the frame: action, camera, mood, lines, and sound. Set the duration and aspect ratio.
Grok Imagine Video 1.5 builds a clip with built-in audio in seconds. Download in MP4 for any commercial use.
Four scenarios where Grok Imagine Video 1.5 shines
Vertical 9:16 clips for Reels, TikTok, and YouTube Shorts — with motion and audio out of the box.
Product clips, promos, and ad creatives with sound — fast and without a film crew.
Turn static shots, portraits, and artwork into living scenes with natural motion.
Talking characters, ambient, and music clips — wherever synced sound matters, not silent video.
The convenient way to access the xAI model
No waitlists, no API keys, no setup. Open the editor, upload a photo, and create right away.
Credits are charged per generation, with no mandatory subscription to unused quota.
Every clip you create can be used in ads, client projects, and published content — no royalties.
Everything you need to know about Grok Imagine Video 1.5 on Clipia
It's a video generation model by xAI (Elon Musk's team) on the autoregressive Aurora engine. On Clipia it turns an uploaded image into a short clip with built-in audio — dialogue, effects, and background music.
The developer is xAI. The Grok Imagine Video 1.5 Preview build is dated May 30, 2026, and runs on the in-house Aurora engine.
Yes. Video and audio are created together in a single pass: speech, ambient, sound effects, and music are synced to the visuals — no separate audio step needed.
On Clipia, Grok Imagine Video 1.5 runs in image-to-video mode: you upload an image and the model animates it while preserving the composition and recognizability of the frame.
Up to 720p (1280×704) at 24 frames per second; there is also a fast 480p draft mode for tests.
From 1 to 15 seconds per generation (6–15 is ideal, the default is about 8). Longer scenes are built by continuing the clip.
Seven: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, and 2:3 — for landscape, vertical (Reels, TikTok, Shorts), and square clips.
Yes. The continuation feature takes the last frame of a clip and builds the scene further, preserving motion, character position, and lighting.
Grok Imagine Video 1.5 shipped in late May 2026 (build dated May 30, 2026). The Grok Imagine line debuted at #1 on the Artificial Analysis Video Arena (in both text-to-video and image-to-video), ahead of Veo 3.1, Kling 2.5 Turbo, and Runway Gen-4.5. The arena is a live leaderboard and changes over time.
Fine details — typography, fabric textures, complex patterns — may slightly drift during heavy motion, and very dense scenes are less stable than simple ones with clear action. The model is still in Preview status.