Introduction: The HappyHorse Video Creation Ecosystem
In 2026, HappyHorse is building a complete AI video matrix: from text-to-video and image animation to multi-character reference generation and precision video editing. Compared with many other platforms, HappyHorse stands out in two areas: character-consistent reference workflows and long-duration video editing.
Many users still face the same question when choosing between v1-text-to-video, v1-image-to-video, v1-reference-to-video, and v1-video-edit: what is the actual difference, and which version fits my task?
This article maps all four HappyHorse variants using complete API parameter structures so you can make route-level decisions quickly.
HappyHorse tool hub: /home/happy-horse
I. Family Snapshot: One Table to Understand All Four
| No. | Model version | Core function | Input type | Duration | Unique advantage | Best scenario |
|---|---|---|---|---|---|---|
| 1 | v1-text-to-video | Text to video | Prompt | 3-15s | Up to 15s + multi-ratio support | Concept clips, creative ads |
| 2 | v1-image-to-video | Image to video | 1 image + optional prompt | 3-15s | First-frame driven animation | Animate static visuals |
| 3 | v1-reference-to-video | Reference image to video | 1-9 images + prompt | 3-15s | Multi-character consistency | IP/character consistency videos |
| 4 | v1-video-edit | Video editing | 1 video + 0-5 refs | 3-60s | Up to 60s editing + audio retain | Local modification and replacement |
II. Four Models Deep Dive
Model 1: v1-text-to-video
This is HappyHorse's base text-to-video model for direct prompt-driven generation.
{
"prompt": "Text prompt for scene/style (max 5000 chars)",
"resolution": "720p / 1080p",
"aspect_ratio": "16:9 / 9:16 / 1:1 / 4:3 / 3:4",
"duration": 5,
"seed": "Optional (0-2147483647)"
}
Key points: duration supports 3-15s, five aspect ratios, bilingual prompts up to 5000 characters.
Use cases: concept videos, creative ads, product demos, atmosphere clips.
Model 2: v1-image-to-video
This model generates motion from one first-frame image plus an optional prompt.
{
"prompt": "Optional text prompt for constraints (max 5000 chars)",
"image_urls": ["First-frame image URL (required, exactly one)"],
"resolution": "720p / 1080p",
"duration": 5,
"seed": "Optional"
}
Image limits: JPEG/JPG/PNG/WEBP; width and height >= 300px; ratio 1:2.5 to 2.5:1; file <= 10MB.
Key points: prompt is optional; without prompt, motion is driven mainly by image content; duration 3-15s.
Use cases: animate old photos, illustration animation, dynamic product visuals.
Model 3: v1-reference-to-video (Signature Feature)
This is HappyHorse's signature model for multi-character or multi-object consistency based on ordered reference images.
{
"prompt": "Use character1/character2/... to refer to subjects (max 5000 chars)",
"image_urls": ["Reference images mapped to character1, character2..."],
"resolution": "720p / 1080p",
"aspect_ratio": "16:9 / 9:16 / 1:1 / 4:3 / 3:4",
"duration": 5,
"seed": "Optional"
}
Core mechanism: 1-9 images; order matters; prompt uses character1/character2 references; model keeps appearance consistency.
Image requirements: JPEG/JPG/PNG/WEBP; short side >= 400px (720p+ recommended); avoid blurry or over-compressed images; <= 10MB.
Use cases: character-consistent videos, multi-character ads, IP animation.
Model 4: v1-video-edit (Signature Feature)
This is the dedicated HappyHorse editing model with up to 60-second edit length and natural-language instruction control.
{
"prompt": "Natural language edit instruction (max 5000 chars)",
"video_url": "Input video URL (required, one video)",
"reference_image": "Optional reference image URLs (0-5)",
"resolution": "720p / 1080p",
"audio_setting": "auto / origin",
"seed": "Optional"
}
Video limits: MP4/MOV (H.264 recommended); 3-60s; long side <= 2160px; short side >= 320px; ratio 1:2.5 to 2.5:1; file <= 100MB; fps > 8.
Reference limits: 0-5 images; JPEG/JPG/PNG/WEBP; width and height >= 300px; ratio 1:2.5 to 2.5:1; <= 10MB.
Use cases: outfit swap, background replacement, style transfer, local edits.
III. Four-Model Comparison Summary
3.1 Core feature matrix
| Feature | v1-text-to-video | v1-image-to-video | v1-reference-to-video | v1-video-edit |
|---|---|---|---|---|
| Text to video | ✅ | ❌ | ❌ | ❌ |
| Image to video | ❌ | ✅ | ❌ | ❌ |
| Multi-character reference | ❌ | ❌ | ✅ | ❌ |
| Video editing | ❌ | ❌ | ❌ | ✅ |
| Aspect ratio control | ✅ | ❌ | ✅ | ❌ |
| Audio retain | ❌ | ❌ | ❌ | ✅ |
| Max duration | 15s | 15s | 15s | 60s |
| Reference image count | - | 1 | 1-9 | 0-5 |
3.2 Parameter complexity matrix
| Model | Required params | Optional params | Learning curve | Best for |
|---|---|---|---|---|
| v1-text-to-video | 1 | 4 | Easy | Beginners |
| v1-image-to-video | 1 | 4 | Easy | Beginners |
| v1-reference-to-video | 2 | 4 | Medium | Character consistency workflows |
| v1-video-edit | 2 | 4 | Medium | Precise editing workflows |
3.3 Input/output matrix
| Model | Input | Output duration | Resolution |
|---|---|---|---|
| v1-text-to-video | Prompt | 3-15s | 720p/1080p |
| v1-image-to-video | 1 image + optional prompt | 3-15s | 720p/1080p |
| v1-reference-to-video | 1-9 images + prompt | 3-15s | 720p/1080p |
| v1-video-edit | 1 video + 0-5 images + prompt | 3-60s | 720p/1080p |
IV. Model Selection Decision Tree
What is your task?
|
|-- Generate a new video from scratch
| |-- No reference image -> v1-text-to-video
| |-- One reference image (animate still image) -> v1-image-to-video
| `-- 1-9 reference images (character consistency) -> v1-reference-to-video
|
`-- Edit an existing video
`-- Need local edit / outfit swap / background swap -> v1-video-edit
|-- Text instruction only -> no reference image
`-- Need style/outfit reference -> pass 1-5 reference_image entries
V. Unique Advantages of HappyHorse
5.1 Up to 60-second video editing
Most video editing models focus on 5-15 second clips. HappyHorse v1-video-edit supports up to 60 seconds, making it stronger for longer-form local modifications.
5.2 Multi-character reference system
The character1/character2 mapping mechanism in v1-reference-to-video enables practical multi-character consistency for ads, animation, and IP content.
5.3 Flexible reference-image ranges
- Image-to-video: 1 first-frame image
- Reference-to-video: 1-9 reference images
- Video-edit: 0-5 reference images
These ranges cover lightweight, medium, and advanced production needs in one product family.
VI. Final Recommendations
| Use case | Recommended model | Core reason |
|---|---|---|
| Fast text-to-video | v1-text-to-video | Simple setup + 5 aspect ratios |
| Animate still images | v1-image-to-video | Optional prompt and flexible control |
| Character-consistent videos | v1-reference-to-video | Signature 1-9 image reference system |
| Local video edit / replacement | v1-video-edit | Up to 60s edit length + audio retain |
One-line summary:
- Daily prompt-to-video: v1-text-to-video
- Static image animation: v1-image-to-video
- Multi-character consistency: v1-reference-to-video (HappyHorse signature)
- Video editing and outfit/background swap: v1-video-edit (up to 60s)
Ready to start? All four parameter sets can run directly in HappyHorse routes. Explore from the hub page: /home/happy-horse.
