[{"data":1,"prerenderedAt":16},["ShallowReactive",2],{"article-wan-development-history":3},{"errorCode":4,"errorMessage":5,"data":6},"00000","Everything ok",{"title":7,"category":8,"path":9,"description":10,"keyword":11,"content":12,"prevPath":13,"nextPath":14,"gmtCreate":15,"gmtModified":15},"Wan (Tongyi Wanxiang) Development History: From Open-Source Video Gen to Wan 2.6",4,"wan-development-history","A timeline of Alibaba's Wan (Tongyi Wanxiang) open-source video generation models: Wan 2.1 breakthrough, Wan 2.2 MoE, Wan 2.5 native audio-visual sync, and Wan 2.6 multi-shot narrative—with benchmarks, capabilities, and open release details.","Wan, Tongyi Wanxiang, 通义万相, Alibaba video generation, Wan 2.1, Wan 2.2, Wan 2.5, Wan 2.6, open source video model, text-to-video, image-to-video, VBench, MoE video","\u003C!DOCTYPE html>\n\u003Chtml lang=\"en\">\n\u003Chead>\n    \u003Cmeta charset=\"UTF-8\">\n    \u003Cmeta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    \u003Ctitle>Wan (Tongyi Wanxiang) Development History: From Open-Source Video Gen to Wan 2.6\u003C/title>\n\u003C/head>\n\u003Cbody>\n    \u003Carticle class=\"ai-model-comparison\">\n        \u003Cheader>\n            \u003Ch1>Wan (Tongyi Wanxiang) Development History: From Open-Source Video Gen to Wan 2.6\u003C/h1>\n        \u003C/header>\n\n        \u003Csection class=\"introduction\">\n            \n            \u003Cp>\u003Ca href=\"https://www.fuseaitools.com/home/wan\">Wan\u003C/a> (Tongyi Wanxiang) is Alibaba's open-source video generation model family from the Tongyi lab. It delivers text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) with multi-shot 1080p output, stable characters, and native audio sync. This article traces its development from Wan 2.1 through Wan 2.6—benchmarks, architecture shifts, and open-release milestones.\u003C/p>\n        \u003C/section>\n\n        \u003Csection class=\"overview\">\n            \u003Ch2>2025: Wan 2.1 and the Open-Source Breakthrough\u003C/h2>\n            \n            \u003Cp>In early 2025, \u003Cstrong>Wan 2.1\u003C/strong> was released as a full open-source video generative model. In January 2025 it topped the VBench leaderboard, outperforming Sora, HunyuanVideo, and other leading video models. The technical paper \u003Cem>Wan: Open and Advanced Large-Scale Video Generative Models\u003C/em> was published in March 2025, and the code and weights were opened on GitHub, Hugging Face, and ModelScope under Apache 2.0.\u003C/p>\n\n            \u003Cdiv class=\"comparison-table\">\n                \u003Ctable>\n                    \u003Cthead>\n                        \u003Ctr>\n                            \u003Cth>Date\u003C/th>\n                            \u003Cth>Event\u003C/th>\n                        \u003C/tr>\n                    \u003C/thead>\n                    \u003Ctbody>\n                        \u003Ctr>\n                            \u003Ctd>Jan 2025\u003C/td>\n                            \u003Ctd>Wan 2.1 tops VBench; 1.3B and 14B models\u003C/td>\n                        \u003C/tr>\n                        \u003Ctr>\n                            \u003Ctd>Feb–Mar 2025\u003C/td>\n                            \u003Ctd>Wan 2.1 open source; 8 tasks including T2V, I2V, video edit\u003C/td>\n                        \u003C/tr>\n                        \u003Ctr>\n                            \u003Ctd>Mar 2025\u003C/td>\n                            \u003Ctd>Technical paper and full code/weights release\u003C/td>\n                        \u003C/tr>\n                    \u003C/tbody>\n                \u003C/table>\n            \u003C/div>\n        \u003C/section>\n\n        \u003Csection class=\"performance-section\">\n            \u003Ch2>Wan 2.1: Scale, Efficiency, and Tasks\u003C/h2>\n            \n            \u003Cdiv class=\"image-container right-aligned\">\n                \u003Cimg src=\"https://media.fuseaitools.com/news/image/pexels-photo-2696249.jpeg\" \n                     alt=\"Wan video model 3D VAE and diffusion architecture\" \n                     width=\"600\" height=\"400\">\n                \u003Cp class=\"image-caption\">Video generative architecture and 3D causal VAE\u003C/p>\n            \u003C/div>\n\n            \u003Ch3>Model Sizes and Efficiency\u003C/h3>\n            \u003Cp>Wan 2.1 offers two scales: \u003Cstrong>1.3B\u003C/strong> for speed and low VRAM (about 8.19 GB), and \u003Cstrong>14B\u003C/strong> for best quality. The 1.3B model runs on consumer GPUs such as RTX 4090 with text-to-video latency in the single-digit seconds for a 4-second clip. The 14B model leads on internal and external benchmarks against other open and commercial video models.\u003C/p>\n            \n            \u003Ch3>Technical Highlights\u003C/h3>\n            \u003Cul>\n                \u003Cli>\u003Cstrong>3D causal VAE:\u003C/strong> Cuts VRAM use by about 60% while keeping visual quality\u003C/li>\n                \u003Cli>\u003Cstrong>Bilingual captions:\u003C/strong> First open video model to generate Chinese and English on-screen text\u003C/li>\n                \u003Cli>\u003Cstrong>Eight tasks:\u003C/strong> Text-to-video, image-to-video, instruction-guided video editing, personalized generation, and more\u003C/li>\n            \u003C/ul>\n\n            \u003Cdiv class=\"comparison-table\">\n                \u003Ctable>\n                    \u003Cthead>\n                        \u003Ctr>\n                            \u003Cth>Model\u003C/th>\n                            \u003Cth>Parameters\u003C/th>\n                            \u003Cth>Focus\u003C/th>\n                        \u003C/tr>\n                    \u003C/thead>\n                    \u003Ctbody>\n                        \u003Ctr>\n                            \u003Ctd>Wan 2.1 small\u003C/td>\n                            \u003Ctd>1.3B\u003C/td>\n                            \u003Ctd>Efficiency, &lt;8 s on RTX 4090 for 4s video\u003C/td>\n                        \u003C/tr>\n                        \u003Ctr>\n                            \u003Ctd>Wan 2.1 large\u003C/td>\n                            \u003Ctd>14B\u003C/td>\n                            \u003Ctd>Quality, top VBench and benchmarks\u003C/td>\n                        \u003C/tr>\n                    \u003C/tbody>\n                \u003C/table>\n            \u003C/div>\n        \u003C/section>\n\n        \u003Csection class=\"technical-improvements\">\n            \u003Ch2>Wan 2.2, 2.5, and 2.6: MoE, Audio, and Multi-Shot\u003C/h2>\n            \n            \u003Cdiv class=\"image-container\">\n                \u003Cimg src=\"https://media.fuseaitools.com/news/image/pexels-photo-1181677.jpeg\" \n                     alt=\"Wan 2.5 audio-visual sync and multi-modal video\" \n                     width=\"800\" height=\"450\">\n                \u003Cp class=\"image-caption\">Native audio-visual sync and multi-modal video generation\u003C/p>\n            \u003C/div>\n\n            \u003Ch3>Wan 2.2 (July 2025)\u003C/h3>\n            \u003Cp>Wan 2.2 introduced \u003Cstrong>MoE (mixture of experts)\u003C/strong> into open video generation. A 5B-parameter variant runs smoothly on consumer cards (e.g. RTX 4090) at 720p@24fps; a 27B version targets professional-grade visuals.\u003C/p>\n\n            \u003Ch3>Wan 2.5 (September 2025)\u003C/h3>\n            \u003Cp>Wan 2.5 added \u003Cstrong>native audio-visual sync\u003C/strong> in a single pipeline for vision, language, and sound. It supports 480p, 720p, and 1080p, with video length up to 10 seconds, and accepts text, image, and audio in any combination as input.\u003C/p>\n\n            \u003Ch3>Wan 2.6 (December 2025)\u003C/h3>\n            \u003Cp>Wan 2.6 focuses on \u003Cstrong>multi-shot narrative\u003C/strong> and \u003Cstrong>character consistency\u003C/strong>, supports up to 15-second videos, and further improves audio-visual synchronization.\u003C/p>\n\n            \u003Cdiv class=\"comparison-table\">\n                \u003Ctable>\n                    \u003Cthead>\n                        \u003Ctr>\n                            \u003Cth>Release\u003C/th>\n                            \u003Cth>Key feature\u003C/th>\n                            \u003Cth>Max duration / resolution\u003C/th>\n                        \u003C/tr>\n                    \u003C/thead>\n                    \u003Ctbody>\n                        \u003Ctr>\n                            \u003Ctd>Wan 2.2\u003C/td>\n                            \u003Ctd>MoE for video\u003C/td>\n                            \u003Ctd>5B / 27B; 720p@24fps on consumer GPU\u003C/td>\n                        \u003C/tr>\n                        \u003Ctr>\n                            \u003Ctd>Wan 2.5\u003C/td>\n                            \u003Ctd>Native A/V sync, multi-modal input\u003C/td>\n                            \u003Ctd>Up to 10 s; 480p / 720p / 1080p\u003C/td>\n                        \u003C/tr>\n                        \u003Ctr>\n                            \u003Ctd>Wan 2.6\u003C/td>\n                            \u003Ctd>Multi-shot, character consistency\u003C/td>\n                            \u003Ctd>Up to 15 s\u003C/td>\n                        \u003C/tr>\n                    \u003C/tbody>\n                \u003C/table>\n            \u003C/div>\n        \u003C/section>\n\n        \u003Csection class=\"conclusion\">\n            \u003Ch2>Summary: Why Wan Matters\u003C/h2>\n            \n            \u003Cp>Wan establishes Alibaba Tongyi as a major contributor to open video generation. Full code and weights (Apache 2.0), strong benchmarks (e.g. VBench), and a clear path from 2.1 to 2.6—with MoE, native audio, and longer multi-shot narratives—make it a go-to option for researchers and builders who need high-quality, affordable video generation.\u003C/p>\n            \n            \u003Cdiv class=\"key-takeaways\">\n                \u003Ch3>Key Takeaways\u003C/h3>\n                \u003Cul>\n                    \u003Cli>Wan 2.1 (2025): First full open-source release; 1.3B and 14B; topped VBench; 8 tasks including T2V, I2V, editing\u003C/li>\n                    \u003Cli>Wan 2.2: MoE for video; 5B runs on RTX 4090; 27B for pro quality\u003C/li>\n                    \u003Cli>Wan 2.5: Native audio-visual sync; 10 s; 480p–1080p; text/image/audio input\u003C/li>\n                    \u003Cli>Wan 2.6: Multi-shot narrative, character consistency, 15 s\u003C/li>\n                    \u003Cli>All code and weights open under Apache 2.0 on GitHub, Hugging Face, ModelScope\u003C/li>\n                \u003C/ul>\n            \u003C/div>\n\n            \u003Cp class=\"article-cta\">Try \u003Ca href=\"https://www.fuseaitools.com/home/wan\">Wan on FuseAITools\u003C/a> for text-to-video, image-to-video, and video-to-video generation up to 15 seconds with 1080p and native audio.\u003C/p>\n        \u003C/section>\n\n        \u003Cfooter class=\"article-footer\">\n            \u003Cp>\u003Cstrong>Disclaimer:\u003C/strong> Timeline and capabilities are based on public announcements and the paper \u003Cem>Wan: Open and Advanced Large-Scale Video Generative Models\u003C/em>; see Alibaba Tongyi and official repositories for authoritative details.\u003C/p>\n            \u003Cp class=\"update-note\">This article will be updated as new Wan versions and benchmarks are released.\u003C/p>\n        \u003C/footer>\n    \u003C/article>\n\n    \u003Cstyle>\n        article.ai-model-comparison,\n.article-body.html-content article {\n  max-width: 100%;\n  width: 100%;\n  box-sizing: border-box;\n  background: transparent;\n  padding: 0;\n  margin: 0;\n}\n\n.article-body.html-content section {\n  width: 100%;\n  max-width: 100%;\n  box-sizing: border-box;\n  margin-bottom: 1.5rem;\n}\n\n.article-body.html-content h1 {\n  font-size: 2rem;\n  font-weight: 700;\n  color: #1f2937;\n  margin: 0 0 1rem;\n  line-height: 1.25;\n}\n\n.article-body.html-content h2 {\n  font-size: 1.75rem;\n  font-weight: 600;\n  color: #1f2937;\n  margin: 2rem 0 1rem;\n  padding-bottom: 0.5rem;\n  border-bottom: 1px solid #e5e7eb;\n}\n\n.article-body.html-content h3 {\n  font-size: 1.5rem;\n  font-weight: 600;\n  color: #1f2937;\n  margin: 1.5rem 0 0.75rem;\n}\n\n.article-body.html-content p {\n  margin-bottom: 1.25rem;\n  color: #374151;\n  line-height: 1.7;\n}\n\n.article-body.html-content ul,\n.article-body.html-content ol {\n  margin: 1rem 0 1.5rem 1.5rem;\n  padding-left: 1.5rem;\n}\n\n.article-body.html-content li {\n  margin-bottom: 0.5rem;\n}\n\n.article-body.html-content .image-container,\n.article-body.html-content figure {\n  width: 100%;\n  max-width: 100%;\n  margin: 1.5rem 0;\n  box-sizing: border-box;\n}\n\n.article-body.html-content .image-container img,\n.article-body.html-content img {\n  max-width: 100%;\n  width: 100%;\n  height: auto;\n  display: block;\n  border-radius: 8px;\n  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.06);\n}\n\n.article-body.html-content .image-caption {\n  font-size: 0.875rem;\n  color: #6b7280;\n  margin-top: 0.5rem;\n  font-style: italic;\n  text-align: center;\n}\n\n.article-body.html-content .comparison-table {\n  width: 100%;\n  max-width: 100%;\n  overflow-x: auto;\n  margin: 1.5rem 0;\n  -webkit-overflow-scrolling: touch;\n}\n\n.article-body.html-content table {\n  width: 100%;\n  max-width: 100%;\n  border-collapse: collapse;\n  font-size: 0.9375rem;\n}\n\n.article-body.html-content th,\n.article-body.html-content td {\n  padding: 12px 16px;\n  text-align: left;\n  border: 1px solid #e5e7eb;\n}\n\n.article-body.html-content th {\n  background: #f8fafc;\n  font-weight: 600;\n  color: #1f2937;\n}\n\n.article-body.html-content tr:hover {\n  background: #fafafa;\n}\n\n.article-body.html-content .key-takeaways {\n  width: 100%;\n  max-width: 100%;\n  padding: 1.25rem 1.5rem;\n  background: #f8fafc;\n  border-radius: 8px;\n  border-left: 4px solid #667eea;\n  box-sizing: border-box;\n}\n\n.article-body.html-content .article-cta {\n  margin-top: 1.5rem;\n  font-weight: 500;\n}\n\n.article-body.html-content .article-footer {\n  margin-top: 2rem;\n  padding-top: 1.5rem;\n  border-top: 1px solid #e5e7eb;\n  font-size: 0.875rem;\n  color: #6b7280;\n}\n    \u003C/style>\n\u003C/body>\n\u003C/html>","seedance-emotional-brand-story-text-to-video-tutorial","","2026-03-12 07:18:19",1775264344029]