扫码阅读

手机扫码阅读

生成一个好故事！StoryDiffusion:一致自注意力和语义运动预测器必不可少（南开&字节）

312 2024-10-22

我们非常重视原创文章，为尊重知识产权并避免潜在的版权问题，我们在此提供文章的摘要供您初步了解。如果您想要查阅更为详尽的内容，访问作者的公众号页面获取完整文章。

查看原文：生成一个好故事！StoryDiffusion:一致自注意力和语义运动预测器必不可少（南开&字节）

文章来源：

AI生成未来

扫码关注公众号

Article Summary

Article Summary:

Introduction

Diffusion models have shown exceptional potential in content generation, such as images, 3D objects, and videos. However, maintaining a consistent theme, such as consistent identity and attire in images and videos that describe a story, is challenging. The objective of this paper is to find a method that can generate images and videos with consistent characters in terms of identity and attire while maximizing text prompt controllability by the user.

Related Work

Diffusion models have dominated the field of generative modeling due to their ability to generate realistic images. The applications have expanded to various fields including image and video generation, 3D modeling, and low-level vision tasks. Text-to-image generation has seen significant advances with models like Latent Diffusion, DiT, and Stable XL.

Method

The paper introduces two novel components: Consistent Self-Attention, which generates images with thematic consistency without training, and Semantic Motion Predictor for video generation. The framework, called StoryDiffusion, can describe text-based stories with consistent images or videos covering a wide range of content.

Experiments

The proposed StoryDiffusion outperforms recent methods in generating consistent images and stable long videos. The method is evaluated qualitatively and quantitatively against other state-of-the-art methods. A user study further confirms StoryDiffusion's superior performance.

Conclusion

StoryDiffusion is a new method that generates consistent images for storytelling without training and transforms these images into videos. It aims to inspire future efforts in controllable image and video generation.

References

[1] STORYDIFFUSION: CONSISTENT SELF-ATTENTION FOR LONG-RANGE IMAGE AND VIDEO GENERATION

想要了解更多内容？

查看原文：生成一个好故事！StoryDiffusion:一致自注意力和语义运动预测器必不可少（南开&字节）

文章来源：

AI生成未来

扫码关注公众号

相关推荐

AIGC｜用ChatGPT有效打工的N种姿势

854

GPT ChatGPT AI 生成

在AI的世界里挖呀挖呀挖~

LaTeX数学公式排版，新手入门看这篇就够了

699

LaTeX document end begin

介绍如何使用LaTeX来表达复杂的概念和想法。

高保真+通用！视频扩散模型加持，稀疏视图重建任意场景！清华&港科大发布ReconX

291

点击下方卡片，关注“AI生成未来”作者：Fangfu Liu等?

WPS AI试用（与GPT、Claude参照对比）

651

金山办公的WPS AI已经开放申请了，申请网站https://ai.wps.cn/。大概一到两天就可以申请成功。

AI+Agent智能体，一切不需要跟人打交道的工作，正在以肉眼可见的速度消失

299

工作电商平台老婆

其实是一个正在席卷各个行业的趋势：重复性、低技术门槛、不需要与人打交道的工作，正在以惊人的速度被自动化和人工智能所取代。而这种变革，在大城市已经持续很久了，可能暂时还没传导到小县城，但其实用不了多久。

你的文生图模型可以秘密接收多模态提示了！南洋理工&腾讯最新提出EMMA

1

生成图像模型 EMMA

点击下方卡片，关注“AI生成未来”>>后台回复“

AIGC最新技术及资讯

199 篇文章

浏览 80.7K

AI生成未来的其他文章

ECCV2024｜LightenDiffusion 超越现有无监督方法，引领低光图像增强新纪元！

点击下方卡片，关注“AI生成未来”>>后台回复??

2024年了,Diffusion模型还有什么可做的？

点击下方卡片，关注“AI生成未来”>>后台回复“

YYDS！数字人终于实现穿、脱衣自由!上大、腾讯等提出3D服装合成新方法：ClotheDreamer

.点击下方卡片，关注“AI生成未来”>>后台回复??

我的发型我做主！上交联合Tiamat发布首个基于扩散的商业级发型移植框架：Stable-Hair！

.点击下方卡片，关注“AI生成未来”>>后台回复?

ECCV`24 | 艺术文本和场景文本分割任务新SOTA 方法！华科&Adobe提出WAS！

点击下方卡片，关注“AI生成未来”>>后台回复“

随机阅读

公司级项目管理例会的汇报内容

COSMIC规模度量案例集四：业务应用软件案例—新增用户

Lehman的软件演化定律

例解：目标、性能基线与性能模型的关系

性能报告就是组织的实效改进故事！

加入社区微信群

与行业大咖零距离交流学习

PMO实践白皮书
白皮书上线