2020年之前,图像生成被GAN统治。GAN生成快但训练不稳定、模式崩塌、缺乏多样性。扩散模型提供了一个全新的思路:与其直接生成,不如先学会去噪。
灵感来自非平衡热力学——墨水滴入水中逐渐扩散(加噪),如果我们学会逆转这个过程(去噪),就能从噪声中"提取"出清晰的图像。
给定一张清晰图像x_0,逐步添加高斯噪声,经过T步后变成纯噪声x_T:
q(x_t | x_{t-1}) = N(x_t; √(1-β_t) · x_{t-1}, β_t · I)
其中β_t是预设的噪声方差表(如从0.0001线性增长到0.02)。
关键性质——可以一步跳到任意时刻:
q(x_t | x_0) = N(x_t; √ᾱ_t · x_0, (1-ᾱ_t) · I)
其中 ᾱ_t = ∏(1-β_s)
训练一个神经网络来预测每一步添加的噪声,从而逐步去噪:
p_θ(x_{t-1} | x_t) = N(x_{t-1}; μ_θ(x_t, t), Σ_θ(x_t, t))
核心简化:不直接预测x_0,而是预测噪声ε:
# 模型预测噪声
ε_θ = U-Net(x_t, t)
# 从x_t推导x_{t-1}
μ_θ = (1/√α_t) · (x_t - (β_t/√(1-ᾱ_t)) · ε_θ)
这是DDPM最精妙的设计之一:
DDPM使用修改版U-Net作为去噪网络:
| 数据集 | 指标 | DDPM成绩 | 之前最佳 |
|---|---|---|---|
| CIFAR-10 | FID | 3.17 | ~3.5 (StyleGAN2) |
| CIFAR-10 | IS | 9.46 | ~9.2 |
| LSUN 256×256 | 样本质量 | ≈ProgressiveGAN | — |
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17.
评论区