[개념 설명] Score-based Model

[Blog]

https://yang-song.net/blog/2021/score/
blog 작성자 - SDE의 저명한 저자인 Yang Song
NCSN, SDEs 등 Score-based model을 공부하기 전 이 blog를 통해 개념을 익히는 것 추천!!
이 blog를 리뷰하는 post가 될 예정

[Code]

https://colab.research.google.com/drive/120kYYBOVa1i0TD85RjlEkFjaWDxSFUx3?usp=sharing#scrollTo=0H1Rq5DTmW8o

NCSN의 noise scales을 infinity하게 늘릴 때, 발생하는 효과 ⇒ continuous noise scale
- higher quality samples
- exact log-likelihood computation
- controllable generation for inverse problem solving

Perturbing data with an SDE

noise scales 수를 infinity ⇒ peturbed data dist. with continuously growing levels of noise
continuous-time stochastic process를 통해 noise pertuabation 진행
stochastic process: 시간에 따라 변화하는 stochastic 현상
- ${X(t): t\in T}$

많은 Stochastic process는 stochastic differential equations(SDEs)의 solution

SDE form: $dx=f(x,t)dt +g(t)dw$
- $f(. ,t):R^d$→$R^d$: vector-valued function, drift coefficient
  - process의 방향성 결정 ⇒ 시간에 따라 평균적으로 증가/감소 방향성
- $g(t)\in R$: real-valued function, diffusion coefficient
  - process의 변동성 결정 $\sigma$ ⇒ noise 강도
- $w$: standard Brownian motion
- $dw$: infinitesimal white noise
  - 매우 작은 시간 간격 $dt$동안 발생하는 무작위 변화

SDE의 soluntion - continuous collection of r.v. ${x(t)}_{t\in[0,T]}$
- time t가 0~T까지 증가하는 동안의 stochastic trajectories를 trace함

$p_t(x)$: marginal prob. density ft of $x(t)$, $t\in[0,T]$
- noise scales 수를 finite ⇒ $p_t(x)=p_{\sigma_i}(x)$
- $p_0(x)=p(x)$ - no perturbation at t=0
- T 충분히 크면, $p_T(x) \approx\pi(x)$
  - $\pi(x)$: prior dist.
  - finite, $p_T(x)=p_{\sigma_L}(x)$ ⇒ largest noise perturbation $\sigma_L$

add noise perturbations & choice of SDEs는 not unique

ex) $dx=e^tdw$
- t에 따라 exponential 증가하는 variance를 가진 noise로 perturbation
- NCSN의 $N(0,\sigma^2_1I), ...,N(0,\sigma_L^2I)$과 유사

대표적인 SDE model
- Variance Exploding SDE (VE SDE)
- Variance Preserving SDE (VP SDE)
- Sub-VP SDE

Reversing the SDE for sample generation

finite noise scales case, annealed Langevin dynamics를 통해 sampling
infinite noise scales case, finite과 유사한 reverse SDE를 통해 sampling

모든 SDE는 대응되는 reverse SDE항상 존재 ⇒ closed form
- dt: negative infinitesimal time step ⇒ T에서 0으로 reverse 이동
- estimate $\nabla_xlogp_t(x)$ 필요함

Estimating the reverse SDE with score-based models and score matching

Reverse SDE를 풀기 위해 $p_T(x)$와 $\nabla_xlogp_t(x)$ 필요
$p_T(x)\approx\pi(x)$, $\pi(x)$는 fully tractable한 prior dist.
$\nabla_xlogp_t(x)\approx s_\theta(x,t)$, Time-dependent score-based model

Training objective for $s_\theta(x,t)$
- continuous weighted combination of Fisher divergences
- $\lambda$: $R$→$R_{>0}$, 보통은 $\lambda\propto 1/E[||\nabla_{x(t)}logp(x(t)|x(0))||^2_2]$로 사용
  - t에 따른 score matching loss의 크기를 균형있게 조절
- denoising score matching & sliced score matching을 이용하여 optimized

Estimated reverse SDE
- $x(T)\sim\pi$로 시작하여 $x(0)$까지 sampling
- $s_\theta(x,t)$ well-trained이면 $x(0)\sim p_\theta \approx p_0$

Connection between Fisher & KL divergence from $p_0$ to $p_\theta$ under some regularity conditions
- $\lambda(t)=g^2(t)$일 때, KL, Fisher divergences 사이 important connection 존재
- Fisher divergences를 minimize ⇒ KL upper bound minimize
  - maximizing likelihood for model training
- $\lambda(t)=g^2(t)$: likelihood weighting function
- 높은 likelihoods를 가지는 score-based models 얻을 수 있음
- ⇒ SOTA autoregressive models과 comparable or even superior

How to solve the reverse SDE

numerical SDE solvers를 통해 reverse SDE를 solve
- ⇒ sampling generation을 위한 reverse stochastic process를 simulate할 수 있음

Euler-Maruyama method - simplest numerical SDE solver
- Milstein method, stochastic Runge-Kutta methods등으로도 풀 수 있음
estimated reverse SDE에 적용 ⇒ finite time & small gaussian noise를 사용하여 SDE를 discretizes
- small negative time step $\Delta t \approx0$ / t = T 초기값 / $t\approx0$까지 반복
- $z_t \sim N(0,I)$

SDEs 논문 - Euler-Maruyama method와 비슷한 reverse diffusion solver 제공
- more tailored for solving reverse-time SDES
이후 다른 논문 - Adaptive step-size SDE solvers
- “Gotta Go Fast When Generating Data with Score-Based Models”
- 좋은 quality & sampling 속도 향상

2개 special properties of reverse SDE for more flexible sampling methods
- time-dependent score-based model $s_\theta(x,t)$를 통해 $\nabla_xlogp_t(x)$를 estimate
- each marginal dist. $p_t(x)$에서 sampling ⇒ 각 time step에서 독립적으로 sampling 가능

⇒ numerical SDE solvers를 통해 얻은 trajectories를 fine-tuning하기 위해 MCMC 적용할 수 있음
⇒ Predictor-Corrector samplers 제안

Predictor-Corrector samplers
- Predictor - any numerical SDE solver를 사용하여 $x(t+\Delta t)\sim p_{t+\Delta t}(x)$를 예측
- Corrector - Langevin dynamics나 Hamiltonian Monte Carlo와 같은 MCMC procedure를 통해 sample 점진적으로 조정

1. predictor step - $x(t+\Delta t)$를 predict, $\Delta t<0$
2. several corrector steps - $s_\theta(x,t+\Delta t)$에 따라 높은 qulity sample $x(t+\Delta t)\sim p_{t+\Delta t}(x)$가 되도록 조정

Predictor-Corrector methods & better architecutres of score-based models
- ⇒ SOTA sample quality on CIFAR-10
- high dimensinal data에도 scalable ⇒ FFHQ dataset

Probability flow ODE

Langevin MCMC & SDE solvers로 high-quality samples 생성
- but score-based generative models의 정확한 log-likelihood 계산 방법 제공 X

sampler based on ordinary differential equations (ODEs) ⇒ exact likelihood computation 가능

marginal dist. ${p_t(x)}_{t\in[0,T]}$를 바꾸지 않고, any SDE를 ODE로 convert 가능
SDE에 대응되는 ODE ⇒ probability flow ODE
- deterministic 경로를 따름 ⇒ sampling 과정 안정적
- noise없이 sampling할 수 있어서 density 변화에 따른 log likelihood computation 가능
- SDE - stochastic noise $dw$ 포함 ⇒ random 경로 따라가며 sample 생성

ODE trajectories가 SDE보다 noticeably smoother
ODE & SDE는 same set of marginal dist ${p_t(x)}_{t\in[0,T]}$를 sharing하면서 same data dist. ↔ same prior dist.

Several unique advantages
- $\nabla_xlogp_t(x)\approx s_\theta(x,t)$를 사용하여, Nerual ODE 및 Continuous Normalizing Flow의 특성을 가지는 deterministic 경로 제공 ⇒ exact log-likelihood computation
- 해당 ODE는 data dist $p_0(x)$에서 $p_T(x)$로 변환 & fully invertible & SDE와 동일한 marginal dist 가짐
- Change-of-variable formula와 numerical ODE solvers를 통해 $p_T$에서 $p_0$를 compute할 수 있음

Controllable generation for inverse problem solving

Score-based generative models - inverse problems에 특히 suitable
inverse problems = Bayesian inference problems

Forward proess $p(y|x)$
Inverse problem $p(x|y)$
Bayes’ rule
gradients w.r.t x
- estimated score ft. of the uncondtional data dist.: $s_\theta(x)\approx \nabla_xlogp(x)$
- posterior score ft.: $\nabla_xlogp(x|y)$
- known forward process: $p(y|x)$
  - $x_{i+1}$←$x_i+\epsilon(\nabla_xlogp(x_i)+\nabla_xlogp(y|x))+\sqrt{2\epsilon}z_i$ $, i=0,1,...,K$
- ⇒ Langevin-type sampling을 통해 sampling

examples on solving inverse problems
- Class-conditional generation
- Image inpainting
- Image colorization

저작자표시 비영리 변경금지 (새창열림)

'Paper Review > Score-based Model' 카테고리의 다른 글

[논문 리뷰] SDEs: Score-based generative modeling with stochastic differential equations (0)	2025.03.25
[논문 리뷰] NCSN: Generative modeling by estimating gradients of the data distribution (0)	2025.03.25

kongshin's Lab

[개념 설명] Score-based Model

Perturbing data with an SDE

Reversing the SDE for sample generation

Estimating the reverse SDE with score-based models and score matching

How to solve the reverse SDE

Probability flow ODE

Controllable generation for inverse problem solving

'Paper Review > Score-based Model' 카테고리의 다른 글

티스토리툴바

[개념 설명] Score-based Model

Perturbing data with an SDE

Reversing the SDE for sample generation

Estimating the reverse SDE with score-based models and score matching

How to solve the reverse SDE

Probability flow ODE

Controllable generation for inverse problem solving

'Paper Review > Score-based Model' 카테고리의 다른 글

관련글

티스토리툴바