딥러닝/생성AI

sd, sdxl 아키텍처

DanGEE 2024. 7. 3. 15:44

https://ostin.tistory.com/231

 

conv_in

 

down_blocks:

    (CrossAttnDownBlock2D:

        ResnetBlock2D

        Transformer2DModel

        ResnetBlock2D

        Transformer2DModel

        Downsample2D

    ) x 3

    DownBlock2D:

        (ResnetBlock2D) x 2

 

mid_blocks:

        ResnetBlock2D

        Transformer2DModel

        ResnetBlock2D

 

up_blocks:

    UpBlock2D:

       (ResnetBlock2D) x 3

        Upsample2D

    (CrossAttnUpBlock2D:

        (ResnetBlock2D

        Transformer2DModel) x 3

        Upsample2D) x 2

    CrossAttnUpBlock2D:

        (ResnetBlock2D

        Transformer2DModel) x 3

 

out


Stable Diffusion XL-v1.0

 

conv_in

 

down_blocks:

    DownBlock2D:

        (ResnetBlock2D) x 2

        Downsample2D

    CrossAttnDownBlock2D:

        (ResnetBlock2D

        Transformer2DModel (BasicTransformerBlock x 2) ) x 2

        Downsample2D

    CrossAttnDownBlock2D:

        (ResnetBlock2D

        Transformer2DModel (BasicTransformerBlock x 10) ) x 2

 

mid_blocks:

        ResnetBlock2D

        Transformer2DModel

        ResnetBlock2D

 

up_blocks:

    CrossAttnUpBlock2D:

        (ResnetBlock2D

        Transformer2DModel (BasicTransformerBlock x 10) ) x 3

        Upsample2D

    CrossAttnUpBlock2D:

        (ResnetBlock2D

        Transformer2DModel (BasicTransformerBlock x 2) ) x 3

        Upsample2D

    UpBlock2D:

       (ResnetBlock2D) x 3

 

out

 

Original Stable diffusion과의 차이점:

  • 최고 해상도의 transformer block 제거
  • 최저 해상도(8x8) 제거
  • SD에서는 모든 Transformer2DModel에 각각 1개의 BasicTransformerBlock을 사용했지만, SDXL에서는 구성이 달라졌다.