본문 바로가기

VirtualTryon

(5)

CLIPVisionModel Projection, PBE image encoder to SDXL 이식기록 *1 : batch norm, layer norm 차이 : https://yonghyuc.wordpress.com/2020/03/04/batch-norm-vs-layer-norm/*2 : CLIPVisionModel output : CLIPVisionModel의 output은 CLIPVisionTransformer을 받아오는거랑 같다. CLIPVisionModel의 forward 가 self.vision_model(..) 을 return 하기 때문이다. vision_model = CLIPVisionTransformer 이다. ..

Virtual Tryon(개발 아이디어) 1. 일단 img2img의 효과에 대해서 알고싶다. 학습시켰던 controlnet 모델이 모델의 얼굴은 잘 내보내는데, 옷부분이 무너진다. 이 부분에 대해서 img2img를 수행했을때, 어떤 결과가 나오는지 보자. 2. 해봐야 하는 실험 ? > 사람 유지적 측면에서 controlnet을 쓰면 더 좋아 질 순 있지만, 굳이 안써도됨. 그래서 controlnet은 일단 빼고, Image Embedding의 효과에 대해서 좀 더 고민할껀데, 학습시 masked condition을 사용해서 9 channel unet을 학습 시킬려고함 unet 학습, mask를 주고, ( masked = 팔+상체+손 )condition ( image embedding) = 의복 줌. 일부만 학습더보기 ..

clip - ViT & Image projection clip에서 사용하는 vit , Image encoder의 pooled output 을 상세하게 확인한다. # ViT 구조 더보기 이미지 encoder 패치의 크기는 P이다. ViT 14는 각 패치의 크기가 14란 뜻이다. 1개의 패치의 크기는 14x14 이고, 이건 1개의 토큰이 된다. 더보기따라서 이미지 (224 * 224 * 3)를 패치화 한다면, 224 / 14 = 16 => 16*16 * 3 = 768 이 된다. 따라서 1장은 768개의 token을 갖는 문장처럼 다룬다. 텍스트 encoder 텍스트 프롬프트는 77개의 토큰으로 변환되고, 이걸 768 으로 임베딩 한다. 헷갈리지 말자 point1 ) clip 훈련시(contrastive learrning)..

Stable diffusion with LoRA! https://stable-diffusion-art.com/lora/ What are LoRA models and how to use them in AUTOMATIC1111 - Stable Diffusion ArtLoRA models are small Stable Diffusion models that apply tiny changes to standard checkpoint models. They are usually 10 to 100 times smaller than checkpointstable-diffusion-art.com 위 게시물의 한글 번역. 이해내용. LoRA model은 standard checkpoint에 작은 변화를 가할 수 있는 작은 stable diffusion 모델이다. 오리지..

[논문리뷰] There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge 풀번역 3. Technical Approach skipskipTechnical Approach기학습된 modality-specific 교사들이 respective modality space에서 자동차들이 위치한 BB를 예측한다. 이 예측은 single multi-teacher prediction으로 융합되고 audio student network를 학습시키기위한 pseudo라벨로 활용된다. (각 도메인에서의) 상호 보완적인 정보를 효과적으로 활용하기 위해서 MTA(multi-Teacher Alignment loss)를 제안한다. 3장의 나머지에서 우리는 먼저 교사-학습 네트워크의 아키텍처와 교사 pre-trainning 절차를 설명하고, 오디오 학생을 더 잘 초기화하기 위해 제안하는 새로운 핑계 과제(pre..

이전 1 다음

티스토리툴바