[Deep Learning] Deep Generative Models(1) - VAE

2023. 5. 29. 17:10
๐Ÿง‘๐Ÿป‍๐Ÿ’ป์šฉ์–ด ์ •๋ฆฌ
Deep Generative Models
Generative Model
Discriminative Model
pre-training
fine-tunning
DBN
VAE
encoder
autoencoder
reparametrization trick

 

 

์ด๋ฒˆ์—๋Š” Deep Generative Models์ด๋ผ ๋ถˆ๋ฆฌ๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์— ๋Œ€ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

Deep Generative Models

์ด Generative Model์€ ์ƒ์„ฑ ๋ชจ๋ธ, '์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ'์ž…๋‹ˆ๋‹ค.

 

์ฆ‰, data๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

 

์šฐ๋ฆฌ๋Š” ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ฐ€์ง€๋Š” ํ™•๋ฅ  ๋ถ„ํฌ์— ์˜ํ•ด data๋ฅผ ๋ฌดํ•œํžˆ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์œผ๋กœ๋ถ€ํ„ฐ sampling, generation ๋“ฑ์œผ๋กœ ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

 

๋งŒ์•ฝ ์šฐ๋ฆฌ๊ฐ€ N(0, 1)๊ณผ ๊ฐ™์€ ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด, ์ด๊ฒƒ์œผ๋กœ๋ถ€ํ„ฐ ํ™•๋ฅ ์ ์œผ๋กœ data๋ฅผ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ model์„ generative model์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

๊ณผ๊ฑฐ์—๋Š”, ์ผ๋ฐ˜์ ์œผ๋กœ ์ด generative model์˜ output ์ž์ฒด๊ฐ€ probability distribution ์œผ๋กœ ๋‚˜์˜ค๋Š” ๊ฒƒ์œผ๋กœ ๋งŽ์ด ์ด์•ผ๊ธฐ ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

๋ชจ๋ธ์˜ output ์ž์ฒด๊ฐ€ ํ™•๋ฅ  ๋ถ„ํฌ์˜€๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด์ฃ .

 

 

๊ทธ๋ž˜์„œ true- data-generating distribution์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

 

๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋””๋กœ๋ถ€ํ„ฐ ์™”๋Š”์ง€,

 

๋ฐ์ดํ„ฐ์˜ underlying func.์„ ์ž˜ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

 

ํ•™์Šต์ด ์ž˜ ๋˜์–ด, ์ด ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ํŠน์ • ๋ถ„ํฌ๋ฅผ ์ž˜ ๋”ฐ๋ฅธ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ,

 

๊ฒฐ๊ตญ, true data-generating distribution์„ ์•ˆ๋‹ค๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋“  ๊ฒƒ์„ ์šฐ๋ฆฌ๊ฐ€ ์ž˜ ์ดํ•ดํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด ๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค์„ ๊ณ„์†ํ•ด์„œ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

์ด๊ฒƒ์— ๋ฐ˜๋Œ€๋˜๋Š” ๊ฒƒ์ด Discriminative Model์ž…๋‹ˆ๋‹ค.

 

์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๊ฐ€ ๋ฐฐ์›Œ์˜จ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒƒ๋“ค์ด์ฃ .

 

discriminant func.์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” model๋กœ, ์ด๊ฒƒ์€ ํŒ๋ณ„ ๋ชจ๋ธ์ด๋ผ๊ณ ๋„ ํ•ฉ๋‹ˆ๋‹ค.

 

์ฆ‰, ํŒ๋ณ„์‹์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๋ชจ๋ธ๋กœ, ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ decision hyperplane ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

๊ฒฐ๊ตญ generative model์€

 

์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ํ™•๋ฅ  ๋ถ„ํฌ์˜ ๋ชจ์–‘์ธ ๊ทธ๋Ÿฌํ•œ output์ด ๋‚˜์™€์ฃผ์–ด์•ผ ํ•˜๋Š” model๋กœ ํŠน๋ณ„ํ•˜์ง€๋งŒ ๊ต‰์žฅํžˆ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ ์—ฌ๊ธฐ์„œ ๋ชจ๋ธ์˜ output์ด ํ™•๋ฅ  ๋ถ„ํฌ ํ˜•ํƒœ๋ฅผ ๋„์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด GAN model์„ ํ†ตํ•ด์„œ ๊ฐ€์ •์ด ๋ฌด๋„ˆ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋‚˜์ค‘์— ๋ง์”€๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

 

 

Deep Belief Networks (DBN)

NN์œผ๋กœ๋ถ€ํ„ฐ depth๋ฅผ ํ™•์žฅ ์‹œํ‚ค๋Š” ํŠธ๋ Œ๋“œ๊ฐ€ ์ƒ๊ฒจ๋‚ฌ์„ ๋•Œ,

 

vanishing gradient๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด pre-trained ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์‚ฌ์ „์— ๋ชจ๋ธ์„ ์ข€ ํ•™์Šต์‹œ์ผœ ๋†“์€ ๋’ค,

 

์šฐ๋ฆฌ์˜ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ fine-tunning์„ ํ•จ์œผ๋กœ์จ vanishing gradient๋ฅผ ์—†์• ๋ณด์ž๋Š” ์•„์ด๋””์–ด๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๊ฒƒ์ด ์ด DBN์ด ๋ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

์ด๊ฒƒ์€ ๊ทธ๋ฆผ์˜ ์šฐ์ธก ์ƒ๋‹จ์— ๋‚˜์™€์žˆ๋Š” Restricted Bolzmann Machine (RBM)์ด๋ผ๋Š” ๊ฒƒ์„ ํ†ตํ•ด pre-training์„ ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์€ perceptron์ด๋ž‘ ํฌ๊ฒŒ ๋‹ฌ๋ผ๋ณด์ด์ง„ ์•Š์Šต๋‹ˆ๋‹ค.

 

๋จผ์ €, visibleํ•˜๋‹ค๊ณ  ํ•˜์—ฌ input์„ v๋กœ ํ‘œํ˜„ํ•˜์˜€๊ณ , ์œ„๋Š” hidden์ด ๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์ด๊ฒƒ์€ ๋ฐ”๋กœ ๋ฐ ํ•˜๋‹จ์˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ fitting ์‹œํ‚ค๋Š” ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค.

 

์ด ํ™•๋ฅ  ๋ถ„ํฌ๋Š” energy func.๋ฅผ ํฌํ•จํ•˜๋Š”๋ฐ, ๊ทธ ์ˆ˜์‹์€ ์ˆ˜์‹ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

์ด energy func.์ด ๋‚ฎ์œผ๋ฉด ๋‚ฎ์„ ์ˆ˜๋ก ์ข‹์Šต๋‹ˆ๋‹ค. ๋‚ฎ์œผ๋ฉด ๋‚ฎ์„ ์ˆ˜๋ก ์ˆ˜์‹์— ๋‚ฎ๊ฒŒ ๋“ค์–ด๊ฐ€ exp. ๊ฐ’์ด ์ปค์ง€์ฃ .

 

๊ทธ๋ž˜์„œ probability๊ฐ€ ๋†’๊ฒŒ ๋˜์–ด ์•ˆ์ •์ ์ธ ์ƒํƒœ๋ผ๊ณ  fitting์„ ํ•˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

 

 

๊ฒฐ๊ตญ ์œ„ energy func.์˜ ์ˆ˜์‹์—์„œ, visible๊ณผ hidden์˜ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์œผ๋ฉด ๋†’์„์ˆ˜๋ก energy func.์ด ๋งˆ์ด๋„ˆ์Šค๋กœ ๋‚ฎ์€ ๊ฐ’์„ ๊ฐ€์ง€๊ณ ,

 

์œ„ ํ™•๋ฅ ๊ฐ’์ด ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ปค์ง€๋ฉฐ, ์ฆ‰, visible layer์˜ ํ–‰๋™ ํ˜•ํƒœ์™€ hidden layer์˜ ํ–‰๋™ ํ˜•ํƒœ๊ฐ€ ์œ ์‚ฌํ•˜๋ฉด ์œ ์‚ฌํ• ์ˆ˜๋ก ์ด model์ด ์„ฑ๊ณต์ ์œผ๋กœ ํ•™์Šต์ด ๋˜์—ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

๊ทธ๋ž˜์„œ ์šฐ์ธก ๋งจ ๋ฐ‘ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ,

 

์ด Restricted Bolzmann machine์œผ๋กœ layer ์—์„œ pre-training์„ ๋จผ์ €ํ•˜๊ณ ,

 

๋งˆ์ง€๋ง‰์— ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ค๋ฉด, ์ญ‰ fine-tunning๋งŒ ํ•˜๊ณ  ๊ฐ€์ž๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์œ„ ๊นŠ์€ layer์—์„œ vanishing gradient๋กœ ์ธํ•ด ํ•™์Šต์ด ์ž˜ ์•ˆ ๋˜๋˜ ๊ฒƒ์ด,

 

์‚ฌ์ „์— ๋ฏธ๋ฆฌ ๋งž์ถฐ ๋‘์–ด, ๊ทธ๋ ‡๊ฒŒ ํ•™์Šต์ด ์ž˜ ๋˜์–ด์žˆ์ง„ ์•Š๋”๋ผ๋„ ์–ด๋Š ์ •๋„๋Š” ๊ดœ์ฐฎ์•„์ง„๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ์ด pre-training์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๊ฐ layer ๋ผ๋ฆฌ ํ–‰๋™์˜ ์œ ์‚ฌ๋„๋ฅผ ์ตœ๋Œ€ํ™” ํ•˜๊ณ  ์‹ถ์–ดํ•˜๋Š”,

 

Restricted Bolzmann machines์„ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

NN์ด ์—ญ์‚ฌ์ ์œผ๋กœ ๋ช‡ ๋ฒˆ ์”ฉ ์ฃฝ์–ด์žˆ๋‹ค๊ฐ€,

 

2006๋…„์— ์ด DBN์œผ๋กœ SVM์„ ํ•œ ๋ฒˆ ์ด๊ธฐ๊ณ ,

 

2011๋…„์— ReLU๊ฐ€ ๋‚˜์˜ค๋ฉฐ Vanishing gradient๊ฐ€ ๋งŽ์ด ํ•ด๊ฒฐ๋˜์–ด ์ญ‰ ์ด๊ธฐ๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ž…๋‹ˆ๋‹ค.

 

์ด DBN์€ deep learning ์—ญ์‚ฌ์ƒ ํฐ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

 

 

๊ทธ๋Ÿฐ๋ฐ ์ด DBN์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ ์žˆ์–ด, RBM์„ ์‚ฌ์šฉํ•ด์•ผํ•˜๋Š” ๊ฒŒ ๋„ˆ๋ฌด๋‚˜๋„ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์ตœ๊ทผ์—๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์„ ๋‹ค ํ™•๋ฅ ๋กœ fitting์„ ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‹œ๊ฐ„์ด ๋งค์šฐ ์˜ค๋ž˜๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.

 

 

์—ญ์‚ฌ์ ์ธ ๊ด€์ ์—์„œ, ์—ฌ๊ธฐ์„œ vanishing gradient๋ฅผ ์กฐ๊ธˆ ํ•ด๊ฒฐํ•˜๋ฉด ์ด depth๋ฅผ ํฌ๊ฒŒ ๊ฐ€์ ธ๊ฐ„ NN์ด SVM๋„ ์ด๊ธด๋‹ค๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ๋ณธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ insight๋ฅผ ์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ง€๊ธˆ์˜ ๋”ฅ๋Ÿฌ๋‹์ด ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์–ด์งธํŠผ ๊ฒฐ๊ตญ ์ด DBN์€ unsupervised pre-training์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

y ๊ฐ’ ์—†์ด ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ง€๊ณ  unsupervised pre-training์„ ํ•ฉ๋‹ˆ๋‹ค.

 

 

Variational Auto-Encoder

๊ทธ ๋‹ค์Œ ๋ง์”€ ๋“œ๋ฆด ๋ชจ๋ธ์€ VAE์ž…๋‹ˆ๋‹ค.

 

 

 

์ด ๋ชจ๋ธ์€ auto encoder์˜ ๊ตฌ์กฐ๋ฅผ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, auto encoder์—๋Š” latent vector๊ฐ€ ๋‚˜์˜ค์ฃ .

 

latent vector๋Š” code๋ผ๊ณ ๋„ ๋ถ€๋ฅด๋Š” ์ˆจ๊ฒจ์ง„, ํ•จ์˜์˜ vector์ด๋ฉฐ ํ•ต์‹ฌ์„ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

๊ทธ๋Ÿฐ๋ฐ ์ผ๋ฐ˜์ ์ธ auto encoder ๊ตฌ์กฐ์™€ ๋‹ฌ๋ฆฌ latent vector๋กœ๋ถ€ํ„ฐ representation์„ ๋ฝ‘์•„๋‚ด๋Š” ๊ฒƒ์ด ์•„๋‹Œ,

 

ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๊ฐ€์ •ํ•˜๊ณ , ํ™•๋ฅ  ๋ถ„ํฌ์˜ parameter ๊ฐ’์„ ์ฐพ์•„๋ƒ…๋‹ˆ๋‹ค.

 

์œ„ ๊ทธ๋ฆผ์œผ๋กœ ๋ดค์„ ๋•Œ, mean variance๊ฐ€ 2์„ธํŠธ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์€ latent vector์˜ latent dimension์ด 2๋ผ๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ๊ณง,

 

- Z1 ~ N (µ_1, ∂^2_1)

- Z2 ~ N (µ_2, ∂^2_2) ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

 

auto encoder์˜ ๊ฒฝ์šฐ, ์šฐ๋ฆฌ๊ฐ€ ์ถ•์•ฝํ•˜๊ณ  ์‹ถ์€ ์ •๋ณด์˜ ํ˜•ํƒœ๋Š” ์ •๋ง ์ •๋ณด ๊ทธ ์ž์ฒด์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, VAE๋Š” ์ •๋ณด๋ฅผ ์ถ•์•ฝํ•ด์„œ code์— ๋ชจ์œผ๊ณ  ์‹ถ์€๊ฒƒ์ด latent์˜ distribution์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ๋ชจ๋ธ์ด ์ž˜ ํ•™์Šต๋˜์–ด mean, variance vector๊ฐ€ ์ž˜ ์ถ”์ •์ด ๋˜์—ˆ๋‹ค๋ฉด,

 

decoder๋ถ€๋ถ„์—์„œ๋Š” z1, z2 ๋ผ๋Š” ๋‘ ๊ฐœ์˜ latent vector๋ฅผ ๋ฌดํ•œํžˆ ๋งŽ์ด ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ํ•™์Šต์ด ์ž˜ ๋˜์–ด ์žˆ๋‹ค๋ฉด, ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค์†Œ ๋ถ€์กฑํ•˜๋”๋ผ๋„ ์ž˜ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  latent์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ฒฐ๊ตญ ์šฐ๋ฆฌ๋Š” latent representation์„ ์ฐพ๋Š” ๊ฒƒ์ด์ฃ .

 

autoencoder๋Š” ํ•จ์ถ•๋œ ์ •๋ณด ๊ฐ’๋งŒ ์•Œ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•˜๋ฉด, VAE๋Š” latent space ์ž์ฒด๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ฆ‰, ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

์‚ฌ์‹ค ํ•™์Šต์ด ์ž˜ ๋œ๋‹ค๋ฉด,

 

encoder ๋ถ€๋ถ„์„ ๋ฒ„๋ฆฌ๊ณ  decoder ๋ถ€๋ถ„์œผ๋กœ๋ถ€ํ„ฐ sample์„ ๋ฌดํ•œํžˆ ๋งŒ๋“ค์–ด๋‚ด์–ด,

 

latent vector ์ž์ฒด๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ•จ์ถ•ํ•˜๊ณ  ์žˆ์œผ๋‹ˆ,

 

๊ทธ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌดํ•œํžˆ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ด 2014๋…„์— ๋‚˜์˜จ VAE๋Š” ์ž˜๋งŒ ๋œ๋‹ค๋ฉด ๊ต‰์žฅํžˆ ํŒŒ์›Œํ’€ํ•ฉ๋‹ˆ๋‹ค.

 

 

์ด๊ฒƒ์ด autoencoder์˜ ๊ธฐ๋ณธ์ ์ธ ํ˜•ํƒœ๋ผ๋ฉด 28 x 28์— ๋Œ€ํ•ด 700๋ช‡ ๊ฐœ๋ฅผ 10๊ฐœ๋กœ ์ค„์ด๊ณ  ๋‹ค์‹œ representation์„ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

๊ทธ๋Ÿฐ๋ฐ, VAE๋ผ๋ฉด,

์œ„์™€ ๊ฐ™์ด mean๊ณผ sigma๋ฅผ µ_1, ∂^2_1 ~ µ_10, ∂^2_10์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ , autoencoder์˜ ์ „์ฒด ๋ชจ๋ธ์„ ๊ฐ€์ง€๊ณ  ์ถ”์ •์ด ์ž˜ ๋œ๋‹ค๋ฉด,

 

์ด sampled latent vector๋Š” ๋ฌดํ•œํžˆ ๋งŽ์ด ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ encoder๋ฅผ ๋ฒ„๋ฆฌ๊ณ  ๊ทธ ๋’ค์— ์šฐ๋ฆฌ์˜ original data๋ฅผ ๊ต‰์žฅํžˆ ์ž˜ ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋Š” latent vector๋กœ MLP, CNN, SVM ๋“ฑ์˜ task๋“ค์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

 

 

์ด VAE๋Š” ํ™•๋ฅ  ๋ถ„ํฌ ์ถ”์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด VAE์˜ ๋‹จ์ ์ด์ฃ .

 

MCMC, Gibbs Sampling, MLE, .... ๋“ฑ ์—„์ฒญ๋‚œ heavyํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

 

๋ชจ๋“  ๋ชจ๋ธ์ด ํ™•๋ฅ  ๋ถ„ํฌ ์ถ”์ •์ด ๋“ค์–ด๊ฐ€๋ฉด ๊ต‰์žฅํžˆ heavyํ•ด์ง„๋‹ค๊ณ  ๋ณด๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์— ๋น„ํ•˜๋ฉด Backpropagation์€ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•œ ๊ฒƒ์ด์ฃ .

 

๊ทธ๋Ÿฐ๋ฐ ํ™•๋ฅ  ๋ถ„ํฌ ์ถ”์ •์„ ์œ„ํ•ด์„œ๋Š” sampling์„ ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ•˜๋Š”๊ฐ€๋ถ€ํ„ฐ ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์ด์ฃ .

 

์ถœ์ฒ˜ : https://theclevermachine.wordpress.com/2012/11/05/mcmc-the-gibbs-sampler/

์ˆ˜๋งŽ์€ ์ฐจ์›์˜ multi-variable์„ ๋ชจ๋“  space๋ฅผ ์ปค๋ฒ„ํ•˜๋„๋ก samplingํ•ด๋‚ด๋Š” ๊ฒƒ์€ ๊ต‰์žฅํžˆ ์–ด๋ ค์šด ์ผ์ด์ฃ .

 

๊ทธ๋ž˜์„œ ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์—์„œ๋Š” random sample ํ•˜๋‚˜ ๋ฝ‘์•„๋‚ด๊ธฐ๋„ ์–ด๋ ค์›Œ์ง‘๋‹ˆ๋‹ค.

 

random samples๋“ค์„ ๋ฝ‘์•„์„œ ์ „์ฒด space๋ฅผ ๋‹ค ์„ค๋ช…ํ•ด๋‚ด์•ผ ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์–ด๋ ค์šด ์ผ์ž…๋‹ˆ๋‹ค.

 

 

Reparameterization Trick

 

 

 

์—ฌ๊ธฐ์„œ ๋ณด๋ฉด ์™ผ์ชฝ์— ์žˆ๋Š” ๊ฒƒ์ด VAE์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

 

VAE input์ด ๋“ค์–ด๊ฐ€๊ณ , Encoder ๊ตฌ์กฐ์—์„œ mean, variance๋ฅผ ๋ฝ‘์•„๋‚ด๊ณ , ๊ทธ๊ฒƒ์„ ํ†ตํ•ด ๋‹ค์‹œ sampling, ๊ทธ๋ฆฌ๊ณ  decoder ๊ตฌ์กฐ์—์„œ reconstruction์„ ๋ฝ‘์•„๋‚ด๊ณ  Reconstruction error๊นŒ์ง€ ๋ฝ‘์•„๋ƒ…๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์„ ๊ณ„์‚ฐํ•ด์•ผ backpropagation์ด ๋Œ๊ณ  ๋Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, mean๊ณผ variance ๊ฐ€์ง€๊ณ  ์œ„ ๊ทธ๋ฆผ์˜ ํ™”์‚ดํ‘œ๋ฅผ ํ†ตํ•ด samplingํ•œ ๊ฒƒ์— ๋Œ€ํ•ด ์–ด๋–ป๊ฒŒ backpropagationํ•  ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•œ ๊ณ ๋ฏผ์ด ์ƒ๊น๋‹ˆ๋‹ค.

 

๋งค์šฐ ๋‚œํ•ดํ•ฉ๋‹ˆ๋‹ค.

 

 

๊ทธ๋ž˜์„œ ์˜ค๋ฅธ์ชฝ์ด practical VAE, ์ฆ‰, ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•˜๋Š” VAE์ž…๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์€ mean, variance๋ฅผ ๊ทธ๋ƒฅ ์ •์ ์ธ ๊ฐ’์„ ๊ฐ€์ง€๊ณ  ๊ฐ€์„œ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  sample์€ noise๋ฅผ ํ•˜์—ฌ์„œ ๊ทธ๋ƒฅ ๋”ํ•ด์ฃผ์–ด์„œ backpropagation์ด ํ๋ฅด๋„๋ก ๋งŒ๋“ค์–ด์ฃผ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

๋งŒ์ผ, N(0, 1) -> 0.1์œผ๋กœ samplingํ–ˆ๋‹ค๋ฉด, ์ด sampling์ด๋ผ๋Š” precess ๋•Œ๋ฌธ์— Backpropagation์— ์ œ์•ฝ์ด ์ƒ๊น๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌํ•˜์—ฌ, 



mean, variance 0, 1์— ๋Œ€ํ•ด์„œ, 0 + 1 * (Noise)์˜ ํ˜•ํƒœ๋กœ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ด noise๊ฐ€ sampling์ด์ฃ .

 

๊ทธ๋ž˜์„œ gradient chain ๋‚ด์—์„œ sampling ๊ณผ์ •์„ ๋นผ๋ฒ„๋ ธ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์‹ค์ œ๋กœ VAE๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด Reparameterization Trick์„ ํ†ตํ•ด Backpropagation์ด ๊ตฌํ˜„๋œ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 



 

BELATED ARTICLES

more