[Seq2Seq with Attention for Natural Language Understanding and Generation] Part 4 - 1

2023. 1. 25. 01:22
๐Ÿง‘๐Ÿป‍๐Ÿ’ป์šฉ์–ด ์ •๋ฆฌ

Seq2Seq
Recurrent Neural Networks (RNNs)
Unrolled Illustration

 

Recurrent Neural Networks (RNNs)

  • CNN ๊ณผ ๋”๋ถˆ์–ด ํŠน์ •ํ•œ ํ˜•ํƒœ์˜ Neural Network ์ž…๋‹ˆ๋‹ค.
  • Sequence data์— ํŠนํ™”๋œ ํ˜•ํƒœ๋ฅผ ๋„๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ๋™์ผํ•œ function์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ˜ธ์ถœํ•œ๋‹ค๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๋ ‡๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š” ์ž…๋ ฅ์ด ์ˆœ์ฐจ์ ์œผ๋กœ ๋“ค์–ด์˜จ๋‹ค๊ณ  ํ•˜๋ฉด, ํŠน์ • time step t์—์„œ์˜ ์ž…๋ ฅ ์‹ ํ˜ธ๋ฅผ Xt๋ผ๊ณ  ํ•˜๋ฉด, RNN function ํ˜น์€ nerual network layer๋Š” ํ˜„์žฌ time step์—์„œ์˜ ์ž…๋ ฅ ์‹ ํ˜ธ์™€ ๊ทธ ์ด์ „์˜ time step์—์„œ์˜ ๋™์ผํ•œ RNN function์ด ๊ณ„์‚ฐํ–ˆ๋˜ Hidden state vector์ธ h t - 1์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ,  ํ˜„์žฌ time step์˜ RNN module์˜ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ output์ธ htํ˜น์€ current hidden state vector๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ๋งค time step ๋งˆ๋‹ค, ๋™์ผํ•œ function, ์ฆ‰, ๋™์ผํ•œ parameter set์„ ๊ฐ€์ง€๋Š” Layer๊ฐ€ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
  • ๋˜, prediction์ด ํ•„์š”ํ•œ time step์˜ ๊ฒฝ์šฐ์—๋Š” RNN์˜ ht๋ฅผ ๋‹ค์‹œ ์ž…๋ ฅ์œผ๋กœ output layer์— ์ „๋‹ฌํ•ด ์คŒ์œผ๋กœ์จ ์ตœ์ข… ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ถœ์ฒ˜ : https://www.yuthon.com/post/tutorials/notes-for-cs231n-rnn/

 

  • RNN์˜ Unrolled Illustration ๊ณผ์ •์„ ์‚ดํŽด๋ด…์‹œ๋‹ค.
  • ์—ฌ๋Ÿฌ time step์— ๊ฑธ์ณ์„œ ํŽผ์ณ๋†“์€ ์ด๋Ÿฐ Unrolled ๋ฒ„์ „์˜ ๊ทธ๋ฆผ์ด ์•„๋ž˜์™€ ๊ฐ™์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜ : https://www.researchgate.net/figure/An-unrolled-recurrent-neural-network-Example-borrowed-from-Olah-2015-13_fig1_304470393

 

 

  • ์ž„์˜์˜ t์— ๋Œ€ํ•ด์„œ N์ฐจ์› vector๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ fully-connected layer๋ฅผ ํ†ต๊ณผํ•˜์—ฌ output vector๋ฅผ ๋‚ด๋ถ€์ ์ธ ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ๋กœ์จ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
  • RNN์˜ ๋˜ ๋‹ค๋ฅธ ์ž…๋ ฅ์ธ Hidden state vector ์˜€๋˜ h2๋ฅผ ๋˜ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ h2์˜ ์ž…๋ ฅ Vector์˜ dimension์ด ๊ฐ€๋ น 2์ฐจ์›์ด๋ผ๊ณ  ํ•œ๋‹ค๋ฉด, ์ด 2์ฐจ์› ์ž…๋ ฅ vector๋ฅผ ๋˜ fully-connected layer๋ฅผ ํ†ต๊ณผ์‹œ์ผœ ํŠน์ • vector๋กœ ๋งŒ๋“ค์–ด์ฃผ๊ณ  ๋‘ ๊ฐœ์˜ vector๋ฅผ ์ด๋ ‡๊ฒŒ ํ•ฉ์‚ฐํ•ด์„œ ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ์„ ๊ณ„์‚ฐํ•ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ์„ RNN์—์„œ ๊ฐ€์žฅ ์ ์ ˆํ•œ ํ™œ์„ฑ ํ•จ์ˆ˜์ธ tan h๋ฅผ ์จ์„œ ์ตœ์ข…์ ์œผ๋กœ ํ˜„์žฌ time step์˜ ๊ฒฐ๊ณผ vector๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๊ฒŒ ๋˜๊ณ , ๊ณผ์ •์—์„œ previous time step์˜ Hidden state vector์˜ ์ฐจ์›๊ณผ current time step์˜ Hidden state vector ์ฐจ์›์ด ๊ฐ™์•„์•ผ ๊ทธ ๋‹ค์Œ time step์— ์ด์ „ time step hidden state vector ์ž๋ฆฌ์— ๋“ค์–ด๊ฐˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ์˜ˆ์ธก์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋Š”, RNN์˜ ์ตœ์ข… Output์„ ์ž…๋ ฅ Vector๋กœ, ๋˜ ๋‹ค๋ฅธ ์„ ํ˜• ๋ณ€ํ™˜์„ ํ†ตํ•ด ์ตœ์ข… output vector๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๊ฒŒ๋ฉ๋‹ˆ๋‹ค.
  • ์–ด๋–ค Multi-class classification์˜ Output์ธ ๊ฒฝ์šฐ์—๋Š” ์ตœ์ข… Output vector์— softmax layer๋ฅผ ๊ฑฐ๋Š” ํ˜•ํƒœ๋กœ Output vector๋ฅผ ํ™•๋ฅ  ๋ถ„ํฌ์— ํ•ด๋‹นํ•˜๋Š” vector๋กœ ๋ณ€ํ™˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ์ด๊ฒŒ regression task์ธ ๊ฒฝ์šฐ์—๋Š” ์ตœ์ข… ์‹ค์ˆ˜ ๊ฐ’์„ ์ตœ์ข… ์˜ˆ์ธก output์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋Š” ํ˜•ํƒœ๋ฅผ ๋„๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 


  • one-to-one

time step์˜ ๊ฐœ๋…์ด ์ „ํ˜€ ์—†์—ˆ๋˜, ํ•œ ๋ฒˆ์—, ํ•˜๋‚˜์˜ Data item์„ ๋…๋ฆฝ์ ์œผ๋กœ ๊ทธ๋•Œ๊ทธ๋•Œ ๋ฐ›์•„์„œ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋‚ด์–ด์ฃผ๋Š”, time step์ด 1์ผ ๋•Œ๋งŒ ์ž…๋ ฅ์ด ์ฃผ์–ด์ง€๊ณ , ๋ฐ”๋กœ ์ถœ๋ ฅ์ด ๋‚˜์˜ค๋Š” ํ˜•ํƒœ๋ฅผ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ˜•ํƒœ๋ฅผ One-to-one ํ˜•ํƒœ๋กœ ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

-> Vanilla Neural Networks

 


  • one-to-many

์ž…๋ ฅ์€ sequence data๊ฐ€ ์•„๋‹ˆ๋ผ time step์ด 1์ผ ๋•Œ๋งŒ ์ž…๋ ฅ์ด ์ฃผ์–ด์ง€๊ณ , ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฌผ์€ ์—ฌ๋Ÿฌ time step์— ๊ฑธ์ณ์„œ ์ˆœ์ฐจ์ ์ธ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋˜๋Š” ์ด๋Ÿฐ ์ž…๋ ฅ์ด ํ•˜๋‚˜๊ณ , ์ถœ๋ ฅ์€ ์—ฌ๋Ÿฌ ๊ฐœ๋กœ ์ด๋ฃจ์–ด์ง„ sequence data๋ฅผ ์ƒ์„ฑํ•ด ์ฃผ๋Š” ๊ทธ๋Ÿฐ Task๋ฅผ one-to-many ์˜ ํ˜•ํƒœ๋กœ ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

-> Image Captioning (image -> sequence of words)

 

- RNN module์€ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฌผ๋กœ์„œ ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๊ฐ๊ฐ์˜ ์›Œ๋“œ๋“ค์„ ํŠน์ • Sequence๋กœ ์˜ˆ์ธกํ•ด ์ฃผ๋Š” ํ˜•ํƒœ๋ฅผ ๋„๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

์ด RNN task์—์„œ ๋‹ค์Œ time step์˜ ์ž…๋ ฅ์œผ๋กœ๋Š” 0์œผ๋กœ ์ฑ„์›Œ์ง„ vector๋ฅผ ์ค๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๊ฐ time step์—์„œ ์–ด๋–ค ์›Œ๋“œ๋“ค์˜ Sequence ๋กœ์„œ์˜ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


  • many-to-one

์—ฌ๋Ÿฌ ์ž…๋ ฅ์— ๋Œ€ํ•ด, ์—ฌ๋Ÿฌ time step์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ time step์— ๋Œ€ํ•œ ์ถœ๋ ฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

 

-> Sentiment Classification (Sequence of words -> sentiment)

๋ฌธ์žฅ์„ ์ฝ๊ณ  ๊ฐ’์ด "positive" or "negative"์ธ์ง€๋ฅผ ํŒ๋ณ„ํ•ด์คŒ.


  • many-to-many

 

์—ฌ๋Ÿฌ ์ž…๋ ฅ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ์ถœ๋ ฅ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™€ ์—ฌ๋Ÿฌ time step์„ ํ•„์š”๋กœ ํ•จ.

 

-> Machine Translation (Sequence of words -> sequence of words)
์˜์–ด -> ํ•œ๊ธ€ ๋ฒˆ์—ญ

 


  • many-to-many

์œ„์˜ ๋ณ€์ข…์œผ๋กœ ๋ฐ”๋กœ๋ฐ”๋กœ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ์— ์‚ฌ์šฉ.

 

-> Video Classification on Frame Level

delay๋ฅผ ํ—ˆ์šฉํ•˜์ง€ ์•Š๊ณ , ๋งค time step ๋งˆ๋‹ค ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

'Artificial Intelligence' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Transformer] Part 5  (1) 2023.01.25
[Seq2Seq with Attention for Natural Language Understanding and Generation] Part 4 - 2  (0) 2023.01.25
[Ensemble Learning] part 6 - 2  (0) 2023.01.20
[Ensemble Learning] part 6 - 1  (0) 2023.01.20
[Advanced Classification] part 5 - 3  (0) 2023.01.19

BELATED ARTICLES

more