[NLP] RNN - LSTM, GRU

2023. 4. 4. 15:05
๐Ÿง‘๐Ÿป‍๐Ÿ’ป์šฉ์–ด ์ •๋ฆฌ

Neural Networks
RNN
LSTM
Attention

 

 

 

RNN์€ sequence ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜๊ณ ์ž ํ•˜๋Š”๋ฐ,

 

๋’ค๋กœ ๊ฐˆ์ˆ˜๋ก ์•ž์— ํ•™์Šตํ•œ ๊ฒƒ๋“ค์„ ์ž˜ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” "Long Term Dependency" ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ LSTM๊ณผ GRU๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

 

๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ดํŽด๋ด…์‹œ๋‹ค.

 

Long Short-Term Memory (LSTM)

 

 

 

  • Capable of learning long-term dependencies.
    • LSTM networks introduce a new structure called a memory cell.
      • An LSTM can learn to bridge time intervals in excess of 1000 steps.
    • Gate units that learn to open and close access to the past
      • input gate
      • forget gate
      • output gate
      • neuron with a self-recurrent

 

 

 

 

์ž ์—ฌ๊ธฐ์„œ ๋” ์‚ดํŽด๋ด…์‹œ๋‹ค.

 

์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋“ค์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

์—ฌ๊ธฐ์„œ ๊ณ„์‚ฐ๋œ ๊ฒƒ์ด, ํ˜„์žฌ ๋‹จ์–ด๋ณด๋‹ค ๋” ์ค‘์š”ํ•œ, ๋” ๋งŽ์ด ๋ฐ˜์˜ํ•ด์•ผํ•˜๋Š” ๋ถ€๋ถ„์ด ์žˆ๋‹ค๋ฉด ๊ทธ๊ฒƒ์„ ๋” ๋ฐ˜์˜ํ•ด๋ณด์ž !

 

๋ผ๋Š” ์•„์ด๋””์–ด์—์„œ gate๋“ค์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

  • input gate๋Š”
    • ์ƒˆ๋กœ์šด input์ด ๋“ค์–ด์™”์„ ๋•Œ, ์ง€๊ธˆ์˜ ๊ฐ’์„ ์–ผ๋งˆ๋‚˜ ๋ฐ˜์˜ํ•ด์•ผํ•ด?์— ๋Œ€ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ , forget gate๋Š”
    • s t-1๋ฅผ ๊ทธ๋Œ€๋กœ ๋ฐ˜์˜ํ• ์ง€, ์•„๋‹ˆ๋ฉด ์–ด๋Š์ •๋„๋งŒ ๋ฐ˜์˜ํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•ด๋ณด์ž!์ž…๋‹ˆ๋‹ค. 
  • output gate๋Š”
    • ์ตœ์ข…์ ์œผ๋กœ ๊ณ„์‚ฐ๋œ g ๊ฐ’์„ ์–ผ๋งˆ๋‚˜ ์‚ฌ์šฉํ• ์ง€์ž…๋‹ˆ๋‹ค.
  •  

 

 

๊ทธ๋ฆฌ๊ณ , RNN๊ณผ ์ฐจ์ด์ ์€ 

 

internal memory์ธ c t์™€ hidden state์ธ s t๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋‘˜ ๋‹ค ๊ฐ™์€ hidden state์ด์ง€๋งŒ, s t๋Š” ์žŠ์„ ๊ฑฐ๋Š” ์žŠ๊ณ  ์ƒˆ๋กœ์šด ๊ฒƒ์„ ์–ป์„ ๊ฑฐ๋ฉด ์–ป์–ด๋ผ. ๋ณต์žกํ•œ ์—ฐ์‚ฐ.

 

c t๋Š” ๊ณผ๊ฑฐ๋ฅผ ์ž˜ ๊ธฐ์–ตํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ์ˆœํ•œ ์—ฐ์‚ฐ.

 

 

RNN -> ์ž…๋ ฅ์ด ๋“ค์–ด์˜ค๋ฉด ์ด์ „ hidden state๋ฅผ ๊ฐ€์ง€๊ณ  ๊ฐ™์ด ๊ณ„์‚ฐ์ด ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

LSTM => g ๊ตฌํ•˜๋Š” ๊ฒƒ๊นŒ์ง„ ๊ฐ™์ง€๋งŒ, ์ด๊ฒƒ์ด hidden state ๊ตฌํ•˜๋Š” ์ค‘๊ฐ„๊ณผ์ • ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

 

์—ฌ๊ธฐ์„œ g t๋ฅผ ๊ตฌํ•ด๋†“๊ณ , ์ด๊ฒƒ์„ c t -1๊ณผ ๋”ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•˜๊ฒŒ ๋”ํ•˜๊ธฐ๋งŒ ํ•˜์—ฌ ์ญ‰ ๊ทธ๋Œ€๋กœ c t+1 ๊ณ„์‚ฐ ๋  ๋•Œ๋„ ์ด๊ฒƒ์ฒ˜๋Ÿผ ๋”ํ•˜๊ธฐ ์—ฐ์‚ฐ๋งŒ ๋‹จ์ˆœํ•˜๊ฒŒ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, ์—ฌ๊ธฐ์„œ ์กฐ๊ธˆ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

 

 

 

 

 

์–ผ๋งˆ๋‚˜ ์žŠ์„ ๊ฒƒ์ธ์ง€๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” f t๊ฐ€ ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, g t๋Š” hidden state์ด๊ณ , i, f, o ๋Š” sigmoid๋กœ ๊ณ„์‚ฐ๋œ ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์—, -1 ~ 1 ์‚ฌ์ด์˜ ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.

 

๋‹ค์‹œ ๊ทธ๋ฆผ์œผ๋กœ ๋Œ์•„์™€์„œ,

 

์ด ๊ฐ’ f์™€ ์ด์ „ hidden state์ธ c t-1๊ณผ ๊ณฑํ•˜์—ฌ ๋”ํ•˜์—ฌ ์ค๋‹ˆ๋‹ค.

 

๊ทธ๋ ‡๋‹ค๋ฉด, f t์˜ ๊ฐ’์ด 0~ 1 ์‚ฌ์ด์˜ ๊ฐ’์ด๋ผ๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜๋ฏธ๋ฅผ ๊ฐ–์Šต๋‹ˆ๋‹ค.

 

  • f t -> 1 => ๊ธฐ์–ตํ•ด๋ผ
  • f t -> 0 => ์žŠ์–ด๋ผ.

forget์ด ์ž˜ ๋˜์–ด์•ผ ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด, ์žŠ์œผ๋ ค๊ณ  ํ•˜๋ฉด 0์— ๊ฐ€๊น๊ฒŒ, ๊ธฐ์–ตํ•ด์•ผํ•˜๋ฉด 1์— ๊ฐ€๊น๊ฒŒ ๊ณฑํ•ด์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

Backpropagation์—์„œ loss์— ๋”ฐ๋ผ U, W ๊ฐ’๋“ค์ด ๋ฐ”๋€Œ๋ฉฐ forget gate์˜ ๊ฐ’์ด ๋ฐ”๋€๋‹ˆ๋‹ค.

 

 

g t๋Š” ํ˜„์žฌ hidden state ์ธ๋ฐ,  h t-1 ๊ณผ x t๋ฅผ ๊ฐ€์ง€๊ณ  ์–ผ๋งˆ๋‚˜ ๋ฐ˜์˜ํ•  ๊ฒƒ์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ๊ณฑํ•ด์„œ ๋”ํ•ด์ค๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, ์—ฌ๊ธฐ์„œ ์ตœ์ข…์ ์œผ๋กœ ๊ตฌํ•ด์ง„ c t๋ฅผ ๊ฐ€์ง€๊ณ , ๋ฐ”๋กœ ์“ฐ์ง€ ์•Š๊ณ , c t๋ฅผ ๊ฐ€์ง€๊ณ  ์ด๊ฒƒ์„ ์–ผ๋งˆ๋‚˜ ์จ์•ผํ•˜๋‚˜๋ฅผ ๋ณด๋Š” ๊ฒƒ์ด o t์ž…๋‹ˆ๋‹ค.

 

์ด ๊ฐ’๋„ 0~ 1์˜ ๊ฐ’์ด๋ฏ€๋กœ ์ด๊ฒƒ์ด ๊ณฑํ•ด์ ธ, h t๋ฅผ ์™„์„ฑ์‹œํ‚ต๋‹ˆ๋‹ค.

 

 

์œ„์™€ ๊ฐ™์ด LSTM์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค.

 

 

๊ทธ๋ ‡๋‹ค๋ฉด ์™œ ์“ฐ๋Š”์ง€๋ฅผ ๋‹ค์‹œ ์งš๊ณ  ๋„˜์–ด๊ฐ€๊ฒ ์Šต๋‹ˆ๋‹ค.

 

Why LSTM?

 

-> RNN ๋ณด๋‹ค ๋” Long-Term Dependency๊ฐ€ ์šฐ์ˆ˜ํ•˜๋‹ค ์ •๋„๋กœ ์•Œ๊ณ  ๋„˜์–ด๊ฐ€๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

-> RNN ์„ฑ๋Šฅ์ด ์ž˜ ์•ˆ ๋‚˜์˜ค๋ฉด LSTM์„ ์‹œ๋„ํ•ด๋ณผ ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

RNN๊ณผ ๋น„์Šทํ•œ๋ฐ, c t์™€ s t๋ฅผ ๋ณธ๋‹ค.

์ด๊ฒƒ์€ c t๋Š” internal memory์ด๊ณ , s t๋Š”  hidden state์ด๋‹ค. ๊ฐ๊ฐ ๋™์ž‘ํ•˜๋Š” ๋ฐฉ์‹ ๋‹ค๋ฅด๋‹ค.

 

 

  • c t๋Š” ์ด์ „ state ์–ผ๋งˆ๋‚˜ ์žŠ์—ˆ๋‚˜, ์ƒˆ๋กœ์šด ๊ฒƒ์„ ์–ผ๋งˆ๋‚˜ ๊ธฐ์–ตํ•ด์•ผํ•˜๋‚˜๋ฅผ ๋ด…๋‹ˆ๋‹ค.

 

  • s t๋Š” ์ตœ์ข…์ ์œผ๋กœ ๊ณ„์‚ฐ๋œ ๊ฒƒ์„ ๊ฐ€์ง€๊ณ  Output gate๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋ฐ˜์˜์ด ๋˜๋‚˜.

 

Long Term์„ ์ž˜ ํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ์š”?

 

 

์œ„์™€ ๊ฐ™์ด Backpropagation์„ ์ง„ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค !

 

 

์ด๋ฅผ ResNet๊ณผ ๋น„๊ตํ•ด๋ด…์‹œ๋‹ค.

 

 

 

 

์•ˆ์ •์ ์œผ๋กœ Backpropagationํ•˜๊ธฐ ์œ„ํ•ด, ๋ฏธ๋ถ„ํ•œ ๊ฒƒ์— Residual network ์„ ์œผ๋กœ ์—ฐ๊ฒฐ๋œ ๊ฒƒ๋“ค์—๊ฒŒ๋„ gradient๋ฅผ ์—ฐ๊ฒฐํ•˜๋ฉด layer๊ฐ€ ๊นŠ์–ด๋„ gradient๋ฅผ ์ž˜ ์—ฐ์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

GRU๋„ ๋น„์Šทํ•œ ๊ฐœ๋…์œผ๋กœ, reset gate์™€ update gate๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

๊ฒฐ๊ตญ ๊ณตํ†ต๋œ ๊ฒƒ์€ ํ˜„์žฌ ๊ฒƒ์—์„œ ์–ผ๋งˆ๋‚˜ update ์‹œํ‚ฌ ๊ฒƒ์ธ๊ฐ€์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.

 

 

 

๊ฒฐ๊ตญ, RNN์€ ๋งค๋ฒˆ ๊ฐ™์€ ์—ฐ์‚ฐ์„ ํ•˜๋ฉฐ ์•ž์˜ ๊ฒƒ์„ ๊ณ„์†ํ•ด์„œ ๊นŒ๋จน์ง€๋งŒ,

 

GRU, LSTM์€ ๊ณตํ†ต์œผ๋กœ ๋œ ์ค‘์š”ํ•œ ์ •๋ณด๋Š” ์ž˜ ๋„ฃ์ง€ ์•Š๊ณ , ์ค‘์š”ํ•œ ๊ฒƒ์€ ์ •๋ณด๋ฅผ 1์— ๊ฐ€๊น๊ฒŒ ๋„ฃ์–ด ๋” ์ž˜ ๊ธฐ์–ตํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

๋ชจ๋“  sequence model์€ sequence ์ œํ•œ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

 

code ์ƒ , max length ๊ฐ€ ์ •ํ•ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

'Artificial Intelligence > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[NLP] Attention  (0) 2023.04.12
[NLP] Sequential Data Modeling  (0) 2023.04.10
[NLP] RNN  (0) 2023.04.04
[NLP] Word Embedding - GloVe [practice]  (0) 2023.03.31
[NLP] Word Embedding - GloVe  (0) 2023.03.31

BELATED ARTICLES

more