[NLP] RNN

2023. 4. 4. 10:07
๐Ÿง‘๐Ÿป‍๐Ÿ’ป์šฉ์–ด ์ •๋ฆฌ

Neural Networks
RNN
LSTM
Attention

 

 

RNN์ด ์™œ ๋‚˜์™”์„๊นŒ์š”?

 

์˜ค๋Š˜์€ ๊ทธ ์ด์œ ์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๋„๋ก ํ•ฉ์‹œ๋‹ค.

 

ํ•œ ๋งˆ๋””๋กœ ์ •๋ฆฌํ•˜์ž๋ฉด,

 

Sequential Data๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด์„œ ๋‚˜์˜จ ๊ฒƒ์ด RNN์ž…๋‹ˆ๋‹ค.

 

text Datas๋Š” ๋ณดํ†ต sequential Data๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๋ฅผ ๋” ์ž˜ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ RNN ๊ธฐ๋ฐ˜์˜ ๊ตฌ์กฐ๊ฐ€ ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค.

 

๋ฌผ๋ก  CNN๋„ local ํ•˜๊ฒŒ Convolution์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, RNN ๋งŒํผ์˜ ์„ฑ๋Šฅ์ด ๋‚˜์˜ค์ง„ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

 

 

RNN (Recurrent Neural Network)

 

Sequential Data Modeling

  • Sequential Data
    • Most of data are sequential
    • Speech, Text, image, ...
  • Deep Learnings for Sequential Data
    • Convolutional Neural Networks (CNN)
      • Try to find local features from a sequence
    • Recurrent Neural Networks : LSTM, GRU
      • Try to capture the feature of the past

 

 

 

 

์œ„์™€ ๊ฐ™์ด NN ๊ตฌ์กฐ์—์„œ ์•„๋ž˜์˜ ์ถ”๊ฐ€๋œ ๊ฐ’๋“ค์ด ๋ณด์ด๋Š” ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ ํ˜•ํƒœ๋ฅผ RNN ํ˜•ํƒœ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋˜‘๊ฐ™์ด input, hidden, output layer ๊ฐ€ NN์ฒ˜๋Ÿผ ์กด์žฌํ•˜์ง€๋งŒ,

 

์ด์ „ representation์ด ๊ฐ™์ด ๋“ค์–ด์˜จ๋‹ค๋Š” ๊ฒƒ์ด RNN์—์„œ ์ค‘์š”ํ•œ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.

 

 

์ผ๋‹จ Text์˜ ํ˜•ํƒœ๋ฅผ ๋ณด๊ณ  ๊ฐ‘์‹œ๋‹ค.

 

Text์—์„œ ๋‹จ์–ด ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ vector๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์šฐ๋ฆฌ๋Š” ์ด๊ฒƒ์„ One-Hot vector๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

 

๋งจ ์ฒ˜์Œ์— ์œ„ hidden layer์— ๋“ค์–ด์˜จ ๊ฒƒ์ด ์ €์žฅ๋˜์–ด ๋‹ค์Œ input์ด ๋“ค์–ด์˜ฌ ๋•Œ๋„ ์‚ฌ์šฉ๋˜๊ณ , ๊ธฐ์กด NN์ฒ˜๋Ÿผ output ๊ณ„์‚ฐํ•  ๋•Œ๋„ ๋“ค์–ด์˜ต๋‹ˆ๋‹ค.

 

์ฆ‰, ํ˜„์žฌ Input์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์žˆ์–ด, ์ด์ „ Input๋“ค์˜ ๊ฐ’์ด ํ˜„์žฌ์˜ input์— ์˜ํ–ฅ์„ ์ค€๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

์ถœ์ฒ˜ : https://wikidocs.net/22888

 

์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๋กœ ์˜ํ–ฅ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.

 

 

์ฆ‰, ์˜ˆ์‹œ๋กœ ๋ณด๋ฉด,

 

Cat์ด๋ผ๋Š” ๋‹จ์–ด๋ฅผ ์—ฐ์‚ฐํ•  ๋•Œ, ์ด์ „ ๋‹จ์–ด๋Š” The๊ฐ€ ๊ฐ™์ด ์—ฐ์‚ฐ๋˜์–ด hidden state์— ๋ฐ˜์˜๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ ‡๋‹ค๋ฉด, ๋‹ค์Œ ๋‹จ์–ด sat์ด๋ผ๋Š” ๋‹จ์–ด๋Š” ์›๋ž˜๋Š” hidden state๋งŒ ๊ณ„์‚ฐํ–ˆ์ง€๋งŒ, ์ด์ „ ๋‹จ์–ด๊นŒ์ง€ ๋ฐฐ์› ๋˜ ๊ฒƒ์„ ๊ฐ€์ง€๊ณ  ์—ฐ์‚ฐ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

 

์ด๋ ‡๊ฒŒ Recurrentํ•˜๊ฒŒ ๋ฐ˜๋ณต๋˜๋Š” ๊ตฌ์กฐ๋ฅผ Recurrent Neural Network๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ ‡๊ฒŒ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š” ๊ฒƒ์ธ RNN์˜ ๊ธฐ๋ณธ์ ์ธ ํˆด์ž…๋‹ˆ๋‹ค.

 

์–ด๋Š์ •๋„ ์ดํ•ด๊ฐ€ ๋˜์…จ๋‚˜์š”?

 

๊ทธ๋Ÿผ ๋‹ค์Œ ๊ทธ๋ฆผ๋„ ๋ณด์‹œ์ฃ  !

 

 

 

 

๊ทธ๋ ‡๋‹ค๋ฉด ์ง€๊ธˆ๊นŒ์ง€์˜ ๋ฌผ์„ ์œ„์™€ ๊ฐ™์€ ๊ทธ๋ฆผ ๊ตฌ์กฐ๋กœ ์ •๋ฆฌํ•ด๋ณผ ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

 

๊ทธ๋ฆฌ๊ณ , ์ˆ˜์‹์œผ๋กœ ์œ„์™€ ๊ฐ™์ด ์ •๋ฆฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

x t๋Š” t์—์„œ input,

 

x t์—์„œ ๊ตฌํ•œ hidden state ๊ฐ’์ด h t ์ž…๋‹ˆ๋‹ค.

 

์ด๊ฒƒ์ด ๊ฐ weight U, W๋ฅผ ๊ฐ๊ฐ input์ธ x t์— ๊ณฑํ•ด์ง€๊ณ  h t-1์— ๊ณฑํ•ด์ง€๋ฉฐ ๊ทธ๊ฒƒ์„ ๋”ํ•œ ๊ฐ’์ด activation function f๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.

 

hidden state์—์„œ์˜ ๊ฐ’์ด ์œ„์—์„œ ๋งํ•œ ๋ฐ”์™€ ๊ฐ™์ด input๊ณผ ์ด์ „๊นŒ์ง€์˜ input์„ ๋‚˜ํƒ€๋‚ด๋Š” h t -1์ด weighted sum ๋˜์–ด hidden state์˜ ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ์ด ๊ฐ’์ด ๋‹ค์‹œ weight ๊ณฑํ•ด์ ธ์„œ ๋˜ ๋‹ค๋ฅธ output actvation function์„ ์ง€๋‚˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

์ž…๋ ฅ์€ ๊ธฐ์กด์˜ NN๊ณผ ๋‹ค๋ฅด๊ฒŒ,

 

Sequence๊ฐ€ ์กด์žฌํ•˜๋ฏ€๋กœ, sequence์˜ ๊ธธ์ด๋งŒํผ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ์ด ๊ฐ๊ฐ์˜ sequence๋Š” One-Hot vector๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์–ด, ์ด vecto์˜ ๊ธธ์ด๊ฐ€ ๊ฝค ํฌ๊ฒŒ 100000์˜ ๊ธธ์ด๊ฐ€ ๋  ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์œ„ ๊ตฌ์กฐ์™€ ๊ฐ™์ด input์ด ์—ฌ๋Ÿฌ ์ฐจ์› ์กด์žฌํ•˜๊ณ , output๋„ ์—ฌ๋Ÿฌ ์ฐจ์› ์กด์žฌํ•˜๋Š” ๊ฒƒ์„ "Many-to-Many"๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

๊ตฌ์กฐ์— ํ•ด๋‹นํ•˜๋Š” ๊ฒƒ์€ ๋‚ด๊ฐ€ ํ•˜๊ณ ์žํ•˜๋Š” ๊ฒƒ์— ๋”ฐ๋ผ RNN๊ตฌ์กฐ๋ฅผ ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

 

์œ„ ๊ทธ๋ฆผ์—์„œ h3๋Š” x1 ~ x3์˜ Input์ด ๋ฐ˜์˜๋œ hidden state์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๋งˆ์ง€๋ง‰ hidden state์ธ h5๋ฅผ sequence์˜ embedding์ด๋ผ ํ•˜๋ฉฐ,

 

์ด๊ฒƒ์€ sentence embedding์ด๋ผ ํ•ฉ๋‹ˆ๋‹ค.

 

 

์—ฌ๊ธฐ์„œ input์„ ๋” ์‚ดํŽด๋ด…์‹œ๋‹ค.

 

The [ 1 0 0 0 0 ]

 

The ๋ผ๋Š” ๋‹จ์–ด๋ฅผ ์œ„์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ธ๋‹ค๋ฉด,

 

์ด๋ฅผ Word2vec์ด๋‚˜ GloVe๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์„ ํŠน์ • context์— ๋Œ€ํ•œ Vector ๊ฐ’์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์„ Pre-trained embedding์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๊ฒƒ๋“ค์„ input์œผ๋กœ ๋„ฃ์œผ๋ฉด ๋” ์œ ์˜๋ฏธํ•œ embedding์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

 

 

 

Long Term Dependency

 

h t๋Š” x 1 ~ x t-1๊นŒ์ง€์˜ ์ •๋ณด๊ฐ€ ๋ชจ๋‘ encoding๋œ ์ •๋ณด์ž…๋‹ˆ๋‹ค.

 

RNN์€ ์ด๋Ÿฌํ•œ recurrent ๊ตฌ์กฐ๋กœ ํ•˜๋‚˜ํ•˜๋‚˜์”ฉ ์—ฐ๊ฒฐ๋˜๋Š” ๊ตฌ์กฐ๋กœ ํ•™์Šต๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์—,  Long Term Dependency๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

 

 

์ด Sequence ๊ธธ์ด๊ฐ€ ์—„์ฒญ๋‚˜๊ฒŒ ๊ธธ๋‹ค๋ฉด, ์ด๊ฒƒ์ด ์œ ์ง€๋ ๊นŒ์š”?

 

RNN์€ sequence ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜๊ณ ์ž ํ•˜๋Š”๋ฐ,

 

๋’ค๋กœ ๊ฐˆ์ˆ˜๋ก ์•ž์— ํ•™์Šตํ•œ ๊ฒƒ๋“ค์„ ์ž˜ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” "Long Term Dependency" ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

LSTM๊ณผ GRU๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

'Artificial Intelligence > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[NLP] Sequential Data Modeling  (0) 2023.04.10
[NLP] RNN - LSTM, GRU  (0) 2023.04.04
[NLP] Word Embedding - GloVe [practice]  (0) 2023.03.31
[NLP] Word Embedding - GloVe  (0) 2023.03.31
[NLP] Word Embedding - CBOW and Skip-Gram  (2) 2023.03.27

BELATED ARTICLES

more