[NLP] Overview NLP

2023. 3. 21. 15:09
๐Ÿง‘๐Ÿป‍๐Ÿ’ป ์ฃผ์š” ์ •๋ฆฌ
 
NLP
Word Embedding
Modeling Sequence
MLP
CNN
RNN
Language Modeling
Autoregressive Language Modeling
Machine Translation
Seq2Seq
Attention
Transformer

 

 

๋ฐฐ๊ฒฝ ์ง€์‹

 

NLP๋ฅผ ๋ฐฐ์šฐ๊ธฐ ์œ„ํ•ด์„  ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ตฌ์„ฑ์š”์†Œ๋“ค์„ ์•Œ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

 

ํ•จ๊ป˜ ์•Œ์•„๋ด…์‹œ๋‹ค.

 

 

Word Embedding

์ปดํ“จํ„ฐ๋Š” ์ธ๊ฐ„์˜ ์–ธ์–ด๋ฅผ ์•Œ์•„๋“ฃ์ง€ ๋ชป ํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š” ์ด ๋ฌธ์ž๋“ค์„, ์ด Semantic ๋‹จ์–ด๋ฅผ ์–ด๋–ค ์ˆซ์ž๋กœ mapping ์‹œ์ผœ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ฅผ word embedding ์ž‘์—…์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

Each word can be represented as a vector based on the Distributed hypothesis.

 

 

 

Modeling Sequences

 

๋ชจ๋“  text๋Š”, ์ฆ‰ ๋ชจ๋“  ์ž์—ฐ์–ด๋Š” ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋ฌธ์žฅ ๊ธธ์ด, ๋‹ค๋ฅธ ๋‹จ์–ด์˜ ๊ธธ์ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ sequence๋“ค์„ ์šฐ๋ฆฌ๊ฐ€ modelingํ•˜๋Š” ์ž‘์—…์ด ํ•„์š”ํ•ด์ง‘๋‹ˆ๋‹ค.

 

์ด๋ฅผ modeling sequences๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

๋” ์ž์„ธํžˆ ์•Œ์•„๋ด…์‹œ๋‹ค.

 

MLP

๋จผ์ €, sequence์˜ length๋ฅผ ๋ฐ”๊ฟ”๋ณด๋ฉด ์–ด๋–จ๊นŒ์š”?

 

MLP๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์€ ๊ณ ์ •๋œ Length๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—, 

 

๋‹ค์Œ๊ณผ ๊ฐ™์€ issue๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

- We cannot reuse the same network for handling sequences in different length

- It is because the network parameter bounds the length of the sequences

 

๊ทธ๋ ‡๋‹ค๋ฉด ์šฐ๋ฆฌ๋Š” ์–ด๋–ค Length๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒŒ ๋ฐ”๋žŒ์งํ• ๊นŒ์š”?

 

1. ์ œ์ผ ์งง์€ ๊ฒƒ

 

2. ์ œ์ผ ๊ธด ๊ฒƒ. ๋‚จ์€ ๊ฒƒ์€ padding (๊ณต๋ฐฑ)

 

์ด๊ฒƒ์€ ๊ณผ์ œ์— ๋”ฐ๋ผ ๋‹ค๋ฅผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

 

CNN

 

CNN์„ ์‚ฌ์šฉํ•˜์—ฌ sequence์˜ ๊ธธ์ด๋ฅผ ๋ฐ”๊พธ๋ฉด ์–ด๋–จ๊นŒ์š”?

 

We can resue the same network by sliding convolution filters over the sequence.

 

CNN์€ Computer Vision์—์„œ์™€ ๊ฐ™์ด,

 

์ด๋ฅผ ํ…Œ๋ฉด 28 x 28 ํฌ๊ธฐ์˜ ๊ณ ์ •๋œ length๋Š” ์ž˜ ์ฒ˜๋ฆฌํ•˜์ง€๋งŒ, ๋ณ€ํ™”ํ•˜๋Š” ๊ฒƒ์€ ์ž˜ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชป ํ•ฉ๋‹ˆ๋‹ค.

 

CNN์„ ์‚ฌ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ issue๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

 

- The hidden representation grow with the length of the sequence.

- The receptive field is fixed.

 

 

 

 

 

๊ทธ๋ ‡๋‹ค๋ฉด, ์–ด๋–ค ๊ฒƒ์ด ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์ผ๊นŒ์š”?

 

์‹ค์ œ NLP ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋ฐ”๋กœ,

Recursion

์ž…๋‹ˆ๋‹ค.

 

 

๋ฐ˜๋ณต์„ ์‚ฌ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด์ ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

- A model that reads each input one at a time.

- Parameters can simply be reused (or shared) for each input in a recurrent computation.

 

์—ฌ๊ธฐ์„œ RNN์„ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค.

 

RNN์„ ํ†ตํ•ด sequence pattern ๋ถ„์„์ด ๊ฐ€๋Šฅํ•˜๊ณ , sequence์˜ ๊ธธ์ด ๋งŒํผ ์ž…๋ ฅ์ด ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜ :https://omicro03.medium.com/์ž์—ฐ์–ด์ฒ˜๋ฆฌ-nlp-11์ผ์ฐจ-rnn-299a48d16bf0

 

 

 

 

Language Modeling

 

A language model captures the distribution over all possible sentences

 

 

ํŠน์ • ๋‹จ์–ด input์— ๋Œ€ํ•ด ๊ทธ ๋‹ค์Œ ๋‹จ์–ด์— ๋Œ€ํ•˜์—ฌ ์˜ˆ์ธกํ•˜์—ฌ output์„ ๋‚ด๋†“์Šต๋‹ˆ๋‹ค.

 

 

  • input : a sentence
  • output : the probability of the input sentence

์ถœ์ฒ˜ : https://junklee.tistory.com/46

 

์—ฌ๊ธฐ์„œ ํ™•๋ฅ ์ด ์ถœ๋ ฅ์œผ๋กœ ๋‚˜์˜จ๋‹ค๋Š” ๊ฒƒ์€ scoring์ด ๊ฐ€๋Šฅํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ฆ‰, ์šฐ๋ฆฌ๊ฐ€ ๋ฌธ์ž๋ฅผ ์ˆซ์ž๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค๋Š” ๋ง์ด์ฃ  !!!

 

์ด๋Š” ์šฐ๋ฆฌ๊ฐ€ sequenct generating์„ ํ•˜๋Š” ๋ฐ ์žˆ์–ด ํฐ ์˜ํ–ฅ๋ ฅ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

 

์—ฌ๊ธฐ์„œ ์กฐ๊ธˆ ๋” ๋‚˜์•„๊ฐ€๋ฉด,

 

 

Autoregressive Language Modeling

 

The distribution over the next toen is based on all the previous tokens.

 

ํšŒ๊ท€์ ์œผ๋กœ ์•ž์—์„œ๋ถ€ํ„ฐ ์ฐจ๊ทผ์ฐจ๊ทผ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ด์ „ ๋‹จ์–ด๋กœ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Model์ž…๋‹ˆ๋‹ค.

 

 

 

 

์ถœ์ฒ˜ : https://junklee.tistory.com/47

 

์œ„ ์‹์„ ํ† ๋Œ€๋กœ ์‚ดํŽด๋ณด๋ฉด, ์กฐ๊ฑด๋ถ€ํ™•๋ฅ ๋ถ„ํฌ์™€ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜ : https://opendatascience.com/build-nlp-and-conversational-ai-apps-with-transformers-and-large-scale-pre-trained-language-models/

 

์œ„์™€ ๊ฐ™์ด I๊ฐ€ ์ž…๋ ฅ ๋˜์—ˆ์„ ๋•Œ, like๊ฐ€ ์˜ฌ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

 

 

 

Machine Translation

 

Machine Translation is a task of translating a sentence in one language to another.

 

์ด model์€ ๋ชจ๋“  generation์— ํฌํ•จ๋˜๋Š” model์ž…๋‹ˆ๋‹ค.

 

Seq2Seq model์ด๋ฉฐ encoder์™€ decoder ๋ชจ๋‘๋ฅผ ์‚ฌ์šฉํ•˜๋Š” model์ž…๋‹ˆ๋‹ค.

 

 

Sequence to Sequence Learning

 

์œ„ ๋ชจ๋ธ์˜ Encoder, Decoder๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

  • Encoder : Encodes the source sentence into a set of sentence representation vectors.
  • Decoder : Generates the target sentence conditioned on the source sentence.

 

๊ฐ„๋‹จํ•˜๊ฒŒ ๋ณด๋ฉด, input sentence๋ฅผ vector๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ด Encoder, ์ด vectors๋“ค์„ ๊ฐ€์ง€๊ณ  target sentence, ์ฆ‰ ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๊ฐ’์ด ๋‚˜์˜ค๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด Decoder์ž…๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜ : https://wikidocs.net/24996

 

 

 

Attention Mechanism

 

Attention focus on a relevant part of the source sentence for predicting the next target token.

 

Attention ์—์„œ๋Š” RNN Encoder์™€ RNN Decoder๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์กฐ๊ธˆ ๋” ๊ด€๋ จ ์žˆ๋Š” ๊ฐ’๋“ค์„ ๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

 

๋‹จ์–ด๋“ค ์ž…๋ ฅ์„ ํ•œ ๋ฒˆ์— ๋„ฃ๊ณ , output๋„ ํ•˜๋‚˜์”ฉ ๋‚˜์˜ต๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜ : https://wikidocs.net/22893

 

๋‹จ์–ด ๊ฐœ์ˆ˜ 4๊ฐœ ์งœ๋ฆฌ sentence๋ฅผ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋งŒํผ 4๋ฒˆ ๋ด์„œ ๊ฒฐ์ •ํ•œ๋‹ค.

 

๊ทธ๋Ÿฌ๋‚˜ transformer์—์„œ ์ด๊ฒƒ์€ ํ•œ ๋ฒˆ์œผ๋กœ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

 

 

Transformer

<paper : https://arxiv.org/abs/1706.03762 >

 

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new

arxiv.org

 

ํ•ด๋‹น ๋…ผ๋ฌธ์„ ํ†ตํ•ด transformer๊ฐ€ ๋„์ž…๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

๋ฐฐ๊ฒฝ์€

 

  • No more RNN or CNN modules
  • No more sequential processing

์ž…๋‹ˆ๋‹ค.

 

์ด transformer๋Š” CNN ๋ถ„์•ผ, ์ด๋ฏธ์ง€ ์ธ์‹ ๋ถ„์•ผ์—์„œ๋„ ํ™œ๋ฐœํ•˜๊ฒŒ ์“ฐ์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

์ด๋Š” Seq2Seq model์ด์ง€๋งŒ attention๋งŒ ๊ฐ€์ง€๋Š” model์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด transformer๋ฅผ ์—ฌ๋Ÿฌ๊ฒน ์Œ“์•„์„œ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

Transformer๊ฐ€ ๋‚˜์˜ค๊ณ  ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๊ธฐ์ˆ ๋“ค์ด ๋งŽ์ด ๋ฐœ์ „๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

  • GPT-1
  • BERT
  • XLNet
  • ALBERT
  • RoBERTa
  • Reformer
  • T5

 

 

 

2017๋…„์— ๊ฐœ๋ฐœ๋œ ์ด transformer๋Š” ๋” ๋น ๋ฅธ ์„ฑ์žฅ์„ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค.

 

2023๋…„์ธ ํ˜„์žฌ๋„ ๊ณ„์†ํ•ด์„œ ์“ฐ์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

BERT

 

Improve performance of various NLP tasks recently

 

Learn through masked language model task and next sentence prediction task

 

Transformer์˜ encoder ๋ถ€๋ถ„์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

GPT-2

 

Language Models are Unsupervised Multitask Learners

 

Transformer์˜ decoder๋ถ€๋ถ„์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋” ๋งŽ์ด ์Œ“๊ณ  ํ•™์Šต์„ ๋งŽ์ด ์‹œํ‚จ ๊ฒƒ์ด ์ง€๊ธˆ์˜ ๋งŽ์ด ์ƒ์šฉํ™” ๋˜์–ด์žˆ๋Š” ChatGPT3, ChatGPT4๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

GPT3๋Š” GPT2 ๋ณด๋‹ค parameter ๊ฐœ์ˆ˜๋กœ๋งŒ ๋ด๋„ 100๋ฐฐ๊ฐ€ ์ฐจ์ด ๋‚ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ , GPU ๊ฐœ์ˆ˜๋„ 1000 ~ 10000๋ฐฐ ์ฐจ์ด๋‚ฉ๋‹ˆ๋‹ค.

 

 

 

 

์œ„ ๊ธฐ์ˆ ๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ถ„์•ผ์— ์‚ฌ์šฉ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

  • Machinbe Reading Comprehension
  • Document Summarization
    • Extractive summarization
    • Abstractive summarization
  • Neural Conversational Model

 

 

 

 

 

 

'Artificial Intelligence > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[NLP] Word Embedding - Word2Vec  (0) 2023.03.27
[NLP] Word Embedding - Skip Gram  (0) 2023.03.27
[NLP] Word Embedding - CBOW  (1) 2023.03.27
[NLP] Introduction to Word Embedding  (0) 2023.03.26
[NLP] Introduction to NLP  (0) 2023.03.21

BELATED ARTICLES

more