Artificial Intelligence/Natural Language Processing

[NLP] Introduction to Word Embedding

Han Jang 2023. 3. 26. 22:26
๐Ÿง‘๐Ÿปโ€๐Ÿ’ป ์ฃผ์š” ์ •๋ฆฌ
 
NLP
Word Embedding

 

 

๋ฐฐ๊ฒฝ ์ง€์‹

 

What is the Embedding?

 

Embedding์ด๋ž€ ๋ฌด์—‡์ผ๊นŒ์š”?

 

์šฐ๋ฆฌ๋Š” input์œผ๋กœ ๋“ค์–ด์˜จ sentence์ธ sequence๋ฅผ vectorํ™” ํ•ด์„œ ๊ณ„์‚ฐํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

 

์ด ๋•Œ ํ•„์š”ํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ Word Embedding์ž…๋‹ˆ๋‹ค.

 

์ฆ‰, ๋ฌธ์ž๋ฅผ ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

 

์šฐ๋ฆฌ๊ฐ€ CNN์—์„œ image๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ, [28 x 28 x 3 x 256] ์ •๋„์˜ vector๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋•Œ, hidden layer์—์„œ hidden representation์„ ๋ฝ‘์•„๋‚ด๋Š” ๊ณผ์ •์„ ํ•˜์ฃ .

 

์ฆ‰, vector ๊ฐ’์—์„œ ์˜๋ฏธ์žˆ๋Š” representation์„ ๋ฝ‘์•„๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

Embedding๋„ ๋งˆ์ฐฌ๊ฐ€์ง€ ์ž…๋‹ˆ๋‹ค.

 

์œ„์™€ ๊ฐ™์ด ์šฐ๋ฆฌ๊ฐ€ input์œผ๋กœ ๋„ฃ๋Š” ๊ฐ’์„ vector๋กœ ๋ฐ”๊พธ์–ด ์—ฐ์‚ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜ : https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/

์œ„์™€ ๊ฐ™์€ ์ž๋ฃŒ๋ฅผ ๋ด…์‹œ๋‹ค.

 

๋‹จ์–ด๋ฅผ ์œ„์™€ ๊ฐ™์ด 3์ฐจ์› ๊ณต๊ฐ„์— mappingํ•ฉ๋‹ˆ๋‹ค.

 

vector[Queen] = vector[King] - vector[Man] + vector[Woman] ๊ณผ ๊ฐ™์€ ์—ฐ์‚ฐ์„ ํ†ตํ•ด

 

๋‹จ์–ด๋ฅผ embedding or representation์œผ๋กœ ๋ฐ”๊พธ๋ฉฐ ์ด๋Ÿฌํ•œ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿผ ์กฐ๊ธˆ ๋” ์‚ดํŽด๋ณผ๊นŒ์š”?

 

 

NLP ๋ถ„์•ผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ถ„์•ผ๋กœ ๋„“๊ฒŒ ์“ฐ์ด๋ฉฐ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ด€๋ จ ๋ฉ”์†Œ๋“œ๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

1. Word Similarity

 

Classic Methods : Edit Distance, WordNet, Porter's Stemmer, Lemmatization using dictionaries

 

  • Easily identifies similar words and synonyms since they ocuur in similar contexts.
  • Stemming  (thought -> think)
  • Inflections, Tense forms
  • eg. Think, thiught / ponder, podering
  • Plane, Aircraft, Flight

 

 

2. Machine Translation

Classic Methods : Rule-based machine translation, morphological transformation

 

 

3. Part-of-Speech and Named Entity Recognition

 

Classic Methods : Sequential Models (MEMM, Conditional Random Fields), Logistic Regression

 

4. Relation Extracting

 

Classic Methods : OpenIE, Linear programming models, Bootstrapping

 

5. Sentiment Analysis

 

 

Classic Methods : Naive Bayes, Random Forests/SVM

  • Classifying sentences as positive and negative
  • Building sentiment lexicons using seed sentiment sets
  • No need for classifiers, we can just use cosine distances to compare unseen reviews to known reviwes.

-> ๋‹จ์–ด๊ฐ„ distance๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค.

 

=> L1 distance๋Š” ๋น„์Šทํ•  ์ˆ˜๋ก ๊ฐ’์ด ์ ์–ด์ง„๋‹ค. L2 distance๋Š” ์œ ํด๋ฆฌ๋””์•ˆ์„ ์‚ฌ์šฉํ•˜๋Š” ํ’€์ด์ด๋‹ค. cos distance๋Š” ๋น„์Šทํ•  ์ˆ˜๋ก ๊ฐ๋„ ์ฐจ์ด๋ฅผ ์ž‘๊ฒŒ ํ•˜์—ฌ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

 

 

6. Co-reference Resolution

  • Chaining entity mentions across multiple documents
  • can we find and and unify the multiple contexts in which mentions occurs?

 

7. Clustering

  • Words in the same class naturally occur in similar contexts, and this feature vector can directly be used with any conventional clustering algorithms (K-Means, agglomerative, etc..).
  • Human doesn't have to waste time hand-picking useful word features to cluster on.

 

8. Semantic Analysis of Documents

  • Build word distributions for various topics, etc.

 

 

์œ„์™€ ๊ฐ™์€ ์—„์ฒญ๋‚œ vectors๋“ค์„ ์–ด๋–ป๊ฒŒ ๋งŒ๋“ค๊นŒ์š”?

 

-> Similar words have silmilar representation !

 

Lower-dimension vector representations for words based on their context

 

  • Co-occurrence Matrix with SVD
  • Word2Vec
  • Global Vector Representations (Glove)
  • Paragraph Vectors

 

 

 

Co-occurrence Matrix with Singular Value Decomposition

 

 

data์— ๋Œ€ํ•ด์„œ Co-occurrence ํ•˜๋Š” ๋‹จ์–ด์— ๋Œ€ํ•ด์„œ ๊ฐ™์ด ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

 

ํ•ด๋‹น ๊ณผ์ •์„ ํ†ตํ•ด์„œ ํ…Œ์ด๋ธ”์„ ์–ป์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  Vector์˜ ์ฐจ์›์—์„œ ์˜๋ฏธ ์žˆ๋Š” dimension๋งŒ ๋ด…๋‹ˆ๋‹ค.

 

์ด ๊ณผ์ •์„ Dimension Reduction using SVD๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

SVD๋Š” vector์˜ ์ฐจ์›์ž…๋‹ˆ๋‹ค.

 

์ด ๋ฐฉ๋ฒ•์˜ ๋ฌธ์ œ์ ์€ ์ˆ˜์‹ญ์–ต ๊ฐœ์˜ ํ–‰๊ณผ ์—ด์ด ์žˆ๋Š” ํ–‰๋ ฌ๋กœ ๋๋‚  ์ˆ˜ ์žˆ์–ด SVD ๊ณ„์‚ฐ์ด ์ œํ•œ์ ์ด๋ผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
 

๋งŽ์€ ๊ณ„์‚ฐ๋Ÿ‰ ๋•Œ๋ฌธ์— ์ด ๊ณผ์ •์„ ๋งŽ์ด ์‚ฌ์šฉํ•˜์ง„ ์•Š์Šต๋‹ˆ๋‹ค.

 

 

Word2Vec

 

  • Represent each word with a low-dimensional vector
  • Word similarity = vector similarity
  • Key idea : Predict surrounding words of every word
  • Faster and can easily incorporate a new sentence/document or add a word to the vocabulary

 

Represent the meaning of word

 

  • Two basic neural network models:
    • Continuous Bag of Word(CBOW) : use a window of word to predict the middle word.
    • Skip-gram (SG) : use a word to predict the surrounbding ones in window.

 

 

 

์œ„์™€ ๊ฐ™์€ ์ฐจ์ด๋ฅผ ๋ณด์ธ๋‹ค.

 

์ด ์ฐจ์ด์— ๋Œ€ํ•ด ๋‹ค์Œ ์‹œ๊ฐ„๋ถ€ํ„ฐ ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

'Artificial Intelligence > Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[NLP] Word Embedding - Word2Vec  (0) 2023.03.27
[NLP] Word Embedding - Skip Gram  (0) 2023.03.27
[NLP] Word Embedding - CBOW  (1) 2023.03.27
[NLP] Overview NLP  (0) 2023.03.21
[NLP] Introduction to NLP  (0) 2023.03.21