[Convolutional Neural Networks and Image Classification] Part 3

2023. 1. 24. 18:19
๐Ÿง‘๐Ÿป‍๐Ÿ’ป์šฉ์–ด ์ •๋ฆฌ

Convolutional Neural Networks
fully-connected layer
fully-connected neural network
activation map
Hyperparameters
VGGNet
ResNet

 

 

Convolutional Neural Networks

  • fully-connected layer
    fully-connected neural network
  • Computer Vision (ConvNets)
  • 2012๋…„ AlexNet๋ถ€ํ„ฐ ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ์ข‹์•„์ง
  • 2015๋…„ ์‚ฌ๋žŒ๋ณด๋‹ค ๋” ์ธ์‹์„ ์ž˜ ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

์—ฌ๋Ÿฌ๋ถ„์€ ๊ณ ์–‘์ด์™€ ๊ฐ•์•„์ง€๋ฅผ ์ž˜ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋‚˜์š”?

 

๊ณ ์–‘์ด์™€ ๊ฐ•์•„์ง€

 

๊ณ ์–‘์ด์™€ ๊ฐ•์•„์ง€๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”

 

Computer Vision ๋ถ„์•ผ์—์„œ ๋งŽ์ด ์“ฐ์ด๋Š” CNN, ์ฆ‰ Convolutional Neural Network๊ฐ€ ๊ฐ€์žฅ default๋กœ ์“ฐ์ž…๋‹ˆ๋‹ค.

 

๊ทธ ๊ณผ์ •์—๋Š” pixel์˜ ๊ฐ’์„ ์˜ค์ฐจ ๋ฒ”์œ„์—์„œ ํ•˜๋‚˜ํ•˜๋‚˜ ๋‹ค ๋น„๊ตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

์ฆ‰, Trying Every Possible Match๊ฐ€ ๊ทธ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ , ๋ชจ๋“  ๊ฐ’๋“ค์„ ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ด์–ด ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉฐ activation map ํ˜น์€ ํ™œ์„ฑํ™” ์ง€๋„ ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

(ํ…Œ์ด๋ธ”์— ์จ์žˆ๋Š” ์ˆซ์ž๋ฅผ ์–ธ๊ธ‰ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.)

์ถœ์ฒ˜ : https://wikidocs.net/135874

 

์œ„ ํŠน์ • ํ™œ์„ฑํ™” ์ง€๋„๋Š” ํŠน์ •ํ•œ convolution filter ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์œ„์น˜์— Overlab์„ ์‹œ์ผœ ๋งค์นญ๋˜๋Š” ์ •๋„๋ฅผ ์–ป์—ˆ์„ ๋•Œ, ๋‚˜์˜ค๋Š” ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€ ์ž…๋‹ˆ๋‹ค.

 

์ด ์—ฐ์‚ฐ์€ Convolution ์—ฐ์‚ฐ ํ˜น์€ Operation์ด๋ผ๊ณ  ์ •์˜ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ฒฐ๊ตญ ์ด Activation Map์„ ์–ป๊ธฐ ์œ„ํ•ด์„  ์•„๋ž˜์™€ ๊ฐ™์€ ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

 

  1. Convolution filter๋ฅผ ์ด๋ฏธ์ง€ ์ƒ์˜ ๊ฐ๊ฐ์˜ ํŠน์ • ์œ„์น˜์— ์œ„์น˜ ์‹œ์ผœ์„œ ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ์ด๋ฏธ์ง€ ํŒจ์น˜์™€ align ์‹œ์ผฐ์„ ๋•Œ, pixel ๊ฐ’๋“ค์„ ์„œ๋กœ ๊ณฑํ•ฉ๋‹ˆ๋‹ค.
  2. ๊ทธ๊ฒƒ์„ ๋‹ค ๋”ํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ทธ๋ฆฌ๊ณ  total pixel ๊ฐœ์ˆ˜๋กœ ๋‚˜๋ˆ„์–ด ์ค๋‹ˆ๋‹ค. (์„ ํƒ)

 

Convolution filter

  • Convolution layer์—์„œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŠน์ •ํ•œ ๊ณ ์œ ํ•œ ํŒจํ„ด์„ ๊ฐ€์ง€๋Š” ๋‹ค์ˆ˜์˜ convolution filter๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
  • ์ด convolution filter ํ˜น์€ ํŒจ์น˜๋ฅผ ๊ฐ๊ฐ ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ์ ์šฉํ–ˆ์„ ๋•Œ filter ๋ณ„๋กœ activation map์ด output์œผ๋กœ ๋‚˜์˜ต๋‹ˆ๋‹ค.
  • ํ•˜๋‚˜์˜ activation map์„ ์ด๋ฏธ์ง€๋กœ ๋ณธ๋‹ค๋ฉด, ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€ ํ•œ ์žฅ์„ ๊ฐ€์ง€๊ณ  filter ๊ฐœ์ˆ˜๋งŒํผ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋ƒ…๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ, ์ด๋Ÿฌํ•œ Convolution layer๋ฅผ ์—ฌ๋Ÿฌ ๊ณ„์ธต์œผ๋กœ ์Œ“์•˜์„ ๋•Œ, deep neural network๋กœ์„œ์˜ convolution neural nework๋ฅผ ๊ตฌ์„ฑํ–ˆ์„ ๋•Œ, ํ•œ convolution layer ๋‚ด์— ์ฃผ์–ด์ง€๋Š” ์ž…๋ ฅ์ด ํ•œ ์žฅ์ด ์•„๋‹Œ ์—ฌ๋Ÿฌ ์žฅ์ธ ์ž…๋ ฅ์ผ ๊ฒฝ์šฐ๋„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

ํ•œ ์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ฑ„๋„์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

๊ฐ convolution filter๊ฐ€ ํ•œ ์นธ์”ฉ ์ด๋™ํ•˜๋ฉด์„œ scalar ๊ฐ’์„ ๋ฝ‘๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

์ด๋กœ์จ activation map์ด ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

Pooling Layer

  • ํŠน์ • ์ด๋ฏธ์ง€ ํŒจ์น˜ ๋‚ด์—์„œ ๊ฐ€์žฅ ์ตœ๋Œ“๊ฐ’์„ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ max pooling์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ํŠน์ • ์ด๋ฏธ์ง€ ํŒจ์น˜๋ฅผ ์˜ฎ๊ฒจ๊ฐ€๋ฉฐ output layer๋ฅผ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ์ตœ๋Œ“๊ฐ’๋งŒ ๋ฝ‘๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ€๋กœ ์„ธ๋กœ์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ๋ฐ˜์œผ๋กœ ์ค„์ด๊ฒŒ ๋˜๋Š” ํšจ๊ณผ๋ฅผ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ์ฑ„๋„๋ณ„๋กœ ๋”ฐ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

 

ReLU Layer(ReLUs, Rectified Linear Units)

  • ํ•จ์ˆ˜๋ฅผ ๋ณด๋‹ค ์œ ์—ฐํ•˜๊ฒŒ ๋‹ค์–‘ํ•œ ํŒจํ„ด๋“ค์„ ํ‘œํ˜„ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
  • ์–‘์ˆ˜์˜€๋˜ ๊ฐ’์€ ๊ทธ๋Œ€๋กœ, ์Œ์˜ ๊ฐ’์€ 0์œผ๋กœ clipping ํ•ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ์ฆ‰, ๋ณ€ํ˜•๋œ Output activation map์„ ์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

Convolutional Neural Networks ๊ณผ์ •

  1. ๊ฐ€์ค‘ํ•ฉ์„ ๊ตฌํ•˜๊ฒŒ ๋˜๋Š” convolutional operation์„ ๋จผ์ € ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  2. ๊ทธ๋ฆฌ๊ณ  ๋ฐ”๋กœ ReLU Layer ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.
  3. ๊ทธ๋ฆฌ๊ณ , 1,2 ๊ณผ์ •์„ ๋ช‡ 1๋ฒˆ stacking ํ•œ ํ›„์— ์ด ์ด๋ฏธ์ง€๋ฅผ ์ข€ ๋” ์ถ•์•ฝ๋œ ํ˜•ํƒœ๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” max pooling layer ๊ณผ์ •์ด ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  4. 1,2,3์˜ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜๋Š” ํ˜•ํƒœ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

 

 

ํŠน์ • Convolution layer์—์„œ Max pooling์ด ์—ฌ๋Ÿฌ ๋ฒˆ์— ๊ฑธ์ณ์„œ ์Œ“์€ ํ›„,

 

์ž‘์€ ์ด๋ฏธ์ง€ size๋กœ ๋ณ€ํ™˜๋œ activation layer๋ฅผ filter์˜ ๊ฐœ์ˆ˜, ์ฆ‰ channel์˜ ๊ฐœ์ˆ˜ ๋งŒํผ์˜ output activation map์„ ๋งŒ๋“ค์—ˆ์„ ๋•Œ,

 

ํŠน์ • Layer์—์„œ ํ•œ ์ค„์— vector๋กœ ํ”ผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

 ์ด vector๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์„œ Fully connected layer๋ฅผ ๊ตฌ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

๋ชฉ์  ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ output node๋ฅผ ๋งŒ๋“ค๊ฒŒ ๋˜๊ณ , class์— ๋Œ€ํ•œ score๋ฅผ ๊ฐ๊ฐ ๋งŒ๋“ค์–ด ์ค„ ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

score ๊ฐ’์— ๋Œ€ํ•ด์„œ ์ตœ์ข… ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

์ตœ๋Œ€ํ•œ score์˜ ๊ฐ’์ด ์ •๋‹ต๊ฐ’์ด ์ตœ๋Œ€ํ•œ 1, ๋‚˜๋จธ์ง€๋Š” ์ตœ๋Œ€ํ•œ 0์ด ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

 

Hyperparameters

  • Convolution
    • Number of filters
    • Size of filters
  • Pooling
    • Window size
    • Window stride // window ํ•œ ๋ฒˆ ์˜ฎ๊ฒผ์„ ๋•Œ ๋ช‡ Pixel ๋งŒํผ ์›€์ง์ด๋Š๋ƒ.
  • Fully Connected
    • Number of layers
    • Number of neurons

 

Various CNN Architectures

  • AlexNet
  • VGGNet
    • CNN filter๋ฅผ 3 by 3 ์œผ๋กœ๋งŒ ํ•ฉ๋‹ˆ๋‹ค.
  • GoogLeNet
  • ResNet (Residual Network)
    • layer ๋“ค์„ ์—ฌ๋Ÿฌ ๊ฐœ ์Œ“์•„์„œ skip ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

BELATED ARTICLES

more