[Linear Regression] part 2 - 2

2023. 1. 14. 14:33

๐ŸŽฏ Keyword ๐ŸŽฏ

- linear model
- MSE
- model parameter
- score
- parameter optimization

Parameter Optimization

- model parameter๊ฐ€ ๋‹ฌ๋ผ์ง์— ๋”ฐ๋ผ ์ฃผ์–ด์ง„ data์— fittingํ•˜๋Š” ๊ณผ์ •์—์„œ ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

- ์ฃผ์–ด์ง„ ์„ธํƒ€ ๊ฐ’์ด ๋‹ฌ๋ผ์ง์— ๋”ฐ๋ผ ์ปค๋ธŒ๋ฅผ ๊ทธ๋ฆฌ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

- loss function์€ model parameter์— ์˜ํ•œ ํ•จ์ˆ˜๊ฐ€ ๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

θ0, θ1์ด ๋ฐ”๋€Œ๊ฒŒ ๋จ์— ๋”ฐ๋ผ MSE๊ฐ€ ๋‹ฌ๋ผ์ง€๋ฏ€๋กœ Error ๊ณก์„ ์ด ๋ฐ”๋€Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

์šฐ๋ฆฌ์˜ ๋ชฉ์  ?

 

-> cost function์„ ์ตœ์†Œ๋กœ ๋งŒ๋“œ๋Š” θ0, θ1์„ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

=> ๊ทธ๋กœ๋ถ€ํ„ฐ data์— fittingํ•˜๋Š” ์„ ํ˜• ๋ชจ๋ธ์„ ์ฐพ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๋ฐ์ดํ„ฐ ๊ฐ’์— ์˜ํ•ด ํ•™์Šต์ด ์ง„ํ–‰ ๋˜๋ฉด์„œ θ0, θ1์ด ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— fitting์ด ๋˜๋ฉด์„œ ์ฃผ์–ด์ง„ ๋“ฑ๊ณ ์„ ์˜ ๋‚ฎ์€ ๊ฐ’์œผ๋กœ fitting์ด ๋ฉ๋‹ˆ๋‹ค.

Loss function์ด ๊ฐ€์žฅ ๋‚ฎ์€ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋  ๋•Œ, ํ•ด๋‹น parameter θ0 θ1์ด ์ตœ์ ํ™”๋œ ์„ ํ˜• ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด θ0, θ1์„ ๊ตฌํ•˜๋Š” ๊ณผ์ •์„ parameter optimization์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ matrix ํ˜•ํƒœ๋กœ ์ •๋ฆฌํ–ˆ์„ ๋•Œ, ์ž…๋ ฅ vector x๋Š” d-dimensionํ•˜๋Š” vector์ด๋ฏ€๋กœ x1 ~ xd๊นŒ์ง€๋กœ ํ‘œํ˜„ ๋˜์ง€๋งŒ offset์ธ θ0๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ๋ฒกํ„ฐ์˜ ์•ž์€ 1๋กœ ๊ตฌ์„ฑ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

linear regression์—์„œ ์šฐ๋ฆฌ์˜ ์ •๋‹ต์ธ y ์—ญ์‹œ  n-dimension ์„ ์ด๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ ์ด์œ ๋Š” n๊ฐœ์˜ sample ๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ์˜ ์ •๋‹ต์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 parameter vector θ๋Š” θ0 ~ θd๊นŒ์ง€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— d + 1์˜ dimensional vector์ž…๋‹ˆ๋‹ค.

 

์ž…๋ ฅ matrix X์™€ θ๋ฅผ ๊ณฑํ•œ ๊ฒƒ์ด Score์ž…๋‹ˆ๋‹ค.

 

Score๊ณผ Y์˜ ์ฐจ์ด Squared ๊ฐ’์„ ํ†ตํ•ด Loss๋ฅผ ํ†ตํ•œ matrix ํ˜•ํƒœ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋ชจ๋ธ์˜ ์ถœ๋ ฅ๊ณผ Y์˜ ์ฐจ์ด๋ฅผ ์ œ๊ณฑํ•˜๊ณ  ํ‰๊ท ํ•˜์—ฌ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

 

์ตœ์ ํ™”๋œ parameter θ๋Š” cost function์„ ๊ฐ€์žฅ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

์ˆ˜์‹์„ ํ’€๊ธฐ ์œ„ํ•ด์„œ θ์— ๊ด€ํ•œ derivative term์„ ๊ตฌํ•œ ๋’ค, 0์ด ๋˜๋„๋ก ํ•˜๋Š” θ์˜ ๊ฐ’์„ ๊ตฌํ•˜๋ฉด ์ตœ์ ํ™”๋œ parameter ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

-> ์ด ๊ณผ์ •์„ Least Square Problem์ด๋ผ๊ณ  ํ•˜๋ฉฐ ๋ฐฉ์ •์‹์„ Normal Equation์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

์ด ๋•Œ, loss function์€ differentiableํ•˜๊ณ  convexํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ ๋ฐฉ์‹์œผ๋กœ θ๋ฅผ ๊ตฌํ•˜๋ฉด one step์œผ๋กœ ํ•ด๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฌ๋‚˜ ์ด๋ ‡๊ฒŒ Normal Equation์„ ์ด์šฉํ•ด์„œ ํ•ด๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒฝ์šฐ, Data์˜ ์ƒ˜ํ”Œ ์ˆซ์ž๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด ๋น„ํšจ์œจ ์ ์ž„์ด ๋ฐํ˜€์กŒ์Šต๋‹ˆ๋‹ค.

 

N์ด ๋Š˜์–ด๋‚˜๊ฒŒ ๋˜๋ฉด ์ด matrix์˜ dimension์ด ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋˜๊ณ , matrix inverse ๋“ฑ์˜ ์—ฐ์‚ฐ์„ ํ•จ์— ์žˆ์–ด ๊ต‰์žฅํžˆ ํฐ ๋ณต์žก๋„๊ฐ€ ์†Œ์š”๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋˜๋Š”, matrix์˜ inverse๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๋„ ์žˆ์–ด, ์ด๋Ÿฐ ๊ฒฝ์šฐ ํ•ด๋ฅผ ๊ตฌํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

 

 

์ด๋Ÿฌํ•œ ๋ฐฉ์‹์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์šฐ๋ฆฌ๋Š” Gradient Descent ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

'Artificial Intelligence' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Gradient Descent] part 3 - 1  (0) 2023.01.16
[Gradient Descent] part 2 - 3  (0) 2023.01.14
[Linear Regression] part 2 - 1  (0) 2023.01.14
[Foundation of Supervised Learning] part 1 - 2  (0) 2023.01.12
[Foundation of Supervised Learning] part 1 - 1  (0) 2023.01.12

BELATED ARTICLES

more