[Deep Learning] Generating regression model for California housing dataset with Keras functional API

2023. 4. 5. 16:05
๐Ÿง‘๐Ÿป‍๐Ÿ’ป์šฉ์–ด ์ •๋ฆฌ

Neural Networks
Keras
Layer
California housing dataset

 

Generating regression model for California housing dataset with keras functional API.

 

 

 

 

์œ„์™€ ๊ฐ™์ด ์—ฌ๋Ÿฌ ํ•„์š” ๋ชจ๋“ˆ๋“ค์„ import ํ•ฉ๋‹ˆ๋‹ค.

matplotlib.pyplot, tensorflow, keras, fetch_california_housing

 

๊ทธ๋ฆฌ๊ณ , housing ๋ณ€์ˆ˜์— dataset์„ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋’ค, print ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ, ์ธ์ž๋กœ, housing, housings.keys(), housing.feature_names๋ฅผ ๋„ฃ์–ด ๊ฐ๊ฐ dataset์˜ ๋‚ด์šฉ, dataset์˜ ํ‚ค, dataset์˜ ํŠน์„ฑ ์ด๋ฆ„์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

์œ„ ๋‚ด์šฉ์œผ๋กœ, x_data์—๋Š” housing.,data๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ dataset์˜ ํŠน์„ฑ ๊ฐ’์„ ํ• ๋‹นํ•˜๊ณ ,

y_data์—๋Š” housing.target์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์˜ target ๊ฐ’์„ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๊ฐ๊ฐ์˜ shape์„ ์œ„์™€ ๊ฐ™์ด ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

 

 

์บ˜๋ฆฌํฌ๋‹ˆ์•„ ์ฃผํƒ ๊ฐ€๊ฒฉ dataset์€ 20640๊ฐœ์˜ ์ƒ˜ํ”Œ๊ณผ 8๊ฐœ์˜ ํŠน์„ฑ์œผ๋กœ ๊ตฌ์„ฑ๋œ dataset์ž…๋‹ˆ๋‹ค.

์ด dataset์€ ์ฃผํƒ์˜ ์ง€๋ฆฌ์  ์œ„์น˜, ์ธ๊ตฌ ํŠน์„ฑ, ์†Œ๋“ ๋“ฑ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํŠน์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฃผํƒ์˜ ์ค‘๊ฐ„ ๊ฐ€๊ฒฉ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ๊ฒƒ์œผ๋กœ, ๊ฐ ์ƒ˜ํ”Œ์€ ์บ˜๋ฆฌํฌ๋‹ˆ์•„์˜ ์–ด๋–ค ์ง€์—ญ์—์„œ ์ธก์ •๋œ data์ด๋ฉฐ, ์ฃผํƒ์˜ ์ค‘๊ฐ„ ๊ฐ€๊ฒฉ์€ dataset์˜ target ๊ฐ’์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ ‡๋‹ค๋ฉด, x_data.shape์€ dataset์ด 20640๊ฐœ์˜ ์ƒ˜ํ”Œ๊ณผ 8๊ฐœ์˜ ํŠน์„ฑ์œผ๋กœ ์ด๋ฃจ์–ด์กŒ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋Š” 2์ฐจ์› ๋ฐฐ์—ด์ด๋ฉฐ, y_data.shape์€ 20640๊ฐœ์˜ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ target ๊ฐ’ ํ•˜๋‚˜์”ฉ์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋ฉฐ, 1์ฐจ์› ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค.

 

 

์œ„ ๋‚ด์šฉ์œผ๋กœ ๋ณด๋ฉด, model_selection ๋ชจ๋“ˆ๋กœ๋ถ€ํ„ฐ ๋ถˆ๋Ÿฌ์˜จ train_test_split ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด dataset์„ ํ•™์Šต dataset๊ณผ ํ…Œ์ŠคํŠธ dataset์œผ๋กœ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค. x_data๋Š” ์ž…๋ ฅ ํŠน์„ฑ๊ฐ’, y_data๋Š” ์ถœ๋ ฅ target ๊ฐ’์ž…๋‹ˆ๋‹ค.

 

๋ถ„๋ฆฌ๋œ ํ•™์Šต dataset๊ณผ ํ…Œ์ŠคํŠธ dataset์„ ๊ฐ๊ฐ x_train, x_test, y_train, y_test ๋ณ€์ˆ˜์— ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  print ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด์„œ x_train, x_test, y_train, y_test ์˜ ๊ฐ’๋“ค์˜ ํ˜•ํƒœ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

train_test_split ํ•จ์ˆ˜๋Š” dataset์„ ํ•™์Šต dataset๊ณผ ํ…Œ์ŠคํŠธ dataset์œผ๋กœ ๋ฌด์ž‘์œ„ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค. test_size ์ธ์ž๋ฅผ ํ†ตํ•ด์„œ dataset์˜ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์œ„ ์ฝ”๋“œ์—์„œ 0.2๋กœ ์„ค์ •๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ, ์ „์ฒด dataset์—์„œ 20%๋ฅผ ํ…Œ์ŠคํŠธ dataset์œผ๋กœ ํ™œ์šฉํ•˜๊ณ , ๋‚˜๋จธ์ง€ 80%๋ฅผ ํ•™์Šต dataset์œผ๋กœ ํ™œ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

์—ฌ๊ธฐ์„œ๋Š”, sklearn library์˜ StandardScaler class๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œ์ค€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

 

1. StandardScaler() ๋กœ scaler ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

 

2. sclaer์˜ fit_transform() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ x_train ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œ์ค€ํ™” ์ž‘์—…์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.

fit_transform()์˜ ๋‚ด๋ถ€ ๊ณผ์ •์€, ๋‘ ๋‹จ๊ณ„๋กœ ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋‚ด๋ถ€์ ์œผ๋กœ ์ €์žฅํ•œ ๋’ค, ์ €์žฅ๋œ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œ์ค€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

fit_transform() ๋ฉ”์†Œ๋“œ๋Š” ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

 

3. x_test๋„ ๋˜‘๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ํ‘œ์ค€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ ,

 

x_train_scaled์™€ x_test_scaled ๊ฐ๊ฐ  ์ฒซ ๋ฒˆ์งธ์™€ ๋‘ ๋ฒˆ์งธ feature๋งŒ ์„ ํƒํ•˜์—ฌ x_train_selected์™€ x_test_selected๋ฅผ ๋งŒ๋“œ๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

[:,0:2]์—์„œ ์ฒซ ๋ฒˆ์งธ ์ถ•์—์„  ๋ชจ๋‘, ๋‘ ๋ฒˆ์งธ ์ถ•์—์„  0๊ณผ 1์˜ ์ธ๋ฑ์Šค๋งŒ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ๊ฒฐ๊ณผ๋„ (# of n_samples,2)๋กœ ๋‚˜์˜ค๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ , test data์— ๋Œ€ํ•ด์„œ๋Š” fit_transform์ด ์•„๋‹Œ, transform()์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๊ฒŒ ํ•ด์•ผ,

 

 

์œ„ ์ฝ”๋“œ๋Š” sklearn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ MLPRegressor ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ MLP(๋‹ค์ธต ํผ์…‰ํŠธ๋ก ) ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

 

1. MLPRegressor ํด๋ž˜์Šค๋กœ regr ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, hidden_layer_sizes=(4,)๋Š” ์€๋‹‰์ธต์˜ ๋…ธ๋“œ ์ˆ˜๋ฅผ 4๋กœ ์„ค์ •ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. activation='relu'๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ReLU ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

 

2. fit() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ x_train_selected์™€ y_train ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

 

3. predict() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ x_test_selected ๋ฐ์ดํ„ฐ๋กœ ์˜ˆ์ธก๊ฐ’ y_test_hat์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

 

MLPRegressor ํด๋ž˜์Šค๋Š” ๋‹ค์ธต ํผ์…‰ํŠธ๋ก  ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ธฐ๋ณธ์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์ œ๊ณตํ•˜๋ฉฐ, hidden_layer_sizes๋Š” ์€๋‹‰์ธต์˜ ๋…ธ๋“œ ์ˆ˜๋ฅผ ์ง€์ •ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.

 

activation์€ ์€๋‹‰์ธต์—์„œ ์‚ฌ์šฉํ•  activation function์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

 

fit() ๋ฉ”์†Œ๋“œ๋Š” ์ง€์ •๋œ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋ฉฐ, predict() ๋ฉ”์†Œ๋“œ๋Š” ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ  ๋ฐ‘ print๋ฌธ์„ ์ด์šฉํ•˜์—ฌ, regr ๊ฐ์ฒด์˜ ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•˜๊ณ , ํ•™์Šต ๊ณผ์ •์—์„œ ๋‚˜ํƒ€๋‚œ ์†์‹ค๊ฐ’์„ ํ™•์ธํ•˜๋ฉฐ, ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

 

1. print(regr)์€ ํ•™์Šต๋œ ๋ชจ๋ธ์ธ regr์˜ ๊ตฌ์„ฑ ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ถœ๋ ฅ ๋‚ด์šฉ์—๋Š” ์ž…๋ ฅ์ธต, ์€๋‹‰์ธต, ์ถœ๋ ฅ์ธต์˜ ๋…ธ๋“œ ์ˆ˜์™€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ๋“ฑ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

2. print(regr.loss_curve_)๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ๋‚˜ํƒ€๋‚œ ์†์‹ค๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์†์‹ค๊ฐ’์€ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ์ถœ๋ ฅ๊ฐ’๊ณผ ์‹ค์ œ ์ถœ๋ ฅ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ด ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ชจ๋ธ ํ•™์Šต์˜ ๋ชฉ์ ์ž…๋‹ˆ๋‹ค. loss_curve_๋Š” ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •์—์„œ ๊ณ„์‚ฐ๋œ ์†์‹ค๊ฐ’๋“ค์„ ์‹œ๊ฐ„ ์ˆœ์„œ๋Œ€๋กœ ์ €์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

3. print(regr.score(x_test_selected, y_test))๋Š” x_test_selected, y_test ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. score() ๋ฉ”์†Œ๋“œ๋Š” ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ์ถœ๋ ฅ๊ฐ’๊ณผ ์‹ค์ œ ์ถœ๋ ฅ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜ํ™˜๊ฐ’์€ 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ, 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋ชจ๋ธ์˜ ์˜ˆ์ธก์ด ์ •ํ™•ํ•˜๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

 

์œ„๋Š” y_test์™€ y_test_hat์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ์‹œ๊ฐํ™”ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

 

 

plt.hist() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๊ทธ๋ ค์ค๋‹ˆ๋‹ค.

์ฒซ ๋ฒˆ์งธ ์ธ์ž๋กœ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜๊ณ , ๋‘ ๋ฒˆ์งธ ์ธ์ž๋กœ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ช‡ ๊ฐœ์˜ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆŒ ๊ฒƒ์ธ์ง€ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. color ์ธ์ž๋กœ๋Š” ์ƒ‰์ƒ์„, alpha ์ธ์ž๋กœ๋Š” ํˆฌ๋ช…๋„๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. plt.title() ํ•จ์ˆ˜๋Š” ๊ทธ๋ž˜ํ”„์˜ ์ œ๋ชฉ์„ ์ง€์ •ํ•˜๊ณ , plt.legend() ํ•จ์ˆ˜๋Š” ๊ทธ๋ž˜ํ”„์— ๋ฒ”๋ก€๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ ‡๊ฒŒ ์‹œ๊ฐํ™”๋œ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ํ†ตํ•ด, ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ y_test_hat ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์ œ y_test ๋ฐ์ดํ„ฐ์™€ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€, ๊ทธ ๋ถ„ํฌ๊ฐ€ ์–ด๋–ค์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ , ์œ„์™€ ๊ฐ™์ด hidden_layer_sizes๊ฐ€ (4,)์ธ MLPRegressor ๋ชจ๋ธ์„ ๋‹ค์‹œ ๋งŒ๋“ค์–ด, ์ „์ฒด feature์— ๋Œ€ํ•ด ํ•™์Šต์‹œํ‚ค๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

 

์ด์ „์— ์„ ํƒํ•œ 2๊ฐœ์˜ feature ๋Œ€์‹  ์ „์ฒด feature๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ model์„ ํ•™์Šต์‹œํ‚จ ๊ฒฐ๊ณผ, ์˜ˆ์ธก๊ฐ’์˜ ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์•„์กŒ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋” ๋งŽ์€ feature๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋” ๋ณต์žกํ•œ ํŒจํ„ด์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

print(regr2)๋Š” ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ๊ณผ hyperparameter๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

print(regr2.loss_curve_)๋Š” model ํ•™์Šต ์ค‘ ๊ธฐ๋ก๋œ loss ๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ , print(regr2.score(x_test_scaled, y_test))๋Š” ๋ชจ๋ธ์˜ test set์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ •ํ™•ํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ  histogram์œผ๋กœ ์œ„ ๊ฒฐ๊ณผ๋ณด๋‹ค ๋” ๋งŽ์€ feature๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๊ฐ€ ๋” ์ž˜ ๋‚˜์˜จ ๊ฒƒ์„ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

์œ„ ์ฝ”๋“œ๋Š” LinearRegression ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด feature์— ๋Œ€ํ•ด ํ•™์Šต์‹œํ‚ค๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

 

print(regr3)๋Š” ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ๊ณผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

print(regr3.score(x_test_scaled, y_test))๋Š” ๋ชจ๋ธ์˜ ํ…Œ์ŠคํŠธ์…‹์— ๋Œ€ํ•œ R^2 (๊ฒฐ์ •๊ณ„์ˆ˜) ๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

Linear Regression ๋ชจ๋ธ์€ MLPRegressor์™€ ๋‹ค๋ฅด๊ฒŒ ์„ ํ˜• ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ ๊ตฌ์กฐ๋‚˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ MLPRegressor์™€๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

๋˜ํ•œ, MLPRegressor๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์ง€๋งŒ, feature๊ฐ€ ์ ์€ ๊ฒฝ์šฐ์—๋Š” Linear Regression ๋ชจ๋ธ์ด ๋” ๊ฐ„๋‹จํ•˜๊ณ  ํšจ๊ณผ์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

์œ„ ์ฝ”๋“œ๋Š” LinearRegression model๋กœ ํ•™์Šต์‹œํ‚จ ํ›„ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

plt.hist(y_test, 30, color=’red’, alpha=0.3)๋Š” test set์˜ ์‹ค์ œ target y_test๋ฅผ 30๊ฐœ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆ ์„œ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

plt.hist(y_test_hat, 30, color=’b;ue’, alpap = 0.3)์€ LinearRegresssion ๋ชจ๋ธ๋กœ ์˜ˆ์ธกํ•œ target y_test_hat์„ 30๊ฐœ์˜ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆ„์–ด์„œ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

์ œ๋ชฉ๊ณผ ๋ฒ”๋ก€๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋งˆ๋ฌด๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

์œ„ ์ฝ”๋“œ์—์„œ input_shape๋Š” x_train_scaled์˜ shape์—์„œ feature์˜ ์ˆ˜๋งŒ ์ถ”์ถœํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฆ‰, x_train_scaled๋Š” 2์ฐจ์› ๋ฐฐ์—ด๋กœ, ํ–‰์€ ์ƒ˜ํ”Œ์˜ ์ˆ˜, ์—ด์€ feature์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.

 

x_train_scaled.shape[1]์€ feature์˜ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉฐ, input_shape๋Š” (# of n_features,)์˜ ํŠœํ”Œ ํ˜•ํƒœ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.

์ด ํŠœํ”Œ์€ ๋ชจ๋ธ์˜ ์ฒซ ๋ฒˆ์งธ ์ธต์ธ layers.Dense์˜ input_shape ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ชจ๋ธ์˜ ์ฒซ ๋ฒˆ์งธ ์ธต์— input_shape์„ ์ง€์ •ํ•ด์ค˜์•ผ ๋ชจ๋ธ์ด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๊ทธ ๋‹ค์Œ ๋‚ด์šฉ์€, Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํšŒ๊ท€ ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

์•„๋ž˜๋Š” ๊ฐ ์ค„์˜ ์—ญํ• ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

  • models.Sequential() : ์ธต(layer)์„ ์ˆœ์„œ๋Œ€๋กœ ์Œ“์•„ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • add() : ์ˆœ์ฐจ์ ์œผ๋กœ ์ธต์„ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • layers.Dense(units=4, activation='relu',input_shape = input_shape) : fully connected layer๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. units๋Š” ์ด ์ธต์˜ ์ถœ๋ ฅ ๋‰ด๋Ÿฐ ์ˆ˜, activation์€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ReLU๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , input_shape์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ shape์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ธต์—์„œ๋งŒ input_shape๋ฅผ ์ง€์ •ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • layers.Dense(units=2, activation='relu') : fully connected layer๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋‰ด๋Ÿฐ ์ˆ˜๋ฅผ 2๊ฐœ๋กœ ์„ค์ •ํ•˜๊ณ , ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ReLU๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • layers.Dense(units=1, activation='linear') : fully connected layer๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋‰ด๋Ÿฐ ์ˆ˜๋ฅผ 1๊ฐœ๋กœ ์„ค์ •ํ•˜๊ณ , ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • compile() : ๋ชจ๋ธ์„ ์ปดํŒŒ์ผํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ loss ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ์˜ˆ์ œ์—์„œ๋Š” ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(MSE)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • summary() : ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ์„ ์š”์•ฝํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

์œ„ ๊ณผ์ •์„ ์ •๋ฆฌํ•ด๋ด…์‹œ๋‹ค.

 

Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ธต ํผ์…‰ํŠธ๋ก (MLP)์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

 

regr_model์ด๋ผ๋Š” ์ด๋ฆ„์˜ ๋ชจ๋ธ์„ ์ •์˜ํ•˜๊ณ , ์ด ๋ชจ๋ธ์€ 3๊ฐœ์˜ Dense layer๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ layer๋Š” Dense ํด๋ž˜์Šค๋กœ ์ •์˜ํ•˜๋ฉฐ, units๋Š” ํ•ด๋‹น layer์—์„œ ์‚ฌ์šฉํ•  ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

activation์€ ํ•ด๋‹น layer์—์„œ ์‚ฌ์šฉํ•  ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. input_shape๋Š” ์ฒซ๋ฒˆ์งธ layer์—์„œ๋งŒ ์ •์˜ํ•˜๋ฉฐ, ์ž…๋ ฅ๊ฐ’์˜ ์ฐจ์›์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์ด ์ •์˜๋˜์—ˆ์œผ๋ฉด compile() ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ํ•™์Šต์„ ์œ„ํ•œ ํ™˜๊ฒฝ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ์—๋Š” ์†์‹คํ•จ์ˆ˜๋กœ 'mean_squared_error'๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

 

๋งˆ์ง€๋ง‰์œผ๋กœ fit() ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. x_train_scaled, y_train ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋ฉฐ, validation_data ์ธ์ž๋ฅผ ํ†ตํ•ด ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. epochs๋Š” ํ•™์Šตํ•  ์—ํญ ํšŸ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

 

ํ•™์Šต์ด ๋๋‚˜๋ฉด ์†์‹คํ•จ์ˆ˜์˜ ๊ฐ’์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด ์ฝ”๋“œ ๋ชฉ์ ์€ MLP ๋ชจ๋ธ์˜ ํ•™์Šต ๊ณผ์ •์—์„œ train ๋ฐ์ดํ„ฐ์…‹๊ณผ test ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(Mean Squared Error, MSE) ๊ฐ’์„ ํ™•์ธ์ž…๋‹ˆ๋‹ค.

 

history = regr_model.fit(x_train_scaled, y_train, validation_data=[x_test_scaled, y_test], epochs=100)๋Š” MLP ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

 

์ด๋•Œ, train ๋ฐ์ดํ„ฐ์…‹(x_train_scaled, y_train)์„ ์ด์šฉํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ณ , x_test_scaled์™€ y_test๋ฅผ validation ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ด์šฉํ•ด ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. epochs๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” epoch์˜ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

 

plt.plot(history.history['loss'])๋Š” train ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ MSE ๊ฐ’์˜ ๋ณ€ํ™”๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

plt.plot(history.history['val_loss'])๋Š” validation ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ MSE ๊ฐ’์˜ ๋ณ€ํ™”๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜ํ”„๋ฅผ ํ†ตํ•ด MSE ๊ฐ’์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„์˜ y์ถ•์€ MSE ๊ฐ’์ด๋ฉฐ, x์ถ•์€ epoch ์ˆ˜์ž…๋‹ˆ๋‹ค. plt.title, plt.ylabel, plt.xlabel, plt.legend๋Š” ๊ฐ๊ฐ ๊ทธ๋ž˜ํ”„์˜ ์ œ๋ชฉ, y์ถ• ๋ผ๋ฒจ, x์ถ• ๋ผ๋ฒจ, ๋ฒ”๋ก€๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. plt.show()๋Š” ๊ทธ๋ž˜ํ”„๋ฅผ ํ™”๋ฉด์— ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

BELATED ARTICLES

more