728x90
타이타닉 데이터를 활용한 데이터 분석¶
1. 라이브러리 및 데이터¶
In [48]:
import pandas as pd
from sklearn.linear_model import LogisticRegression # 로지스틱 회귀 모델 불러오기
from sklearn.tree import DecisionTreeClassifier #의사결정 나무 모델 불러오기
#사이킷런 - 데이터 학습 모델 라이브러리
In [49]:
train = pd.read_csv('./data/train.csv')
test = pd.read_csv('./data/test.csv')
submission = pd.read_csv('./data/submission.csv')
2. 탐색적 자료 분석 (Exploratory Data Analysis(EDA)¶
PassengerId : 탑승객의 고유 아이디
Survival : 생존여부(0: 사망, 1: 생존)
Pclass : 등실의 등급(1: 1등급, 2: 2등급, 3: 3등급)
Name : 이름
Sex : 성별
Age : 나이
Sibsp : 함께 탑승한 형제자매, 아내 남편의 수
Parch: 함께 탑승한 부모, 자식의 수
Ticket: 티켓번호
Fare: 티켓의 요금
Cabin: 객실번호
Embarked: 배에 탑승한 위치(C = Cherbourg, Q = Queenstown, S = Southampton)
In [50]:
train.head()
Out[50]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
SibSp 같이 탄 형제 자매, 아내 남편의 수
Parch 같이 탄 부모, 자식의 수
In [51]:
test.head()
Out[51]:
PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892 | 3 | Kelly, Mr. James | male | 34.5 | 0 | 0 | 330911 | 7.8292 | NaN | Q |
1 | 893 | 3 | Wilkes, Mrs. James (Ellen Needs) | female | 47.0 | 1 | 0 | 363272 | 7.0000 | NaN | S |
2 | 894 | 2 | Myles, Mr. Thomas Francis | male | 62.0 | 0 | 0 | 240276 | 9.6875 | NaN | Q |
3 | 895 | 3 | Wirz, Mr. Albert | male | 27.0 | 0 | 0 | 315154 | 8.6625 | NaN | S |
4 | 896 | 3 | Hirvonen, Mrs. Alexander (Helga E Lindqvist) | female | 22.0 | 1 | 1 | 3101298 | 12.2875 | NaN | S |
In [52]:
submission.head()
Out[52]:
PassengerId | Survived | |
---|---|---|
0 | 892 | 0 |
1 | 893 | 1 |
2 | 894 | 0 |
3 | 895 | 0 |
4 | 896 | 1 |
In [53]:
print('train shape :', train.shape)
print('test shape :', test.shape)
print('submission shape :', submission.shape)
train shape : (891, 12)
test shape : (418, 11)
submission shape : (418, 2)
pd.DataFrame.info()¶
In [54]:
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
데이터를 그려보기¶
In [55]:
train.groupby('Sex').mean()
Out[55]:
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
Sex | |||||||
female | 431.028662 | 0.742038 | 2.159236 | 27.915709 | 0.694268 | 0.649682 | 44.479818 |
male | 454.147314 | 0.188908 | 2.389948 | 30.726645 | 0.429809 | 0.235702 | 25.523893 |
In [56]:
train.groupby('Embarked').mean()
Out[56]:
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
Embarked | |||||||
C | 445.357143 | 0.553571 | 1.886905 | 30.814769 | 0.386905 | 0.363095 | 59.954144 |
Q | 417.896104 | 0.389610 | 2.909091 | 28.089286 | 0.428571 | 0.168831 | 13.276030 |
S | 449.527950 | 0.336957 | 2.350932 | 29.445397 | 0.571429 | 0.413043 | 27.079812 |
In [103]:
sex_pclass_means = train.groupby(['Sex', 'Pclass']).mean()
In [104]:
sex_pclass_means.Survived.plot(kind = 'bar')
Out[104]:
<AxesSubplot:xlabel='Sex,Pclass'>
In [59]:
train.plot(x = 'Age', y='Fare', kind='scatter')
Out[59]:
<AxesSubplot:xlabel='Age', ylabel='Fare'>
In [60]:
train.sort_values(by='Fare', ascending=False).head(20)
Out[60]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
258 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
737 | 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C |
679 | 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36.0 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C |
88 | 89 | 1 | 1 | Fortune, Miss. Mabel Helen | female | 23.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
27 | 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
341 | 342 | 1 | 1 | Fortune, Miss. Alice Elizabeth | female | 24.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
311 | 312 | 1 | 1 | Ryerson, Miss. Emily Borie | female | 18.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C |
742 | 743 | 1 | 1 | Ryerson, Miss. Susan Parker "Suzette" | female | 21.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C |
118 | 119 | 0 | 1 | Baxter, Mr. Quigg Edmond | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C |
299 | 300 | 1 | 1 | Baxter, Mrs. James (Helene DeLaudeniere Chaput) | female | 50.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C |
557 | 558 | 0 | 1 | Robbins, Mr. Victor | male | NaN | 0 | 0 | PC 17757 | 227.5250 | NaN | C |
700 | 701 | 1 | 1 | Astor, Mrs. John Jacob (Madeleine Talmadge Force) | female | 18.0 | 1 | 0 | PC 17757 | 227.5250 | C62 C64 | C |
380 | 381 | 1 | 1 | Bidois, Miss. Rosalie | female | 42.0 | 0 | 0 | PC 17757 | 227.5250 | NaN | C |
716 | 717 | 1 | 1 | Endres, Miss. Caroline Louise | female | 38.0 | 0 | 0 | PC 17757 | 227.5250 | C45 | C |
527 | 528 | 0 | 1 | Farthing, Mr. John | male | NaN | 0 | 0 | PC 17483 | 221.7792 | C95 | S |
377 | 378 | 0 | 1 | Widener, Mr. Harry Elkins | male | 27.0 | 0 | 2 | 113503 | 211.5000 | C82 | C |
730 | 731 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0 | 0 | 0 | 24160 | 211.3375 | B5 | S |
779 | 780 | 1 | 1 | Robert, Mrs. Edward Scott (Elisabeth Walton Mc... | female | 43.0 | 0 | 1 | 24160 | 211.3375 | B3 | S |
689 | 690 | 1 | 1 | Madill, Miss. Georgette Alexandra | female | 15.0 | 0 | 1 | 24160 | 211.3375 | B5 | S |
In [67]:
train.head(5)
Out[67]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
In [68]:
train.plot(x='Fare', y='Survived', kind='scatter')
Out[68]:
<AxesSubplot:xlabel='Fare', ylabel='Survived'>
결측치 처리하기¶
In [61]:
train.isna().sum()
Out[61]:
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
중앙값으로 nan을 채운다¶
In [62]:
train.Age.median()
Out[62]:
28.0
In [63]:
train['Age'] = train['Age'].fillna(int(train.Age.median()))
In [64]:
train
Out[64]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | 28.0 | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
In [ ]:
In [65]:
train['Embarked'] = train['Embarked'].fillna('S')
모델은 수치화 된 자료들만 다룰 수 있다!¶
- 문자열로 된 자료들을 수치화 시키기
In [70]:
train['Sex'] =train['Sex'].map({'male':0, 'female':1})
In [71]:
train
Out[71]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | 0 | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | 1 | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | 1 | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | 1 | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | 0 | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | 0 | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | 1 | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | 1 | 28.0 | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | 0 | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | 0 | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
x -> MODEL -> Y
변수 선택 및 모델 구축¶
feature Engineering & Initial Modeling¶
In [73]:
X_train = train[['Sex', 'Pclass']]
y_train = train['Survived']
In [74]:
X_train
Out[74]:
Sex | Pclass | |
---|---|---|
0 | 0 | 3 |
1 | 1 | 1 |
2 | 1 | 3 |
3 | 1 | 1 |
4 | 0 | 3 |
... | ... | ... |
886 | 0 | 2 |
887 | 1 | 1 |
888 | 1 | 3 |
889 | 0 | 1 |
890 | 0 | 3 |
891 rows × 2 columns
In [76]:
test['Sex'] = test['Sex'].map({'male':0, 'female':1})
In [77]:
X_test = test[['Sex', 'Pclass']]
X_test
Out[77]:
Sex | Pclass | |
---|---|---|
0 | 0 | 3 |
1 | 1 | 3 |
2 | 0 | 2 |
3 | 0 | 3 |
4 | 1 | 3 |
... | ... | ... |
413 | 0 | 3 |
414 | 1 | 1 |
415 | 0 | 3 |
416 | 0 | 3 |
417 | 0 | 3 |
418 rows × 2 columns
sklearn.linear_model.LogisticRegression()¶
- 로지스틱 회귀 모형
- 0과 1사이의 값을 산출
In [78]:
lr = LogisticRegression()
DecisionTreeClassifier¶
- 의사결정 나무 모델
In [80]:
dt = DecisionTreeClassifier()
In [82]:
# 학습이 되었다.
lr.fit(X_train, y_train)
Out[82]:
LogisticRegression()
In [83]:
dt.fit(X_train, y_train)
Out[83]:
DecisionTreeClassifier()
In [84]:
lr.predict(X_test)
Out[84]:
array([0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0,
0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0],
dtype=int64)
In [85]:
lr.predict_proba(X_test)
Out[85]:
array([[0.89797923, 0.10202077],
[0.40645631, 0.59354369],
[0.77524063, 0.22475937],
[0.89797923, 0.10202077],
[0.40645631, 0.59354369],
[0.89797923, 0.10202077],
[0.40645631, 0.59354369],
[0.77524063, 0.22475937],
[0.40645631, 0.59354369],
[0.89797923, 0.10202077],
[0.89797923, 0.10202077],
[0.57476411, 0.42523589],
...
[0.40645631, 0.59354369],
[0.40645631, 0.59354369],
[0.40645631, 0.59354369],
[0.09515218, 0.90484782],
[0.40645631, 0.59354369],
[0.89797923, 0.10202077],
[0.09515218, 0.90484782],
[0.89797923, 0.10202077],
[0.89797923, 0.10202077],
[0.89797923, 0.10202077]])
In [87]:
lr.predict_proba(X_test).shape
#하나는 사망확률 / 하나는 생존확률
Out[87]:
(418, 2)
In [92]:
lr_pred = lr.predict_proba(X_test)
In [90]:
type(lr_predict)
Out[90]:
numpy.ndarray
TODO. 이게모람¶
In [91]:
lr.predict_proba(X_test)[:, 1] # 생존확률 빼내기
Out[91]:
array([0.10202077, 0.59354369, 0.22475937, 0.10202077, 0.59354369,
0.10202077, 0.59354369, 0.22475937, 0.59354369, 0.10202077,
0.10202077, 0.42523589, 0.90484782, 0.22475937, 0.90484782,
0.78842569, 0.22475937, 0.10202077, 0.59354369, 0.59354369,
0.42523589, 0.10202077, 0.90484782, 0.42523589, 0.90484782,
0.10202077, 0.90484782, 0.10202077, 0.42523589, 0.10202077,
0.22475937, 0.22475937, 0.59354369, 0.59354369, 0.42523589,
0.10202077, 0.59354369, 0.59354369, 0.10202077, 0.10202077,
0.10202077, 0.42523589, 0.10202077, 0.78842569, 0.90484782,
0.10202077, 0.42523589, 0.10202077, 0.90484782, 0.59354369,
0.42523589, 0.22475937, 0.78842569, 0.90484782, 0.22475937,
0.10202077, 0.10202077, 0.10202077, 0.10202077, 0.90484782,
0.10202077, 0.22475937, 0.10202077, 0.59354369, 0.42523589,
0.78842569, 0.59354369, 0.42523589, 0.42523589, 0.90484782,
...
0.78842569, 0.42523589, 0.10202077, 0.59354369, 0.10202077,
0.42523589, 0.22475937, 0.10202077, 0.22475937, 0.10202077,
0.22475937, 0.10202077, 0.10202077, 0.90484782, 0.10202077,
0.59354369, 0.22475937, 0.59354369, 0.22475937, 0.78842569,
0.90484782, 0.22475937, 0.22475937, 0.22475937, 0.59354369,
0.42523589, 0.90484782, 0.10202077, 0.10202077, 0.59354369,
0.10202077, 0.78842569, 0.78842569, 0.10202077, 0.90484782,
0.59354369, 0.10202077, 0.59354369, 0.90484782, 0.22475937,
0.22475937, 0.90484782, 0.42523589, 0.22475937, 0.90484782,
0.90484782, 0.59354369, 0.22475937, 0.42523589, 0.10202077,
0.10202077, 0.10202077, 0.59354369, 0.59354369, 0.22475937,
0.78842569, 0.10202077, 0.22475937, 0.10202077, 0.10202077,
0.42523589, 0.90484782, 0.10202077, 0.22475937, 0.10202077,
0.90484782, 0.10202077, 0.90484782, 0.10202077, 0.10202077,
0.90484782, 0.22475937, 0.90484782, 0.42523589, 0.42523589,
0.22475937, 0.22475937, 0.42523589, 0.59354369, 0.59354369,
0.59354369, 0.90484782, 0.59354369, 0.10202077, 0.90484782,
0.10202077, 0.10202077, 0.10202077])
In [99]:
dt_pred = dt.predict_proba(X_test)[:, 0]
In [100]:
submission['Survived'] = lr_pred
submission
Out[100]:
PassengerId | Survived | |
---|---|---|
0 | 892 | 0.897979 |
1 | 893 | 0.406456 |
2 | 894 | 0.775241 |
3 | 895 | 0.897979 |
4 | 896 | 0.406456 |
... | ... | ... |
413 | 1305 | 0.897979 |
414 | 1306 | 0.095152 |
415 | 1307 | 0.897979 |
416 | 1308 | 0.897979 |
417 | 1309 | 0.897979 |
418 rows × 2 columns
In [101]:
# TODO. 인덱스는 지운다.
# TODO. 담번엔 TODO 대신에 필기할거는 필기로...
submission.to_csv('logistic_regression_pred.csv', index=False)
In [102]:
submission['Survived'] = dt_pred
submission.to_csv('decision_tree_pred.csv', index=False)
In [98]:
dt_pred
Out[98]:
array([0.13544669, 0.5 , 0.15740741, 0.13544669, 0.5 ,
0.13544669, 0.5 , 0.15740741, 0.5 , 0.13544669,
0.13544669, 0.36885246, 0.96808511, 0.15740741, 0.96808511,
0.92105263, 0.15740741, 0.13544669, 0.5 , 0.5 ,
0.36885246, 0.13544669, 0.96808511, 0.36885246, 0.96808511,
0.13544669, 0.96808511, 0.13544669, 0.36885246, 0.13544669,
0.15740741, 0.15740741, 0.5 , 0.5 , 0.36885246,
0.13544669, 0.5 , 0.5 , 0.13544669, 0.13544669,
0.13544669, 0.36885246, 0.13544669, 0.92105263, 0.96808511,
0.13544669, 0.36885246, 0.13544669, 0.96808511, 0.5 ,
0.36885246, 0.15740741, 0.92105263, 0.96808511, 0.15740741,
0.13544669, 0.13544669, 0.13544669, 0.13544669, 0.96808511,
0.13544669, 0.15740741, 0.13544669, 0.5 , 0.36885246,
0.92105263, 0.5 , 0.36885246, 0.36885246, 0.96808511,
0.5 , 0.13544669, 0.5 , 0.36885246, 0.96808511,
0.36885246, 0.13544669, 0.96808511, 0.15740741, 0.5 ,
0.13544669, 0.36885246, 0.36885246, 0.13544669, 0.15740741,
0.13544669, 0.5 , 0.5 , 0.5 , 0.15740741,
0.5 , 0.13544669, 0.96808511, 0.13544669, 0.36885246,
0.13544669, 0.96808511, 0.13544669, 0.5 , 0.13544669,
0.96808511, 0.15740741, 0.13544669, 0.13544669, 0.5 ,
0.13544669, 0.13544669, 0.13544669, 0.13544669, 0.15740741,
0.15740741, 0.5 , 0.96808511, 0.5 , 0.96808511,
0.13544669, 0.13544669, 0.5 , 0.36885246, 0.92105263,
0.92105263, 0.13544669, 0.96808511, 0.13544669, 0.13544669,
0.5 , 0.13544669, 0.5 , 0.15740741, 0.13544669,
0.13544669, 0.36885246, 0.5 , 0.13544669, 0.13544669,
0.13544669, 0.13544669, 0.15740741, 0.5 , 0.13544669,
0.5 , 0.96808511, 0.36885246, 0.15740741, 0.36885246,
0.13544669, 0.36885246, 0.13544669, 0.36885246, 0.15740741,
0.96808511, 0.13544669, 0.13544669, 0.5 , 0.13544669,
0.13544669, 0.96808511, 0.5 , 0.36885246, 0.5 ,
0.5 , 0.13544669, 0.92105263, 0.13544669, 0.15740741,
0.5 , 0.36885246, 0.13544669, 0.96808511, 0.5 ,
0.13544669, 0.13544669, 0.13544669, 0.13544669, 0.13544669,
0.92105263, 0.92105263, 0.36885246, 0.92105263, 0.96808511,
0.15740741, 0.36885246, 0.96808511, 0.13544669, 0.96808511,
0.15740741, 0.92105263, 0.13544669, 0.5 , 0.15740741,
0.15740741, 0.36885246, 0.13544669, 0.15740741, 0.15740741,
0.13544669, 0.36885246, 0.5 , 0.15740741, 0.5 ,
0.5 , 0.13544669, 0.36885246, 0.92105263, 0.15740741,
0.36885246, 0.5 , 0.15740741, 0.96808511, 0.13544669,
0.13544669, 0.13544669, 0.15740741, 0.92105263, 0.5 ,
0.36885246, 0.5 , 0.36885246, 0.96808511, 0.13544669,
0.92105263, 0.13544669, 0.92105263, 0.13544669, 0.96808511,
0.5 , 0.13544669, 0.5 , 0.13544669, 0.15740741,
0.15740741, 0.96808511, 0.13544669, 0.13544669, 0.36885246,
0.13544669, 0.36885246, 0.13544669, 0.92105263, 0.96808511,
0.96808511, 0.92105263, 0.36885246, 0.13544669, 0.13544669,
0.36885246, 0.92105263, 0.15740741, 0.92105263, 0.5 ,
0.92105263, 0.13544669, 0.36885246, 0.13544669, 0.13544669,
0.13544669, 0.13544669, 0.13544669, 0.92105263, 0.13544669,
0.13544669, 0.13544669, 0.92105263, 0.5 , 0.15740741,
0.13544669, 0.36885246, 0.13544669, 0.5 , 0.13544669,
0.36885246, 0.13544669, 0.96808511, 0.5 , 0.13544669,
0.92105263, 0.15740741, 0.15740741, 0.15740741, 0.15740741,
0.5 , 0.13544669, 0.5 , 0.5 , 0.5 ,
0.13544669, 0.13544669, 0.36885246, 0.13544669, 0.13544669,
0.36885246, 0.5 , 0.13544669, 0.36885246, 0.13544669,
0.13544669, 0.92105263, 0.13544669, 0.36885246, 0.13544669,
0.13544669, 0.15740741, 0.15740741, 0.13544669, 0.5 ,
0.96808511, 0.36885246, 0.13544669, 0.36885246, 0.5 ,
0.13544669, 0.13544669, 0.13544669, 0.5 , 0.96808511,
0.5 , 0.36885246, 0.15740741, 0.13544669, 0.15740741,
0.13544669, 0.13544669, 0.15740741, 0.36885246, 0.96808511,
0.13544669, 0.92105263, 0.36885246, 0.15740741, 0.15740741,
0.92105263, 0.36885246, 0.13544669, 0.5 , 0.13544669,
0.36885246, 0.15740741, 0.13544669, 0.15740741, 0.13544669,
0.15740741, 0.13544669, 0.13544669, 0.96808511, 0.13544669,
0.5 , 0.15740741, 0.5 , 0.15740741, 0.92105263,
0.96808511, 0.15740741, 0.15740741, 0.15740741, 0.5 ,
0.36885246, 0.96808511, 0.13544669, 0.13544669, 0.5 ,
0.13544669, 0.92105263, 0.92105263, 0.13544669, 0.96808511,
0.5 , 0.13544669, 0.5 , 0.96808511, 0.15740741,
0.15740741, 0.96808511, 0.36885246, 0.15740741, 0.96808511,
0.96808511, 0.5 , 0.15740741, 0.36885246, 0.13544669,
0.13544669, 0.13544669, 0.5 , 0.5 , 0.15740741,
0.92105263, 0.13544669, 0.15740741, 0.13544669, 0.13544669,
0.36885246, 0.96808511, 0.13544669, 0.15740741, 0.13544669,
0.96808511, 0.13544669, 0.96808511, 0.13544669, 0.13544669,
0.96808511, 0.15740741, 0.96808511, 0.36885246, 0.36885246,
0.15740741, 0.15740741, 0.36885246, 0.5 , 0.5 ,
0.5 , 0.96808511, 0.5 , 0.13544669, 0.96808511,
0.13544669, 0.13544669, 0.13544669])
In [ ]:
728x90
'정리' 카테고리의 다른 글
[데이터베이스 강의 정리] 3. 아키텍쳐 (0) | 2021.08.05 |
---|---|
[데이터베이스 강의 정리] 2. 관계형 데이터 베이스 (0) | 2021.08.05 |
[데이터베이스 강의 정리] 1. 데이터베이스란 (0) | 2021.08.05 |
데이터 분석을 위한 5가지 절차 (0) | 2021.08.02 |
210802_jupyter-notebook_googlePlayStore crawling (0) | 2021.08.02 |