100日後にプロになるワシ25日目（Python）

2020年9月14日 13:30

今日はロジスティック回帰を使った学習と
評価まで行った

# テストデータと学習データへ分割
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# ロジスティック回帰モデルのimport
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(solver='liblinear')

# 学習の実行
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)　#学習したモデルで予測

y_predが予測結果

# 精度を確認するため混同行列を作成
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true=y_test , y_pred=y_pred)
df_cm = pd.DataFrame(np.rot90(cm, 2), index=["actual_Positive", "actual_Negative"], columns=["predict_Positive", "predict_Negative"])
print(df_cm)

スクリーンショット 2020-09-10 9.42.19

（図のサイズがうまくできなくて草）

とりあえず前回同様ROC曲線を作成

from sklearn.metrics import roc_auc_score, roc_curve
y_pred_prob = lr.predict_proba(X_test)[:, 1]
# AUCスコアの算出
auc_score = roc_auc_score(y_true=y_test, y_score=y_pred_prob)
print(auc_score)
# ROC曲線の要素（偽陽性率、真陽性率、閾値）の算出
fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=y_pred_prob)
# ROC曲線の描画
plt.plot(fpr, tpr, label='roc curve (area = %0.3f)' % auc_score)
plt.plot([0, 1], [0, 1], linestyle=':', label='random')
plt.plot([0, 0, 1], [0, 1, 1], linestyle=':', label='ideal')
plt.legend()
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.show()

この時点でAUC（Area Under the Curve）の値が0.914。
前回と比べてだいぶ高い。

スクリーンショット 2020-09-14 13.25.55

ただし、今回必要な制度は最低でも0.92以上。

前回と同じやり方だけでは0.92には達成しなかったので、
工夫が必要そう。

色々やってみます。

いつもサポートありがとうございます。難しい方は感想をコメントでいただけると嬉しいです。