[Python]都道府県別の医療費と健康保険の保険料率の関係を可視化してみた

あおじる

2022年5月4日 21:19

はじめに

こんにちは、機械学習勉強中のあおじるです。
医療費データを使っていくつか記事を書いてきました。
今回は、都道府県別の医療費が健康保険の都道府県単位保険料率にどう影響しているのか確認してみます。

言語はPython、環境はGoogle Colaboratoryを使用しました。

使用するデータ

医療費のデータ

医療費データとしては、これまでも使用してきた全国健康保険協会（協会けんぽ）の加入者基本情報、医療費基本情報を使います。
以前の記事で年度単位に集計したデータ df_ytsn_P, df_ytsnk_X を用います。

$$
\def\arraystretch{1.5}
\begin{array}{c:c:c:c|c}
\textsf{y} & \textsf{t} & \textsf{s} & \textsf{n} & \textsf{P} \\ \hline
2010 & 1 & 1 & 1 & {} \\
2010 & 1 & 1 & 2 & {} \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
2019 & 47 & 2 & 8 & {}
\end{array}
$$

$$
\def\arraystretch{1.5}
\begin{array}{c:c:c:c:c|c:c:c:c}
\textsf{y} & \textsf{t} & \textsf{s} & \textsf{n} & \textsf{k} & \textsf{Ken} & \textsf{Nit} & \textsf{Ten} & \textsf{Ten2} \\ \hline
2010 & 1 & 1 & 1 & 1 & {} & {} & {} & {} \\
2010 & 1 & 1 & 1 & 2 & {} & {} & {} & {} \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
2019 & 47 & 2 & 8 & 3 & {} & {} & {} & {}
\end{array}
$$

y：年度
2010年度～2019年度（平成22年度～令和元年度）の10年度分
t：都道府県
1：北海道、・・・、47：沖縄の47都道府県
s：性別
1：男性、2：女性
n：年齢階級
1：0～9歳、2：10～19歳、・・・、7：60～69歳、8：70歳以上
k：診療種別
1：医科入院、2：医科入院外、3：歯科
P：人数（加入者数）
X：件数（Ken）、日数（Nit）、点数（Ten）、調剤を含む点数（Ten2）

import pandas as pd

# 医療費データの読み込み
df_ytsn_P = pd.read_csv('./df_ytsn_P.csv')
df_ytsnk_X = pd.read_csv('./df_ytsnk_X.csv')
print(df_ytsn_P.shape, df_ytsnk_X.shape)
# (7520, 5) (22560, 9)

保険料率のデータ

全国健康保険協会（協会けんぽ）の都道府県単位保険料率のデータを使います。
医療費のデータのようにCSVファイルかエクセルファイルで取得したかったのですが、見当たりませんでしたので、統計情報＞保険料率関係の資料の中の「1．都道府県単位保険料率算定用バックデータ」の各年度のPDFファイルから取得します。毎年、都道府県単位保険料率の算定について議論する会議（運営委員会）の資料で、最後のページに算定結果の保険料率が載っています。PDFファイルからデータを効率的に取得する方法が分かりませんでしたので、エクセルファイルに手入力しました（令和3年度～平成24年度）。

$$
\def\arraystretch{1.5}
\begin{array}{c:c|c:c:c:c:c:c:c}
\textsf{y} & \textsf{t} & \textsf{r0} & \textsf{rb1} & \textsf{rb2} & \textsf{r1} & \textsf{r2} & \textsf{r3} & \textsf{r} \\ \hline
2012 & 1 & {} & {} & {} & {} & {} & {} & {} \\
2012 & 2 & {} & {} & {} & {} & {} & {} & {} \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\
2021 & 47 & {} & {} & {} & {} & {} & {} & {} \\
\end{array}
$$

y：年度
2012年度～2021年度（平成24年度～令和3年度）
t：都道府県
48：全国平均、1：北海道、・・・、47：沖縄
r0：医療給付費についての調整前所要保険料率（a）
rb1：調整（b）　年齢調整
rb2：調整（b）　所得調整
r1：医療給付費についての調整後所要保険料率（a+b）
r2：所要保険料率（精算除く）（a+b+4.71）
r3：保険料率（精算含む）（インセンティブ反映前）（c）
r：保険料率（精算・インセンティブ反映後）（d）

このうち、
・r1：医療給付費についての調整後所要保険料率（a+b）
・r：保険料率（精算・インセンティブ反映後）（d）
を用いることにします。

df_ryoritsu = pd.read_excel('./料率.xlsx') \
                .query('t != 48') \
                .reset_index(drop=True)
df_ryoritsudf_ryoritsu = df_ryoritsu.loc[:,['y','t','r1','r']]
print(df_ryoritsu.shape)
# (470, 4)

データの加工

医療費データから、１人当たり医療費の年齢調整を計算します。
まず、年度（y）、都道府県（t）、年齢階級（n）別に集計します。

# 年度（y）、都道府県（t）、年齢階級（n）別に集計
df_ytn_P = df_ytsn_P.loc[:,['y','t','n','P']] \
                    .groupby(['y','t','n'], as_index=False) \
                    .sum()
df_ytn_X = df_ytsnk_X.loc[:,['y','t','n','Ten2']] \
                    .groupby(['y','t','n'], as_index=False) \
                    .sum()
print(df_ytn_P.shape, df_ytn_X.shape)
# (3760, 4) (3760, 4)

年齢階級別に１人当たり医療費を計算します。

# １人当たり医療費の計算
df_ytn_XP = pd.merge(df_ytn_P, df_ytn_X,
                     on=['y','t','n'], how='left')
# df_ytn_XP = df_ytn_XP.assign(XperP = df_ytn_XP['Ten2']*10 / (df_ytn_XP['P']/12))
df_ytn_XP = df_ytn_XP.assign(XperP = df_ytn_XP.Ten2*10 / (df_ytn_XP.P/12))
print(df_ytn_XP.shape)
# (3760, 6)

年齢調整（直接法）の計算を行います。
参考：
・１．年齢調整死亡率について (mhlw.go.jp)
・罹患率・死亡率の計算、年齢調整の方法 (jacr.info)
・年齢調整死亡率：[国立がん研究センター　がん統計] (ganjoho.jp)
・年齢調整罹患率：[国立がん研究センター　がん統計] (ganjoho.jp)
・年齢調整死亡率 - meddic

# 年齢調整の計算
df_ytn_XPW = pd.merge(df_ytn_XP, df_yn_W,
                      on=['y','n'], how='left')
print(df_ytn_XPW.shape)
# (3760, 7)
df_ytn_XPW['adjXperP'] = df_ytn_XPW['XperP'] * df_ytn_XPW['Wn']
print(df_ytn_XPW.shape)
# (3760, 8)
df_yt_adjXperP = df_ytn_XPW.loc[:,['y','t','adjXperP']] \
                           .groupby(['y','t'], as_index=False) \
                           .sum()
print(df_yt_adjXperP.shape)
# (470, 3)

都道府県別に色分けして、年齢調整後１人当たり医療費の推移をグラフにしておきます。

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
# 都道府県t
label_t = le.fit_transform(df_yt_adjXperP['t'])
cp = sns.color_palette("hls", n_colors=47+1)
color_t = [cp[x] for x in label_t]

plt.figure(figsize=(12,12))
plt.title('adjXperP : annual medical expenses per person in each prefecture')
plt.scatter(x=df['y'], y=df['adjXperP'], c=color_t)
for t in range(1, 47+1, 1):
  df_tmp = df.query('t == {}'.format(t)).reset_index()
  plt.plot(df_tmp['y'], df_tmp['adjXperP'], c=color_t[t-1])
  plt.text(x=df_tmp['y'][1-1], y=df_tmp['adjXperP'][1-1],
           s=str(df_tmp['t'][1-1])+'  ',
           ha='right', va='center', c=color_t[t-1], fontsize=12)
  plt.text(x=df_tmp['y'][10-1], y=df_tmp['adjXperP'][10-1],
           s='  '+str(df_tmp['t'][10-1]),
           ha='left', va='center', c=color_t[t-1], fontsize=12)

保険料率の推移もグラフにしておきます。

plt.figure(figsize=(12,12))
plt.title('r : insurance premium rate for each prefecture')
plt.scatter(x=df_ryoritsu['y'], y=df_ryoritsu['r'], c=color_t)
for t in range(1, 47+1, 1):
  df_tmp = df_ryoritsu.query('t == {}'.format(t)).reset_index()
  plt.plot(df_tmp['y'], df_tmp['r'], c=color_t[t-1])
  plt.text(x=df_tmp['y'][1-1], y=df_tmp['r'][1-1],
           s=str(df_tmp['t'][1-1])+'  ',
           ha='right', va='center', c=color_t[t-1], fontsize=12)
  plt.text(x=df_tmp['y'][10-1], y=df_tmp['r'][10-1],
           s='  '+str(df_tmp['t'][10-1]),
           ha='left', va='center', c=color_t[t-1], fontsize=12)

plt.figure(figsize=(12,12))
plt.title('r1 : the first insurance premium rate for each prefecture')
plt.scatter(x=df_ryoritsu['y'], y=df_ryoritsu['r1'], c=color_t)
for t in range(1, 47+1, 1):
  df_tmp = df_ryoritsu.query('t == {}'.format(t)).reset_index()
  plt.plot(df_tmp['y'], df_tmp['r1'], c=color_t[t-1])
  plt.text(x=df_tmp['y'][1-1], y=df_tmp['r1'][1-1],
           s=str(df_tmp['t'][1-1])+'  ',
           ha='right', va='center', c=color_t[t-1], fontsize=12)
  plt.text(x=df_tmp['y'][10-1], y=df_tmp['r1'][10-1],
           s='  '+str(df_tmp['t'][10-1]),
           ha='left', va='center', c=color_t[t-1], fontsize=12)

保険料率と医療費の関係を可視化

保険料率と１人当たり医療費を対応する年度ごとに散布図（scatter plot）に描きます。
年度は、例えば、
・令和3年度（2021年度）の保険料率は令和元年度（2019年度）の医療費に
・平成24年度（2012年度）の保険料率は平成22年度（2010年度）の医療費に
それぞれ対応しています（参照）。

for y in range(2012, 2021+1, 1):
  print('{}年度料率と{}年度１人当たり医療費'.format(y, y-2))
  df_x = df_yt_adjXperP.query('y == {}'.format(y-2)).reset_index(drop=True)
  df_r = df_ryoritsu.query('y == {}'.format(y)).reset_index(drop=True)
  plt.figure(figsize=(12,5))
  plt.subplot(1, 2, 1)
  plt.title('r (y={}) and adjXperP (y={})'.format(y, y-2))
  plt.xlabel('adjXperP (y={})'.format(y-2))
  plt.ylabel('r (y={})'.format(y))
  plt.scatter(x=df_x['adjXperP'], y=df_r['r'], c=color_t[0:47])
  plt.subplot(1, 2, 2)
  plt.title('r (y={}) and adjXperP (y={})'.format(y, y-2))
  plt.xlabel('adjXperP (y={})'.format(y-2))
  plt.ylabel('r (y={})'.format(y))
  xmin, xmax = min(df_x['adjXperP']), max(df_x['adjXperP'])
  ymin, ymax = min(df_r['r']), max(df_r['r'])
  plt.xlim(xmin-(xmax-xmin)/20, xmax+(xmax-xmin)/20)
  plt.ylim(ymin-(ymax-ymin)/20, ymax+(ymax-ymin)/20)
  for t in range(1,47+1,1):
    plt.text(x=df_x['adjXperP'][t-1], y=df_r['r'][t-1], s=t,
            ha='center', va='center', c=color_t[t-1], fontsize=12)
  plt.show()

for y in range(2012, 2021+1, 1):
  print('{}年度料率と{}年度１人当たり医療費'.format(y, y-2))
  df_x = df_yt_adjXperP.query('y == {}'.format(y-2)).reset_index(drop=True)
  df_r = df_ryoritsu.query('y == {}'.format(y)).reset_index(drop=True)
  plt.figure(figsize=(12,5))
  plt.subplot(1, 2, 1)
  plt.title('r1 (y={}) and adjXperP (y={})'.format(y, y-2))
  plt.xlabel('adjXperP (y={})'.format(y-2))
  plt.ylabel('r1 (y={})'.format(y))
  plt.scatter(x=df_x['adjXperP'], y=df_r['r1'], c=color_t[0:47])
  plt.subplot(1, 2, 2)
  plt.title('r1 (y={}) and adjXperP (y={})'.format(y, y-2))
  plt.xlabel('adjXperP (y={})'.format(y-2))
  plt.ylabel('r1 (y={})'.format(y))
  xmin, xmax = min(df_x['adjXperP']), max(df_x['adjXperP'])
  ymin, ymax = min(df_r['r1']), max(df_r['r1'])
  plt.xlim(xmin-(xmax-xmin)/20, xmax+(xmax-xmin)/20)
  plt.ylim(ymin-(ymax-ymin)/20, ymax+(ymax-ymin)/20)
  for t in range(1,47+1,1):
    plt.text(x=df_x['adjXperP'][t-1], y=df_r['r1'][t-1], s=t,
            ha='center', va='center', c=color_t[t-1], fontsize=12)
  plt.show()