pythonソースコードを開示せず、AIモデルを提供する方法

ai_for_everyone

2022年1月15日 19:48

１、背景

AIの実用化が進んでいる中で、ある問題が浮き彫りになっています。

それはAIモデルを提供する際にpythonソースコードを渡す必要があるとということです。

深層学習や勾配ブースティングなど機械学習モデルの精度は高いですが、ソースコードは通常pythonで書かれているため、AIモデルをユーザに提供する際にpythonのソースコードを提出する必要があります。

pythonソースコードを恣意的に解析されたり流用されたりするのを防ぐために、相手と「紳士協定」を結ぶのは一般的ですが、本当に守られているかは保証されていません。

結局、目をつぶるしかありません。

本日は、pythonソースコードを提供せず、AIモデルを提供する方法を紹介していきたいと思います。

■備考：
pythonファイルをコンパイルして、バイトコードの.pycファイルを提供することができますが、.pycファイルは簡単に逆コンパイルができますので、意味ないです。
https://docs.python.org/ja/3/library/dis.html

２、サンプルのソースコードを用意する

２－１、動作概要

MNISTのデータを使い、pytorchで深層学習モデルを作成しました。

■MNISTとは：
　　０～９の数字画像のデータセットである。
　　http://yann.lecun.com/exdb/mnist/

■pytorchとは
　　Pythonのオープンソースの機械学習ライブラリである。
　　ほとんどの研究者が使っている。（Googleの研究者を除く）

２－２、環境構築

仮想環境env_testを作成し、この中で動作検証を行いました。

仮想環境構築と起動（Windows、anaconda利用）

conda create -n env_test python=3.6
conda activate env_test

仮想環境構築と起動（Linux、venv利用）

python3 -m venv /home/dev/venv/env_test
source /home/dev/venv/env_test/bin/activate

２－３、ソースコード

フォルダー構成

└─src
    │  main.py
    │  main_lib.py
    │  requirements.txt
    │
    ├─data
    │      load_data.py
    │
    └─model
            MyModel.py

requirements.txtの中身（必要なライブラリの定義）

torch
torchvision
scikit-learn
numpy

２－４、必要なライブラリのインストール

下記のコマンドでライブラリをインストールします。
　pip install -r requirements.txt

実行例：
　※途中でpillowのビルドエラーは出たが、最後にインストール成功のメッセージが出たので、無視します。

(env_test) dev@MSI:~/src$ pip install -r requirements.txt
Collecting torch (from -r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9a/f5/b76d021f06e50f770d3f6c1a1b50b62a69e587b1f0db7248269c4be21206/torch-1.10.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting torchvision (from -r requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/38/03/c963ecdf98fae15286437ae533750e2c39e988b7d8c86fad4dbc73a3a146/torchvision-0.11.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting scikit-learn (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/f5/ef/bcd79e8d59250d6e8478eb1290dc6e05be42b3be8a86e3954146adbc171a/scikit_learn-0.24.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting numpy (from -r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting dataclasses; python_version < "3.7" (from torch->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl
Collecting typing-extensions (from torch->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/05/e4/baf0031e39cf545f0c9edd5b1a2ea12609b7fcba2d58e118b11753d68cf0/typing_extensions-4.0.1-py3-none-any.whl
Collecting pillow!=8.3.0,>=5.3.0 (from torchvision->-r requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/7d/2a/2fc11b54e2742db06297f7fa7f420a0e3069fdcf0e4b57dfec33f0b08622/Pillow-8.4.0.tar.gz
Collecting joblib>=0.11 (from scikit-learn->-r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/3e/d5/0163eb0cfa0b673aa4fe1cd3ea9d8a81ea0f32e50807b0c295871e4aab2e/joblib-1.1.0-py2.py3-none-any.whl
Collecting scipy>=0.19.1 (from scikit-learn->-r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/c8/89/63171228d5ced148f5ced50305c89e8576ffc695a90b58fe5bb602b910c2/scipy-1.5.4-cp36-cp36m-manylinux1_x86_64.whl
Collecting threadpoolctl>=2.0.0 (from scikit-learn->-r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/ff/fe/8aaca2a0db7fd80f0b2cf8a16a034d3eea8102d58ff9331d2aaf1f06766a/threadpoolctl-3.0.0-py3-none-any.whl
Building wheels for collected packages: pillow
  Running setup.py bdist_wheel for pillow ... error
  Complete output from command /home/dev/venv/env_test/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-94bsihh4/pillow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpcfvl8hsqpip-wheel- --python-tag cp36:
  /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
    warnings.warn(msg)
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help

  error: invalid command 'bdist_wheel'

  ----------------------------------------
  Failed building wheel for pillow
  Running setup.py clean for pillow
Failed to build pillow
Installing collected packages: dataclasses, typing-extensions, torch, numpy, pillow, torchvision, joblib, scipy, threadpoolctl, scikit-learn
  Running setup.py install for pillow ... done
Successfully installed dataclasses-0.8 joblib-1.1.0 numpy-1.19.5 pillow-8.4.0 scikit-learn-0.24.2 scipy-1.5.4 threadpoolctl-3.0.0 torch-1.10.1 torchvision-0.11.2 typing-extensions-4.0.1
(env_test) dev@MSI:~/src$

２－５、実行結果

ソースコードの実行結果（Linux）
　※Windowsでも同様な結果が出ます。

(env_test) dev@MSI:~/src$ python main.py
len(images)=1797
x_train.shape=torch.Size([1078, 64]) y_train.shape=1078
x_valid.shape=torch.Size([359, 64]) y_valid.shape=359
x_test.shape=torch.Size([360, 64]) y_test.shape=360
Using cpu device
/home/dev/src/model/MyModel.py:23: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  x = f.softmax(x)
00000回目 訓練データのLoss=2.308062 検証データのLoss=2.296052 early_stop_countdown=10
00100回目 訓練データのLoss=1.800464 検証データのLoss=1.641524 early_stop_countdown=10
00200回目 訓練データのLoss=1.684834 検証データのLoss=1.522615 early_stop_countdown=10
00300回目 訓練データのLoss=1.648941 検証データのLoss=1.511034 early_stop_countdown=10
00400回目 訓練データのLoss=1.639657 検証データのLoss=1.505251 early_stop_countdown=10
00500回目 訓練データのLoss=1.648507 検証データのLoss=1.527124 early_stop_countdown=10
00600回目 訓練データのLoss=1.637450 検証データのLoss=1.491798 early_stop_countdown=9
00700回目 訓練データのLoss=1.636279 検証データのLoss=1.496764 early_stop_countdown=10
00800回目 訓練データのLoss=1.650890 検証データのLoss=1.497742 early_stop_countdown=9
00900回目 訓練データのLoss=1.695880 検証データのLoss=1.551189 early_stop_countdown=8
01000回目 訓練データのLoss=1.667899 検証データのLoss=1.524146 early_stop_countdown=7
01100回目 訓練データのLoss=1.664420 検証データのLoss=1.505921 early_stop_countdown=6
01200回目 訓練データのLoss=1.686359 検証データのLoss=1.509611 early_stop_countdown=5
01300回目 訓練データのLoss=1.692531 検証データのLoss=1.565203 early_stop_countdown=4
01400回目 訓練データのLoss=1.692968 検証データのLoss=1.535201 early_stop_countdown=3
01500回目 訓練データのLoss=1.683306 検証データのLoss=1.519024 early_stop_countdown=2
01600回目 訓練データのLoss=1.701402 検証データのLoss=1.553685 early_stop_countdown=1
01700回目 訓練データのLoss=1.698684 検証データのLoss=1.561301 early_stop_countdown=0
早期終了
テストデータのLoss=1.575245
f1=0.8902502587989503
(env_test) dev@MSI:~/src$

過学習を緩和するために、毎回学習に使われるデータは全訓練データからランダムで選ぶため、最終精度は上下します。

main.pyはメインファイルです。
このファイルはmain_lib.pyのwork()関数を呼び出しているだけです。
なぜこのような形にしているかは次の章で詳しく解説します。

import main_lib
if __name__ == '__main__':
    main_lib.work()

ここから先は

13,816字 / 1ファイル

¥ 800

ログイン

この記事が気に入ったらサポートをしてみませんか？