Google Colaboratoryを利用して無料GPUでDeepLabを動作させる

Windows機のGPUがショボくてメモリ不足だったのでGoogle様のお力を借りてみる。

Notebookおよび実行結果

長くなったので先に使用したNotebookへのリンク。
下2つには実行結果も。

DeepLabのインストール

PASCALデータセットのダウンロードおよびコンバート

学習〜モデル出力

Googleドライブの事前準備

“新規>その他>アプリを追加”からGoogle Colaboratoryを追加する。(追加済の場合もあり)
“設定>アプリの管理”からGoogle Colaboratoryの“デフォルトで使用”にチェックを入れる。
新規フォルダ"ML"を作成する。(フォルダ名は任意)

DeepLabのインストール

MLに移動する。
“新規>その他>Google Colaboratory”を選択してノートブックを作成する。

アプリを追加したのに新規にGoogle Colaboratoryがない場合

以下のサイトを開く。

左部(目次 / コードスニペット / ファイル)からファイルタブを選択する。
“ドライブをマウント”を選択する。
「Google ドライブをマウントするには、このセルを実行してください。」と表示されるので、▶︎をクリックまたはshift+enter。
表示されるURLを開いてGoogle Drive File Streamを許可し、コードを入力する。
“ファイル>Python3の新しいノートブック”で新規ノートブックを作成する。
ここまで進めるとマイドライブの新規にGoogle Colaboratoryが追加されているはず。

以下のコードセルを実行する。

from google.colab import drive
drive.mount('/content/drive')

Google ドライブをマウントするには、このセルを実行してください。」と表示されるので、▶︎をクリックまたはshift+enter。
表示されるURLを開いてGoogle Drive File Streamを許可し、コードを入力する。

以下のコードセルを実行する。

%%bash
cd /content/drive/'My Drive'/ML
mkdir tensorflow
cd tensorflow
git clone https://github.com/tensorflow/models.git

以下のコードセルを実行する。

%%bash
cd /content/drive/'My Drive'/ML/tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
python deeplab/model_test.py

OKが表示される。

WARNING: Logging before flag parsing goes to stderr.
W0903 08:49:30.997761 140559798843264 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
 * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
 * https://github.com/tensorflow/addons
 * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0903 08:49:31.288230 140559798843264 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/slim/nets/mobilenet/mobilenet.py:397: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
Running tests under Python 3.6.8: /usr/bin/python3
[ RUN      ] DeeplabModelTest.testBuildDeepLabWithDensePredictionCell
W0903 08:49:31.299340 140559798843264 deprecation.py:323] From /usr/lib/python3.6/contextlib.py:60: TensorFlowTestCase.test_session (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `self.session()` or `self.cached_session()` instead.
2019-09-03 08:49:31.317845: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-09-03 08:49:31.318134: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x28d8680 executing computations on platform Host. Devices:
2019-09-03 08:49:31.318162: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
W0903 08:49:31.321006 140559798843264 deprecation_wrapper.py:119] From deeplab/model_test.py:133: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
W0903 08:49:31.332588 140559798843264 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/model.py:310: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
W0903 08:49:31.333198 140559798843264 deprecation.py:323] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/core/feature_extractor.py:196: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0903 08:49:31.335602 140559798843264 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/core/feature_extractor.py:64: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0903 08:49:33.469389 140559798843264 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/model.py:401: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
I0903 08:49:33.469685 140559798843264 model.py:401] Using dense prediction cell config.
I0903 08:49:33.470008 140559798843264 dense_prediction_cell.py:222] {'kernel': 3, 'rate': [1, 6], 'op': 'conv', 'input': -1}
I0903 08:49:33.665041 140559798843264 dense_prediction_cell.py:222] {'kernel': 3, 'rate': [18, 15], 'op': 'conv', 'input': 0}
W0903 08:49:33.825960 140559798843264 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/core/utils.py:34: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.
[       OK ] DeeplabModelTest.testBuildDeepLabWithDensePredictionCell
[ RUN      ] DeeplabModelTest.testBuildDeepLabv2
[       OK ] DeeplabModelTest.testBuildDeepLabv2
[ RUN      ] DeeplabModelTest.testForwardpassDeepLabv3plus
W0903 08:49:53.043617 140559798843264 deprecation_wrapper.py:119] From deeplab/model_test.py:103: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
2019-09-03 08:49:53.458546: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
[       OK ] DeeplabModelTest.testForwardpassDeepLabv3plus
[ RUN      ] DeeplabModelTest.testWrongDeepLabVariant
[       OK ] DeeplabModelTest.testWrongDeepLabVariant
[ RUN      ] DeeplabModelTest.test_session
[  SKIPPED ] DeeplabModelTest.test_session
----------------------------------------------------------------------
Ran 5 tests in 22.690s
OK (skipped=1)

GPU使用の確認

上で使用したノートブックで、“ランタイム>ランタイムのタイプを変更”を選択する。
ハードウェアアクセラレータにGPUを選択する。
再読み込みを実行する。

マウントのコードセルを実行する。
テストのコードセルを実行する。
結果がGPUを使用したものになっていることを確認。

WARNING: Logging before flag parsing goes to stderr.
W0903 08:57:08.706118 139666998724480 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
 * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
 * https://github.com/tensorflow/addons
 * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0903 08:57:08.729643 139666998724480 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/slim/nets/mobilenet/mobilenet.py:397: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
Running tests under Python 3.6.8: /usr/bin/python3
[ RUN      ] DeeplabModelTest.testBuildDeepLabWithDensePredictionCell
W0903 08:57:08.733702 139666998724480 deprecation.py:323] From /usr/lib/python3.6/contextlib.py:60: TensorFlowTestCase.test_session (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `self.session()` or `self.cached_session()` instead.
2019-09-03 08:57:08.739735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-09-03 08:57:08.739957: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b0fc00 executing computations on platform Host. Devices:
2019-09-03 08:57:08.739984: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-03 08:57:08.742331: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-03 08:57:08.905765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:08.906538: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b10680 executing computations on platform CUDA. Devices:
2019-09-03 08:57:08.906571: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-09-03 08:57:08.906772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:08.907321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-03 08:57:08.907722: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:08.908985: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-03 08:57:08.910295: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-03 08:57:08.910731: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-03 08:57:08.912300: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-03 08:57:08.913378: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-03 08:57:08.916679: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-03 08:57:08.916790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:08.917343: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:08.917866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-03 08:57:08.917937: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:08.919140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-03 08:57:08.919166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-03 08:57:08.919176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-03 08:57:08.919295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:08.919852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:08.920365: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-09-03 08:57:08.920419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4523 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
W0903 08:57:08.921473 139666998724480 deprecation_wrapper.py:119] From deeplab/model_test.py:133: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
W0903 08:57:08.927320 139666998724480 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/model.py:310: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
W0903 08:57:08.927835 139666998724480 deprecation.py:323] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/core/feature_extractor.py:196: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0903 08:57:08.930071 139666998724480 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/core/feature_extractor.py:64: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0903 08:57:10.773625 139666998724480 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/model.py:401: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
I0903 08:57:10.773921 139666998724480 model.py:401] Using dense prediction cell config.
I0903 08:57:10.774191 139666998724480 dense_prediction_cell.py:222] {'kernel': 3, 'rate': [1, 6], 'op': 'conv', 'input': -1}
I0903 08:57:10.979656 139666998724480 dense_prediction_cell.py:222] {'kernel': 3, 'rate': [18, 15], 'op': 'conv', 'input': 0}
W0903 08:57:11.117571 139666998724480 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/deeplab/core/utils.py:34: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.
2019-09-03 08:57:11.120156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:11.120784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-03 08:57:11.120893: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:11.120928: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-03 08:57:11.120948: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-03 08:57:11.120968: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-03 08:57:11.120986: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-03 08:57:11.121005: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-03 08:57:11.121051: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-03 08:57:11.121147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:11.121659: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:11.122152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-03 08:57:11.122217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-03 08:57:11.122241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-03 08:57:11.122255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-03 08:57:11.122346: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:11.122862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:11.123375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4523 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-09-03 08:57:15.714596: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:15.715254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-03 08:57:15.715377: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:15.715404: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-03 08:57:15.715423: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-03 08:57:15.715441: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-03 08:57:15.715460: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-03 08:57:15.715482: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-03 08:57:15.715502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-03 08:57:15.715587: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:15.716156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:15.716641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-03 08:57:15.716691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-03 08:57:15.716706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-03 08:57:15.716720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-03 08:57:15.716804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:15.717369: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:15.717877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4523 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-09-03 08:57:21.427888: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:21.428564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-03 08:57:21.428699: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:21.428733: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-03 08:57:21.428760: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-03 08:57:21.428781: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-03 08:57:21.428801: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-03 08:57:21.428821: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-03 08:57:21.428842: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-03 08:57:21.428930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:21.429489: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:21.429965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-03 08:57:21.430012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-03 08:57:21.430049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-03 08:57:21.430076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-03 08:57:21.430182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:21.430680: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:21.431204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4523 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-09-03 08:57:23.376018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:23.376790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-03 08:57:23.376956: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:23.376983: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-03 08:57:23.377002: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-03 08:57:23.377021: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-03 08:57:23.377072: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-03 08:57:23.377093: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-03 08:57:23.377113: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-03 08:57:23.377213: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:23.377727: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:23.378225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-03 08:57:23.378293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-03 08:57:23.378310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-03 08:57:23.378324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-03 08:57:23.378443: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:23.378969: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:23.379474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4523 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-09-03 08:57:26.025887: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:26.026559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-09-03 08:57:26.026697: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-03 08:57:26.026723: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-03 08:57:26.026743: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-03 08:57:26.026761: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-03 08:57:26.026785: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-03 08:57:26.026803: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-03 08:57:26.026823: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-03 08:57:26.026917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:26.027493: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:26.027995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-03 08:57:26.028060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-03 08:57:26.028086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-03 08:57:26.028101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-03 08:57:26.028248: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:26.028767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-03 08:57:26.029286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4523 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
[       OK ] DeeplabModelTest.testBuildDeepLabWithDensePredictionCell
[ RUN      ] DeeplabModelTest.testBuildDeepLabv2
[       OK ] DeeplabModelTest.testBuildDeepLabv2
[ RUN      ] DeeplabModelTest.testForwardpassDeepLabv3plus
W0903 08:57:28.230751 139666998724480 deprecation_wrapper.py:119] From deeplab/model_test.py:103: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
2019-09-03 08:57:28.556293: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
[       OK ] DeeplabModelTest.testForwardpassDeepLabv3plus
[ RUN      ] DeeplabModelTest.testWrongDeepLabVariant
[       OK ] DeeplabModelTest.testWrongDeepLabVariant
[ RUN      ] DeeplabModelTest.test_session
[  SKIPPED ] DeeplabModelTest.test_session
----------------------------------------------------------------------
Ran 5 tests in 20.135s
OK (skipped=1)

ローカルテスト

続けて以下のコードセルを実行する。

%%bash
cd /content/drive/'My Drive'/ML/tensorflow/models/research/deeplab
sh local_test.sh

local_test.sh中のexportでエラー。"My Drive"が綺麗に処理できてないっぽい。

local_test.sh: 33: export: Drive/ML/tensorflow/models/research:/content/drive/My: bad variable name

おとなしくlocal_test.shの内容をノートブックに落とし込んでいくことにする。

データセットのダウンロードおよびコンバート

新規ノートブックを作成する。
以下のコードセルでドライブをマウントする。

from google.colab import drive
drive.mount('/content/drive')

以下のコードセルでディレクトリ移動とPITHONPATHの設定をする。

%cd "/content/drive/My Drive/ML/tensorflow/models/research"
researchpath = %pwd
%env PYTHONPATH = $researchpath:$researchpath/slim
%cd deeplab/datasets

以下のコードセルでワークディレクトリを作成して移動する。

CURRENT_DIR = %pwd
WORK_DIR = "./pascal_voc_seg"
!mkdir -p $WORK_DIR
%cd $WORK_DIR

以下のコードセルでデータセットをDL、解凍する。

import os

BASE_URL = "http://host.robots.ox.ac.uk/pascal/VOC/voc2012/"
FILENAME = "VOCtrainval_11-May-2012.tar"

os.path.exists(FILENAME)

if not os.path.exists(FILENAME):
   print("Downloading " + FILENAME + " to " + WORK_DIR)
   !wget -nd -c $BASE_URL/$FILENAME

print("Uncompressing" + FILENAME)
!tar -xf $FILENAME

以下のコードセルでカラーマップを削除する。
実行時間の計測が不要な場合はマジックコマンド%timeは不要。(以降も同様)

%cd $CURRENT_DIR

PASCAL_ROOT = WORK_DIR + "/VOCdevkit/VOC2012"

SEG_FOLDER = PASCAL_ROOT + "/SegmentationClass"
SEMANTIC_SEG_FOLDER = PASCAL_ROOT + "/SegmentationClassRaw"

print("Removing the color map in ground truth annotations...")

%time !python remove_gt_colormap.py \
 --original_gt_folder=$SEG_FOLDER \
 --output_dir=$SEMANTIC_SEG_FOLDER

以下のコードセルでデータセットをtfrecords形式に変換する。

OUTPUT_DIR = WORK_DIR + "/tfrecord"
!mkdir -p $OUTPUT_DIR

IMAGE_FOLDER = PASCAL_ROOT + "/JPEGImages"
LIST_FOLDER = PASCAL_ROOT + "/ImageSets/Segmentation"

print("Converting PASCAL VOC 2012 dataset...")

%time !python build_voc2012_data.py \
 --image_folder=$IMAGE_FOLDER \
 --semantic_segmentation_folder=$SEMANTIC_SEG_FOLDER \
 --list_folder=$LIST_FOLDER \
 --image_format="jpg" \
 --output_dir=$OUTPUT_DIR

注意事項:Google ColaboratoryはランタイムのディスクとGoogleドライブの同期に時間がかかることがある。動かした感じでは特にファイル解凍の同期に時間がかかっていた。
Googleドライブ側から確認して空だからといって即、ファイル出力に失敗したことになるわけではない。%lsコマンドでランタイムのディスクにファイルが出力されているか確認すること。

学習・評価・視覚化・モデル出力

新規ノートブックを作成する。ランタイムはGPUに設定する。
以下のコードセルでドライブをマウントする。

from google.colab import drive
drive.mount('/content/drive')

以下のコードセルでディレクトリ移動とPITHONPATHの設定をする。

%cd "/content/drive/My Drive/ML/tensorflow/models/research"
researchpath = %pwd
%env PYTHONPATH = $researchpath:$researchpath/slim
%cd deeplab

以下のコードセルで各種ディレクトリ名を設定する。

CURRENT_DIR = %pwd
DATASET_DIR = "/datasets"

PASCAL_FOLDER = "/pascal_voc_seg"
EXP_FOLDER = "/exp/train_on_trainval_set"
INIT_FOLDER = CURRENT_DIR + DATASET_DIR + PASCAL_FOLDER + "/init_models"
TRAIN_LOGDIR = CURRENT_DIR + DATASET_DIR + PASCAL_FOLDER + EXP_FOLDER + "/train"
EVAL_LOGDIR = CURRENT_DIR + DATASET_DIR + PASCAL_FOLDER + EXP_FOLDER + "/eval"
VIS_LOGDIR = CURRENT_DIR + DATASET_DIR + PASCAL_FOLDER + EXP_FOLDER + "/vis"
EXPORT_DIR = CURRENT_DIR + DATASET_DIR + PASCAL_FOLDER + EXP_FOLDER + "/export"

以下のコードセルでテストを実行する。

!python model_test.py -v

フラグ-vに値がないとエラーを返される。

WARNING: Logging before flag parsing goes to stderr.
W0904 12:32:23.712459 139621176600448 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
 * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
 * https://github.com/tensorflow/addons
 * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0904 12:32:23.754800 139621176600448 deprecation_wrapper.py:119] From /content/drive/My Drive/ML/tensorflow/models/research/slim/nets/mobilenet/mobilenet.py:397: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
Traceback (most recent call last):
 File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 696, in get_value
   return next(args) if value is None else value
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
 File "model_test.py", line 147, in <module>
   tf.test.main()
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/test.py", line 64, in main
   return _googletest.main(argv)
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/googletest.py", line 65, in main
   benchmark.benchmarks_main(true_main=main_wrapper)
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/benchmark.py", line 407, in benchmarks_main
   true_main()
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/googletest.py", line 64, in main_wrapper
   return app.run(main=g_main, argv=args)
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
   _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 294, in run
   flags_parser,
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 351, in _run_init
   flags_parser=flags_parser,
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 213, in _register_and_parse_flags_with_usage
   args_to_main = flags_parser(original_argv)
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 31, in _parse_flags_tolerate_undef
   return flags.FLAGS(_sys.argv if argv is None else argv, known_only=True)
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/flags.py", line 112, in __call__
   return self.__dict__['__wrapped'].__call__(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 626, in __call__
   unknown_flags, unparsed_args = self._parse_args(args, known_only)
 File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 744, in _parse_args
   value = get_value()
 File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 698, in get_value
   raise _exceptions.Error('Missing value for flag ' + arg)  # pylint: disable=undefined-loop-variable
absl.flags._exceptions.Error: Missing value for flag -v

他の環境では発生したことがないのになぜ・・・?
とりあえず--helpfullオプションでDeepLab関連の全オプションが見れるので表示してみる。
absl.loggingに-vを発見。

-v,--verbosity: Logging verbosity level. Messages logged at this level or
lower will be included. Set to 1 for debug logging. If the flag was not set
or supplied, the value will be changed from the default of -1 (warning) to 0
(info) after flags are parsed.
(default: '-1')
(an integer)

ざっと読んだ感じ、フラグを値なしで与えるとデフォルトの-1(warning)になってフラグを省略すると0(info)になるのかな?
-1(warning)をセットするのがよさそう。
以下のようにコードセルを変更して実行する。

!python model_test.py -v −1

これでテストが通った。

以下のコードセルで初期モデルのダウンロードして解凍する。

TF_INIT_ROOT = "http://download.tensorflow.org/models"
TF_INIT_CKPT = "deeplabv3_pascal_train_aug_2018_01_04.tar.gz"
%cd $INIT_FOLDER
!wget -nd -c $TF_INIT_ROOT/$TF_INIT_CKPT
!tar -xf $TF_INIT_CKPT
%cd $CURRENT_DIR

以下のコードセルで学習を実行する。NUM_ITERATIONSはモデル出力時にも参照するので、コードセルはPythonスクリプトの実行と分割した方が無難。

NUM_ITERATIONS=10
%time !python train.py \
 --logtostderr \
 --train_split="trainval" \
 --model_variant="xception_65" \
 --atrous_rates=6 \
 --atrous_rates=12 \
 --atrous_rates=18 \
 --output_stride=16 \
 --decoder_output_stride=4 \
 --train_crop_size="513,513" \
 --train_batch_size=4 \
 --training_number_of_steps=$NUM_ITERATIONS \
 --fine_tune_batch_norm=true \
 --tf_initial_checkpoint=$INIT_FOLDER/deeplabv3_pascal_train_aug/model.ckpt \
 --train_logdir=$TRAIN_LOGDIR \
 --dataset_dir=$PASCAL_DATASET

%timeによる計測結果。約2分。
若干ran out of memoryが出たのにめちゃくちゃ早い。

CPU times: user 477 ms, sys: 52.4 ms, total: 530 ms
Wall time: 1min 51s

以下のコードセルで評価を実行する。

%time !python eval.py \
 --logtostderr \
 --eval_split="val" \
 --model_variant="xception_65" \
 --atrous_rates=6 \
 --atrous_rates=12 \
 --atrous_rates=18 \
 --output_stride=16 \
 --decoder_output_stride=4 \
 --eval_crop_size="513,513" \
 --checkpoint_dir=$TRAIN_LOGDIR \
 --eval_logdir=$EVAL_LOGDIR \
 --dataset_dir=$PASCAL_DATASET \
 --max_number_of_evaluations=1

%timeによる計測結果。約5分。

CPU times: user 889 ms, sys: 119 ms, total: 1.01 s
Wall time: 5min 1s

以下のコードセルで視覚化を実行。

%time !python vis.py \
 --logtostderr \
 --vis_split="val" \
 --model_variant="xception_65" \
 --atrous_rates=6 \
 --atrous_rates=12 \
 --atrous_rates=18 \
 --output_stride=16 \
 --decoder_output_stride=4 \
 --vis_crop_size="513,513" \
 --checkpoint_dir=$TRAIN_LOGDIR \
 --vis_logdir=$VIS_LOGDIR \
 --dataset_dir=$PASCAL_DATASET \
 --max_number_of_iterations=1

%timeによる計測結果。約7分。

CPU times: user 2.71 s, sys: 427 ms, total: 3.13 s
Wall time: 6min 41s

以下のコードセルでモデルを出力する。

CKPT_PATH = TRAIN_LOGDIR + "/model.ckpt-" + str(NUM_ITERATIONS)
EXPORT_PATH = EXPORT_DIR + "/frozen_inference_graph.pb"
%time !python export_model.py \
 --logtostderr \
 --checkpoint_path=$CKPT_PATH \
 --export_path=$EXPORT_PATH \
 --model_variant="xception_65" \
 --atrous_rates=6 \
 --atrous_rates=12 \
 --atrous_rates=18 \
 --output_stride=16 \
 --decoder_output_stride=4 \
 --num_classes=21 \
 --crop_size=513 \
 --crop_size=513 \
 --inference_scales=1.0

%timeによる計測結果。1分かからず。

CPU times: user 97.8 ms, sys: 17.7 ms, total: 115 ms
Wall time: 18.4 s

TPUの使用

せっかくなのでランタイムのタイプをTPUに変更して実行してみる。
以下%time結果の比較。

train.py
----GPU----
CPU times: user 477 ms, sys: 52.4 ms, total: 530 ms
Wall time: 1min 51s
----TPU----
CPU times: user 3.95 s, sys: 548 ms, total: 4.5 s
Wall time: 11min 53s
eval.py
----GPU----
CPU times: user 889 ms, sys: 119 ms, total: 1.01 s
Wall time: 5min 1s
----TPU----
CPU times: user 22.9 s, sys: 3.2 s, total: 26.1 s
Wall time: 1h 14min 48s
vis.py
----GPU----
CPU times: user 2.71 s, sys: 427 ms, total: 3.13 s
Wall time: 6min 41s
----TPU----
CPU times: user 24.6 s, sys: 3.38 s, total: 28 s
Wall time: 1h 15min 36s
export_model.py
----GPU----
CPU times: user 97.8 ms, sys: 17.7 ms, total: 115 ms
Wall time: 18.4 s
----TPU----
CPU times: user 105 ms, sys: 23.6 ms, total: 129 ms
Wall time: 17.1 s

細かい分析はしていないが、どうやらTPUにすれば爆速が神速になるようなものではないらしい。

雑感

圧倒的に速いので最初からこれにしておけばよかったと後悔。
よほど自宅環境に自信のある人以外はGoogle Colaboratoryがオススメかも。

少し出力結果がゴチャッとしてるのでログ周りの設定をいじってから次に移りたい。 ■

この記事が気に入ったらサポートをしてみませんか?