ローカルLLMをCore MLモデルに変換する - huggingface/exporters の使い方

2024年3月3日 16:22

exporters とは

Hugging Faceが公開している exporters というツールがある。

TransformersのモデルをCore MLにエクスポートするためのツールで、coremltoolsをラップしたものではあるが、変換に伴う色々な問題をツール側で吸収してくれている。要はこのツールを使えばcoremltoolsをそのまま使うよりも簡単にTransformersモデルをCore MLモデルに変換できる。

モチベーション

exporters は以下の公式ブログで紹介されていたもので、

Llama 2などのLLMモデルもこれを用いて変換されたようだ（あるいは、その変換の過程で得られた知見がこのツールに盛り込まれている）

公開されている変換済みモデルは、試してみたもののまだモバイル端末には大きすぎたので、

もっと小さいモデルを自前で変換したらよいかもしれない、と思い exporters ツールを使ってみることにした。

なお、同記事で紹介されている "transformers-to-coreml conversion Space" は、2024年3月3日現在 500 エラーがでていて利用できない（試せないので詳細は不明だが、exportersをWebサービス化してGUIから操作できるようにしたものと思われる）。

インストール

git clone してきて、

$ cd exporters
$ pip install -e .

だけでOK。（venvで仮想環境は作った）

とりあえず変換

READMEに載っている

python -m exporters.coreml --model=distilbert-base-uncased exported/

を実行してみた。

コマンドの意味としては、

exporters.coreml パッケージを実行
- 同パッケージを使うと、モデルのチェックポイントをCore MLモデルに変換することができる

`--model` 引数に変換元モデルのチェックポイントを指定
- Hugging Face Hub 上のチェックポイントでも、ローカルに保存されているチェックポイントでもOK
`exported` ディレクトリに変換後の Core ML ファイルが保存される
- デフォルトのファイル名は `Model.mlpackage`

ここまで進んだが、

Torch version 2.2.1 has not been tested with coremltools. You may run into unexpected errors. Torch 2.1.0 is the most recent version that has been tested.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 53.4kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 483/483 [00:00<00:00, 1.17MB/s]
vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 662kB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.50MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 268M/268M [00:40<00:00, 6.68MB/s]
Using framework PyTorch: 2.2.1
（略）
Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[x] values not close enough (atol: 0.0001)

こういうエラーが出た：

Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
（略）
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.004475116729736328

生成まではうまくいったが、バリデーションで失敗している。

READMEによると本来はこういうログが出るらしい

Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/Model.mlpackage

しかし、READMEの別パートには次のように書かれており、この最後のバリデーションでの失敗は許容範囲のようだ。

If validation fails with an error such as the following, it doesn't necessarily mean the model is broken:（以下のようなエラーで検証が失敗しても、必ずしもモデルが壊れているとは限りません）

> ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.12345

The comparison is done using an absolute difference value, which in this example is 0.12345. That is much larger than the default tolerance value of 1e-4, hence the reported error. However, the magnitude of the activations also matters. For a model whose activations are on the order of 1e+3, a maximum absolute difference of 0.12345 would usually be acceptable.
（比較は差の絶対値で行われ、この例では0.12345です。これはデフォルトの許容値である1e-4よりはるかに大きく、そのためエラーが報告されています。しかし、アクティブ度の大きさも重要です。アクティブ度が1e+3程度のモデルであれば、最大差の絶対値が0.12345であれば通常は許容範囲でしょう。）

If validation fails with this error and you're not entirely sure if this is a true problem, call `mlmodel.predict()` on a dummy input tensor and look at the largest absolute magnitude in the output tensor.
（もしこのエラーで検証が失敗し、これが本当に問題なのか全くわからない場合は、ダミーの入力テンソルで `mlmodel.predict()` を呼び出し、出力テンソルの最大の絶対値の大きさを調べてください。）

いったん成功とする。

LLMを変換してみる

ここから先は

8,984字 / 7画像

¥ 500

ログイン

最後まで読んでいただきありがとうございます！もし参考になる部分があれば、スキを押していただけると励みになります。 Twitterもフォローしていただけたら嬉しいです。 https://twitter.com/shu223/