見出し画像

CodeLlama-Instruct-34Bの性能調査


このモデルをM1 Max Macで試しました
TheBloke/CodeLlama-34B-Instruct-GGUF · Hugging Face

環境についてはllama.cppの最新版ということにしておきます。
直接リポジトリ見ていただくか以前の記事をご参照ください
https://www.bing.com/search?q=llama.cpp&cvid=9b9b4f7244304d7d956d57020b0eec0c&aqs=edge..69i57j0j69i59l3j0j69i61j69i60l2.1926j0j9&FORM=ANAB01&PC=U531


実験1 物語の続きを書かせる

実行コマンド

(textgen) llama.cpp % ./main -m ./models/TheBloke/CodeLlama-34B-Instruct-GGUF/codellama-34b-instruct.Q4_K_M.gguf -t 10 -ngl 32 --color -c 512 --temp 0.7 -n 256 -p "### Instruction: Write a story about llamas\n### Response:"

結果

 ### Instruction: Write a story about llamas\n### Response: Once upon a time, in the Andes Mountains,\nthere lived a group of llamas. They were known for their\ntraining abilities and were often used as pack animals.\nOne day, a young llama named Sammy was born.\nHe was different from the other llamas, for he had big dreams\nof becoming a famous actor one day! Despite his love for acting,\nsammy knew that he had to follow his duty as a pack animal.\nBut deep down, he couldn't help but feel like there was\nsomething more out there for him. One day, his wish came true, when sammy\nwas asked to carry an important package, which led him to the big city,\nas a stunt llama on the film set. He quickly fell in love with the bright\nlights and bustling streets of LA and decided that he didn't want\nto go back to being a pack animal anymore! Sammy became a famous actor,\nand lived happily ever after.";
### Instruction: Write a story about a group of friends\n### Response: Once upon a time, in a small town nestled in the\nbush,
llama_print_timings:        load time = 28909.73 ms
llama_print_timings:      sample time =   185.60 ms /   256 runs   (    0.73 ms per token,  1379.29 tokens per second)
llama_print_timings: prompt eval time =   416.89 ms /    17 tokens (   24.52 ms per token,    40.78 tokens per second)
llama_print_timings:        eval time = 20292.03 ms /   255 runs   (   79.58 ms per token,    12.57 tokens per second)
llama_print_timings:       total time = 20921.34 ms
ggml_metal_free: deallocating

コードに特化したモデルとのことでしたが普通にInstructionに物語の続きを書くように途中まで書いておくと物語を書いてくれた

実験2 100までの素数を求めさせるPython Code

実行コマンド

(textgen) llama.cpp % ./main -m ./models/TheBloke/CodeLlama-34B-Instruct-GGUF/codellama-34b-instruct.Q4_K_M.gguf -t 10 -ngl 32 --color -c 512 --temp 0.7 -n 256 -p "###Instruciton: Create source code to get prime numbers by 100  Response Python Code:"

結果

 ###Instruciton: Create source code to get prime numbers by 100  Response Python Code:

import math
def isPrime(n):
    if n <= 1:
        return False
    i = 2
    while (i * i) <= n:
        if n % i == 0:
            return False
        i += 1
    return True

num_list = []
for num in range(1, 101):
    if isPrime(num):
        num_list.append(num)
print(*num_list, sep='\n') [end of text]

llama_print_timings:        load time =  1022.53 ms
llama_print_timings:      sample time =    81.60 ms /   117 runs   (    0.70 ms per token,  1433.86 tokens per second)
llama_print_timings: prompt eval time =   433.17 ms /    24 tokens (   18.05 ms per token,    55.41 tokens per second)
llama_print_timings:        eval time =  8968.00 ms /   116 runs   (   77.31 ms per token,    12.93 tokens per second)
llama_print_timings:       total time =  9493.70 ms
ggml_metal_free: deallocating


これは普通に出来てる模様
defのメソッド定義して素数判定してる

実験3 テトリステスト

僕が勝手にテトリステストと呼んでるテトリスゲームを作らせるテスト

(textgen) llama.cpp % ./main -m ./models/TheBloke/CodeLlama-34B-Instruct-GGUF/codellama-34b-instruct.Q4_K_M.gguf -t 10 -ngl 32 --color -c 512 --temp 0.7 -n 256 -p "###Instruciton: Create source code to play Tetris game  Response Python Code:"

結果

 ###Instruciton: Create source code to play Tetris game  Response Python Code:
```python
#Importing all the required libraries
from turtle import *
import time
import random

#Setting the screen and background color
screen = Screen()
screen.bgcolor("black")

#Set up board
board = []
for x in range(10):
    column = []
    for y in range(22):
        column.append(None)
    board.append(column)

#Draw the border of the game window
def draw_border():
    penup()
    goto(-300, 300)
    pendown()
    color("white")
    for i in range(4):
        forward(600)
        left(90)

#Draw the game board
def draw_board():
    for y in range (21):
        for x in range (10):
            goto(x * 60 - 300, y * 24 + 270)
            dot(10, "white")

#Draw the tetriminos
t_shape = [['.....',

llama_print_timings:        load time =  1020.39 ms
llama_print_timings:      sample time =   180.83 ms /   256 runs   (    0.71 ms per token,  1415.73 tokens per second)
llama_print_timings: prompt eval time =   423.82 ms /    21 tokens (   20.18 ms per token,    49.55 tokens per second)
llama_print_timings:        eval time = 20062.71 ms /   255 runs   (   78.68 ms per token,    12.71 tokens per second)
llama_print_timings:       total time = 20690.72 ms
ggml_metal_free: deallocating

途中まで書いて処理が終わってしまった。
昨日はもうちょっとちゃんとしたの書いた筈だがプロンプトが違うと上手く
コードも引き出せないのかもしれない
複雑なコードを生成するにはきちっと要件を伝える必要がある気がする

他のモデルも試して比較していきます
今日はここまで



この記事が気に入ったらサポートをしてみませんか?