モンティホール問題をAI達は理解しているのかシミュレーションを書かせて確認する

2023年9月22日 21:56

モンティホール問題の説明はこちら

Webで遊べるシミュレーター作って下さった方もいらっしゃいます

要は３つのドアの裏に１つだけ正解が隠れていてプレイヤーはドアを１つ選んだ後、司会者がハズレのドアを１つ開けてその後プレイヤーは選んだドアを変更する権利が与えられる。この時ドアを変更した方が確率的に得なのかどうかという問題です

モンティホール問題はとにかく人間の直感に反する確率観を迫ってくるものらしく問題に対する正しい回答(最初選んだドアから別のドアを選択したほうが当たる確率は2倍になる)を雑誌に掲載した世界最高の知能指数保有者の女性作家に反論した事でアメリカの数学者を始めこの問題で間違った確率の理解を披露してどの方面を向いても合わせる顔が無くなった人は数知れない。wikipediaを見るとこんな感じだ

ジョージ・メイソン大学　ロバート・サッチス博士「プロの数学者として、一般大衆の数学的知識の低さを憂慮する。自らの間違いを認める事で現状が改善されます」
フロリダ大学　スコット・スミス博士「君は明らかなヘマをした（中略）世界最高の知能指数保有者である貴女が自ら数学的無知をこれ以上世間に広める愚行を直ちに止め、恥を知るように！」

E・レイ・ボボ博士「（前略）現在、憤懣やるかたない数学者を何人集めれば、貴女の考えを改める事が可能でしょうか？」

ﾌﾟｷﾞｬｰ　その博士号が泣いてるぜ！
とあとから知ってると煽りたくなるアカデミアの回答ですが、
せっかくなのでこの問題をAIが理解しているか確認してみようというコーナーになります。

１．ChatGPT 3.5

▼私の質問
モンティホール問題のシミュレーションをするコードを書いてください

▼ChatGPTの回答
モンティ・ホール問題をシミュレーションするためのPythonコードを提供します。モンティ・ホール問題は、以下のように要約できます：3つのドアがあり、1つのドアの後ろに賞品があり、他の2つのドアには賞品がありません。最初に選んだドアを選択肢として残し、ホストが別のドアを開けます。あなたは最初の選択を変更するべきかどうかが問題です。

以下は、モンティ・ホール問題をシミュレーションするPythonコードです：

import random

def monty_hall_simulation(num_simulations):
    switch_wins = 0
    stay_wins = 0

    for _ in range(num_simulations):
        # 3つのドアを作成し、1つに賞品を配置する
        doors = ['goat', 'goat', 'car']
        random.shuffle(doors)  # ドアの配置をランダムにシャッフル

        # プレイヤーが最初に選ぶドアをランダムに選択
        player_choice = random.randint(0, 2)

        # ホストが選ぶドアを選択（賞品のないドアを選ぶ）
        host_choices = [i for i in range(3) if i != player_choice and doors[i] == 'goat']
        host_choice = random.choice(host_choices)

        # プレイヤーがステイする場合
        if doors[player_choice] == 'car':
            stay_wins += 1

        # プレイヤーがスイッチする場合
        switch_choice = [i for i in range(3) if i != player_choice and i != host_choice][0]
        if doors[switch_choice] == 'car':
            switch_wins += 1

    return stay_wins, switch_wins

if __name__ == "__main__":
    num_simulations = 10000
    stay_wins, switch_wins = monty_hall_simulation(num_simulations)
    print(f"ステイした場合の勝利数: {stay_wins}/{num_simulations} ({stay_wins/num_simulations:.2%})")
    print(f"スイッチした場合の勝利数: {switch_wins}/{num_simulations} ({switch_wins/num_simulations:.2%})")

このコードは、指定した回数のシミュレーションを実行し、プレイヤーがステイした場合とスイッチした場合の勝利数を計算します。結果はパーセンテージで表示されます。多くの場合、スイッチする方が勝利確率が高いことが示されます。

実行結果

$ python3 montyhall.py
ステイした場合の勝利数: 3359/10000 (33.59%)
スイッチした場合の勝利数: 6641/10000 (66.41%)

ChatGPT (3.5) は完璧ですなこりゃ

２．Codellama 34B

▼　Open Ineterpreterを使ってcodellama 34Bに聞いた場合

> Please write the python code to simulate monty hall problem
                                                                                
                                                                                

                                                                                
  import random                                                                 
                                                                                
  def monty_hall(num_doors):                                                    
      # Initialize the doors with prizes and goats                              
      doors = [1] * num_doors                                                   
      doors[random.randint(0, num_doors - 1)] = 0                               
                                                                                
      # Simulate the game                                                       
      chosen_door = random.randint(0, num_doors - 1)                            
      revealed_door = random.randint(0, num_doors - 1)                          
      while revealed_door == chosen_door:

ぱっと見で分かりますがcodellama 34Bは
まともなシミュレーション出来てなさそうですね。
実行してみるとこんな感じでChatGPTではそのままコピペするだけで動いたコードが動きません

実行結果

$ python3 codellamamontyhall.py
  File "codellamamontyhall.py", line 1
    import random                                                                 
    ^
IndentationError: unexpected indent

中間報告

codellama 34B　はモンティホール問題は正しく理解出来ない
ChatGPT 3.5　　はモンティホール問題を理解している。

トレーニング手法とか色々あるでしょうが34Bクラスじゃ
まともな問題理解やコーディングサポートは出来かねる様に感じてましたがまた一つその認識を深めてしまいました。Metaの皆さんすみませんのう。
とりあえず私は34Bまでしか動かせずllama2 70B はRAM不足になるMacbook Proの買い替えを検討しています。

CPU同性能でRAMを倍にするのが一番手軽そうですが、
これも直ぐに物足りなく感じてきそうで悩ましいですね。

早くAppleはRAM超盛りのMBP出してくれ

３．Llama 70B chat

下記のサイトでllama70B chat にコードを書かせたらいい線いっていた。

▼私
Please writeh the python code only. I don't need the explanation of monty hall problem

▼AI
Sure, here's the Python code to simulate the Monty Hall problem:

import random

def monty_hall(num_doors=3):
    cars = []
    goats = []
    chosen_door = None

    # Assign cars and goats randomly behind doors
    for i in range(num_doors):
        if random.random() < 0.5:
            cars.append(i)
        else:
            goats.append(i)

    # Choose a door at random
    chosen_door = random.choice([i for i in range(num_doors) if i not in cars])

    # Open a door with a goat behind it
    opened_door = random.choice(goats)

でもメソッド定義しただけだから実行しても何も起こらないコードです

実行結果

$ python3 monthyhall-llama2-70B.py

これまで個人的にOSSのLLMで多少コードを書かせてみてやはり一番コードの品質と解説の質が良かったのはFalcon-180Bでした。

４．Falcon 180B

デモサイトを使います

▼私
Please write the code to simulate monty hall problem

▼AI
Sure, here's a Python code to simulate the Monty Hall problem:

import random

def play_game(stay):
    doors = [1, 2, 3]  # The three doors
    prize_door = random.choice(doors)  # The door with the prize behind it
    contestant_choice = random.choice(doors)  # The door chosen by the contestant
    
    remaining_doors = list(set(doors) - set([contestant_choice, prize_door]))  # The other two doors
    
    if stay:
        opened_door = random.choice(remaining_doors)  # Monty opens one of the other doors
        print("Monty opens door", opened_door)
        if opened_door == prize_door:
            print("You win!")
        else:
            print("You lose.")
    else:
        opened_door = random.choice(remaining_doors)  # Monty opens one of the other doors
        print("Monty opens door", opened_door)
        
        while True:
            switch_or_stay = input("Do you want to switch your choice or stay with your original choice? ").lower()
            if switch_or_stay == "switch":
                new_choice = random.choice(remaining_doors)  # The contestant switches their choice
                print("You choose door", new_choice)
                break
            elif switch_or_stay == "stay":
                print("You stick with door", contestant_choice)
                break
            else:
                print("Invalid choice. Please enter 'switch' or 'stay'.")
                
    if (contestant_choice == prize_door) or (new_choice == prize_door):
        print("You win!")
    else:
        print("You lose.")
        
# Run the simulation
num_rounds = 1000
wins_staying = 0
wins_switching = 0

for i in range(num_rounds):
    play_game(False)
    wins_switching += 1 if "You win!" in output else 0
    
print("Wins switching:", wins_switching, "out of", num_rounds)

The play_game() function simulates one round of the game, where the contestant chooses a door and Monty opens one of the other two doors. The function then asks the player whether they want to switch their choice or stay with their original choice, and determines whether they win the prize or not based on their decision.

The num_rounds variable specifies how many rounds of the game to run, and the loop at the end runs num_rounds iterations of the play_game() function with the switch option selected. It counts how many times the player wins the prize when switching, and prints the results at the end.

実行結果

$ python montyhall-falcon180B.py
Monty opens door 3
Do you want to switch your choice or stay with your original choice? y
Invalid choice. Please enter 'switch' or 'stay'.
Do you want to switch your choice or stay with your original choice? switch
You choose door 3
You lose.
Traceback (most recent call last):
  File "montyhall-falcon180B.py", line 45, in <module>
    wins_switching += 1 if "You win!" in output else 0
NameError: name 'output' is not defined

意図したものとは違いますがギリギリそのままでも主要部分は動くコードでシミュレーションしてますね、エラーが出てますけど。何回か指示だせばこのあたりも修正してfalcon180BはchatGPT 3.5と同じ水準でコードサポートはしてくれそうです。

結論

モンティ・ホール問題の理解度は

ChatGPT > falcon 180B > llama2 70B chat > codellama 34B

大体事前に想定してたのと同じ順位になりました。

この記事が気に入ったらサポートをしてみませんか？

モンティホール問題をAI達は理解しているのかシミュレーションを書かせて確認する

１．ChatGPT 3.5

実行結果

ChatGPT (3.5) は完璧ですなこりゃ

２．Codellama 34B

実行結果

中間報告

codellama 34B はモンティホール問題は正しく理解出来ないChatGPT 3.5 はモンティホール問題を理解している。

早くAppleはRAM超盛りのMBP出してくれ

３．Llama 70B chat

実行結果

４．Falcon 180B

実行結果

結論

ChatGPT > falcon 180B > llama2 70B chat > codellama 34B

codellama 34B　はモンティホール問題は正しく理解出来ない
ChatGPT 3.5　　はモンティホール問題を理解している。