Some methods using R for Twitter information "defense warfare" ─Rを用いたTwitter情報検証のためのいくつかの手法

2022年3月16日 02:01

　The military invasion of Ukraine by Russia has resulted in the loss of many innocent and irreplaceable lives and deprived many others of their livelihood.　I express my deepest condolences to all the victims and affected people. I pray that those who have lost or been forced to kill their lives, regardless of the country to which they belong, will find peace as soon as possible. I pray that as many people as possible will not be hurt and will regain a warm and secure life.
　　Each country has its own history, position, and feelings, and it is impossible for a layman like me to judge which is right.　News is being exchanged daily on social networking services such as Twitter, and the situation has become a kind of "information warfare". People are also expressing their opinions in response to the news from various perspectives.　Some of the information and opinions are correct, some are incorrect, some are neither, and some people may use the information arbitrarily.　Twitter allows tweets to be deleted. In order to verify the validity of information and senders later, it is important to comprehensively capture data and keep evidence in the form of logs.
　In this article, I show how to log Twitter information using R and also how to infer and visualize about the relationship between tweet senders.

　ロシアによるウクライナへの軍事侵攻により、罪のない多くのかけがえのない命が失われ、数多くの方がその暮らしを奪われています。
　犠牲となられた全ての方々、被害を受けた方々に深い哀悼の意を表します。どうか、属する国を問わず、命を奪い、奪われる事を強いられた方々に一刻も早い安寧と平和がありますように。どうか、傷つく方が一人でも少なく、温かな安心できる暮らしが戻りますように。

　国々には各々の歴史と立場と思いがあり、私のような素人がどちらが正しい等と判断することはできません。
　Twitter等SNSでは連日のようにニュースが飛び交い、"情報戦"と呼ばれる様相を呈しております。また、それに呼応する方々が様々な立場で意見を表しています。
　その情報や意見の中には正しいもの、誤っているもの、そのどちらでもないものがあり、情報を恣意的に用いる人々もいるでしょう。
　Twitterはツイートを消すことができます。後から情報・発信者の妥当性を確認するためには、包括的にデータを取得・分析し、ログとして証拠を残すことが大切です。

本記事では、Rを用いてTwitter情報のログを残す方法を紹介し、また発信者の関係性について推定・可視化する方法を示します。

●Methods to be introduced ─ 紹介する手法

Propaganda based on incorrect facts is often contradicted and can be shown to be erroneous over time. Therefore, in "information warfare" on SNS, it is important to properly record past statements and sources, and to point out inconsistencies and omissions to inappropriate information transmitters in order to publicly demonstrate the unreliability of the information concerned.
誤った事実に基づくプロパガンダは、時間が経つにつれて矛盾が生じ、誤りであることが明らかになることがあります。
したがって、SNS上の"情報戦"においては、過去の発言や情報源を適切に記録し、不適切な情報発信者に対して矛盾や抜けを指摘することで、当該情報の信頼性の低さを公に示すことが重要です。
Here are some simple techniques to help with this.
このことに役立つ、簡単な手法を紹介します。

・Retrieving tweets by word search
　語句検索によるツイートの取得
・Mass acquisition of tweets from specific personal accounts
　特定の個人アカウントのツイートの大量取得
・Mass acquisition of screenshots of tweets
　ログ証拠としてのTweetスクリーンショットの大量取得
・Relevance inference for tweeters using retweet data
　リツイートデータを利用したツイート者の関連性推論

●What you need ─ 必要なもの

・personal computer ─ PC
・Internet connection ─ インターネット接続
・Twitter API key (free) ─ Twitter APIキー(無料)

●Step 1 Install R and Rstudio ─ RとRstudioのインストール

Install R and Rstudio on your PC.
See the following pages.

RとRstudioをPCにインストールします。
以下のページを参照してください。

●Step2 Obtain a Twitter API key ─ Twitter API keyの取得

If the key is not used continuously or in large quantities, this step can be skipped because the key in the rtweet package described below can be used temporarily.
継続して・大量に使用しない場合は後述するrtweetパッケージのkeyが一時的に使用できるため本ステップを飛ばすことができます。

See the following pages.

以下のページを参照してください。

The following information is used
・The name of the configured app
・consumer_key
・consumer_secret
・access_token
・access_secret

以下の情報を使用します。
・設定したapp名
・consumer_key
・consumer_secret
・access_token
・access_secret

●Retrieving tweets by word search
　語句検索によるツイートの取得

# load package
if(!require("tidyverse")){
  install.packages("tidyverse", dependencies = TRUE)
  library("tidyverse")
}
if(!require("rtweet")){
  install.packages("rtweet", dependencies = TRUE)
  library("rtweet")
}

# set token if you need
twitter_token <- create_token(app = "***", # app_name
                              consumer_key = "***", # Consumer Key
                              consumer_secret = "***", # Consumer Secret
                              access_token = "***",
                              access_secret = "***")
# Get today's date and time
today <- Sys.Date()
now <- Sys.time()
ti <- format(now, "%H%M%S")

# set search_word
x <- "Russia invasion" 
# Number of tweets acquired
n <- 18000

# Retrieving Tweets
tweet_data <- search_tweets(x, 
                            n = n, 
                            retryonratelimit = TRUE, # If you get less than 18000 tweets, set it to FALSE.
                            include_rts = FALSE # Set TRUE if you want to include retweets.
                            )

# save data as csv
write_as_csv(tweet_data,
             paste(today, "_", x, "_tweettxt.csv",sep = ""), 
             fileEncoding = "UTF-8")

# load data from csv
tweet_data <- readr::read_csv(choose.files(), 
                              locale=locale(encoding="UTF-8")
                              )

Tweets data can be obtained as follows.
以下のようにツイートデータが取得できます。

Tweets can be retrieved retroactively for the past approximately 7 days, and 18,000 tweets can be retrieved every 15 minutes.
Tweets exceeding 18,000 tweets will be retrieved after a 15-minute wait time.

過去およそ7日間のツイートを遡って取得可能で、15分ごとに18,000件のツイートが取得できます。
18,000件を超えるツイートは15分の待ち時間を経てから取得されます。

The acquired tweet data can be used for various analyses.
As an example, the following code analyzes when accounts with tweets containing "Russian invasion" were created.

取得したツイートデータは、さまざまな分析に利用することができます。
例として、「Russian Invasion」を含むツイートをしたアカウントがいつ作成されたかを解析するコードを示します。

account_created <- tweet_data %>% 
  distinct(user_id, .keep_all = TRUE) %>% 
  mutate(account_created_at = round_time(account_created_at, n = "months")) %>% 
  group_by(account_created_at) %>% 
  summarise(n = n()) %>% 
  arrange(desc(account_created_at))

# show data
account_created
# total account
sum(account_created$n)

> account_created
# A tibble: 190 x 2
   account_created_at      n
   <dttm>              <int>
 1 2022-02-27 00:00:00   448
 2 2022-01-28 00:00:00   324
 3 2021-12-29 00:00:00   148
 4 2021-11-29 00:00:00   140
 5 2021-10-30 00:00:00   125
 6 2021-09-30 00:00:00   119
 7 2021-08-31 00:00:00   119
 8 2021-08-01 00:00:00    93
 9 2021-07-02 00:00:00    78
10 2021-06-02 00:00:00    71
# ... with 180 more rows

> sum(account_created$n)
[1] 12963

18,000 tweets were made by 12963 accounts, indicating that many accounts were created in the past month.
18000件のツイートは12963アカウントによって行われており、更にこの1ヶ月でRussian Invasionについてツイートする多くのアカウントが作成されたことがわかります。

The question of what information the newly created accounts are spreading may be worth considering.
新たに作成されたアカウントがどのような情報を拡散しているのかという問題は、検討する価値があるかもしれません。

For more examples of analysis, please refer to the recipes on the following pages.
より多くの分析例は下記ページのレシピを参考にしてください。

●Mass acquisition of tweets from specific personal accounts
　特定の個人アカウントのツイートの大量取得

If there is a particular personal account you are interested in, you can retrieve up to 3200 most recent tweets from that account.
気になる特定の個人アカウントがある場合、そのアカウントの最新ツイートを最大3200件まで取得することができます。

# load package
if(!require("tidyverse")){
  install.packages("tidyverse", dependencies = TRUE)
  library("tidyverse")
}
if(!require("rtweet")){
  install.packages("rtweet", dependencies = TRUE)
  library("rtweet")
}

# set token if you need
twitter_token <- create_token(app = "***", # app_name
                              consumer_key = "***", # Consumer Key
                              consumer_secret = "***", # Consumer Secret
                              access_token = "***",
                              access_secret = "***")
# Get today's date and time
today <- Sys.Date()
now <- Sys.time()
ti <- format(now, "%H%M%S")

## Same as the aforementioned code up to this point. ##

# set account ID
x <- "RusEmbassyJ" 
# https://twitter.com/RusEmbassyJ
# Number of tweets acquired
n <- 3200

# get data
timeline <- get_timeline(x, 
                         n = n
                         )

# save data as csv
write_as_csv(timeline,
             paste(today, "_", x, "_tweettxt.csv",sep = ""), 
             fileEncoding = "UTF-8")

# load data from csv
tweet_data <- readr::read_csv(choose.files(), 
                              locale=locale(encoding="UTF-8")
                              )

It is possible to quickly obtain and save logs of tweets and retweets from specific accounts as csv data, which is useful for verification activities at a later date.
特定アカウントのツイートやリツイートのログをcsvデータとしてスピーディに取得・保存することが可能であり、後日の検証活動に有用です。

●Mass acquisition of screenshots of tweets
　ログ証拠としてのTweetスクリーンショットの大量取得

While the above log data storage by csv file has a large amount of information, it is difficult to handle as evidence to be presented on the Internet.
I will show how to obtain a large number of screenshots of tweets from the acquired account data.
上記のcsvファイルによるログデータ保存は、情報量が多い反面、インターネット上に提示する証拠としての取り扱いが困難です。
そこで、取得したアカウントデータから、ツイートのスクリーンショットを大量に取得する方法を紹介します。

# load package
if(!require("tidyverse")){
  install.packages("tidyverse", dependencies = TRUE)
  library("tidyverse")
}
if(!require("rtweet")){
  install.packages("rtweet", dependencies = TRUE)
  library("rtweet")
}
if(!require("remotes")){
  install.packages("remotes", dependencies = TRUE)
  library("remotes")
}
if(!require("webshot2")){
  remotes::install_github("rstudio/webshot2")
  library("webshot2")
}



# set token if you need
twitter_token <- create_token(app = "***", # app_name
                              consumer_key = "***", # Consumer Key
                              consumer_secret = "***", # Consumer Secret
                              access_token = "***",
                              access_secret = "***")
# Get today's date and time
today <- Sys.Date()
now <- Sys.time()
ti <- format(now, "%H%M%S")

# set account ID
x <- "RusEmbassyJ" 
# https://twitter.com/RusEmbassyJ
# Number of tweets acquired
n <- 100 # It takes time, but up to 3200.

# Create a folder named YYYYY-MM-DD-HHMMSS_pics and move the working folder there
dir.create(paste(today, "-", ti, "_", x, "pics", sep = ""), showWarnings = F, recursive = T)
setwd(paste0(today, "-", ti, "_", x, "pics/", sep = ""))

# get account tweets
timeline <- get_timeline(x, 
                         n = n
                         )

for (i in 1:length(timeline$status_url)) {
  webshot2::webshot(timeline$status_url[i], 
                    paste0(x, "_", timeline$status_id[i], ".png", sep = ""), 
                    zoom = 1, vwidth = 560, vheight = 2500, delay = 1.0)
}

# set the working directory to home
setwd(Sys.getenv("HOME"))

The URL is extracted from the tweet data acquired by rtweet, and a screenshot is acquired in a for loop using rstudio/webshot2 headless chrome browser.
rtweetで取得したツイートデータからURLを抽出し、rstudio/webshot2ヘッドレスクロームブラウザを用いてスクリーンショットをfor loopで取得しています。

The file name is TwitterID_statusID.
File名はTwitterID_StatusIDとなります。

●Relevance inference for tweeters using retweet data
　リツイートデータを利用したツイート者の関連性推論

A network analysis of retweets can be performed to verify what groups of accounts are retweeting tweets containing the search term.
This is a useful technique to visualize which accounts are the originators, the sources, and what groups they form.
リツイートのネットワーク分析を行うことで、どのようなグループのアカウントが検索語を含むツイートをリツイートしているかを確認することができます。
どのアカウントが発信元で拡散力が強く、どのようなグループを形成しているのかを可視化するのに有効な手法です。

# load package
if(!require("tidyverse")){
  install.packages("tidyverse", dependencies = TRUE)
  library("tidyverse")
}
if(!require("rtweet")){
  install.packages("rtweet", dependencies = TRUE)
  library("rtweet")
}
if(!require("remotes")){
  install.packages("remotes", dependencies = TRUE)
  library("remotes")
}
if(!require("twinetverse")){
  remotes::install_github("JohnCoene/twinetverse")
  library("twinetverse")
}

# set token if you need
twitter_token <- create_token(app = "***", # app_name
                              consumer_key = "***", # Consumer Key
                              consumer_secret = "***", # Consumer Secret
                              access_token = "***",
                              access_secret = "***")
# Get today's date and time
today <- Sys.Date()
now <- Sys.time()
ti <- format(now, "%H%M%S")

# set search_word
x <- "deep state filter:retweets"  # add filter:retweets
# Number of tweets acquired
n <- 500

tweets <- search_tweets(q = x, 
                        n=500, include_rts = TRUE) # include_rts = TRUE


net <- tweets %>% 
  gt_edges(screen_name, mentions_screen_name) %>% # Get edge data
  gt_nodes() %>% # Get nodes data
  gt_collect() # collect
lapply(net, class)

# Get edges and nodes
c(edges, nodes) %<-% net 
##edges <- net$edges  
##nodes <- net$nodes 

# Convert to sigmajs data
nodes <- nodes2sg(nodes)
edges <- edges2sg(edges)

sigmajs(height = "800px") %>% 
  sg_nodes(nodes, id, label, size) %>% 
  sg_edges(edges, id, source, target) %>% 
  sg_layout(layout = igraph::layout_components) %>% 
  sg_cluster(
    colors = c(
      "#0084b4",
      "#00aced",
      "#1dcaff",
      "#c0deed"
    )
  ) %>% 
  sg_settings(
    minNodeSize = 1,
    maxNodeSize = 2.5,
    edgeColor = "default",
    defaultEdgeColor = "#d3d3d3"
  )

Retweet network diagram of tweets containing "deep state."
Every single point (node) is an account. Lines (edges) indicate retweets between accounts.
It shows that there are about 8 accounts with high spreading power.
By hovering the mouse cursor over each point (node), the account name is displayed, allowing you to examine the accounts with high spreading power.
"deep state "を含むツイートのリツイートネットワーク図。一つ一つの点（ノード）がアカウント、線（エッジ）は、アカウント間のリツイートを示しています。
拡散力の高いアカウントが8つほどあることがわかります。各ポイント（ノード）にマウスカーソルを合わせると、アカウント名が表示され、拡散力の高いアカウントを調べることができます。

This network diagram allows visualization of where information originated and how it spread. It is useful in identifying accounts that are spreaders of information.
このネットワーク図では、情報がどこから発信され、どのように広がっていくかを可視化することができます。情報の拡散者であるアカウントを特定するのに有効です。

I have used the code from the following page.
コードは以下のページよりいただきました。

For more detailed information on the use of twinetverse, please refer to the following.

●conclusion ─ 最後に

Thank you for your interest.
If you have any questions about the program or would like me to analyze it for you, please contact me by DM on Twitter. If it is appropriate, I will be happy to respond.
I hope you will find this information useful for your information confirmation and defense on social networking sites.
ご興味をお持ちいただき、ありがとうございます。
プログラムについてのご質問や、解析してほしいことなどありましたら、TwitterのDMでご連絡ください。適切な内容であれば、対応させていただきます。
SNSでの情報確認や防衛にお役立ていただければ幸いです。

万が一サポート、感想、コメント、分析等のご相談などございましたらお気軽に。