Text As Data.
So I ended up switching from rtweet to Academic Twitter API and ended up paying for premium twitter access since my request rates went over the limit. Below I read in the RDS file and convert tweets to ASCII to avoid character issues and emojis. I also remove html links, stopwords, punctuation, tabs, and the @ before usernames.
tweets <- readRDS("~/networks-blog/boliviatweetsfile")
tweets <- iconv(tweets$text, to = "ASCII", sub = " ")
tweets <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", tweets) # Remove the "RT" (retweet) and usernames
tweets = gsub("http.+ |http.+$", " ", tweets) # Remove html links
tweets = gsub("http[[:alnum:]]*", "", tweets)
tweets = gsub("[[:punct:]]", " ", tweets) # Remove punctuation
tweets = gsub("[ |\t]{2,}", " ", tweets) # Remove tabs
tweets = gsub("^ ", "", tweets) # Leading blanks
tweets = gsub(" $", "", tweets) # Lagging blanks
tweets = gsub(" +", " ", tweets) # General spaces
tweets = tolower(tweets)
tweets = unique(tweets)
glimpse(tweets)
chr [1:40843] "1rudecruzan dredadonx maadynyc bolivia staged a mass strike after america staged a coup and removed their democ"| __truncated__ ...
Creating the corpus