Ognjen Marković | Twitter Text Sentiment Classification

In this class project at Stanford University, I worked in a team to examine and classify Twitter text sentiment. We classified whether tweets support one or the other position on the topic of Brexit and whether a tweet can be classified as coming from a bot account or not. We used classifiers such as logistic regression, naive Bayes, and the passive-aggressive classifier. The data was obtained from both pre-classified datasets and by scraping Twitter accounts for tweets with known sentiments. After training the classifiers, we used unsupervised learning (k-means clustering) and dimensionality reduction (t-SNE) to examine tweet topic clusters. The project achieved 90% classification accuracy of remain/leave camps and bot/not bot accounts using simple logistic regression and bag-of-words approach. The unsupervised learning topic clustering on classified tweets was used to further explore the topics discussed by both camps.