Data Science Projects
Studying the interaction between microblog sentiment and stock returns
- Objective: to test the effect of investor sentiment on stock return considering potential causality loop
- Process: extracted investor sentiment from 18 million StockTwits messages using NLP algorithms; modeled the relationship between investor sentiment and stock return using panel vector autoregression.
- Outcome: investor sentiment is found to predict stock return at the hourly frequency.
Examining design consistency in web pages
- Objective: to test the role of design consistency in web pages.
- Process: created web pages for charity using different levels of design consistency; used randomized experiments (A/B testing) to collect responses from survey subjects; performed multivariate statistical analysis among different treatment groups.
- Outcome: design consistency is found to positively influence users’ donation amount.
Linguistic cues that impact online petition success
- Objective: to examine what linguistic cues may affect the success of online petitions; the findings may help digital activists more effectively promote causes online.
- Process: crawled 30,000 online petition pages from Change.org; measured the level of cognitive, emotional, and moral arguments in petition text using NLP techniques; tested the effect of the three linguistic factors on petition success using Logistic regression.
- Outcome: the three linguistic factors are found to significantly impact petition success.
Explaining envy on social media platforms
- Objective: to examine what social media activities are associated with envy; the findings may help facilitate a healthy social media environment.
- Process: streamed Twitter messages containing the keyword envy; filtered data to retain only messages expressing envy; identified benign envy and malicious envy tweets using sentiment analysis; modeled the relationship between the two types of envy with follower count, following count, message count, and like count.
- Outcome: found different types of social media activities are associated with different types of envy.
Predicting labor strike using online texts
- Objective: to alert companies about potential labor strike by analyzing web pages.
- Process: collected labor strike data from Department of Labor Statistics; retrieved web pages related to the affected companies using Google News; extracted sentiment from the retrieved web pages.
- Outcome: found significantly spike of negative sentiment a couple of weeks before labor strikes.
Detecting accounting fraud in financial report
- Objective: to more effectively detect accounting fraud by analyzing texts in financial reports.
- Process: parsed 10K reports using Stanford NLP; proposed a convolutional tree kernel for SVM.
- Outcome: improved F-measure by 3% in classifying fraudulent financial reports.
Resolving ambiguity in microblog messages using dependency features
- Objective: to improve the accuracy of short text classification.
- Process: parsed Twitter messages into dependencies using Stanford NLP and CMU Ark.
- Outcome: classification using the dependency features robustly outperform popular baselines.