Projects

Data Science Projects

Studying the interaction between microblog sentiment and stock returns

  • Objective: to test the effect of investor sentiment on stock return considering potential causality loop
  • Process: extracted investor sentiment from 18 million StockTwits messages using NLP algorithms; modeled the relationship between investor sentiment and stock return using panel vector autoregression.
  • Outcome: investor sentiment is found to predict stock return at the hourly frequency.

Examining design consistency in web pages

  • Objective: to test the role of design consistency in web pages.
  • Process: created web pages for charity using different levels of design consistency; used randomized experiments (A/B testing) to collect responses from survey subjects; performed multivariate statistical analysis among different treatment groups.
  • Outcome: design consistency is found to positively influence users’ donation amount.

Linguistic cues that impact online petition success

  • Objective: to examine what linguistic cues may affect the success of online petitions; the findings may help digital activists more effectively promote causes online.
  • Process: crawled 30,000 online petition pages from Change.org; measured the level of cognitive, emotional, and moral arguments in petition text using NLP techniques; tested the effect of the three linguistic factors on petition success using Logistic regression.
  • Outcome: the three linguistic factors are found to significantly impact petition success.

Explaining envy on social media platforms

  • Objective: to examine what social media activities are associated with envy; the findings may help facilitate a healthy social media environment.
  • Process: streamed Twitter messages containing the keyword envy; filtered data to retain only messages expressing envy; identified benign envy and malicious envy tweets using sentiment analysis; modeled the relationship between the two types of envy with follower count, following count, message count, and like count.
  • Outcome: found different types of social media activities are associated with different types of envy.

Predicting labor strike using online texts

  • Objective: to alert companies about potential labor strike by analyzing web pages.
  • Process: collected labor strike data from Department of Labor Statistics; retrieved web pages related to the affected companies using Google News; extracted sentiment from the retrieved web pages.
  • Outcome: found significantly spike of negative sentiment a couple of weeks before labor strikes.

Detecting accounting fraud in financial report

  • Objective: to more effectively detect accounting fraud by analyzing texts in financial reports.
  • Process: parsed 10K reports using Stanford NLP; proposed a convolutional tree kernel for SVM.
  • Outcome: improved F-measure by 3% in classifying fraudulent financial reports.

Resolving ambiguity in microblog messages using dependency features

  • Objective: to improve the accuracy of short text classification.
  • Process: parsed Twitter messages into dependencies using Stanford NLP and CMU Ark.
  • Outcome: classification using the dependency features robustly outperform popular baselines.