Article
Analyzing the Impact of Social Media Sentiment on Stock Investment Decisions Using Machine Learning Techniques
The study of social media analytics and financial market analysis has become one of the most important areas of current computational finance research. In this paper, we perform an extensive empirical analysis on the potential for machine learning (ML) approaches to capture and leverage sentiment information from social media to improve the process of investing in stocks. By using data over the course of several years, between 2021 and 2024, collected from more than 48 million social media posts, news articles, and financial analyst reports from social media platforms such as X (formerly Twitter), Reddit (r/WallStreetBets and r/investing), StockTwits, and news feeds, we construct and validate an ensemble method based on transformer architectures for NLP tasks along with LSTM and gradient-boosted ML algorithms. In our approach, we consider sentiment polarity scores, topic modeling, named entity recognition, as well as time-based aggregation to create sentiment feature vectors combined with conventional financial data, such as price/volume metrics, measures of volatility, macroeconomic signals, and momentum features at a sector level. Empirical findings from applying ML models to portfolios of 200 S&P 500 stocks show that our sentiment-based approach is capable of delivering statistical significance in generating alpha of 8.3% annually, beating benchmarks by Sharpe ratio by 0.41 points relative to purely technical approaches. We explore underlying mechanisms, consider the differential informativeness of retail vs. institutional investors' discussion on social media, as well as study dynamics of cascades as a tool for intensifying impact of sentiment shocks. Our results have important practical applications in designing trading algorithms, understanding retail behavior, and regulating social media market manipulations. We discuss the limitations of our study carefully.