Posts

In April 2020, S&P Global Market Intelligence and Social Market Analytics, Inc. (SMA) launched ‘Machine Readable Filings’ (MRF), a sophisticated textual data offering which applies Parsing and Natural Language Processing to generate machine readable text extracted from SEC Regulatory Filings. Machine Readable Filings allows businesses and investors to incorporate more qualitative measures of company performance into their investment strategy by using machine readable text from full or individual sections of regulatory filings to enhance their analysis of companies. The parsed textual data allows firms to drill down on both historical and new filings in near real-time.  My last blog introduced the product and illustrated some basic return characteristics present in filings word count.

This blog explores the predictive nature of filings using SMA patented NLP and machine learning. For our analysis we used all active securities with a price greater than 5 dollars. Our analysis starts in 2006.  Securities are broken into quintiles based on each factor.  These factors are samples of the extensive metrics that can be created with this data.  Quintiles are re-balanced monthly based on each company’s most recent filing. 10-Q’s are compared to prior 10-Q’s and 10-Ks are compare to prior 10-K’s.  These are not meant to be trading models. They illustrate the predictive power of the data and use as broad a universe as possible. Two interesting distributions are below: distribution of word counts for 10-K (mean 36,000) and distribution of average sentiment.  As you can see companies try and keep the 10-K as upbeat as possible.

Our first factor is Change in Sentiment Hits. Sentiment hits are the number of times our NLP was able to identify a word or segment in a sentence. Positive hits + Negative hits + Neutral hits.  The green line represents filings with the largest increase in sentiment hits while the red line represents filings with the largest decrease in sentiment hits. Large increases in sentiment hits tend to under perform and large decreases in sentiment hits tend to outperform its peers.

The quintile performance characteristics are below.   Although quintile 2 and 3 are out of order you see the average values for those quintiles are near zero.  Quintile 1 outperforms quintile 5 by 3% annualized.

The next factor we are analyze is what percentage of the document does the parser hit.  Many filings are filled with general information not necessarily providing meaningful statements.   The green line represents filings with the highest percentage of sentiment hits in the document while the red line represents filings with the lowest percentage of sentiment hits in the document. A higher percentage of sentiment hits tend to  outperform and a lower percentage of sentiment hits tend to under perform its peers.   Companies with documents containing more meaningful content outperform companies with documents with less meaningful content by about 3.5% annualized.

Quintile 5 – Quintile 1 annualized is 3.5%

The third factor we are exploring is changes in negative hits.  Companies with increasing negative hits are discussing more negative information than prior quarters, they subsequently under perform.  The green line represents filings with the largest increase in negative hits while the red line represents filings with the largest decrease in negative hits. A large increase in negative hits tend to under perform and a large decrease in negative hits tend to outperform its peers.

The last factor we explore is cumulative document sentiment.  Quintiles are based on summations of all sentiment hits in the document.  More common analysis of sentiment is by section.  We identify parts sections and subsections in this product providing a myriad of ways to analyze the data.  At the most aggregated level sentiment is predictive.   Document length has a large impact on overall sentiment.  Z-Scores of this factor are a good way to compare prior documents.  As you can see in the chart companies with more positive total document sentiment tend to outperform companies with more negative total sentiment.

Quintile 5 outperforms quintile 1 by 1.7 percent annualized.

There are many ways to analyze the MRF data set. Filings are parsed by Item, Section, and sub-Sections to 2006 for historical back testing. This analysis looked at only 10-K’s and Q’s ‘Machine Readable Filings’ (MRF) cover 20 types of SEC filings. This blog covers a small portion of the research. The U.S. SEC Edgar Data is live on the S&P Xpressfeed. International Reports will be released later in 2020. To learn more or to start a trial please ContactUS@SocialMarketAnalyitcs.com.

Visit Our Website

Social media is a new and rich source of trading ideas.  To illustrate this point, below are some recent  trading opportunities social media data presented.  In each case activity and sentiment increase prior to the actual event.  Social media is a leading indicator of stock performance and SMA is the leader in providing metrics based on social media.

Teva acquires Allergen

Teva Pharmaceutical Industries surged in pre-market trading on July 27, 2015 on news that the company will be acquiring Allergan’s (AGN) generic drug business. Before this happened, sentiment on Twitter had already become strongly positive. At 4:00 AM EDT, when the stock price was $66.00 there was significant positive sentiment on Twitter. The sentiment rapidly shifted positive. By 7:24 a.m., the stock was trading at $72.30. The stock opened at $67.80 when the sentiment was 3.92 and closed at $72.

Figure 1:  S-Score™ For TEVA Pre- and Post-Announcement.

TevaSentiment

TevaHistoricalSentiment

Historically, daily sentiment scores for TEVA fluctuated near 0 (Neutral), with low social media activity as indicated by the time series of the S-Volume™ metric.  This behavior started to change on July 26 with significant upticks in indicative Tweet volumes and sentiment levels.  On the morning of July 27th,TEVA’s S-Score™ increased sharply to a high positve level, coincident with a spike in S-Volume™ consistent with high social media activity, indicating that SMA’s processing technology had sucessfully detected the signature of positve sentiment for TEVA embedded in the Twitter data stream.  This high positive sentiment level persisted through the open on July 28th and then started to return to typical historical levels as the markets and social media fully integrated the effect of the announcement.

Rumored Announcement of Acquisition:  Twitter (TWTR)

On July 14, 2015 at 11:39 AM EDT, a rumor started spreading on Twitter about Twitter being acquired by Bloomberg.  At 11:40 AM, there was a Tweet from user ‘beckyhiu’ indicating that Bloomberg had offered $31 Billion to buy Twitter and that Twitter was considering the offer. This rumor caused the stock price to rise rapidly. A Tweet, about 30 seconds later, at 11.41 AM,  by ‘zerosum24’ confirmed that the rumor had reached Twitter and people had started talking about it. The sentiment had started rising rapidly by this time. The changes in S-ScoreTM and S-DeltaTM were significantly positive. At 11:42 AM, the sentiment was over 2, and was statistically significant.

It was soon realized this might be a hoax and that no offer was made. At 11:42 AM, ‘TurboResearch’ questioned the credibility of the buyout offer.

There had been no official statement from Bloomberg, and hence, both the sentiment and the stock price kept rising. At around 11:50 AM, a journalist from Bloomberg Tweeted that the news was a hoax and that it was not to be believed. At this point sentiment started declining as people starting tweeting negatively. The stock price dropped rapidly.  After that, there were mostly negative comments driven by the refuted rumor.  The figures below show SMA sentiment factors leading the stock price quite accurately.

TWTRSentimentPrice

Figure 2: TWTR S-ScoreTM vs. Price

TwitterVolumeSpike

Figure 3:  Intraday   S-Volume™ Chart for TWTR

Amazon (AMZN) Earnings Announcement 

Twitter sentiment can predict stock changes even after market close, as in the case of Amazon. Amazon reported earnings on July 23, 2015. While the market consensus was that the company would not beat expectations, the conversation on social media was different.

SMA data showed a sharp increase in sentiment metrics around 2:49 PM EDT. By 2:51 PM, the sentiment on Amazon was two standard deviations higher than its typical level. The stock was trading at $480.45 at this point. At market close, it traded at $482.18, higher than the price at the time when sentiment on Amazon became positive.

It was interesting to see how the stock traded after-hours once the company reported earnings. Amazon’s stock shot up more than 17% — to $568 — from its price at 3:51 PM EDT after the company reported a surprise quarterly profit. The hidden sentiment value in Twitter data predicted what “conventional” market speculators failed to predict.

AmazonTweets

AmazonEarnings

Figure 4:  Intraday S-Score™ And S-Volume™ Behavior across Amazon’s Earnings Event.

The progression of intraday S-Score™ and S-Volume™ metrics for Amazon is shown above from 1:00 PM EDT to 4:25 PM EDT.  Amazon’s sentiment remained positive throughout the day and became significant around 2:50 PM. The sentiment saw a sharp rise post the earnings announcement after market close.

We publish our own research and analysis.  We invite you to check our Research site for new updates and publications.

Thanks,

Joe