Posts

In April 2020, S&P Global Market Intelligence and Social Market Analytics, Inc. (SMA) launched ‘Machine Readable Filings’ (MRF), a sophisticated textual data offering which applies Parsing and Natural Language Processing to generate machine readable text extracted from SEC Regulatory Filings. Machine Readable Filings allows businesses and investors to incorporate more qualitative measures of company performance into their investment strategy by using machine readable text from full or individual sections of regulatory filings to enhance their analysis of companies. The parsed textual data allows firms to drill down on both historical and new filings in near real-time.  My last blog introduced the product and illustrated some basic return characteristics present in filings word count.

This blog explores the predictive nature of filings using SMA patented NLP and machine learning. For our analysis we used all active securities with a price greater than 5 dollars. Our analysis starts in 2006.  Securities are broken into quintiles based on each factor.  These factors are samples of the extensive metrics that can be created with this data.  Quintiles are re-balanced monthly based on each company’s most recent filing. 10-Q’s are compared to prior 10-Q’s and 10-Ks are compare to prior 10-K’s.  These are not meant to be trading models. They illustrate the predictive power of the data and use as broad a universe as possible. Two interesting distributions are below: distribution of word counts for 10-K (mean 36,000) and distribution of average sentiment.  As you can see companies try and keep the 10-K as upbeat as possible.

Our first factor is Change in Sentiment Hits. Sentiment hits are the number of times our NLP was able to identify a word or segment in a sentence. Positive hits + Negative hits + Neutral hits.  The green line represents filings with the largest increase in sentiment hits while the red line represents filings with the largest decrease in sentiment hits. Large increases in sentiment hits tend to under perform and large decreases in sentiment hits tend to outperform its peers.

The quintile performance characteristics are below.   Although quintile 2 and 3 are out of order you see the average values for those quintiles are near zero.  Quintile 1 outperforms quintile 5 by 3% annualized.

The next factor we are analyze is what percentage of the document does the parser hit.  Many filings are filled with general information not necessarily providing meaningful statements.   The green line represents filings with the highest percentage of sentiment hits in the document while the red line represents filings with the lowest percentage of sentiment hits in the document. A higher percentage of sentiment hits tend to  outperform and a lower percentage of sentiment hits tend to under perform its peers.   Companies with documents containing more meaningful content outperform companies with documents with less meaningful content by about 3.5% annualized.

Quintile 5 – Quintile 1 annualized is 3.5%

The third factor we are exploring is changes in negative hits.  Companies with increasing negative hits are discussing more negative information than prior quarters, they subsequently under perform.  The green line represents filings with the largest increase in negative hits while the red line represents filings with the largest decrease in negative hits. A large increase in negative hits tend to under perform and a large decrease in negative hits tend to outperform its peers.

The last factor we explore is cumulative document sentiment.  Quintiles are based on summations of all sentiment hits in the document.  More common analysis of sentiment is by section.  We identify parts sections and subsections in this product providing a myriad of ways to analyze the data.  At the most aggregated level sentiment is predictive.   Document length has a large impact on overall sentiment.  Z-Scores of this factor are a good way to compare prior documents.  As you can see in the chart companies with more positive total document sentiment tend to outperform companies with more negative total sentiment.

Quintile 5 outperforms quintile 1 by 1.7 percent annualized.

There are many ways to analyze the MRF data set. Filings are parsed by Item, Section, and sub-Sections to 2006 for historical back testing. This analysis looked at only 10-K’s and Q’s ‘Machine Readable Filings’ (MRF) cover 20 types of SEC filings. This blog covers a small portion of the research. The U.S. SEC Edgar Data is live on the S&P Xpressfeed. International Reports will be released later in 2020. To learn more or to start a trial please ContactUS@SocialMarketAnalyitcs.com.

Visit Our Website

Social Market Analytics (SMA) tracks real-time sentiment on equities, commodities, currencies, ETF’s and crypto currencies.  SMA has the most powerful and customizable Alerting API combining Twitter sentiment and pricing metrics.  Users receive custom real-time sentiment alerts on instruments in their watch list.  For example, on December 11, 2018, SMA’s alerting system sent an alert on Corn at 12:12 pm CT when corn was @ $385.25. Below is the email and mobile alert.

Cornalert

Mobile

Subsequent to the alert, corn moved lower starting at 12:17pm CT. The price continued to move lower the remainder of the day and closed at $383.25. (See chart below)

Corn Alert

The above alert was based on SMA’s rolling 24-hour sentiment. SMA also calculates a Long-term sentiment with longer price projection periods.  Corn’s long-term S-Factor flipped from positive to negative on November 14th. 12/10 was the first day the long-term S-Factor for corn reached a significantly negative level of -1.5 standard deviations more negative than the longer-term baseline conversation. For more information please contactUs@SocialMarketAnalytics.com

This year has been tough for most investment strategies.  Firms using traditional sources of data are generating the same underwhelming returns.  Two years ago, Social Market Analytics, Inc.  (SMA)  (Twitter)   launched the SMLCW index in partnership with the CBOE.  This index is re-balanced weekly and comprised of the twenty-five securities selected from the CBOE large cap universe with the highest average S-Score over the prior week.  It’s A long only index of super-cap stocks with unusually positive Twitter conversations.

SMA publishes a family of metrics providing a full representation of the Twitter conversation across equities (US and LSE), commodities, currencies, ETF’s & Cryptos.

S-Score is a normalized representation of the current Twitter conversation of professional investors as identified by Social Market Analytics patented algorithms.  SMA has access to the full Twitter feed through our licensed partnership with Twitter and listens in real-time for any mention of topics and securities of interest.  These Tweets are scanned in real-time for sentiment and influence of the poster and compared to prior conversations over the look back period.  Securities with higher S-Scores subsequently outperform and securities with negative S-Scores under-perform.

SMA S-Scores are predictive over multiple prediction periods.  With seven years of out-of-sample data we can extend our comparison baselines and predict over longer periods.

Year-To-Date the SMLCW index is up over 7.5% while the SP500 is flat.  Subtracting a couple percent for commissions/slippage and the index is still significantly positive. This is not a back-test, this index has been live and on your quote screens for nearly two years.  YTD actual performance chart from the CBOE site is below.

SMLCW - YTD

As mentioned, this is a long only index.  During the recent market drawdown this long index has been performing.  SMA negative S-Score stocks have been moving lower at a significant rate – generating positive alpha.  Below is a chart of the SMLCW index compared to the SP500.  for any questions or to learn more please contact us at:  ContactUs@SocialMarketAnalytics.com.

Thanks,

Joe

 

Social Market Analytics, Inc. (SMA) aggregates the intentions of professional investors as expressed on Twitter & StockTwits and publishes a series of metrics that describes the current conversation relative to historical benchmarks.  Our data is a leading indicator of price movement both positive and negative.

There is unique predictive information in unstructured content.  Social Market Analytics use AI and Machine Learning techniques developed over the last eight years to convert this unstructured content into data suitable for quantitative analysis. This opens a whole new area of big data analysis.

Social Market Analytics (SMA) calculates predictive sentiment on the entire US equity universe, Currencies, Commodities, Crypto currencies, ETF’s and custom sources.   This blog is about the predictive nature of our LSE security universe.  We calculate our custom metrics on the top 1000 market cap securities listed on the LSE.  Our LSE data starts on 1/1/2016. Below is a cumulative quintile distribution of returns based on our S-Score metrics.  Our S-Score is effectively a Z-Score comparing 24-hour sentiment based on the Tweets of professional investors compared to a 20-day baseline.   Prediction periods vary per asset class and baseline. Longer baseline comparisons lead to longer prediction periods.

Stocks with abnormally positive conversations typically outperform their peers and stocks with abnormally negative conversations typically underperform their peers.  As expected conversations with normal positive or negative tones perform like the overall market.

Below is a typical quintile chart for the LSE 1000 universe tracked from post Brexit to 8/31/2018. The spread between the top and bottom quintiles is 10% annualized.   Sharpe and Sortino ratios are in the table below that.  To learn more or request a historical data set contact SMA with any questions ContactUS@SocialMarketAnalytics.com

LSEQuintiles 1

LSE Quintiles2

Social Market Analytics (SMA) publishes real time Twitter based sentiment for nearly 300 crypto currencies including Bitcoin.  To view Bitcoin sentiment values and 35 other commodities in real time, go to the CME Active Traders website.   Twitter based sentiment has proven to be strongly predictive for Bitcoin and other commodities.

Today we will review a sentiment-based Z-Score strategy to generate profitable trades for Bitcoin.  This is similar to traditional standard deviation band strategies calculated with price.

When Twitter volume from certified investors is abnormally high use the sentiment of the abnormally large conversation to select entry points.  Strategy overview is below:

CMEBitcoin 1

A visualization of the strategy is below. When the Z-Score of Social Market Analytics Indicative Twitter volume is greater than the threshold and the tone of the conversation is significant enter or modify trades.  Sentiment  > 2 standard deviations and the volume of the conversation is high enter a position.  Positions are modified based on further extensions of the Z-Score.

CMEBitcoin2

Test period is from 1/1/2017 to current.  Overall results below.  For more detailed results on this and other strategies contact ContactUS@SocialMarketAnalytics.com

CMEBitcoin3

SMA has examples of profitable applications of Twitter based sentiment to many coins.