One of the most closely followed events on the corporate calendar are earnings calls. This gives executives the opportunity to comment on earnings and answer questions from those outside of the company. Using our patented Natural Language Processing, Social Market Analytics scores Earnings Call Transcripts in real time and creates metrics based on sentiment, word count, and section count. For this research, we look specifically at the question-and-answer section of call transcripts. The theory is that isolating the section of the call where executives aren’t controlling the topic of conversation will give a more accurate assessment of the sentiment surrounding earnings results. We use Sum of Sentiment to quantify the positivity of the call. Sum of Sentiment adds all the words and phrases tagged in the section with sentiment. The following histogram shows the distribution of the Sum of Sentiment variable.

The Sum of Sentiment is centered around 3.5 and is roughly normal with a heavy tail skewing right. As executives of companies want to express good things to come, it makes sense that the sum is predominantly positive. Still some earnings calls are more positive than others. Based on the distribution of sentiment, we defined an extremely positive earning call as having a sum greater than 5 and a negative earning call as having a sum less than 1.5. These thresholds give a roughly equal number of instances over the past 14 years. We took these thresholds and compared returns for different time periods following the Earnings Call. Time periods were subsequent Open-to-Close; subsequent Close-to-Close, subsequent week return, subsequent month return, and subsequent quarter return. Since earnings calls are spaced throughout the year, it is difficult to compound the subsequent returns. Instead, we will be looking at average excess returns for each threshold. The excess return for each security is calculated by subtracting the SPY return of the same time frame from the securities return. Our hypothesis is that the average excess returns for the extremely positive earnings calls will be higher than those for negative earnings calls. We calculated returns of all instances since the end of 2009.

For long-term holdings, the average excess return for high sentiment earnings call companies was strongly positive. On the contrary, negative sentiment earning calls company returns were negative for every time frame. Quarterly returns highlight the importance of a positive earnings call as the average excess return is close to .8% higher than negative. The biggest takeaway from different time periods was the large difference in returns between the next Open-to-Close and the next Close-to-Close, especially those with a high sentiment. Entering on the subsequent close rather than open dropped the excess returns by .8% and made them negative. The next close to close returns were negative regardless of the sentiment threshold. Waiting to enter removed the benefit of positivity from high sentiment. We looked at the returns of these two-time frames with high sentiment over the past 13 years.

Looking at the past 14-year performance: The Open-to-Close excess returns were positive 12/14 years and Close-to-Close excess returns were positive 4/14 years. The two negative years for the Open-to-Close also came during an abnormal period of the COVID-19 pandemic. Immediate open to close return benefits from the high sentiment far more than the close to close. Therefore, there is a premium on knowing the sentiment of an Earnings Call in real time and entering the next open to maximize short term returns. Instead of manually reading earnings calls to gain insights, traders can use the sentiment summarized by Social Market Analytics to select positions. Waiting to enter on positive earnings calls generally hurts the short-term returns. Social Market Analytics’ scoring on Earning Calls can give traders the advantage of entering the position as quickly as possible for immediate returns, while also providing a holding option for quarterly returns.

If you are interested in learning more about how SMA’s Earnings data can help your trading strategies, please email us at or schedule a demo using this link.

Explore sentiment on earnings calls and all corporate filings on the SMA Unstructured Data Terminal below.

The target of this research was to find an indicator that helps predict the direction of the overall US Equity market for the next week using sentiment data from the previous week. The hypothesis is when there is high volatility in sentiment over the previous week, which means investors have differing opinions, the subsequent week overall market performance will underperform. When volatility on sentiment is low or neutral, the crowd has reached a consensus and the general market will outperform over the next week. The sentiment metric used to represent volatility is Raw-Volatility in SMA’s S-Factor data feed, which captures the volatility of the sentiment from Twitter conversations. All Raw-Volatility data points were taken from the 3:40 pm ET timestamp (20 minutes before the market close). We calculated the summation of Raw-Volatility for each date as a proxy to represent the volatility of Twitter social sentiment on the entire market. The exact calculation is as follows, where “N” is the number of companies with sentiment on that date and “D” is the date:

We then created a 7-day standardized volatility using a 91-day benchmark:

This Z_Volatility score follows a roughly normal distribution.

Using the S&P 500 ETF Trust (SPY) as a proxy of general market performance, we then look at the relationship between Z_Volatility and SPY’s return series. The daily close-to-close return is calculated as:

Hypothesis: When Z_Volatility for the previous closing Date is high, the subsequent market performance will be lower. When Z_Volatility is low or neutral, the next day’s market performance will be higher.

To test this, our strategy is to open short position of SPY when Z_Volatility > 1. When Z_Volatiltiy is =< 1, the portfolio treats SPY as a long position. This hypothetical portfolio is then compared to SPY over the past 10 years:

Prior to the COVID-19 pandemic, which began in early 2020, SPY outperformed the modified portfolio. However, since then the behavior of this factor changed drastically. Here is the same graph as above starting in 2020:

Taking a closer look, the separation since the beginning of 2020 is quite significant. Adding a short position to SPY when volatility on sentiment is high, has enhanced the portfolio’s return. Even though many of the days will maintain a long position, the Z-Volatility is predictive of downturns in the market since 2020. Traders could use this metric as an indicator to stay out of the market, or at the very least trade with more caution. The COVID-19 Pandemic led to a large amount of uncertainty surrounding the stock market and the direction its heading. A high Z_Volatility score indicates the public’s opinion is more uncertain about the direction of various stocks. This research shows the value of sentiment from Social Market Analytics in predicting macro-level events and price movements.

If you are interested in learning more about how SMA’s S-Factor data can help your trading strategies, please email us at or schedule a demo using this link.

Social Market Analytics converts textual data into quantitative signals for the investment community. To complement our U.S. Equity feed, we recently launch a Twitter based sentiment feed covering the largest TSX (Toronto Stock Exchange) equities. The initial universe included 233 TSX equities, and recently expanded to include an additional 200+ equities. The out-of-sample date for this dataset is January 20th, 2022 with history extended back to the beginning of 2020.

Similar to other asset classes in SMA’s database, the TSX asset class publishes Activity, S-Factor, Short Squeeze, and Hard to Borrow data feeds. To test the robustness of this new asset class, we conducted Daily Quintile and Threshold tests on the S-Factor feed.

TSX Equities Quintiles

In this Quintile test we look at SMA’s S-Factor feed at 3:40 pm ET (20 minutes prior to Market Close). Stocks are placed into different Quintile buckets based on the value of their S-Score. The S-Score is one of fifteen factors supplied in SMA’s S-Factor feed and provides a daily view of Social Sentiment from Twitter. S-Scores greater than 2 or less than -2 are considered extreme sentiment, while values closer to 0 indicate neutral sentiment. The lowest 20% of S-Score values are placed in Quintile 1 while the highest 20% of S-Score values are placed in Quintile 5. We look at subsequent Close-to-Close returns with each stock equally weighted within each quintile. Below is the daily cumulative return series of the TSX asset class quintiles.

After placing stocks in their proper Quintiles and taking the average return of each bucket, stocks with higher S-Scores tend to outperform stocks with lower S-Scores. The graph above shows a monotonic factor. This graph includes a large spread between Quintiles 1 and 5, which would result in a cumulative return of nearly 70% in 27 months. On average, the TSX asset class publishes 141 securities each market day. This number will only increase as SMA’s TSX universe expands.

We further explore the S-Score signal by filtering on securities that had 3 or more Tweets within the previous 24 hours (S-Volume >= 3), then conduct the same quintile analysis.

This new filter reduces the number of securities distributed in the graph from 141 to 95 securities each day. However, it also increases the Sharpe ratio and spread between Quintile 1 and Quintile 5.

TSX Thresholds

For this test, we look at all TSX securities at 3:40 pm ET and place securities in the ‘Long’ bucket if their S-Score >= 2 and place securities in the ‘Short’ bucket if their S-Score =< -2. Since 2020, there are an average of 5 stocks per day in ‘Short’ bucket and 17 stocks per day in the ‘Long’ bucket. We look at the subsequent Close-to-Close return for each security. Securities are equally weighted within each portfolio. In the Long/Short portfolio, the Long and Short baskets are equally weighted.

When only looking at securities with S-Score >= 2, the annualized return of this portfolio is greater than 45%. This portfolio also has a Sharpe Ratio of 1.57 which is well above the market benchmark (S&P/TSX Composite Index).

Many insights can be extracted using SMA’s TSX Social Media feeds. To find out more on this topic, email us at or schedule a meeting using our 1 on 1 Meeting Signup.

The SMA research team has done a tremendous amount of research on Machine readable filings. This Blog is taken from a research paper authored by Koby Weisman. SMA partnered with S&P Global Market Intelligence to provide textual data in U.S. SEC EDGAR filings broken down by heading with text underneath (i.e. Parts, Items). The textual data is parsed to create historical baselines for 10-Ks, 10-Qs, 8-Ks, 20-Fs and other filings. This paper focuses on word counts, sentiment factors, and the change in those factors. There are 20 filing types in the MRF product, however this paper analyzes 10-Ks and 10-Qs building on existing academic research including Lazy Prices1.

The MRF dataset includes seven factors which are described in the table below. These factors are produced at the Item, Part, and Total Document level to provide a comprehensive view of what sections within the document have changed.

Subscribers of the MRF dataset can create derivative metrics stemming from the seven factors provided. For instance, one metric explored in this paper is Sentiment per Word. That factor is calculated by dividing Sentiment Sum by Word Count. Another factor explored is Percentage of Sentiment Hits which is calculated by dividing Sentiment Hits by Word Count. These factors and other derivative factors are calculated to normalize sentiment based on the length of document.


The MRF dataset provides word counts and sentiment factors throughout the entire document, each part, and each item of the quarterly or annual report. In order to test our hypothesis that larger changes in SEC Edgar filings underperform smaller changes, we created metrics that exemplify ‘changes’ in a report.

The authors of Lazy Prices categorized changes in filings using a variety of similarity metrics (cosine similarity, Jaccard similarity, minimum edit distance, and simple similarity). In our analysis we use raw change in word count as proxy for similarity scores. Raw change in word count is the difference between the word count in two filings. This analysis looks at the Quarter-over-Quarter changes in regulatory filings. Each 10-K and 10-Q is compared to the most recent 10-K and 10-Q from the same company.

In addition to word count, this analysis explores other factors included in the MRF dataset which contain sentiment scores, word counts categorized by sentiment, and factors that combine word counts and sentiment.

Lazy Prices makes no mention of their universe, so we used all securities over five US dollars. The benchmark used, called ‘Universe’, is the average return of all stocks in any Quintile portfolio at that point in time. The analysis begins in 2007 and concludes at the end of 2019.

When computing calendar-time portfolio returns, stocks enter buckets depending on the factor or the raw change in that factor. Stocks enter the portfolio in the month the report was released. Portfolios are rebalanced monthly to introduce new filings submitted in the most recent month. Note that average portfolio size can differ due to documents having the same value.


Results below show graphs and metrics related to calendar-time portfolio returns. ‘Q1’, or Quintile 1, contains stocks with the lowest value of the factor while ‘Q5’ encompasses stocks with the highest value of the factor.

We first looked at metrics on the total document level. This contains data embedded at the Item and Part level of a regulatory filing, which is then rolled up to the document level.

The graph and table above exemplify how Raw Change in Word Count can enhance stock selection. The green line represents securities that have the largest increase in Word Count while the red line denotes securities that have the largest decrease in Word Count. The red line, Quintile 1, outperforms all other quintiles while the green line, Quintile 5, underperforms all other quintiles.

As filings become longer or wordier compared to the company’s most previous filing, returns tend to drop compared to the universe. Regulatory filings are intended to adequately warn investors or potential investors about the company’s actions and strategies. If there are more warnings and explanations of the company’s actions, then the company isn’t as stable and thus underperforms the market.

As filings become shorter or more concise, subsequent stock returns outperform the universe. Companies that have a decrease in word count do not boast of events or products, but rather provide succinct statements. Also, one-off events that were in the company’s previous regulatory filing are taken out of the document meaning that the event was resolved.

The difference in monthly returns between the two lines (Q1 – Q5) has a T-Statistic of 3.64 and is proven significant at a 95% confidence level, thus we reject the Null Hypothesis that the Average Monthly Return equals 0.

The graph above exemplifies how a change in the number of subsections is an indicative source of future stock returns. This metric is a round integer with a small range so many stocks have the same value, which is why the average count in each bucket is uneven.

Subsections are counted at the Item level and are included if there is a specific topic to discuss. If there are more subsections included in the document (Quintile 5, green line) compared to the previous document, the stock price underperforms its peers. When there is a decrease in the number of subsections (Quintile 1, red line) the stock outperforms its peers.

Subsections are added to a regulatory filing when there’s a specific topic to discuss. Subsection Count and Word Count are correlated because as there are more topics to discuss, there are more words in the document. The addition of a new subsection means there is an event occurring and the company needs to adequately warn its investors. If there are more subsections, then the company has more events that could risk the future value of the company.

The monthly return difference between the two lines (Q1 – Q5) has a high 5.32 T-Statistic and is proven significant at a 95% confidence level. The hit rate, which is the percentage of times the return of the portfolio is greater than 0.00%, is extremely high at 68.59%. This means we reject the Null Hypothesis that the Average Monthly Return equals 0.

The above graph shows how the Total Document’s average sentiment can be a predictive source. The green line (Quintile 5) has the highest Average Sentiment value and outperforms all other stocks in the universe. Not only does Quintile 5 outperform the rest of the universe, but it also does so with the least amount of risk.

Through SMA’s Natural Language Processing all words in the document are read and assigned a score based on the sentiment of those words. If there is more positive language used throughout the document, the security tends to overperform the market. On the other side, if there is more negative language, the security underperforms its peers.

The red line (Quintile 1) underperforms its peers, but not by a significant amount. Even though the difference between Quintile 5 and Quintile 1 is not proven significant at a 95% confidence level, this factor provides additional alpha on the Long side.

We next looked at the Management Discussion & Analysis section of regulatory filings. This section is unique because of how unstructured it is compared to all other sections. It encompasses how management views the trajectory of the business and future events.

The chart above shows the Quintiles for Percentage of Sentiment Hits. This metric is calculated by dividing Sentiment Hits by Word Count. This is the percentage of the total document that had financial lexicon pertaining to sentiment (either positive or negative).

Quintile 5 (green line) represents the highest Percentage of Sentiment Hits, which outperforms all portfolios. Quintile 1 (red line) underperforms all portfolios. Companies that talk more about its performance in financial terms with sentiment are upfront. This transparency is beneficial for the company as they are forthright with investors. On the other hand, if the MD&A section has a small Percentage of Sentiment Hits that means the company is speaking about information not related to the financial status of the company. These companies don’t provide as much important information or use additional language that is not required. This lack of transparency devalues the company in the eyes of the investors.

The difference between Quintile 5 and Quintile 1 is proven significant at a 95% confidence level and provides a unique source of alpha.

The factor Sentiment per Word is calculated by dividing Sentiment Sum by Word Count. Longer documents are more likely to have an extreme value in Sentiment Sum. The rationale for this is if a document has more words, it is more likely to have more sentiment hits, thus a more extreme value for Sentiment Sum. The Sentiment per Word factor normalizes the magnitude of sentiment based on the length of the document.

Here we see Quintile 5 (green line) outperform and Quintile 1 (red line) underperform all other portfolios. The difference between the two is not proven significant, however this metric still provides insights on the Long side as Quintile 5 has the highest returns with less risk. If a company has a higher Sentiment per Word, then there is more of an upwards outlook on the future of the company and its events. A low Sentiment per Word means the company is negative when speaking about the company’s actions. This would attribute to a lack of confidence in the company’s future.

We last looked at the Risk Factors section of regulatory filings. This section generally has a negative tone and states what could go wrong in the company while adequately warning investors.

The factor plotted above, Positive and Negative Hits Difference, is the difference between Positive Hits and Negative Hits. In this graph Quintile 5 (green line) represents filings with a larger number of positive hits than negative hits, which underperforms all other portfolios. Quintile 1 (red line) contains filings that have significantly more negative hits than positive hits, which outperforms all portfolios. Filings with positive language in the Risk Factors section lack truth and transparency which leads to an underperformance. If the company is upfront about the risks of investing and doesn’t put a positive spin on the risks, the investors have more confidence in the company.


Machine Readable Filings is the most advanced and thorough product on the market for drilling into the un-tapped value of textual data in regulatory filings. These filings track how companies evolve and approach strategy in the face of micro and macro trends and the effect of these trends on their short- and long-term goals. While much in these documents do not change over successive quarters and years, the ability to quantify change and the location of change when it exists has been shown to be a predictive factor for stock selection in a portfolio.

Using previous academic research as a guide (Lazy Prices), SMA has shown the predictive nature inherent in changes in regulatory filings. The results presented in this paper show how multiple factors tend to predict future returns in securities and can be a factor for stock selection in a portfolio.

The flexibility of the raw data provided allows subscribers to create an infinite number of derivative factors at the Item, Part, and Total Document level. These factors will continue to be explored as an additional source of alpha.

Although this analysis only included factors at the Total Document level, the Management Discussion & Analysis section, and Risk Factors section, other sections within regulatory filings can provide additional insights into a security’s future return. Furthermore, we expect additional insights to be uncovered using natural language processing to quantify the sentiment of the underlying text at the various levels of the document. These analyses and more will be explored by Social Market Analytics and S&P Global in the future.

In April 2020, S&P Global Market Intelligence and Social Market Analytics, Inc. (SMA) launched ‘Machine Readable Filings’ (MRF), a sophisticated textual data offering which applies Parsing and Natural Language Processing to generate machine readable text extracted from SEC Regulatory Filings. Machine Readable Filings allows businesses and investors to incorporate more qualitative measures of company performance into their investment strategy by using machine readable text from full or individual sections of regulatory filings to enhance their analysis of companies. The parsed textual data allows firms to drill down on both historical and new filings in near real-time.  My last blog introduced the product and illustrated some basic return characteristics present in filings word count.

This blog explores the predictive nature of filings using SMA patented NLP and machine learning. For our analysis we used all active securities with a price greater than 5 dollars. Our analysis starts in 2006.  Securities are broken into quintiles based on each factor.  These factors are samples of the extensive metrics that can be created with this data.  Quintiles are re-balanced monthly based on each company’s most recent filing. 10-Q’s are compared to prior 10-Q’s and 10-Ks are compare to prior 10-K’s.  These are not meant to be trading models. They illustrate the predictive power of the data and use as broad a universe as possible. Two interesting distributions are below: distribution of word counts for 10-K (mean 36,000) and distribution of average sentiment.  As you can see companies try and keep the 10-K as upbeat as possible.

Our first factor is Change in Sentiment Hits. Sentiment hits are the number of times our NLP was able to identify a word or segment in a sentence. Positive hits + Negative hits + Neutral hits.  The green line represents filings with the largest increase in sentiment hits while the red line represents filings with the largest decrease in sentiment hits. Large increases in sentiment hits tend to under perform and large decreases in sentiment hits tend to outperform its peers.

The quintile performance characteristics are below.   Although quintile 2 and 3 are out of order you see the average values for those quintiles are near zero.  Quintile 1 outperforms quintile 5 by 3% annualized.

The next factor we are analyze is what percentage of the document does the parser hit.  Many filings are filled with general information not necessarily providing meaningful statements.   The green line represents filings with the highest percentage of sentiment hits in the document while the red line represents filings with the lowest percentage of sentiment hits in the document. A higher percentage of sentiment hits tend to  outperform and a lower percentage of sentiment hits tend to under perform its peers.   Companies with documents containing more meaningful content outperform companies with documents with less meaningful content by about 3.5% annualized.

Quintile 5 – Quintile 1 annualized is 3.5%

The third factor we are exploring is changes in negative hits.  Companies with increasing negative hits are discussing more negative information than prior quarters, they subsequently under perform.  The green line represents filings with the largest increase in negative hits while the red line represents filings with the largest decrease in negative hits. A large increase in negative hits tend to under perform and a large decrease in negative hits tend to outperform its peers.

The last factor we explore is cumulative document sentiment.  Quintiles are based on summations of all sentiment hits in the document.  More common analysis of sentiment is by section.  We identify parts sections and subsections in this product providing a myriad of ways to analyze the data.  At the most aggregated level sentiment is predictive.   Document length has a large impact on overall sentiment.  Z-Scores of this factor are a good way to compare prior documents.  As you can see in the chart companies with more positive total document sentiment tend to outperform companies with more negative total sentiment.

Quintile 5 outperforms quintile 1 by 1.7 percent annualized.

There are many ways to analyze the MRF data set. Filings are parsed by Item, Section, and sub-Sections to 2006 for historical back testing. This analysis looked at only 10-K’s and Q’s ‘Machine Readable Filings’ (MRF) cover 20 types of SEC filings. This blog covers a small portion of the research. The U.S. SEC Edgar Data is live on the S&P Xpressfeed. International Reports will be released later in 2020. To learn more or to start a trial please

Visit Our Website

Social Market Analytics aggregates the intentions of professional investors as expressed on Twitter.  We apply our patented filtering and natural language processing(NLP) to Tweets to proactively select Twitter accounts to use in our predictive metrics.  We track several metrics to gauge the predictive nature of our dataset.  For this blog I am going to illustrate one of these metrics.

2018 was a rough year for the SP500, it lost about 9% (rolling one year).  Given market loss and the high volatility we thought it would be an ideal dataset over which to run an experiment.  Two questions we get regularly are: How would your data perform in a bear market?  And what is the benefit of your NLP and account ratings systems? This blog will answer both questions from the perspective of 2018 market performance.

The table below illustrates performance of six theoretical portfolios.  These portfolios represent stocks with Social Market Analytics S-Scores of 2 or higher (Long signal) or Social Market Analytics S-Scores of -2 or lower (Short signal).  S-Score compares the tone of current Twitter conversations with average tone of Twitter conversations over the last twenty days.  Social Market Analytics has multiple baseline for multiple prediction periods.

Each security in our universe represents a proprietary Topic Model.  Each Topic is a collection of rules used to include or exclude specific Tweets from security buckets.  For example, if you are looking for Tweets about Ethan Allen furniture (ETH) you do not want to include Tweets about Ethereum Crypto Currency (Also symbol ETH) conversations.

We created portfolios with our account filtering algorithms and compared them with portfolios of all twitter accounts discussing our Equity Topic Models. The purpose of the run was to quantify the ability of our patented account filtering algorithms to identify professional, and hence more accurate, investors. Spoiler alert: Our account filtering improved the long/short return by 50% (18.73 for 2018 versus 12.53 NLP only)

NLP applied only:

The NLP only portfolios illustrate the power of our NLP process to accurately identify and fine grain score Tweets discussing securities and companies.  Our patented process reads each Tweet multiple times to identify if and how strongly someone is voicing a view of expected future performance.  The NLP only portfolios illustrate the predictive power of our NLP in isolation.  When you apply the Account filtering you get a predictive boost.

Account Filtered + NLP applied:

Account Filtered plus NLP portfolios illustrate the benefit of applying our account filtering metrics.  Early in the life of Social Market Analytics we learned its not just what is being said on Twitter but who is saying it. We developed proprietary metrics to identify investors more likely to be correct about the future direction of a security. When the conversation of these professional investors is significantly more positive than the average conversation over the last 20 days those securities significantly outperform.  When the conversation of these professional investors is significantly more positive than the average conversation over the last 20 days those securities significantly underperform.

 Portfolio Construction

Portfolios are constructed of securities with an S-Score of 2 or higher (long) or -2 or lower (short).  All portfolios are equally weighted.  A negative value for a short portfolio denotes a positive return to that portfolio.  Short portfolios are supposed to move lower.  All securities are entered on the Open based on a 9:10 am Eastern time S-Scores and exited on the Close.  There is no overnight exposure.

Result Analysis

We use SP500 as our performance benchmark.  SP return is calculated from open to close in the same manner as the selected securities. Using open to close performance the SP500 returned -16.89% for comparison.  As you can see from the table the S-Score > 2 outperformed the market and negative S-Score securities significantly underperformed the market (generating positive alpha).  The L/S portfolio with NLP only returned +12.54%, NLP plus account filtering improved that performance by 50% to +18.73%.  We do not illustrate this as a single factor model but removing 10% a year for slippage and commissions still significantly outperforms.

nlp-accountratingPlease contact us with any questions or to see how SMA’s NLP and filtering capabilities can be used in your investment process.

Social Market Analytics (SMA) tracks real-time sentiment on equities, commodities, currencies, ETF’s and crypto currencies.  SMA has the most powerful and customizable Alerting API combining Twitter sentiment and pricing metrics.  Users receive custom real-time sentiment alerts on instruments in their watch list.  For example, on December 11, 2018, SMA’s alerting system sent an alert on Corn at 12:12 pm CT when corn was @ $385.25. Below is the email and mobile alert.



Subsequent to the alert, corn moved lower starting at 12:17pm CT. The price continued to move lower the remainder of the day and closed at $383.25. (See chart below)

Corn Alert

The above alert was based on SMA’s rolling 24-hour sentiment. SMA also calculates a Long-term sentiment with longer price projection periods.  Corn’s long-term S-Factor flipped from positive to negative on November 14th. 12/10 was the first day the long-term S-Factor for corn reached a significantly negative level of -1.5 standard deviations more negative than the longer-term baseline conversation. For more information please

This year has been tough for most investment strategies.  Firms using traditional sources of data are generating the same underwhelming returns.  Two years ago, Social Market Analytics, Inc.  (SMA)  (Twitter)   launched the SMLCW index in partnership with the CBOE.  This index is re-balanced weekly and comprised of the twenty-five securities selected from the CBOE large cap universe with the highest average S-Score over the prior week.  It’s A long only index of super-cap stocks with unusually positive Twitter conversations.

SMA publishes a family of metrics providing a full representation of the Twitter conversation across equities (US and LSE), commodities, currencies, ETF’s & Cryptos.

S-Score is a normalized representation of the current Twitter conversation of professional investors as identified by Social Market Analytics patented algorithms.  SMA has access to the full Twitter feed through our licensed partnership with Twitter and listens in real-time for any mention of topics and securities of interest.  These Tweets are scanned in real-time for sentiment and influence of the poster and compared to prior conversations over the look back period.  Securities with higher S-Scores subsequently outperform and securities with negative S-Scores under-perform.

SMA S-Scores are predictive over multiple prediction periods.  With seven years of out-of-sample data we can extend our comparison baselines and predict over longer periods.

Year-To-Date the SMLCW index is up over 7.5% while the SP500 is flat.  Subtracting a couple percent for commissions/slippage and the index is still significantly positive. This is not a back-test, this index has been live and on your quote screens for nearly two years.  YTD actual performance chart from the CBOE site is below.


As mentioned, this is a long only index.  During the recent market drawdown this long index has been performing.  SMA negative S-Score stocks have been moving lower at a significant rate – generating positive alpha.  Below is a chart of the SMLCW index compared to the SP500.  for any questions or to learn more please contact us at:




Social Market Analytics, Inc. (SMA) partnered with the Cboe in January 2017 to release the SMLCW Index ‘Cboe – SMA Large Cap Weekly Index’. The SMLCW Index is a Long Only Index that has outperformed since it was released and has continues to outperform in the recent market volatility and sell-off. In the chart below the S&P500 is flat for the year and SMLCW is up nearly 5% YTD.

SMA has two U.S. Patents around its machine learning and NLP processes that produce predictive analytics at the security level across U.S. and UK stocks, ETFs, FX, Futures, and Crypto Currencies

The SMLCW portfolio is an equally-weighted Long Only portfolio of 25 stocks drawn from the CBOE Large-Cap Universe with the highest average 5-period S-Scores. Stocks in this universe (a) are in the top 15% capitalization tranche of stocks that are the underlying for options listed on the CBOE (approximately 3000 stocks) and (b) have a market capitalization greater than or equal to $10 billion.


The CBOE Large-Cap Universe is reconstituted quarterly on the third Friday of the month. The SMLCW portfolio is reconstituted every Friday at 8:30 am CT, based on average 5-period SMA S-Scores at 8:10 am CT. A period is a date on which there is sufficient social media data to derive SMA S-Scores. Stocks are deemed sold and purchased at market-on-open prices. The portfolio is held until 8:30 am CT on the next Friday. If Friday is a business holiday, the portfolio is rebalanced on the preceding Thursday.

To learn more, visit SMA at or the Cboe website at










Social Market Analytics, Inc. (SMA) aggregates the intentions of professional investors as expressed on Twitter & StockTwits and publishes a series of metrics that describes the current conversation relative to historical benchmarks.  Our data is a leading indicator of price movement both positive and negative.

There is unique predictive information in unstructured content.  Social Market Analytics use AI and Machine Learning techniques developed over the last eight years to convert this unstructured content into data suitable for quantitative analysis. This opens a whole new area of big data analysis.

Social Market Analytics (SMA) calculates predictive sentiment on the entire US equity universe, Currencies, Commodities, Crypto currencies, ETF’s and custom sources.   This blog is about the predictive nature of our LSE security universe.  We calculate our custom metrics on the top 1000 market cap securities listed on the LSE.  Our LSE data starts on 1/1/2016. Below is a cumulative quintile distribution of returns based on our S-Score metrics.  Our S-Score is effectively a Z-Score comparing 24-hour sentiment based on the Tweets of professional investors compared to a 20-day baseline.   Prediction periods vary per asset class and baseline. Longer baseline comparisons lead to longer prediction periods.

Stocks with abnormally positive conversations typically outperform their peers and stocks with abnormally negative conversations typically underperform their peers.  As expected conversations with normal positive or negative tones perform like the overall market.

Below is a typical quintile chart for the LSE 1000 universe tracked from post Brexit to 8/31/2018. The spread between the top and bottom quintiles is 10% annualized.   Sharpe and Sortino ratios are in the table below that.  To learn more or request a historical data set contact SMA with any questions

LSEQuintiles 1

LSE Quintiles2