03-01-2023 · Interview

'Machine learning models can spot interesting interactions'

Buzzwords such as ‘alternative data’, ‘machine learning’ and ‘natural language processing’ are quickly becoming part of the jargon used by asset managers. We uncover what these mean for the Robeco Quant Team in our discussion with Quant Researcher Clint Howard.

    Authors

  • Lusanele Magwa - Investment Specialist

    Lusanele Magwa

    Investment Specialist

The growing prominence of big data is widening the scope for quant strategies. So, given the multitude of new alternative datasets cropping up, how do you select which ones to use?

“Our research initiatives are premised on ideas that are driven by fundamental economic reasons. As quant investors, we have traditionally used financial statement and market data to conduct such research. Now with the deluge of alternative datasets, we have additional information that we can use and different ways to study our ideas. That said, it is important to be discerning about which datasets can add value.”

“Because we intentionally focus on the economic rationale behind our ideas before selecting data sources (whether alternative or traditional), it allows us to be quite deliberate in picking the datasets that we believe will actually answer the questions we are studying. If you do not start with the economic principles, you face the potential risk of overfitting a model and weakening its predictive power as ill-suited datasets might be chosen.”

“For example, big text data such as broker reports, company announcements and news filings are a rich treasure trove given the large volumes of data available. But these data sources only add value to our process if we can use them to research the economic intuition behind our market observations or hypotheses. Alternative datasets are, therefore, a means to an end, but not the be-all and end-all.”

Data vendors can offer the same datasets to competing asset managers. So how does the Robeco Quant Team gain unique insights?

“This is true, data vendors market and sell their datasets to several asset managers as it is the nature of their business. So if investors just plug in the data into their models or strategies in the same form they receive them in, then they run the risk of falling prey to alpha decay and crowding issues as their peers can easily do the same thing.”

“There are a few ways to address this. An approach we favor is sourcing datasets that are as raw as possible, with minimal alterations made by a vendor. This allows us to transform the granular data so that it is suited to the economic problems we are trying to study. This enables us to incorporate our unique insights and domain knowledge, therefore differentiating our use of the data from competitors’.”

“It is important to stress again that we always start any research we do based on economic intuition. This means that we have a sensible idea about why something might work. Only then do we hunt for the datasets that we can use to either validate or refute our intuition. By following this approach, we believe the possibility of using a dataset in exactly the same manner as another asset manager diminishes.”

What can we do with machine learning (ML) that was not easy to do before?

“For decades, standard linear modeling has been the go-to approach in quant models and has laid the foundation for the success achieved by the investment style over the years. These models essentially impose linear relationships between variables, from which patterns can be deduced to establish alpha signals, risk models or portfolio construction algorithms, for example.”

“ML provides quant investors with an extra toolkit to study economic problems (or reveal such patterns). This flexible and powerful framework – through the use of applications such as neural networks and random forest – can uncover nonlinear relationships between variables as well as how variables interact with each other. This can provide quant investors with additional insight for signal construction.”

“For example, ML models can spot interesting interactions such as between newsflow and stock-price reversals. One of the patterns observed in markets is that when a firm’s share price goes up (or down) by a big margin, it tends to revert back down (or up). Interestingly, we find that this reversal phenomenon is affected by the level of abnormal newsflow related to stocks in question.”

“Specifically, if there has been more newsflow than average on a stock around a time when its share price rallies or sinks, it does not tend to revert. The intuition behind this is that there is probably a genuine reaction to a change in fundamentals if there has been a lot of news covering a recent event. But in the absence of significant newsflow, we do tend to see the reversal pattern in stocks, suggesting that the initial move was probably based on noise rather than fundamentals. So these kind of insights are really interesting for us.”

And why now?

“ML, specifically neural networks, has been around since the 1940s, but there are two main reasons why the concept has only taken off more recently. The first reason is due to computational power. To put this in context, it would have taken several months to run the simplest ML model on the fanciest IBM or Bell Labs research computer back in the day. The turning point was in the 2000s when we witnessed exponential growth in computational power, facilitating the rise of applied research in ML to solve real-world problems.”

“The second reason is related to data as ML models require a lot of it for training purposes. The advent of big data and increasing ease of access – largely due to cloud computing – has been helpful. You can find data on just about anything these days and this has propelled research on ML applications given the increased scope for training. Luckily for us in finance, we also get to benefit from the initial work done by computer scientists in terms of applied research in ML.”

Stay informed on our latest insights

Receive our Robeco newsletter and be the first to read the latest insights.

Stay updated

What do you think of the notion that ML models are black boxes?

“If you asked me this five to ten years ago, then I would say it is a fair statement because back then there was a lot of hype given the results ML techniques were producing. But there was not a lot of attention given to what lay under the hood. Since then, there have been significant advancements on this front – such as the development of the Explainable AI (XAI) toolkit – that allow users to better understand the predictions made by ML models.”

“For example, Shapley values is an XAI method that allows us to interpret ML models by analyzing the relationship between the model inputs and outputs, how the different variables contribute to predicting outcomes, how the variables interact, etc. This level of understanding is in line with our investment philosophy that all our ideas need to be supported by an economic rationale. These tools allow us to see if ML models make decisions that are in line with our economic intuition.”

“That said, the bar for us to use ML models in our strategies is high given their complex nature. We have to be comfortable that we understand how they work, that they behave in the way that we would expect them to, and that they add value on top of our existing models. Without such XAI tools that transform ML models into ‘glass boxes’, we probably would not be able to explore the possibilities offered by ML.”

Natural language processing (NLP) has attracted a lot of attention in recent years. What are some interesting applications of NLP?

“NLP is a toolkit that can be used to analyze spoken words and text. This is quite exciting for us quant investors as it allows us to go to previously unexplored places. To put this in context, fundamental equity analysts examine broker research notes, analyze company reports, review news releases and meet with management teams, among other things. Using their expertise, they glean insights by reading between the lines. Quant investors can now potentially perform similar tasks with NLP techniques such as sentiment analysis.”

“For example, this allows us to scrutinize how brokers view a company based on how they write about it in their reports, enables us to analyze news sentiment based on the language used in articles pertaining to specific firms, and gives us the tools to assess the mood within a company based on the language used by its executives at press conferences compared to earnings calls. Moreover, this can be done swiftly across thousands of stocks. And this is just one of the many ways in which NLP can be used within quant models.”

But what if company executives adapt their use of words to circumvent this?

“This is classic game theory. In this scenario, quant investors start off by building NLP models to analyze the language used by executives. When the executives catch on to this, they change their communication style to disguise their sentiment. But everything comes full circle as quant investors can retrain their NLP models to catch onto the changes, until the executives make further tweaks to how they relay their messaging.”

“This iterative loop speaks to the concept of: if you want to innovate, then you need to innovate constantly. It is not only our competitors that will try to keep up with us or forge ahead, but also the companies that we invest in. It means we need to continuously update and improve the way we conduct our research and implement our strategies.”

Given the promising prospects of alternative data and advanced techniques, many asset managers are investigating and applying these techniques. What distinguishes Robeco’s approach?

“We were very deliberate in how we approached the incorporation of alternative data and advanced techniques into our research and strategies. We focused firstly on laying the foundations by heavily investing in the infrastructure. We wanted to ensure that we would be able to use these datasets and tools in a robust and repeatable manner, while also being able to seamlessly integrate ML or NLP models into new or existing strategies.”

“We were aware of the risk of spending valuable hours on research as well as building ML and NLP models, and then being thwarted by the complexities of the practical implementation of these models. As a result of our initial investment, the production lead time to deploy new ML and NLP research in our strategies is relatively short.”

“I believe this gives us a competitive edge as setting up state-of-the-art infrastructure requires a lot resources, technical expertise and time to see it to completion. After three or so years of hard work on this project, we are proud of the results and can fully focus on our research pipeline and on implementing our best ideas. This has started to happen as of last year with the inclusion of a distress risk ML model in our strategies that forecasts stock price crash risk.”

Let's keep the conversation going

Keep track of fast-moving events in sustainable and quantitative investing, trends and credits with our newsletters.

Stay updated
Robeco

Robeco aims to enable its clients to achieve their financial and sustainability goals by providing superior investment returns and solutions.

Important information
The Robeco Capital Growth Funds have not been registered under the United States Investment Company Act of 1940, as amended, nor or the United States Securities Act of 1933, as amended. None of the shares may be offered or sold, directly or indirectly in the United States or to any U.S. Person (within the meaning of Regulation S promulgated under the Securities Act of 1933, as amended (the “Securities Act”)). Furthermore, Robeco Institutional Asset Management B.V. (Robeco) does not provide investment advisory services, or hold itself out as providing investment advisory services, in the United States or to any U.S. Person (within the meaning of Regulation S promulgated under the Securities Act).
This website is intended for use only by non-U.S. Persons outside of the United States (within the meaning of Regulation S promulgated under the Securities Act who are professional investors, or professional fiduciaries representing such non-U.S. Person investors. By clicking “I Agree” on our website disclaimer and accessing the information on this website, including any subdomain thereof, you are certifying and agreeing to the following: (i) you have read, understood and agree to this disclaimer, (ii) you have informed yourself of any applicable legal restrictions and represent that by accessing the information contained on this website, you are not in violation of, and will not be causing Robeco or any of its affiliated entities or issuers to violate, any applicable laws and, as a result, you are legally authorized to access such information on behalf of yourself and any underlying investment advisory client, (iii) you understand and acknowledge that certain information presented herein relates to securities that have not been registered under the Securities Act, and may be offered or sold only outside the United States and only to, or for the account or benefit of, non-U.S. Persons (within the meaning of Regulation S under the Securities Act), (iv) you are, or are a discretionary investment adviser representing, a non-U.S. Person (within the meaning of Regulation S under the Securities Act) located outside of the United States and (v) you are, or are a discretionary investment adviser representing, a professional non-retail investor.


Access to this website has been limited so that it shall not constitute directed selling efforts (as defined in Regulation S under the Securities Act) in the United States and so that it shall not be deemed to constitute Robeco holding itself out generally to the public in the U.S. as an investment adviser. Nothing contained herein constitutes an offer to sell securities or solicitation of an offer to purchase any securities in any jurisdiction. We reserve the right to deny access to any visitor, including, but not limited to, those visitors with IP addresses residing in the United States. This website has been carefully prepared by Robeco. The information contained in this publication is based upon sources of information believed to be reliable. Robeco is not answerable for the accuracy or completeness of the facts, opinions, expectations and results referred to therein. Whilst every care has been taken in the preparation of this website, we do not accept any responsibility for damage of any kind resulting from incorrect or incomplete information. This website is subject to change without notice. The value of the investments may fluctuate. Past performance is no guarantee of future results. If the currency in which the past performance is displayed differs from the currency of the country in which you reside, then you should be aware that due to exchange rate fluctuations the performance shown may increase or decrease if converted into your local currency. For investment professional use only. Not for use by the general public.