01-03-2023 · 訪談

'Machine learning models can spot interesting interactions'

Buzzwords such as ‘alternative data’, ‘machine learning’ and ‘natural language processing’ are quickly becoming part of the jargon used by asset managers. We uncover what these mean for the Robeco Quant Team in our discussion with Quant Researcher Clint Howard.

    作者

  • Lusanele Magwa - Investment Specialist

    Lusanele Magwa

    Investment Specialist

The growing prominence of big data is widening the scope for quant strategies. So, given the multitude of new alternative datasets cropping up, how do you select which ones to use?

“Our research initiatives are premised on ideas that are driven by fundamental economic reasons. As quant investors, we have traditionally used financial statement and market data to conduct such research. Now with the deluge of alternative datasets, we have additional information that we can use and different ways to study our ideas. That said, it is important to be discerning about which datasets can add value.”

“Because we intentionally focus on the economic rationale behind our ideas before selecting data sources (whether alternative or traditional), it allows us to be quite deliberate in picking the datasets that we believe will actually answer the questions we are studying. If you do not start with the economic principles, you face the potential risk of overfitting a model and weakening its predictive power as ill-suited datasets might be chosen.”

“For example, big text data such as broker reports, company announcements and news filings are a rich treasure trove given the large volumes of data available. But these data sources only add value to our process if we can use them to research the economic intuition behind our market observations or hypotheses. Alternative datasets are, therefore, a means to an end, but not the be-all and end-all.”

Data vendors can offer the same datasets to competing asset managers. So how does the Robeco Quant Team gain unique insights?

“This is true, data vendors market and sell their datasets to several asset managers as it is the nature of their business. So if investors just plug in the data into their models or strategies in the same form they receive them in, then they run the risk of falling prey to alpha decay and crowding issues as their peers can easily do the same thing.”

“There are a few ways to address this. An approach we favor is sourcing datasets that are as raw as possible, with minimal alterations made by a vendor. This allows us to transform the granular data so that it is suited to the economic problems we are trying to study. This enables us to incorporate our unique insights and domain knowledge, therefore differentiating our use of the data from competitors’.”

“It is important to stress again that we always start any research we do based on economic intuition. This means that we have a sensible idea about why something might work. Only then do we hunt for the datasets that we can use to either validate or refute our intuition. By following this approach, we believe the possibility of using a dataset in exactly the same manner as another asset manager diminishes.”

What can we do with machine learning (ML) that was not easy to do before?

“For decades, standard linear modeling has been the go-to approach in quant models and has laid the foundation for the success achieved by the investment style over the years. These models essentially impose linear relationships between variables, from which patterns can be deduced to establish alpha signals, risk models or portfolio construction algorithms, for example.”

“ML provides quant investors with an extra toolkit to study economic problems (or reveal such patterns). This flexible and powerful framework – through the use of applications such as neural networks and random forest – can uncover nonlinear relationships between variables as well as how variables interact with each other. This can provide quant investors with additional insight for signal construction.”

“For example, ML models can spot interesting interactions such as between newsflow and stock-price reversals. One of the patterns observed in markets is that when a firm’s share price goes up (or down) by a big margin, it tends to revert back down (or up). Interestingly, we find that this reversal phenomenon is affected by the level of abnormal newsflow related to stocks in question.”

“Specifically, if there has been more newsflow than average on a stock around a time when its share price rallies or sinks, it does not tend to revert. The intuition behind this is that there is probably a genuine reaction to a change in fundamentals if there has been a lot of news covering a recent event. But in the absence of significant newsflow, we do tend to see the reversal pattern in stocks, suggesting that the initial move was probably based on noise rather than fundamentals. So these kind of insights are really interesting for us.”

And why now?

“ML, specifically neural networks, has been around since the 1940s, but there are two main reasons why the concept has only taken off more recently. The first reason is due to computational power. To put this in context, it would have taken several months to run the simplest ML model on the fanciest IBM or Bell Labs research computer back in the day. The turning point was in the 2000s when we witnessed exponential growth in computational power, facilitating the rise of applied research in ML to solve real-world problems.”

“The second reason is related to data as ML models require a lot of it for training purposes. The advent of big data and increasing ease of access – largely due to cloud computing – has been helpful. You can find data on just about anything these days and this has propelled research on ML applications given the increased scope for training. Luckily for us in finance, we also get to benefit from the initial work done by computer scientists in terms of applied research in ML.”

時刻把握我們最新市場觀點及電子報​

接收荷寶電子報,率先閱讀最新洞察分析,並構建最綠色的投資組合。

掌握新形勢

What do you think of the notion that ML models are black boxes?

“If you asked me this five to ten years ago, then I would say it is a fair statement because back then there was a lot of hype given the results ML techniques were producing. But there was not a lot of attention given to what lay under the hood. Since then, there have been significant advancements on this front – such as the development of the Explainable AI (XAI) toolkit – that allow users to better understand the predictions made by ML models.”

“For example, Shapley values is an XAI method that allows us to interpret ML models by analyzing the relationship between the model inputs and outputs, how the different variables contribute to predicting outcomes, how the variables interact, etc. This level of understanding is in line with our investment philosophy that all our ideas need to be supported by an economic rationale. These tools allow us to see if ML models make decisions that are in line with our economic intuition.”

“That said, the bar for us to use ML models in our strategies is high given their complex nature. We have to be comfortable that we understand how they work, that they behave in the way that we would expect them to, and that they add value on top of our existing models. Without such XAI tools that transform ML models into ‘glass boxes’, we probably would not be able to explore the possibilities offered by ML.”

Natural language processing (NLP) has attracted a lot of attention in recent years. What are some interesting applications of NLP?

“NLP is a toolkit that can be used to analyze spoken words and text. This is quite exciting for us quant investors as it allows us to go to previously unexplored places. To put this in context, fundamental equity analysts examine broker research notes, analyze company reports, review news releases and meet with management teams, among other things. Using their expertise, they glean insights by reading between the lines. Quant investors can now potentially perform similar tasks with NLP techniques such as sentiment analysis.”

“For example, this allows us to scrutinize how brokers view a company based on how they write about it in their reports, enables us to analyze news sentiment based on the language used in articles pertaining to specific firms, and gives us the tools to assess the mood within a company based on the language used by its executives at press conferences compared to earnings calls. Moreover, this can be done swiftly across thousands of stocks. And this is just one of the many ways in which NLP can be used within quant models.”

But what if company executives adapt their use of words to circumvent this?

“This is classic game theory. In this scenario, quant investors start off by building NLP models to analyze the language used by executives. When the executives catch on to this, they change their communication style to disguise their sentiment. But everything comes full circle as quant investors can retrain their NLP models to catch onto the changes, until the executives make further tweaks to how they relay their messaging.”

“This iterative loop speaks to the concept of: if you want to innovate, then you need to innovate constantly. It is not only our competitors that will try to keep up with us or forge ahead, but also the companies that we invest in. It means we need to continuously update and improve the way we conduct our research and implement our strategies.”

Given the promising prospects of alternative data and advanced techniques, many asset managers are investigating and applying these techniques. What distinguishes Robeco’s approach?

“We were very deliberate in how we approached the incorporation of alternative data and advanced techniques into our research and strategies. We focused firstly on laying the foundations by heavily investing in the infrastructure. We wanted to ensure that we would be able to use these datasets and tools in a robust and repeatable manner, while also being able to seamlessly integrate ML or NLP models into new or existing strategies.”

“We were aware of the risk of spending valuable hours on research as well as building ML and NLP models, and then being thwarted by the complexities of the practical implementation of these models. As a result of our initial investment, the production lead time to deploy new ML and NLP research in our strategies is relatively short.”

“I believe this gives us a competitive edge as setting up state-of-the-art infrastructure requires a lot resources, technical expertise and time to see it to completion. After three or so years of hard work on this project, we are proud of the results and can fully focus on our research pipeline and on implementing our best ideas. This has started to happen as of last year with the inclusion of a distress risk ML model in our strategies that forecasts stock price crash risk.”

免責聲明

本文由荷宝海外投资基金管理(上海)有限公司(“荷宝上海”)编制, 本文内容仅供参考, 并不构成荷宝上海对任何人的购买或出售任何产品的建议、专业意见、要约、招揽或邀请。本文不应被视为对购买或出售任何投资产品的推荐或采用任何投资策略的建议。本文中的任何内容不得被视为有关法律、税务或投资方面的咨询, 也不表示任何投资或策略适合您的个人情况, 或以其他方式构成对您个人的推荐。 本文中所包含的信息和/或分析系根据荷宝上海所认为的可信渠道而获得的信息准备而成。荷宝上海不就其准确性、正确性、实用性或完整性作出任何陈述, 也不对因使用本文中的信息和/或分析而造成的损失承担任何责任。荷宝上海或其他任何关联机构及其董事、高级管理人员、员工均不对任何人因其依据本文所含信息而造成的任何直接或间接的损失或损害或任何其他后果承担责任或义务。 本文包含一些有关于未来业务、目标、管理纪律或其他方面的前瞻性陈述与预测, 这些陈述含有假设、风险和不确定性, 且是建立在截止到本文编写之日已有的信息之上。基于此, 我们不能保证这些前瞻性情况都会发生, 实际情况可能会与本文中的陈述具有一定的差别。我们不能保证本文中的统计信息在任何特定条件下都是准确、适当和完整的, 亦不能保证这些统计信息以及据以得出这些信息的假设能够反映荷宝上海可能遇到的市场条件或未来表现。本文中的信息是基于当前的市场情况, 这很有可能因随后的市场事件或其他原因而发生变化, 本文内容可能因此未反映最新情况,荷宝上海不负责更新本文, 或对本文中不准确或遗漏之信息进行纠正。