04-12-2023 · 市場觀點

Using machine learning for emerging market equity returns

Machine learning algorithms and models have large potential for investing in emerging stock markets, says quant researcher Laurens Swinkels.

    作者

  • Laurens Swinkels - Head of Quant Strategy

    Laurens Swinkels

    Head of Quant Strategy

  • Matthias Hanauer - Researcher

    Matthias Hanauer

    Researcher

Machine learning algorithms have surged in popularity among academics and practitioners as they seek to determine if they can enhance returns. Robeco’s quant team put this to the test by seeing what the application of such algorithms would mean for investing in emerging market equities1. The results were as useful as the machine learning models themselves.

We discovered that they excel at detecting financially material non-linear relationships between company characteristics – a feat that would be challenging for human researchers. We also found that using ensembling, or the ‘wisdom of the crowd’ for machine learning models, could increase expected returns net of trading costs by up to 2% per annum for equity investors.

The results came from analyzing more than 15,000 unique stocks from 32 countries between 1990 and 2021. We used 36 standard characteristics that can apply to both developed and emerging markets for the study, and opted not to introduce any new ones to highlight the added value that machine learning techniques can bring. This ensured that any additional performance gleaned wasn’t just the result of novel data but accrued to well-known factors such as low-risk, valuation, momentum and quality.

Different algorithms were then used to predict relative stock returns to their own country market index based on these factors. The least complex method assumes that each of the firm characteristics has a linear relationship to stocks’ outperformances.

Three machine learning methods were used to improve upon straightforward linear regression.

  • Elastic net. This method aims to reduce the number of characteristics (36 in our case) by eliminating those with the lowest or no forecasting ability. It also minimizes the potential noise that may be present in a sample that could impair out-of-sample predictive performance. This method does not detect data-driven non-linear relationships or interaction effects.

  • Tree-based methods. Random forests and gradient-boosted regression trees follow the idea of sequentially partitioning the underlying data into groups of firm characteristics – ‘growing’ a tree. New branches are created every time the data is separated. At each new branch, the characteristic that generates the biggest separation in the database is selected, with the tree growing as high as the researcher allows, ending in a leaf.

  • Neural networks. These are flexible models that connect multiple layers. They consist of an input layer of firm characteristics and at least one hidden layer of activation functions. An output layer aggregates the hidden layers’ outcomes into a return prediction. When a model uses more than one hidden layer – ours uses up to five – it is sometimes referred to as a deep learning model.

With 1990 to 2001 as our initialization period, we used data from the first half for training and the second half for validation. We trained the models on our entire set of emerging market stock returns and refrained from developing country-specific models, because some evidence suggests these may lead to overfitting, which reduces out-of-sample performance.

We can then rank each of the 36 variables in order of their importance by evaluating the negative impact on prediction performance when the variable is left out and the rest of the model remains unchanged. We found that the models make similar choices regarding the most influential characteristics, with price to its 52-week high, idiosyncratic volatility, and turnover being the three most important.

Momentum and short-term reversal are also among the top 15, as well as the price/earnings ratio and profitability. This is information that is worth having. Detecting interaction effects between each of the 36 variables would be incredibly time-consuming and difficult for a human researcher, whereas a machine learning model is able to find these relationships quickly and systematically.

Investment performance

So they work in theory, but how do these interaction effects actually impact investment performance? For investors, it may be more relevant to back-test the signals coming from these models, allowing us to compare the risk and return of portfolios.

To test this, we formed five portfolios based on the machine-predicted excess returns of each stock relative to its country index. We then calculated the return in the next month, using market capitalization-based portfolio weights within each portfolio. Starting in our out-of-sample period from January 2002, we repeated this each month until December 2021, when our sample ends. The results can be seen in the chart below.

using-machine-learning-for-emerging-market-equity-returns-fig1.jpg

Source: Robeco, Hanauer and Kalsbach (2023) using data from January 2002 to December 2021

On average, the returns of the long/short portfolio derived from the two linear models, namely regression and elastic net, were around 0.8% per month. This is substantial and shows that conventional quantitative models are able to generate excess returns in emerging stock markets, confirmed by earlier studies on factor investing in emerging markets.

The random forest and gradient-boosted random tree methods generated higher returns of around 1.0% per month, while the neural networks method and a combination of all machine learning models delivered 1.2%. In short, linear models are good, but machine learning models are better.

時刻把握我們最新市場觀點及電子報​

接收荷寶電子報,率先閱讀最新洞察分析,並構建最綠色的投資組合。

掌握新形勢

Going back to basics

This does lead to the question of whether this is just a fancy way to pick up the conventional quantitative factors that have been employed in the investment industry for decades. Indeed, as the red bars show, a substantial part of the raw excess returns can be explained by these well-known factors.

On the one hand, this confirms that traditional factor investing can still predict future returns. On the other hand, it also shows that machine learning models give us greater, economically important insight that can bring even higher returns. The linear models show there is about 0.2% per month of alpha left to capture, which increases to 0.5% per month for the tree-based models, and 0.7% per month for the neural network method and the machine learning ensemble.

Hence, using machine learning signals is more profitable than conventional factor investing alone. Even accounting for transaction costs and short-selling constraints, we see that this type of forecast can lead to significant net outperformance over the market, and can be recommended to investors.

Footnote

1See Hanauer and Kalsbach (2023), Machine learning and the cross-section of emerging market stock returns, Emerging Markets Review 55 (2023), 101022.

This article is an excerpt of a special topic in our five-year outlook.

Read all articles

免責聲明

本文由荷宝海外投资基金管理(上海)有限公司(“荷宝上海”)编制, 本文内容仅供参考, 并不构成荷宝上海对任何人的购买或出售任何产品的建议、专业意见、要约、招揽或邀请。本文不应被视为对购买或出售任何投资产品的推荐或采用任何投资策略的建议。本文中的任何内容不得被视为有关法律、税务或投资方面的咨询, 也不表示任何投资或策略适合您的个人情况, 或以其他方式构成对您个人的推荐。 本文中所包含的信息和/或分析系根据荷宝上海所认为的可信渠道而获得的信息准备而成。荷宝上海不就其准确性、正确性、实用性或完整性作出任何陈述, 也不对因使用本文中的信息和/或分析而造成的损失承担任何责任。荷宝上海或其他任何关联机构及其董事、高级管理人员、员工均不对任何人因其依据本文所含信息而造成的任何直接或间接的损失或损害或任何其他后果承担责任或义务。 本文包含一些有关于未来业务、目标、管理纪律或其他方面的前瞻性陈述与预测, 这些陈述含有假设、风险和不确定性, 且是建立在截止到本文编写之日已有的信息之上。基于此, 我们不能保证这些前瞻性情况都会发生, 实际情况可能会与本文中的陈述具有一定的差别。我们不能保证本文中的统计信息在任何特定条件下都是准确、适当和完整的, 亦不能保证这些统计信息以及据以得出这些信息的假设能够反映荷宝上海可能遇到的市场条件或未来表现。本文中的信息是基于当前的市场情况, 这很有可能因随后的市场事件或其他原因而发生变化, 本文内容可能因此未反映最新情况,荷宝上海不负责更新本文, 或对本文中不准确或遗漏之信息进行纠正。