17-12-2024 · 研究

Better by design: Why human choices matter for return predictions via machine learning

Machine learning (ML) models have become increasingly popular for predicting stock returns, both in academic research and industry practice. However, as a still developing field, we see a lot of variety when it comes to key design choices. Recent research systematically explores this and uncovers how these choices directly affect the performance of ML strategies.

    作者

  • Matthias Hanauer - Researcher

    Matthias Hanauer

    Researcher

Choices choices

The paper by Minghui Chen, Matthias Hanauer, and Tobias Kalsbach, titled ‘Design choices, machine learning, and the cross-section of stock returns’, identifies several key design choices researchers have to make when training ML models. For instance, when setting the prediction (target) variable, should the researcher employ the excess return over the risk-free rate or the abnormal return relative to the market? Is it better to use a continuous target variable or are categories, such as outperformers vs. underperformers, preferable? Is it better to train models based on a rolling window that leads to more adaptive models, or are models based on expanding windows superior, thanks to the availability of more training data?

To assess the importance of such choices, the authors identify seven such key design choices and examine all the ensuing possible combinations, resulting in a total of 1,056 ML models. In this way, the study trains each model on a common set of signals (features) for the US stock market and evaluates their out-of-sample performance using hypothetical top-minus-bottom decile portfolios.

Figure 1 reveals that portfolio returns vary substantially across different model designs, with monthly mean returns ranging from 0.13% to 1.98% and annualized Sharpe ratios ranging from 0.08 to 1.82.1 This variation highlights the substantial impact of human design choices on the performance of ML strategies.

Figure 1 | Cumulative performance of machine learning strategies

Figure 1 | Cumulative performance of machine learning strategies

Source: Robeco, Chen et al. (2024). This figure shows the cumulative performance of a USD 1 initial investment in long-short ML portfolios for each possible combination of the research design choices. For each ML model and month, we first cross-sectionally sort all stocks based on their one-month-ahead return predictions. We then construct the value-weighted long-short portfolios by going long the top decile and short the bottom decile stocks. The solid black line represents the strategy with the median cumulative performance for each month, and the dashed black lines represent the 10th and 90th percentiles of each month, respectively. The sample period is from January 1987 to December 2021.

Machine learning models: Separating the wheat from the chaff

Having documented the substantial variation in the performance of ML models, the study also provides actionable guidance for ML model design:

  • Ensembles of ML models typically outperform individual algorithms.

  • The choice of target variable depends on the investment objective:
    o For identifying relative winners and losers among stocks, predicting stock returns over the market rather than the risk-free rate is better.
    o If the goal is to achieve high market-risk-adjusted returns, CAPM beta-adjusted returns are better.

  • Non-linear ML models are more likely to outperform their linear counterparts when:
    o using abnormal returns relative to the market as the target variable,
    o employing continuous target returns, or
    o adopting expanding training windows.


Conclusion

While computational infrastructure, ML algorithms, and data have become significantly more accessible over the past decade or two, model design remains a critical component of success. At first glance, it might seem that an ML investment strategy only requires a few basic elements: cloud computing space, generic factor data, some Python packages, and a couple of data scientists. However, this approach often lacks the crucial domain knowledge that Robeco has cultivated over 20 years in quant investing. That’s why in financial markets, where the signal-to-noise ratio is low and the risk of overfitting high, investment experience, and economic intuition still play a pivotal role. Robeco’s extensive expertise ensures that ML models focus on meaningful patterns and avoid common pitfalls, bridging the gap between technology and investment insight.

Read the full paper


Footnote

1Please note that these are hypothetical gross returns for long-minus-short strategies that do not consider any transaction costs. We investigated the impact of transaction costs on ML strategies in our study ‘The term structure of machine learning alpha’.


探索量化價值

訂閱我們的電子報,獲取尖端的量化策略和見解。

探索量化的奧秘

免責聲明

本文由荷宝海外投资基金管理(上海)有限公司(“荷宝上海”)编制, 本文内容仅供参考, 并不构成荷宝上海对任何人的购买或出售任何产品的建议、专业意见、要约、招揽或邀请。本文不应被视为对购买或出售任何投资产品的推荐或采用任何投资策略的建议。本文中的任何内容不得被视为有关法律、税务或投资方面的咨询, 也不表示任何投资或策略适合您的个人情况, 或以其他方式构成对您个人的推荐。 本文中所包含的信息和/或分析系根据荷宝上海所认为的可信渠道而获得的信息准备而成。荷宝上海不就其准确性、正确性、实用性或完整性作出任何陈述, 也不对因使用本文中的信息和/或分析而造成的损失承担任何责任。荷宝上海或其他任何关联机构及其董事、高级管理人员、员工均不对任何人因其依据本文所含信息而造成的任何直接或间接的损失或损害或任何其他后果承担责任或义务。 本文包含一些有关于未来业务、目标、管理纪律或其他方面的前瞻性陈述与预测, 这些陈述含有假设、风险和不确定性, 且是建立在截止到本文编写之日已有的信息之上。基于此, 我们不能保证这些前瞻性情况都会发生, 实际情况可能会与本文中的陈述具有一定的差别。我们不能保证本文中的统计信息在任何特定条件下都是准确、适当和完整的, 亦不能保证这些统计信息以及据以得出这些信息的假设能够反映荷宝上海可能遇到的市场条件或未来表现。本文中的信息是基于当前的市场情况, 这很有可能因随后的市场事件或其他原因而发生变化, 本文内容可能因此未反映最新情况,荷宝上海不负责更新本文, 或对本文中不准确或遗漏之信息进行纠正。