17-12-2024 · Research

Better by design: Why human choices matter for return predictions via machine learning

Machine learning (ML) models have become increasingly popular for predicting stock returns, both in academic research and industry practice. However, as a still developing field, we see a lot of variety when it comes to key design choices. Recent research systematically explores this and uncovers how these choices directly affect the performance of ML strategies.

Choices choices

The paper by Minghui Chen, Matthias Hanauer, and Tobias Kalsbach, titled ‘Design choices, machine learning, and the cross-section of stock returns’, identifies several key design choices researchers have to make when training ML models. For instance, when setting the prediction (target) variable, should the researcher employ the excess return over the risk-free rate or the abnormal return relative to the market? Is it better to use a continuous target variable or are categories, such as outperformers vs. underperformers, preferable? Is it better to train models based on a rolling window that leads to more adaptive models, or are models based on expanding windows superior, thanks to the availability of more training data?

To assess the importance of such choices, the authors identify seven such key design choices and examine all the ensuing possible combinations, resulting in a total of 1,056 ML models. In this way, the study trains each model on a common set of signals (features) for the US stock market and evaluates their out-of-sample performance using hypothetical top-minus-bottom decile portfolios.

Figure 1 reveals that portfolio returns vary substantially across different model designs, with monthly mean returns ranging from 0.13% to 1.98% and annualized Sharpe ratios ranging from 0.08 to 1.82.1 This variation highlights the substantial impact of human design choices on the performance of ML strategies.

Figure 1 | Cumulative performance of machine learning strategies

Figure 1 | Cumulative performance of machine learning strategies

Source: Robeco, Chen et al. (2024). This figure shows the cumulative performance of a USD 1 initial investment in long-short ML portfolios for each possible combination of the research design choices. For each ML model and month, we first cross-sectionally sort all stocks based on their one-month-ahead return predictions. We then construct the value-weighted long-short portfolios by going long the top decile and short the bottom decile stocks. The solid black line represents the strategy with the median cumulative performance for each month, and the dashed black lines represent the 10th and 90th percentiles of each month, respectively. The sample period is from January 1987 to December 2021.

Machine learning models: Separating the wheat from the chaff

Having documented the substantial variation in the performance of ML models, the study also provides actionable guidance for ML model design:

  • Ensembles of ML models typically outperform individual algorithms.

  • The choice of target variable depends on the investment objective:
    o For identifying relative winners and losers among stocks, predicting stock returns over the market rather than the risk-free rate is better.
    o If the goal is to achieve high market-risk-adjusted returns, CAPM beta-adjusted returns are better.

  • Non-linear ML models are more likely to outperform their linear counterparts when:
    o using abnormal returns relative to the market as the target variable,
    o employing continuous target returns, or
    o adopting expanding training windows.


Conclusion

While computational infrastructure, ML algorithms, and data have become significantly more accessible over the past decade or two, model design remains a critical component of success. At first glance, it might seem that an ML investment strategy only requires a few basic elements: cloud computing space, generic factor data, some Python packages, and a couple of data scientists. However, this approach often lacks the crucial domain knowledge that Robeco has cultivated over 20 years in quant investing. That’s why in financial markets, where the signal-to-noise ratio is low and the risk of overfitting high, investment experience, and economic intuition still play a pivotal role. Robeco’s extensive expertise ensures that ML models focus on meaningful patterns and avoid common pitfalls, bridging the gap between technology and investment insight.

Read the full paper


Footnote

1Please note that these are hypothetical gross returns for long-minus-short strategies that do not consider any transaction costs. We investigated the impact of transaction costs on ML strategies in our study ‘The term structure of machine learning alpha’.


Discover the value of quant

Subscribe for cutting-edge quant strategies and insights.

Explore quant

Let's keep the conversation going

Keep track of fast-moving events in sustainable and quantitative investing, trends and credits with our newsletters.

Don’t miss out
Robeco

Robeco aims to enable its clients to achieve their financial and sustainability goals by providing superior investment returns and solutions.

Important information This disclaimer applies to any documents and the verbal or written comments of any person in presentations or webinars on this website and taken together is referred to herein as the “Information”. The services to which the Information relate are NOT FOR RETAIL CLIENTS - The information contained in the Website is solely intended for professional investors, defined as investors which (1) qualify as professional clients within the meaning of the Markets in Financial Instruments Directive (MiFID), (2) have requested to be treated as professional clients within the meaning of the MiFID or (3) are authorized to receive such information under any other applicable laws and must not be relied or acted upon by any other persons. This Information does not constitute an offer to sell, or a solicitation of an offer to buy, any financial product, and may not be relied upon in connection with the purchase or sale of any financial product. You are cautioned against using this Information as the basis for making a decision to purchase any financial product. To the extent that you rely on the Information in connection with any investment decision, you do so at your own risk. The Information does not purport to be complete on any topic addressed. The Information may contain data or analysis prepared by third parties and no representation or warranty about the accuracy of such data or analysis is provided.
In all cases where historical performance is presented, please note that past performance is not a reliable indicator of future results and should not be relied upon as the basis for making an investment decision. Investors may not get back the amount originally invested. Neither Robeco Institutional Asset Management B.V. nor any of its affiliates guarantees the performance or the future returns of any investments. If the currency in which the past performance is displayed differs from the currency of the country in which you reside, then you should be aware that due to exchange rate fluctuations the performance shown may increase or decrease if converted into your local currency. Robeco Institutional Asset Management B.V. (“Robeco”) expressly prohibits any redistribution of the Information without the prior written consent of Robeco. The Information is not intended for distribution to, or use by, any person or entity in any jurisdiction or country where such distribution or use is contrary to law, rule or regulation. Certain information contained in the Information includes calculations or figures that have been prepared internally and have not been audited or verified by a third party. Use of different methods for preparing, calculating or presenting information may lead to different results. Robeco Institutional Asset Management UK Limited (“RIAM UK”) is authorised and regulated by the Financial Conduct Authority. RIAM UK, 30 Fenchurch Street, Part Level 8, London EC3M 3BD (FCA Reference No:1007814). The company is registered in England and Wales under Ref No. 15362605.

In all cases where historical performance is presented, please note that past performance is not a reliable indicator of future results and should not be relied upon as the basis for making an investment decision. Investors may not get back the amount originally invested. Neither Robeco Institutional Asset Management B.V. nor any of its affiliates guarantees the performance or the future returns of any investments. If the currency in which the past performance is displayed differs from the currency of the country in which you reside, then you should be aware that due to exchange rate fluctuations the performance shown may increase or decrease if converted into your local currency. Robeco Institutional Asset Management B.V. (“Robeco”) expressly prohibits any redistribution of the Information without the prior written consent of Robeco. The Information is not intended for distribution to, or use by, any person or entity in any jurisdiction or country where such distribution or use is contrary to law, rule or regulation. Certain information contained in the Information includes calculations or figures that have been prepared internally and have not been audited or verified by a third party. Use of different methods for preparing, calculating or presenting information may lead to different results. Robeco Institutional Asset Management B.V. is authorised as a manager of UCITS and AIFs by the Netherlands Authority for the Financial Markets and subject to limited regulation in the UK by the Financial Conduct Authority. Details about the extent of our regulation by the Financial Conduct Authority are available from us on request.