19-04-2023 · 市場觀點

Quant Chart: Lost in translation

Do you like watching old Asian movies from the 60s and 70s? Perhaps you’re a connoisseur of kung fu or monster movies; popular genres from that era. If you are, and you live in a Western country, those movies would have been translated from their original Chinese or Japanese into a Western language, most likely English.

    作者

  • Mike Chen - Head of Next Gen Research

    Mike Chen

    Head of Next Gen Research

  • Matthias Hanauer - Researcher

    Matthias Hanauer

    Researcher

  • Nick Mutsaers - Researcher

    Nick Mutsaers

    Researcher

If so, you may have noticed that when the actors speak, their mouths move for far longer than it took to say the English translation, and you might even have wondered what it was you were missing. Of course, everyone knows that a lot of context and information – actors’ performance, accents, nuances and local culture references – gets lost in translation when a movie is dubbed. But have you ever wondered whether investment information also gets lost in translation?

Natural Language Processing (NLP), an application of artificial intelligence, is a popular tool that is revolutionizing quantitative finance and being applied to many types of texts. However, most NLP tools are developed for texts in English. Since English is not the only language spoken around the world1, a popular approach to process non-English texts is to translate them into English, and then apply English NLP models to the translated texts.

In recent research, Robeco discovered that just like in those old Asian movies, the above-described approach based on translated text also results in some information (alpha) being lost in translation. When a local-language-based NLP model is applied to the local-language text, additional information (alpha) can be revealed and therefore harvested.

Take, for example, Chinese investment texts. The left-hand chart in Figure 1 shows the performance of factors built from Chinese and English-based NLP engines. The good news is that both are positive, so not all information is lost in translation. However, the right-hand chart in Figure 1 shows that of the top quintile-ranked stocks from the Chinese NLP model, only 50% of which would be classified in the top two quintiles under the English NLP model.

Figure 1: English translation versus Chinese original NLP output

Figure 1: English translation versus Chinese original NLP output

Source: I/B/E/S, Refinitiv, Orbit Financial Technology, Robeco. The left panel of the figure displays the return spread between the top and bottom quintile portfolios based on the NLP sentiment score using the Chinese and the English language. The right panel of the graph displays the similarity in stock classification between the two signals. More specifically, it shows the percentage of top English NLP stocks classified in the corresponding quintiles based on the Chinese language. The investment universe consist of MSCI China A index constituents. The portfolios are equally weighted, rebalanced monthly. The left and right charts illustrate the results for the sample period of January 2013 till December 2022.

This shows that the stocks selected are different because there is no perfect overlap. Like those old Asian movies from the 60s and 70s, information may also be lost in translation. To fully grasp the nuances of a movie’s dialogue, it is worth watching the film in the original language, if possible. And to fully understand what is being communicated in an investment text, it may be worth reading the texts in their original local language.

Footnote

1 English is only spoken natively by 400 million people around the world, or ~5% of the global population.

Quant Charts

免責聲明

本文由荷宝海外投资基金管理(上海)有限公司(“荷宝上海”)编制, 本文内容仅供参考, 并不构成荷宝上海对任何人的购买或出售任何产品的建议、专业意见、要约、招揽或邀请。本文不应被视为对购买或出售任何投资产品的推荐或采用任何投资策略的建议。本文中的任何内容不得被视为有关法律、税务或投资方面的咨询, 也不表示任何投资或策略适合您的个人情况, 或以其他方式构成对您个人的推荐。 本文中所包含的信息和/或分析系根据荷宝上海所认为的可信渠道而获得的信息准备而成。荷宝上海不就其准确性、正确性、实用性或完整性作出任何陈述, 也不对因使用本文中的信息和/或分析而造成的损失承担任何责任。荷宝上海或其他任何关联机构及其董事、高级管理人员、员工均不对任何人因其依据本文所含信息而造成的任何直接或间接的损失或损害或任何其他后果承担责任或义务。 本文包含一些有关于未来业务、目标、管理纪律或其他方面的前瞻性陈述与预测, 这些陈述含有假设、风险和不确定性, 且是建立在截止到本文编写之日已有的信息之上。基于此, 我们不能保证这些前瞻性情况都会发生, 实际情况可能会与本文中的陈述具有一定的差别。我们不能保证本文中的统计信息在任何特定条件下都是准确、适当和完整的, 亦不能保证这些统计信息以及据以得出这些信息的假设能够反映荷宝上海可能遇到的市场条件或未来表现。本文中的信息是基于当前的市场情况, 这很有可能因随后的市场事件或其他原因而发生变化, 本文内容可能因此未反映最新情况,荷宝上海不负责更新本文, 或对本文中不准确或遗漏之信息进行纠正。