Does the Language Positivity Represent Countries Happiness?
Nowadays, the world tends to be interested in how happy different countries and their citizens are and by this, make association among economical, psychological, political, etc. metrics. As a data scientist padawan in training, I have already encountered many case studies to see how important this measurement can be. Although, recently, a new question has arisen in my mind during text analysis class: how language positivity may influence the happiness of countries? I still remember in high school, when my literature teacher explained how the Hungarian language classifies words into three categories: deep, mixed, and high sounding words based on the vowel distributions and how a poet wrote a cheerful poem that sounds quite depressing. So, how the language is used may influence peoples’ view on life, or at least, that is the hypothesis that I would like to investigate in the following section.
To begin with, I will use Harry Potter books again as I did in my previous article, but instead of checking the changes in sentiments through the seven books, I will only use the first book in three different languages: German, English, Hungarian. In order to compare my findings, I will use the World Happiness Report 2021 where the analysis provides results in 95 confidential intervals.
Based on the WHR scores, Germany stands in the 13th, the United Kingdom in the 17th, and Hungary at 53rd place. I am curious that this particular order will stay the same after investigating the different translations of Harry Potter.
To begin with, I would like to see how the word occurrences are different from each other in the book to decide whether the languages use the same words in their respectful translation. For this, the names were taken out because they had the highest frequencies through the books. Without them, the followings were the outcomes:
As you can see, (if you know all languages), the main words like “say” (i.e. sagte/sagen and szólt) have a high frequency in all translations. Furthermore, there are overlaps between the languages such as “can” (i.e. konnte in German, tudta in Hungarian) or “boy” (i.e. fiú in Hungarian), or “door” (i.e. tür in German). But let me admit, this finding is not a big surprise considering that we are talking about a translation of one particular book. Therefore, I have also made a brief analysis of how the relationships are changing among the language and managed to find something interesting:
As you can see, in Hungarian, there are way fewer connections present and it is due to one fact: Hungarian language is difficult regarding conjugation. Hungarians use so much different forms of one word that they do not require to say out the pronouns most of the time, which makes our life easier but not others’. Especially in text analysis aspects but that makes our investigation interesting.
Okay, after seeing the differences in the translations, let’s look at the sentiments. But, before starting, allow me to challenge you again: What do you think, which language will be the most positive? And negative? What about the order? Allow yourself to think for a minute before going forward! Are you ready? Then let’s go!
In the beginning, I wanted to make my analysis in “afinn” text scoring but I could not find a Hungarian extension for this. I did not even try for German. So, in the end, I stayed with “bing” scoring that classifies the words into two categories: positive and negative.
It can be recognized in the above plots, that the English version has a way higher frequency in negative words than the others. Are you surprised? Were you expecting this? I admit that I was amazed by this outcome considering my first expectations. But, it is what data analysis is about! Finding out new information based on data!
But, returning to the topic, I think it will not be a big spoiler if I predict that English will be the last in our ranking. Actually, if you do not look down, you already know the answer. But allow me to show you:
In WHR, Germany also held the best place but Hungary closed the order. However, based on the sentiment analysis, the United Kingdom seems less positive among the three nations. But, what about our question: does language positivity represents country happiness? Based on the analysis, not it is not the best way to measure it but a deeper and more precise investigation. In this case, only three versions of one book, from which one of them its native language, were included. What do you think? Should we send a proposal to the analysis team of Happiness World Ranking?