A First Approach to the Implicit Measurement of Happiness in Latin America Through the Use of Social Networks

This research paper can be classified as pertaining to the group of empirical studies that try to measure subjective wellbeing. The article presents as its greatest contributions the use of a subjective measurement of well-being based on social networks for the Latin American setting, as well as its comparative analysis with another traditional method.


I. INTRODUCTION
HIS research paper can be classified as pertaining to the group of empirical studies that for some years now have attempted to analyze subjective well-being in Latin America. Among them some of the most noteworthy are [15], [16], [17], [18], [28], [29], [30] y [5].
The novelty of this paper with respect to previous studies is that its objective is to verify to what extent the results of measuring the happiness of Latin Americans obtained following two radically different methods are consistent. One is based on the use of surveys from Latinobarómetro and the other on inferring the feelings of social network users from a semantic analysis of the words used in their communications and messages. A scientific method is followed in both cases.
The scientific study of happiness is not based on conjectures or presumptions but instead on research projects. Traditionally, researchers have analyzed factors that influence whether an individual defines herself as happy or satisfied [3] [4]. Psychology, sociology and economics have tried to explain the conditions that allow individuals to develop as happy persons [20], [13], [14] and [30].
Following [9][10], the notion of happiness generally used in economics identifies happiness and developing subjective well-being. In this sense, happiness or subjective well-being is no more than an assessment of life itself, regardless of pyschological judgments about momentary pleasure [2], [27]. In other words, happiness refers to how the individual evaluates the overall quality of her life [26], [7]. As such, the happiness of individuals will depend entirely on an individual perception and it will be linked to concepts of quality of life and well-being. In any case, what matters is that that individual perception about the state of subjective well-being or happiness is measurable. This is the notion of happiness that we will use in the third epigraph of the paper, where we use data from Latinobarómetro to measure the happiness of Latin Americans. In the fourth epigraph, we take a completely different look at happiness, using information contained in messages sent over social networks-in particular, data from Twitter--to infer the feelings of individuals [8], [12].
The paper is organized as follows. In this section, we present a short introduction to the study. The following section gives a brief description of the different methods used to measure happiness. As already mentioned, Sections 3 and 4 present two alternative measurements of Latin Americans' happiness, one based on the information gathered from subjective surveys and the other inferred on the basis of information contained in social networks. The fifth section presents a comparative analysis of the results obtained following the two aforementioned methods. The sixth section presents the main findings and sketches out lines of future research that will be conducted to more deeply explore the subjects presented in this paper.

II. THE MEASUREMENT OF HAPPINESS: ALTERNATIVE METHODS
Happiness is measurable, and this is what enables us to speak of the science of happiness. In the new science of happiness, different methods have been used to measure happiness. Ed Diener and his collaborators presented a method to measure happiness based on the idea that individuals can consistently identify their level of satisfaction with life on a scale, and as such, what must be done is to ask people questions [7]. This way of measuring happiness is the one that justifies conducting surveys like the World Values Survey, and it is the most widely-used method [23].
Another method for measuring happiness is based on the sampling of experiences developed by the psychologist Csikszentmihalyi and several researchers. This method consists of using locators (beepers) and afterwards using computers to contact individuals at random and ask them about their mood [24], [25]. A different approach is followed by a group of researchers led by Nobel Prize winner Daniel Kahneman. They created a method for measuring happiness based on following or reconstructing what people do at each moment of the day and asking them how they feel [19]. The main findings of this research specify that the three basic components of happiness are pleasure, commitment and meaning. Following this method and using messages on mobile telephones as an instrument of communication with those surveyed, Matthew Killingsworth identified happiness associated with a wide range of activities [22]. He points out that, until recently, researchers had to trust the assessments and appraisals that people made about their average emotional states over long periods of time. This inconvenience is avoided when following the method based on reconstructing what people do at different moments every day.
Recently, and amidst the impressive growth of social networks, there has emerged a new method for measuring happiness. This method consists of inferring the feelings of social network users on the basis of a semantic analysis of the words used in their communications and messages. Likewise, a study done by the Vermont Complex Systems Center uses information from Twitter to infer how happy or unhappy people in different states of the United States feel. Specifically, the researchers Dodds and Danforth have developed a method that, by incorporating the direct human evaluation of words, allows us to quantify levels of happiness on a continuous scale from a diverse collection of texts [8] [12], [21]. The method is transparent and able to quickly process texts from the Internet.
In the study carried out by Dodds and Danforth, on the basis of ten million "tweets," a code for determining to what extent each analyzed message can be catalogued as happy or sad was developed. The study focused on certain key words that were deemed to be indicative. Thus, "beauty" and "hope" are associated with happiness, while "hate" and "smoke" are associated with unhappiness. The researchers analyze the frequency with which the identifying words are used as good words and bad words in different states of the U.S.A. and qualify them as happy or unhappy. It is important to note that this study requires a highly complex task beforehand that allows us to obtain the terms to evaluate, that is, the words susceptible to be captured and measured. This list of words was obtained by directly asking English-speaking people about the words that evoke happiness for them. Once the list of words was obtained, it was then necessary to create a scale that reflected how one word was evaluated with respect to the following one. This scale was obtained through a similar method, asking people to order words according to the value in terms of happiness that each word had for each of them..

III. THE HAPPINESS OF LATIN AMERICANS ACCORDING TO LATINOBARÓMETRO
The measurement of happiness that this part of the study presents is in keeping with the literature that analyzes the answers of individuals to questions about subjective wellbeing in cross-section or panel surveys, and which is the most widely-used by researchers. The hypothesis on which these studies are based is that the subjective data provided by individuals can be treated ordinally in economic analyses so that greater subjective levels of well-being reflect greater levels of happiness [13]. In other words, it is argued that although everybody has their own ideas about happiness, individual happiness can be captured and analyzed.
Anyone can be asked how satisfied they feel with the life they lead, and behind the answer given in a survey, a conscious evaluation of their subjective well-being can be found. Supposedly, individuals are able to evaluate their subjective level of well-being with respect to certain circumstances. In addition, reliable studies indicate that the subjective well-being demonstrated by individuals is reasonably stable and sensitive to changes in circumstances. In fact, in research about happiness, individuals' answers to questions about their feelings are analyzed and consistent findings are obtained [6].
Specifically, in this section of the paper, there is a synthesis of a paper done in 2012 on life satisfaction in 18 Latin American countries [5]. The countries analyzed are Argentina, Bolivia, Brazil, Colombia, Costa Rica, Chile, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, the Dominican Republic, Uruguay and Venezuela. The results obtained in the cited study are in general consistent with those already known for other countries, as well as with those obtained in different papers that refer to the region.
The data used come from annual personal surveys created by the Latinobarómetro Corporation for the period 2000-2009. The sample used includes 191,488 individuals and they are different every year. The distribution of the sample by countries over the period is presented in Graph 1, where the information about Brazil has been omitted, given that in the study based on social networks for this country, "tweets" were not analyzed because they were in a different language.
The key variable is the degree of satisfaction with individuals' current lives, as it is defined in the Latinobarómetro survey. The degree of a person's satisfaction with life falls into one of the following four categories: not at all satisfied, not satisfied much, quite satisfied and very satisfied. Graph 1 presents the percentage of individuals from the Latin American countries mentioned that indicate they were quite or very satisfied with life during the years 2000-2009. As can be seen, in eight of the 17 countries, more than 70% of the population was quite or very satisfied with their life at the time. Peru, Bolivia and Ecuador are the countries where people were least satisfied with life. In these countries, it can be seen that less than 54% of the people surveyed indicated that they were quite or very satisfied with life.  (2012) So, from the descriptive analysis carried out on the basis of the information provided by Latinobarómetro, it can be seen there are significant differences between countries in the level of satisfaction with life. These results indicate that the happiest individuals are those who live in Costa Rica, Venezuela, Panama and Colombia, while the least happy are those in Peru, Bolivia and Ecuador. It must be noted that these results are consistent with the results obtained in [5] using econometric techniques.

IV. THE HAPPINESS OF LATIN AMERICANS ACCORDING TO SOCIAL NETWORKS
The boom that social networks are currently experiencing is well-known, and their great reach justifies their use as a medium to measure opinion, interest in a subject or a person or even feelings and moods [31] [32] [11].
Especially relevant is the use of social networks in marketing and publicity, the measurement of audiences, opinion surveys, popularity and even as previews of election results. Resorting to social networks to obtain a barometer of opinion is especially common in the social network Twitter, where, since its creation, it has been possible to know the number of followers or the effect of a speceific term or tag [8]. Keep in mind that there are also tools that facilitate more rigorous analysis and establishing relationships, measuring impacts, etc.
To summarize, the reasons that can justify choosing Twitter as a tool for measuring interest, opinion or mood are as follows: 1. Availability of an API (Application Program Interface): the existence of a public API makes it possible to make consultations and recover information in a relatively simple way, through the creation of simple computer programs that facilitate recovery, storage and analysis using different techniques ranging from basic statistics to machine learning. 2. Simple content based on text: the most usual type of message on Twitter is the short text message, owing to its origin from when messages were sent and received via SMS. This characteristic requires the meaning of the messages to be direct, specific and simple in most cases, which helps in their analysis. 3. Instantaneity and transience: the instantaneity and simplicity of the messages on Twitter make it a good mechanism for measuring what happens in almost real time or during a period of time. They are not reflexive, prepared publications but spontaneous, fast communications. 4. Profiling: many of the users on Twitter not only make comments but also have a public profile, which allows for their segmentation according to this data. 5. Geographical segmentation: within any mechanism for measuring opinion, a basic factor is knowing where we are measuring. On Twitter, this is possible through both the user profile and the location of a specific publication. 6. Global use: although Twitter is not the most used social network, it has many users and a very high level of participation [12].
The use of Twitter to measure subjective well-being as presented in this study is not completely new. As has been noted, there is a project called "hedometer" (http://hedonometer.org) that has taken a measurement of happiness (subjective well-being) in the United States of America [8]. This measurement is especially interesting as it demonstrates the possible use of social networks to measure happiness. In addition, it has other interesting characteristics, like being able to take the measurement in a large geographical area with a common language, and being a space where the use of social networks in general and Twitter in particular is very widespread.
Taking these characteristics as a framework of reference, a similar study has been undertaken in our case, in another relatively homogeneous geographical environment and in a common language. Specifically, the study was carried out for the Spanish-speaking countries of Latin America. Although a study with these characteristics can be valuable in itself, it was interesting to contrast the results with the results obtained when a traditional method of measuring happiness is used.
To produce the present research paper, the following considerations have been taken into account: 1. The recovery of "tweets" for a national geographical area is very complex and unreliable, since it can only be based on the data in the personal profile of each user, and this information is not usually contributed by the users. This is why we have decided to use the "tweets" recovered from the capitals of each country as a representative sample to analyze. This process is rather more simple than if we try to use the personal profile of each user, and more effective since Twitter allows consultations which indicate a geographical position and a sphere of interest. 2. The recovery of "tweets" has been done for a group of key words obtained, taking as a reference the group of key words that hedometer uses [8]. These key words are logically in English, which is why they have been translated. As we are dealing with key words, the idiomatic and semantic problems of translation can be managed. In any case, we have eliminated those that could present some problem. Obtaining this list (Table 2) has a certain value, since it was created on the basis of a thesaurus, considering the different words according to their meaning and impact as indicators of happiness based on the information provided by [8]. A thorough process of translation was applied to the original list, eliminating those words that make no sense in Spanish.

TABLE 2. LIST OF WORDS AND WEIGHTS
Source: compiled by the authors from the list using in the "hedometer".
3. The use of Twitter to make the ranking of the different countries according to inferred happiness on the basis of the contents of "tweets" has the problem of showing a strong dependence on the intensity or frequency of Twitter use in each country. The creation of a coefficient was produced by taking into account the studies that are compiled in the source in Table 3 as a reference. Once the plan for carrying out the study was established (how the data would be obtained and under what conditions), we proceeded to design an algorithm to extract the information. The extraction algorithm was executed on Twitter for the duration of the study in order to obtain a happiness ranking. The extraction and generation of the happiness ranking was done according to the following process: 1. For each country (City) on the list: a. For each word on the list: i. Recover the corresponding "tweets" b. They are added up c. The happiness factor of the word is applied 2. A total is obtained 3. The correction of Twitter use is applied 4. The list of countries is ordered according to the score obtained 5. The ranking is generated This process was executed on Twitter for two months to obtain a sample size large enough to be able to obtain significant results. The number of "tweets" used was 100,000.

V. HAPPINESS IN LATIN AMERICA ACCORDING TO TWITTER
As has been mentioned, to be able to apply the algorithm described in the previous section, the first step was the creation of the list of key words to be used in the study. As already noted, since they are key words, most of them can be translated directly. In some cases, however, problems arise because the direct translation does not work well or because the translated term generates noise on making reference to words in radically different contexts. For these cases, we opted to follow one of the two alternatives below: 1. In those cases where, even if the direct translation is not valid, there is an equivalent word or expression, we treat this equivalence as valid. 2. When the direct translation is not valid and there is no equivalent word or expression in the same context, we eliminate that word from the list.
The list of words with their respective weights used in this study is compiled in Table 2. These key words are the ones that were used to recover the "tweets." According to the considerations in the previous section, in the capture of data the previously mentioned algorithm to generate the ranking was applied. Likewise, the correction coefficient based on Twitter use was applied; this information is shown in Table 3. This is how a ranking of feelings of happiness was obtained for Latin American countries according to Twitter data, as appears in Table 4.  Table 3). It seems as though in those countries where the use of Twitter is greater, there is a strong upward bias, such that they appear in relatively high positions in the happiness ranking presented in Table 4. This might be because social networks have a viral, disseminating effect, so both positive and negative messages are spread, and as a result, the values are much more extreme than the simple indicative proportion of number of users. This could also be interpreted to signify that countries with a greater number of users not only have more users, but also more active users. On the contrary, some countries with relatively low coefficients of Twitter use like Costa Rica, Panama and the Dominican Republic, precisely because of the absence of the aforementioned viral effect, occupy relatively low positions in the ranking shown in Table 4, while in the ranking made on the basis of Latinobarómetro (Table 1), they are in high positions.
As part of the experiment and with the aim of finding a correction factor that encourages making future evaluations at different temporary moments and including other factors, we decided to calculate a weighting or adjustment factor that would allow us to equate the results obtained through the use of social networks with those derived from the Latinobarómetro surveys. One justification for calculating this weighting factor is to try to offer additional information that contributes to explaining the differences between using both methods to infer the happiness of Latin Americans. In considering this weighting factor, it is observed that the weighting necessary to adjust the result is greater in smaller countries with lower rates of Internet and social network use, which supports the previously formulated hypothesis for explaining the differences between the ranking of Tables 1 and  4.
In analyzing the content of Table 5, the case of Bolivia deserves to be highlighted. Although its position in the ranking with data from social networks (18th in Table 4) is not very different from its position in the Latinobarómetro ranking (15th in Table 1), on a quantitative level, it presents a very big lag compared to the other countries. Everything seems to indicate that once again we see a polarization of the results owing to the scant use of social networks in this country.

VI. CONCLUSIONS AND FUTURE RESEARCH
In this article, a first approach to measuring happiness in Latin America through the use of social networks is presented. Specifically, the social network used is Twitter, although we do not rule out the possibility of undetaking future studies with Facebook or other social networks. We have used Twitter because of its characteristics (ease of use, availability and popularity, geographical data, etc.). We have developed a process that permits the extraction of data and generation of a new ranking quickly and easily, which allows us to easily repeat the experiment with additional conditions, parameters and searches.
We can extract the following points as our main conclusions:  The measurement of happiness through the use of social networks seems viable, and it is tremendously simple compared to traditional methods (e.g., surveys).  The measurement of happiness through social networks like Twitter involves considering several factors in order to obtain reliable results. The most evident factors are the use of Internet and the use of social networks.  The method used in this work consists of inferring the feelings of social network users on the basis of a semantic analysis of the words used in their communications and messages.  It is possible to calculate, via objective and empirical means, factors that allow us to correctly interpret data collected through the use of social networks.
 In time, as the use of Internet and social networks increases, the use of these tools will be more precise.
As lines of future research, we propose the possibility of:  doing new studies which incorporate data gathered over longer time periods  including only countries with similar socio-economic conditions  refining the creation of that weighting factor which could be converted into a rating  including not only positive terms but also negative ones in order to improve reliability  doing other studies that, instead of key words, are based on iconographic elements like "smiley faces."