Breaking Social Media Bubbles for Information Globalization: A Cross-Cultural and Cross-Language User-Centered Sense-Making Approach

Xiaozhong Liu 1 , Daqing He 2 , and Dan Wu 3
  • 1 Indiana University Bloomington, , USA
  • 2 School of Computing and Information, University of Pittsburgh, Pittsburgh, USA
  • 3 School of Information Management, Wuhan University, Wuhan, China
Xiaozhong Liu, Daqing He and Dan Wu

Abstract

With the globalization of data, online social media plays an active role in spreading information and classifying people, and thinking about how to break the solidification of algorithms becomes critical. Current algorithmic research in the social media space often focuses on a single service or language, mainly due to the lack of a way to connect the different bubbles. The panel speakers described their various research activities in which they presented different perspectives on how to break the bubble. This article provides a summary of this interactive panel.

1 Introduction

Online social media such as Facebook and Twitter are not merely passive conduits of information, they also play an active role in its dissemination and curation (Xia, Yu, Gao, Gu, & Liu, 2017; Liu & Turtle, 2013). Algorithmic curation, that is, matching information to user characteristics and preferences, can result in the “filter bubble” (Liu, Xia, Yu, Guo, & Sun, 2016). While the world is becoming more and more globalized, users still get trapped in their own personalized bubble. They are exposed only to confirming opinions and information, leading to social and political polarization. At the same time, politics, geography, and language differences are separating user communities into online “information silos”, for example, Facebook and Twitter are blocked in China preventing Chinese social media users from accessing information beyond their culture and language. In contrast, most users prefer to access and consume information via their native languages (Liu, Yu, Gao, Xia, & Bollen, 2016b). The objective of this incubator session is to mobilize around how the ASIST community can build a sophisticated yet scalable platform that leverages large-scale, crowd-sourced knowledge bases to automatically cross-link concepts and communities across different comparable social media platforms, thus transparently exposing users and scholars to a wide variety of relevant information, breaking through filter bubbles and information silos.

Most of the existing researches in social media analysis focus on a single service or language, primarily due to the lack of methodology data to cross-link users, concepts, and online communities across social media and different filter bubbles. The proposed new approaches and platforms for research and development address this challenge by developing new approaches to leverage large and multicultural/linguistic crowd-sourced knowledge bases such as Wikipedia. We can identify the concepts or categories of topics in different cultures and languages through the computational methods of text mining and graph mining.

The proposed research project has the following objectives:

  1. From an algorithm perspective, to develop text and graph mining methods that connect concepts and entities across social media platforms and cross-language/cultural, for example, Twitter and Weibo, using a global knowledge base, Wikipedia;
  2. From the information science viewpoint, to quantify the empirical extent of user community separation by comparing social media platforms and to enable global comparison studies;
  3. Concerning users, to identify how they interact with information and with each other on social media platforms in different cultures and languages;
  4. Also concerning users, to develop methods for recommending information across multiple social media platforms.

2 Introduction

The objective of the panel is to develop a sophisticated yet scalable platform that leverages large-scale, crowd-sourced knowledge bases (e.g., Wikipedia) to automatically cross-link concepts and communities across different comparable social media platforms, thus transparently exposing users and scholars to a wide variety of relevant information, breaking through filter bubbles and information silos.

Most existing studies of social media analytics focus on single services or languages, mainly because of the lack of methods/data to cross-link users, concepts, and online communities across social media and different filter bubbles. The panel presenters proposed new methods to leverage large and multicultural/language crowd-sourced knowledge bases, such as Wikipedia, using computational methods in text mining and graph mining to identify the same concept or category of topics in different cultures and languages.

Understanding and comparing how different groups of users from various filter and culture bubbles consume and interact with similar topics remains an exciting but challenging research question. On the basis of the proved hypothesis, the panel presenters explore certain questions in more detail.

Liu Xiaozhong explained the influencing factors that break social media bubbles: users, theory, technologies, application, and data. He Daqing introduced technologies they developed from named entities. Wu Dan compared cross-culture and cross-language users within the bubble and studied users sharing similar topics on how they break the bubble. Below are detailed the presentations of each facilitator.

3 Presentations

3.1 Xiaozhong Liu: “Using User-centered Sense-making Approach to Break Social Media Bubbles”

Xiaozhong Liu's presentation started with social comparison. The first presentation was about policy study and it asked, “Do you support Trump's immigration or economy policy?” Various interest groups may have different answers, and from the viewpoints of information scientists, it was clear that they were much interested in comparing those groups (Liu, Li, Cifor, Liu, Zhang, & Si, 2019). The second was about social justice, which focused on the usage of insulting/violence/discrimination language. A few topics in recent years have invested in this critical issue. The last presentation was about the daily diet and purchase behavior of users with different health concerns like diabetes, cancer groups, depression groups, and so on (Shuai, Liu, Xia, Wu, & Guo, 2014).

Many information scientists may have a strong interest in these comparisons. There are two different approaches to answer this. The first approach is the small data approach, which is from data-driven, has very high-quality data, and is hypothesis-driven with high cost. The other is the big data approach, which can be much noisier, and the data-driven can be more dynamic with three-B models. It can also be a more high genius at low cost, with hopefully the data already available. That is to say, people with physical lung cancer and smoking can purposely create more data that is of high quality and hypothesis-driven to find the secrets of their consuming behavior. Big data can catalyze the different users and hopefully, it will give us something interesting. For these three cases, the first and second can be answered by social media like Facebook and Twitter, while the third can be answered with access to massive e-commerce data by some potential factors.

Xiaozhong Liu showed a famous picture from Facebook. Facebook is open to all users, but it hides something called a bubble. The bubble is defined as a state of intellectual isolation; users become more separated from information that disagrees with their viewpoints when they begin to spend more and more time on social media. They may share their cultural ideology for personal reasons, but not for passive reasons. They do not have a chance to step away from the bubbles.

Liu also showed another example. Some countries are isolated from the rest of the world and they are inside the bubbles that include the language bubble, culture bubble, and network bubble on its own Facebook or Twitter. It is not just culture but also the knowledge bubble that enables some interesting comparative study. Liu proposed suggestions for the above:

  1. If we were interested in modeling world-level knowledge, all the research findings based on a single social media system could be biased;
  2. The social networks or knowledge networks generated from a single system, or to say a single community, cannot fully represent people from all over the world.

Three years ago, Liu proposed something to enable global information access and comparison by using Wikipedia. The reasons for using Wikipedia are it is multilingual, linked, colloquial, and has the ontology to establish the topic network and organize topics.

Different topics like the G7 summit and Kim-Putin meetings belong to different low-level categories, including the 2019 controversies in the United States and in international relations. These categories are interconnected in a huge topic category tree which Liu calls the “3 million node category tree”. When one of the topics is organized in the big tree, not just each individual topic but also the topic categories at a lower or higher level can enable comparison. Liu presented one example called the Pseudo Global Social Media Network. Internet research led to obtaining Twitter and Weibo data. Both are productive to Wikipedia space and Wikipedia has the concept notes and category notes.

Liu's presentation then moved to sentiment comparison. Sentiment comparison can be innovative like the QA sentiment comparison, as for the same topic, the two feedback from two different groups is quite representative. The QA sentiment algorithm may help to extract dynamic sentiment information across diverse communities.

In 2019, two papers on QA sentiment analysis were published, which proposed a deep learning framework in the deep learning QA project to predict the QA sentiment. With the help of the algorithm, some interesting questions would be answered (Imaduddin & Fauziati, 2019; Sánchez-Rada & Iglesias, 2019). By extracting project data into explicit or implicit Wikipedia concepts from the text content, sophisticated features can be removed via deep NLP and graph mining from different kinds of data like QA, comment, dialog, logs, sessions, and so on to enable interesting comparison.

Another example that Liu talked about was sexual harassment data. These data had been collected from social media named “Safecity,” where the focus was on sexual harassment. They used a specific algorithm called “event detection” to extract the event location, harasser, trigger, and time information in textual information. Then the data will become a structured data. The structured data had been used to compare different sexual harassment information in India and America with the support of this algorithm. This paper was presented in the EMNLP conference in Hong Kong by Liu, in the following week.

The last but one part of Liu's talk focused on novel data, that is, using more creative ways to get some additional data, for instance, eCommerce data, social media, eBook, music, education, and so on. Nowadays, a lot of reading data is becoming more and more available.

The first example is to create a safe reading space for kids. In the reading, a machine learning algorithm would be developed to detect pornography contents, which are legal but harmful for kids. The words look safe, but are not very safe for kids to read. There are a lot of interesting findings, and one among them is the reading behavior of one user switch between different screens. As regards readers’ attention distribution, the results show that they spend 20–30 s on each screen for a normal reading, but on pornography contents, they spend a long time. Creating an attention model to identify the information can be attractive.

The second example is to deal with health information using eCommerce data. This biggest African dataset has 0.4 billion users and a 7 billion purchase record, which took 8 long years to track them. In all, 400 billion queries and 2.5 billion products plus metadata had been collected. These data are used to identify their health concerns, like whether they are healthy, whether they have cancer and depression, and so on. Bubbles for different groups can be created. The process of using the data-driven approach to find the factors leading to specific concerns of each group is in progress. The preliminary findings are that a diabetes user may have a 50+ in. TV, a large-screen phone, and enjoy white wine and fruit juice, use essential oils, and so on, while a user having depression uses a lot of diapers for kids because of pregnancy depression, uses cosmetics, and has a higher income. Their mobile phone can be pink in color and the size can be 256G.

The last example is about job postings. Bubbles are not just all over the world but also in the campus. There are different students, hence there are millions of jobs, thousands of courses, and different colleges and diplomas. Students are more isolated in their bubble because of education and the job posting interest. Other bubbles like student, course, college, and certification were created to predict students’ enrollment decisions. But the results show that their bubbles restrict their access. For instance, students move courses but fail because of bubble restriction.

Then in the last part of his presentation, Liu proposed the following cross-conclusion:

  1. User. Users have the inner factors and the willingness to break the bubble that they feel are very comfortable.
  2. Theory. Create a theoretical framework to enable cross-bubble comparison studies of cross-culture, cross-language, cross-network, and cross whatever possible, compared to study.
  3. Technologies. Technologies create an algorithm foundation to support the bubble comparison studies and whether finding some sophisticated algorithm to extract meaningful features can be important.
  4. Application. In application, everybody here has their own public interest. Bring it up here.
  5. Data. Explain where the data come from.

3.2 Daqing He: “Entity Level Generation and Representation”

Daqing's presentation focused on the natural language processing-based modeling of content to truly understand what is going on in the web. Traditionally, modeling content can be achieved at the word level, which provides a statistical topical level representation of the content (Sewalk, Tuli, Hswen, Brownstein, & Hawkins, 2018). But to truly understand what is going on in the social bubbles, identify the bubbles for better representation of the similarities and differences between different bubbles as needed.

Daqing believes that named entities can play an essential role in web content understanding (Artiles, Amigó, & Gonzalo, 2009). Named entities are real-world objects that could be denoted by a proper noun. By going through the text, users can identify the components, locations, events, and time and then build up an understanding of them. Once the named entity is identified, some kind of unique IDs that could be usually referred on the web are obtained. Users can also identify the names, the semantic type of entities, the triggers associated with entities, and relationships with others. Therefore, they can compare and link all objects together to build a rich representation (Hoffart et al, 2011).

Pound, Mika and Zaragoza (2010) conducted a survey showing that the majority of web users’ online searches were related to the entity, for example, 41% of them focused on different types of entities, 12% focused on the types of entities, about 5% focused on the attributes of the entities, and 1% focused on relationships. In total, about 60% of the information need is related to entities and their true relationships on the web.

There were some opportunities and challenges with entities. Opportunities mean that entities are language features that are much richer in semantic content compared to simple keywords. Users can use entities for event detection and cross-language retrieval (Wu, He, Ji, & Grishman, 2008), and at the same time, collect and aggregate information about a given entity from multiple documents and even multiple data collections. However, there are some challenges. For example, entities often need to be recognized and disambiguated because there are no clear boundaries for them and many potential entities can be referred in the text.

Next, Daqing introduced technologies they developed, where these technologies could have great potential to help identify some kind of entity-based representation.

The first technology is keyphrase generation (Meng et al., 2017). In a natural language text, certain text chunks can form phrases, which refer to either abstract concepts or concrete entities. Sometimes, a keyphrase can be abstracted from the text without the keyphrases being mentioned in the text. The idea of keyphrase generation is specifically aimed at extracting the mentions of concepts and entities from the text. In this particular study, academic papers were used for training and testing the technology.

Keyphrase generation work was published in ACL 2017 (Meng et al., 2017). Titles and abstracts of academic papers were used as input. The idea is to simulate the human process: read text, understand, and get contextual information, build up some kind of memories to associate the key content from the context to a set of words, summarize them into the most meaningful phrases, and then write down those phrases as an example of the abstract.

Recurrent neural network was used to represent the content and generate the phrases. The copy mechanism was used to select certain key phrases that are already recognized as important concepts mentioned in the context. RNN-based encoder–decoder structure was used to simulate the human memory and the read–write process. The RNN model keeps on compressing the representation of the text into a latent representation before getting into the generation process of predicting what kind of words are most possible in the decoding sequence.

In order to make the model more efficient in handling large text, certain words from the presentation would be removed. In this study, 5,000 words were kept for modeling the context, and the remaining 25,000 words at the long-tail were removed. However, there is a problem because a lot of words at the long-tails could still be potentially appearing in a keyphrase. The model, after removing the words, has no idea of the words. Therefore, the copying mechanism was developed to enable the model to know the words that are important in the context.

Entities and keyphrases are essential in the education domain, which can help us to better understand social media. In the education domain, it is very important to pick up the important concepts in an instruction material for building up the representation of the material (Thaker, Brusilovsky, & He, 2018). Concepts become a very important unit to understand what is going on in the text, and by using it to model what is the knowledge on those texts for students. Therefore, it is important to use the dense vector representation (i.e., embedding) of the entities/concepts as a way to represent the content, because this helps to perform computations (Le & Mikolov, 2014). In this case, the specific idea of using the concept works as a way to model other meanings of the presentation of the concept.

Next, Daqing presented a work of using Wikipedia as the huge open knowledge repository for obtaining better concept embedding (Thaker et al., 2018) on many domains, including a domain that will work on information. Many articles talked about the concepts of information travel and semantic relationship buildings based on semantic tile and the test associated with each other. On this basis, Daqing's team modeled the association of concept based on the Wikipedia structure and used it as the actual knowledge to try to build a context vector representation. The concept representation should be consistent with those conceptual contexts and reflect what is going on, which is expressed in the similar relationships between different PDF pages related to the concept. The above two can be combined as the same information to guide the generation of concept mining and give us an understanding of both the local context and the global human knowledge in Wikipedia. These two technologies would be important to model what is going on, link different bubbles together, and identify the similarities and differences between other bubbles.

Finally, Daqing proposed the potential applications to the project. The technology can provide semantic-rich and more human-friendly representation about what is going on in different contexts. The other datasets could be developed, trained, and transferred to the social media domain.

3.3 Dan Wu: “Research on Cross-Cultural and Cross-Language Users in Academic Social Network and eHealth Social Media Context”

Dan Wu's presentation is that to break the barrier of different bubbles, they need to deal with the problem from different angles and face the technical challenge. From the user aspect, Dan's presentation is based on the true existing work that she did on the two domains: academic social network and E-health.

The first case compares the users within the bubble to contrast different cultures and language users, while the other compares the users between different bubbles. People know that online social media such as Facebook and Twitter play an active role in research dissemination and data curation. These online social media have a lot of users all over the world. However, because of some political reasons, some social media have been blocked. For example, Facebook, Twitter, and YouTube cannot be accessed in China. While at the same time, the politics, geography, and language differences are separated by user communities into online “information silos.”

In order to communicate with researchers in different countries, many scholars choose to use ResearchGate as an academic social networking site (ASNS). ResearchGate can support the collection and dissemination of studies, as well as social interactions among worldwide researchers (Lee, Oh, Dong, Wang, & Burnett, 2019). It can also help researchers to look for papers and other scholars, keep up with research trends, and communicate with others. More importantly, it is accessible in China. Therefore, ResearchGate breaks the language bubbles, which is the case they want to study: comparing the behavioral characteristics of ResearchGate users who come from different countries.

Dan's presentation moves to the first case they studied. It is about the cross-cultural users’ searching as learning (SAL) on ASNSs. Previous studies have revealed that, by searching on ASNSs, researchers look for scholars and literature and keep up with recent trends in research (Nández & Borrego, 2013). These activities will eventually foster their discovery and change their knowledge structures (Gwizdka, Hansen, Hauff, He, & Kando, 2016; Rieh, Collins-Thompson, Hansen, & Lee, 2016). Therefore, they want to study how SAL on ASNSs changes the cross-cultural user's knowledge structure, and the impact on cross-cultural user's academic social network while they SAL on ASNSs.

Existing theoretical models of SAL on ASNSs were reviewed thoroughly. Based on the existing theoretical models, they built up a new model and proposed related hypotheses. This new model included five parts: (1) motivations for SAL, (2) identification of needs, (3) search phase of information needs, (4) search phase of social needs, and (5) the outcomes of SAL on ASNSs. A survey questionnaire was designed to test and verify this proposed model.

Using ResearchGate as an ASNS, they sent messages and questionnaires to ResearchGate members. Anyone who was interested in this study could fill in the survey questionnaire. A total of 5,000 questionnaires were sent, of which 359 valid responses were obtained. The participants were cross-cultural and cross-language who came from Europe, Asian, North American, Oceania, Africa, and South America. Most of them came from Europe and Asia, followed by North America and South America. Then, data analysis and comparison were conducted.

Dan's survey results showed the demographic information of the participants. The results also revealed that information need and social need were two types of conditions that triggered searching on ASNSs. The outcomes of searching were also revealed and verified. A final theoretical model was put forward after verifying and modifying their initial proposed model.

Comparisons of the impact of different cultures on user SAL were also made. They found that users from Asia, Africa, Oceania, South America, and Europe had significantly different motives of SAL on ASNSs. Differences were also found when comparing users’ information needs and social needs. For users from different countries, the outcomes of SAL were totally different. This is the first study they did. ResearchGate was selected because it was a bubble that already had users from different cultures and different languages. They think this study can support the work for both quantitative and qualitative studies. It can also help leverage the large-scale social media data and investigate various questions in the area of “culture analytics”.

In the future, they believe this study can help to design a more meaningful user system interface for cross-cultural and cross-language users to satisfy their different information needs. This is the case of comparing users within the bubbles.

Another case is about studying the social network user's role in sharing data from personal wearable devices. What motivates them to study this is that they find that in China, a lot of people use multiple trackers to record their data, and they also hope to share their data on the social network (Feng & Agosto, 2016). This is the case not only in China but all over the world (Feng & Agosto, 2019). In this case, they studied the social network platform “Weibo.” They analyze the Weibo data in China. But in the future, they think it will be possible to compare the data that app users share on social platforms under different topics (Aletras & Chamberlain, 2018). Based on these social network data, the users’ persona is depicted. The data of this study are derived from the SINA Weibo user data, collected by the crawler tool, and searching for the topic “#Xiaomi sports# and #Huawei sports health#.” The health data content shared by the users under the topic and the basic information of the user are obtained.

The study determined that users share their data on SINA Weibo while using a Xiaomi or Huawei tracker. Finally, they got some data and used it to describe the users’ characters. Here is some basic statistic of the users. These are the regional distribution of our users, some findings about the behavioral characteristics like the distribution microblog posts on health data, and the correlation analysis of the user characteristics and health data sharing ee (Mahmud, Fei, Xu, Pal, & Zhou, 2016).

In the last part of her presentation, Dan claimed that by doing user clustering, they cluster the users into active users, opinion leaders, potential active users, and marginal users and describe their characters of sharing their variable tracker data. Dan thinks this case is only to study the data in China, but there are similar cases all over the world. They think that in this way, they can do the cross-cultural and cross-platform comparison to study the effect of users’ data sharing behavior, to break the bubbles, and to understand and compare how different groups of users from various future and cultural bubbles consume and interact with similar topics (Lazaridou, Ntalla, & Novak, 2016). In this study, she proposed that e-health and personal trackers sharing personal health data on social networks is just one case. They can study similar topics to break the bubbles to compare the users from different cultures and languages.

4 Questions and Discussions

After the panel presentations, there was a question and discussions session. Following is a summary of the questions and discussions.

Question:Fake news and political leanings can limit openness and creativity. If people in a particular bubble can only see one side of the story, it has a lot of potential to enable them to see the other side. Could you talk about how to help people to break the bubble?

Xiaozhong: For fake news detection, if you paid attention in the recent year, the LP, ACL, or MLE conferences have a better study focus on this topic. The main contribution is to build a machine learning model to classify which is fake and which is not fake. But we are more like information science. We are more interested in the deep reason of the impact of fake news. We are more interested in what happened and why there is fake news. If people know that this just fake news, what is the motivation to click on and share this information? So I would say this is a very interesting research.

Question:Could you talk about the ways of coping with the variety of information. I think that's an interesting side of the theory as well, the theory of how we interact with it.

Daqing: It is so much easier to get information with the help of recommended systems. Probably, for the first time in human history you don’t really have to get new information, the information just keeps coming to you. Also, the designer recommended systems just make it much easier to recommend similar content based on the performance. So, we see this happening from the technology side. From the human side, it is also probably the first time that no matter what kind of opinions you have, you will be able to find many more users like yourself. Therefore, it is the first time in human history that humans feel so strong about what opinions they have that encourage people to build this bubble. Sometimes, I also think that lots of people are different people: they know they are the other side of the same story, they actually want to know what is going on there, and therefore build a better understanding of the event. And this is the motivation by which we want to build this technology and to study this phenomenon so that we can start to see whether there is a factor technology platform.

Xiaozhong: I want to know whether the user purposely chooses to stay inside of the bubble, while some of them are willing to step out. For some kinds of topics, I think some people are super satisfied and stay inside of the bubbles; they don’t have any motivation to step out or at least at that moment they don’t. So as a scientist or as a policymaker, how should one motivate them to access more global information. Outside information, he or she may choose to believe that they are fake or biased news and so they choose not to believe them, and this may even urge their belief to change in early days.

Dan: I think both the human side and technology side are important. Furthermore, to break up the bubble, from the technology side, the system should provide the news and all the information without any e. But we can study users’ cognitivity from their different cultures, what they think, what they need. They just believe this news or do not believe this news. For political news, maybe they have strong biases, but for other information, like health or academics, maybe they do not have a strong bias.

Question:If breaking bubbles is difficult and time-consuming, so maybe they are only trying to bring one that takes hours or days. It is kind of hard for me to break something which I don’t like. it upsets me, it affects my mood, and I want to get away from it. I am looking for something interesting like a singer. Maybe I spent a half-hour to find something inside my bubbles that can make me happy. It is hard for people to be aware that every bubble kind of holds back their emotions, and why do they do that. We have our fulltime jobs, which is already super difficult. They just want some fun.

Daqing: Like we discussed, sometimes, they want to break the bubble, or sometimes they don’t want to break the bubble, because it is so comfortable to stay in there and communicate with people who have the same opinions as you. There is a lot of emotion involved, but at the same time, we also know that, for example, in the United States there are so many heroes on social issues, and just because we are staying in the bubbles and we cannot believe it, whatever we see is the truth. It seems that we need mechanisms to help us to raise this attention. Maybe everybody must be a little painful, more painful than before, but we get a better society.

Xiaozhong: If you remember matrix number one, you will find that there is a very interesting sign that they have two right feels, the blue one and the red one. If you know that you have an inside bubble, like where this is a fake war, you just want to take the red one if you stay inside and feel very comfortable or want to take a risk; and with the blue one you can do something super aggressive. This a personal decision. I do agree with you but one thing I said now I’m coming back to the computer scientist role here. If I say the system designer or computer scientist algorithm guy is strongly interested in groups, find two groups of people: the blue and red. I want to find out because I know we all stay inside the bubble, but some want to break the bubble and take a step out of the bubble, while others want to stay inside a bubble. I would say two things that strongly interest me.

First, if you are intensely interested in a bubble or want to be outside the cognitive model behind that. I want to know the reason cognitively that all of us are interested in. If we can analyze that, we can use the algorithm to classify the people, the blue guy or the red guy, and build a recommendation result. If your reading states you want to stay inside, it is okay. We just give you a result inside the bubble. It is your choice and I want to satisfy you and give you such an algorithm. But for the others, I want to give them more explore-research results, which may satisfy them and open another door. I mean there is no right or wrong.

Question:It seems like the assumption is that getting outside of the bubble changes people's opinions. Is there anything that points to somebody who gets stuck in the wrong bubble, or they get stuck in the opposite bubble and the more they know the more they get hurt? And on the opposite side, is there any evidence of people being stuck in their bubble that they agree with, but becoming disenchanted with that group that they believe in and moving farther the other way. I’m wondering if there's any researcher, or any evidence that says that it is possible, but getting out of the bubble and staying in the bubble can have the opposite effects.

Daqing: I don’t know any specific social studies on this, but what you describe can certainly happen to certain people. I think you mentioned in these keys that you know there is another side, and what is the other side saying? And that is already indicating that you are not totally in a bubble. You are aware of the other bubbles, and you also self-consciously evaluate your carbon bubble to see what position it is. Therefore, you still have your own identity and your own opinions on certain things. The bubbles are not really driving it to be aware of certain directions. In this case, I think you already have your own opinion, since you just try not to compare your opinions with the bubbles in general. This probably indicates that there are a lot of people like you who are aware of what is going on inside and outside and make a decision. We want to enable more people to be able to do this. From the technology side, if we enable more people to see that there is more world outside the bubble and this is what is going on in the other bubbles, then more and more related people would probably be like these.

Question:What is the difference between the time period when people could only read the same newspaper and now? Why are we trying to bring people out of that bubble? Is it just the sheer magnitude of the amount of information that we are getting?

Daqing: One thing I can identify is that I think two things can really probably change. One is, no matter which side you are on, at least all those people who wrote those newspaper articles or were professionally trained, the journalist had certain standards. They have different opinions but they still maintain the general status, and they must maintain that. But on social media nowadays, everybody can express their opinions and there are no standards to check whether whatever you say is true. That could make things quite different from before and now. Another thing is that in the past, whatever the news, you got it daily. The iteration is much slower now; it is like a few seconds, a few minutes, and you have an intuition. And that would just very quickly expression lies into a much bigger worse. I think these are probably two things that make things much worse now than before.

Xiaozhong: An excellent example is, I come from Indiana State, which is quite right. But if you come from New York State of California, you get the environment which is there. One study I am strongly interested in is this, because this is an election-year and I know a lot of studies based on the present elections. One study I find really interesting is the group of people who strongly love or are against Trump? And then ask them: what is your dairy information resource? For some of them, I mean, for example, the strong love for Trump or strongly against Trump. When we purposely give them the information from the other side and try to push them in another bubble, this is the experiment environment. Let us track them some months later. Will their belief have reinforced or somehow changed? If we put them in a different bubble, just access information, is that enough, or they can put them in another bubble where they can communicate with other members and investigate why you don’t like or like Trump? What is going on there? After communication, maybe magical things will happen.

Question:If you put someone on the other side bubble, will they go on to say that there's no way I can believe it or there's no way I can support this. Will they think this is all fake news or it just increases the level of anger that how dare these people say these things when the truth is the other side?

Daqing: It is interesting. And don’t forget we have other features as well. Because if you think about it mathematically, this can be a regression model, so people who change or do not change, these are just linear requests, just a logistic regression model. But behind that, we know for example the impacts like male, female, age, education level, income, news, internet access, and performance. We have all this information to attract them for a few methods that we can create in your reaction model to analyze the human factors behind that.

References

  • Aletras, N., & Chamberlain, B. P. (2018). Predicting Twitter user socioeconomic attributes with network and language information. Proceedings of the 29th on Hypertext and Social Media, 20–24. doi:

    • Crossref
    • Export Citation
  • Artiles, J., Amigó, E., & Gonzalo, J. (2009). The role of named entities in web people search. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (Vol. 2), 534–542. doi:

    • Crossref
    • Export Citation
  • Feng, Y., Agosto, D. E.(2016, May). Long-term management of personal health information generated by activity trackers. Paper presented at the ACM Conference on Human Factors in Computing Systems, San Jose, CA, USA. Retrieved from https://www.researchgate.net/publication/301779910

  • Feng, Y., & Agosto, D. E. (2019). From health to performance: Amateur runners’ personal health information management with activity tracking technology. Aslib Journal of Information Management, 71(2), 217–240. doi:

    • Crossref
    • Export Citation
  • Gwizdka, J., Hansen, P., Hauff, C., He, J. & Kando, N. (2016). Preface. SIGIR 2016 Workshop: Search as Learning. Retrieved from http://ceur-ws.org/Vol-1647/preface.pdf

  • Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., ... & Weikum, G. (2011). Robust disambiguation of named entities in text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 782–792. Retrieved from https://www.aclweb.org/anthology/D11-1072

  • Imaduddin, H., & Fauziati, S. (2019). Word embedding comparison for Indonesian language sentiment analysis. 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), 426–430. doi:

    • Crossref
    • Export Citation
  • Lazaridou, P., Ntalla, A., & Novak, J. (2016). Behavioural role analysis for multi-faceted communication campaigns in Twitter. Proceedings of the 8th ACM Conference on Web Science, 344–345. doi:

    • Crossref
    • Export Citation
  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In E. P. Xing & T. Jebara (Eds.), Proceedings of the 31st International Conference on Machine Learning, 1188–1196.

  • Lee, J., Oh, S., Dong, H., Wang, F., & Burnett, G. (2019). Motivations for self-archiving on an academic social networking site: A study on researchgate. Journal of the Association for Information Science and Technology, 70(6), 563–574.

    • Crossref
    • Export Citation
  • Liu, X., & Turtle, H. (2013). Real-time user interest modeling for real-time ranking. Journal of the American Society for Information Science and Technology, 64(8), 1557–1576.

    • Crossref
    • Export Citation
  • Liu, X., Xia, T., Yu, Y., Guo, C., & Sun, Y. (2016). Cross social media recommendation. 10th International AAAI Conference on Web and Social Media (ICWSM), 221–230.

  • Liu, X., Yu, X., Gao, Z., Xia, T., & Bollen, J. (2016). Comparing community-based information adoption and diffusion across different microblogging sites. Proceedings of the 27th ACM Conference on Hypertext and Social Media, 103–112. doi:

    • Crossref
    • Export Citation
  • Liu, Y., Li, Q., Cifor, M., Liu, X., Zhang, Q., & Si, L. (2019). Uncover sexual harassment patterns from personal stories by joint key element extraction and categorization. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2328–2337. doi:

    • Crossref
    • Export Citation
  • Mahmud, J., Fei, G., Xu, A., Pal, A., & Zhou, M. (2016). Predicting attitude and actions of Twitter users. Proceedings of the 21st International Conference on Intelligent User Interfaces, 2–6. doi:

    • Crossref
    • Export Citation
  • Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., & Chi, Y. (2017). Deep keyphrase generation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1), 582–592. doi:

    • Crossref
    • Export Citation
  • Nández, G., & Borrego, Á. (2013). Use of social networks for academic purposes: A case study. The Electronic Library, 31(6), 781–791.

    • Crossref
    • Export Citation
  • Pound, J., Mika, P., & Zaragoza, H. (2010). Ad-hoc object retrieval in the web of data. Proceedings of the 19th International Conference on World Wide Web, 771–780. doi:

    • Crossref
    • Export Citation
  • Rieh, S. Y., Collins-Thompson, K., Hansen, P., & Lee, H. J. (2016). Towards searching as a learning process: A review of current perspectives and future directions. Journal of Information Science, 42(1), 19–34.

    • Crossref
    • Export Citation
  • Sewalk, K. C., Tuli, G., Hswen, Y., Brownstein, J. S., & Hawkins, J. B. (2018). Using Twitter to examine Web-based patient experience sentiments in the United States: Longitudinal study. Journal of Medical Internet Research, 20(10), 1–15.

  • Sánchez-Rada, J. F., & Iglesias, C. A. (2019). Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison. Information Fusion, 52, 344–356.

    • Crossref
    • Export Citation
  • Thaker, K. M., Brusilovsky, P., & He, D. (2018). Concept enhanced content representation for linking educational resources. 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 413–420. doi:

    • Crossref
    • Export Citation
  • Thaker, K., Brusilovsky, P., & He, D. (2019). Student modeling with automatic knowledge component extraction for adaptive textbooks. iTextbooks@ AIED (pp. 95–102).

  • Wu, D., He, D., Ji, H., & Grishman, R. (2008). The effects of high quality translations of named entities in cross-language information exploration. 2008 International Conference on Natural Language Processing and Knowledge Engineering, 1–8. doi:

    • Crossref
    • Export Citation
  • Shuai, X., Liu, X., Xia, T., Wu, Y., & Guo, C. (2014). Comparing the pulses of categorical hot events in Twitter and Weibo. Proceedings of the 25th ACM Conference on Hypertext and Social Media, 126–135. doi:

    • Crossref
    • Export Citation
  • Xia, T., Yu, X., Gao, Z., Gu, Y., & Liu, X. (2017). Internal/external information access and information diffusion in social media. Proceedings of iConference 2017 (Vol.2), 1–5. Retrieved from http://hdl.handle.net/2142/98867

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Aletras, N., & Chamberlain, B. P. (2018). Predicting Twitter user socioeconomic attributes with network and language information. Proceedings of the 29th on Hypertext and Social Media, 20–24. doi:

    • Crossref
    • Export Citation
  • Artiles, J., Amigó, E., & Gonzalo, J. (2009). The role of named entities in web people search. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (Vol. 2), 534–542. doi:

    • Crossref
    • Export Citation
  • Feng, Y., Agosto, D. E.(2016, May). Long-term management of personal health information generated by activity trackers. Paper presented at the ACM Conference on Human Factors in Computing Systems, San Jose, CA, USA. Retrieved from https://www.researchgate.net/publication/301779910

  • Feng, Y., & Agosto, D. E. (2019). From health to performance: Amateur runners’ personal health information management with activity tracking technology. Aslib Journal of Information Management, 71(2), 217–240. doi:

    • Crossref
    • Export Citation
  • Gwizdka, J., Hansen, P., Hauff, C., He, J. & Kando, N. (2016). Preface. SIGIR 2016 Workshop: Search as Learning. Retrieved from http://ceur-ws.org/Vol-1647/preface.pdf

  • Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., ... & Weikum, G. (2011). Robust disambiguation of named entities in text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 782–792. Retrieved from https://www.aclweb.org/anthology/D11-1072

  • Imaduddin, H., & Fauziati, S. (2019). Word embedding comparison for Indonesian language sentiment analysis. 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), 426–430. doi:

    • Crossref
    • Export Citation
  • Lazaridou, P., Ntalla, A., & Novak, J. (2016). Behavioural role analysis for multi-faceted communication campaigns in Twitter. Proceedings of the 8th ACM Conference on Web Science, 344–345. doi:

    • Crossref
    • Export Citation
  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In E. P. Xing & T. Jebara (Eds.), Proceedings of the 31st International Conference on Machine Learning, 1188–1196.

  • Lee, J., Oh, S., Dong, H., Wang, F., & Burnett, G. (2019). Motivations for self-archiving on an academic social networking site: A study on researchgate. Journal of the Association for Information Science and Technology, 70(6), 563–574.

    • Crossref
    • Export Citation
  • Liu, X., & Turtle, H. (2013). Real-time user interest modeling for real-time ranking. Journal of the American Society for Information Science and Technology, 64(8), 1557–1576.

    • Crossref
    • Export Citation
  • Liu, X., Xia, T., Yu, Y., Guo, C., & Sun, Y. (2016). Cross social media recommendation. 10th International AAAI Conference on Web and Social Media (ICWSM), 221–230.

  • Liu, X., Yu, X., Gao, Z., Xia, T., & Bollen, J. (2016). Comparing community-based information adoption and diffusion across different microblogging sites. Proceedings of the 27th ACM Conference on Hypertext and Social Media, 103–112. doi:

    • Crossref
    • Export Citation
  • Liu, Y., Li, Q., Cifor, M., Liu, X., Zhang, Q., & Si, L. (2019). Uncover sexual harassment patterns from personal stories by joint key element extraction and categorization. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2328–2337. doi:

    • Crossref
    • Export Citation
  • Mahmud, J., Fei, G., Xu, A., Pal, A., & Zhou, M. (2016). Predicting attitude and actions of Twitter users. Proceedings of the 21st International Conference on Intelligent User Interfaces, 2–6. doi:

    • Crossref
    • Export Citation
  • Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., & Chi, Y. (2017). Deep keyphrase generation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1), 582–592. doi:

    • Crossref
    • Export Citation
  • Nández, G., & Borrego, Á. (2013). Use of social networks for academic purposes: A case study. The Electronic Library, 31(6), 781–791.

    • Crossref
    • Export Citation
  • Pound, J., Mika, P., & Zaragoza, H. (2010). Ad-hoc object retrieval in the web of data. Proceedings of the 19th International Conference on World Wide Web, 771–780. doi:

    • Crossref
    • Export Citation
  • Rieh, S. Y., Collins-Thompson, K., Hansen, P., & Lee, H. J. (2016). Towards searching as a learning process: A review of current perspectives and future directions. Journal of Information Science, 42(1), 19–34.

    • Crossref
    • Export Citation
  • Sewalk, K. C., Tuli, G., Hswen, Y., Brownstein, J. S., & Hawkins, J. B. (2018). Using Twitter to examine Web-based patient experience sentiments in the United States: Longitudinal study. Journal of Medical Internet Research, 20(10), 1–15.

  • Sánchez-Rada, J. F., & Iglesias, C. A. (2019). Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison. Information Fusion, 52, 344–356.

    • Crossref
    • Export Citation
  • Thaker, K. M., Brusilovsky, P., & He, D. (2018). Concept enhanced content representation for linking educational resources. 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 413–420. doi:

    • Crossref
    • Export Citation
  • Thaker, K., Brusilovsky, P., & He, D. (2019). Student modeling with automatic knowledge component extraction for adaptive textbooks. iTextbooks@ AIED (pp. 95–102).

  • Wu, D., He, D., Ji, H., & Grishman, R. (2008). The effects of high quality translations of named entities in cross-language information exploration. 2008 International Conference on Natural Language Processing and Knowledge Engineering, 1–8. doi:

    • Crossref
    • Export Citation
  • Shuai, X., Liu, X., Xia, T., Wu, Y., & Guo, C. (2014). Comparing the pulses of categorical hot events in Twitter and Weibo. Proceedings of the 25th ACM Conference on Hypertext and Social Media, 126–135. doi:

    • Crossref
    • Export Citation
  • Xia, T., Yu, X., Gao, Z., Gu, Y., & Liu, X. (2017). Internal/external information access and information diffusion in social media. Proceedings of iConference 2017 (Vol.2), 1–5. Retrieved from http://hdl.handle.net/2142/98867