Due to the rapid development of Web 2.0 and social media, the Online Learning Community (OLC) is increasingly being utilized by academic institutions to create a more convenient learning environment (Fariza, 2019; Wu, Hsieh, & Yang, 2017). OLC establishes a virtual social form through teaching, research and other activities, with interactive learning, collaborative learning and independent learning. In summary, OLC consists of three basic elements: technology, teaching, and academic sentiment interaction. The purpose of improving academic sentiment interaction is to cultivate learners’ sense of belonging to the community, so that learners are willing to stay in the community for a long time and maintain learning motivation at a high level (Cho, Kim, & Choi, 2017). Academic sentiments are generally hidden in the text records of learning community activities, such as documents, statements, and sentences. Through the techniques of sentiment analysis, weight calculation, and semantic understanding, the sentiment experience related to learning processes can be observed. In this regard, academic sentiment mining (AEM) can gain insights from the comments in OLC to analyze the factors affecting learning outcomes, which is of great significance for the improvement of teaching theories and methods (Kohoulat et al., 2017).
In the realization of AEM, much attention has been paid towards discoveries of the expressed sentiments in the applications of academic recommender system (Kaklauskas, Zavadskas, & Seniut, 2013), opinion leader identification (Li, Ma, & Zhang, 2013), forum topic mining (Cheng et al., 2015; Colace, Santo, & Luca Greco, 2014), and community quality analysis (Ghiasifard et al., 2015), etc. However, all the available methods mentioned above are unable to highlight the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularity-levels. In the light of recent studies, current paper aims at constructing topic sentiment analysis in OLC. Our ultimate goal is to obtain a list of terms relevant to some learning topic and to visualize the association relationships based on the sentiment classification in an interactive way.
To access a certain plaint set without checking all the document information, current paper aims at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularity-levels. To enact this need, we first proposed a topic analytics method with actual data sets from the website (
The following proposals are given in the paper.
A topic analytics method is proposed to analyze and extract potential topics in OLC.
On the basis of college students’ feedback, a novel approach is developed to identify the topic sentiment by measuring the sentiment distance.
In addition, the hierarchical and associated relationships as well as the granularity of sentiment information are obtained.
The rest of the paper is constituted as follows: Section 2 introduces a related survey of methodologies in the area of topic detection, sentiment analysis, and sentiment concept clustering. Section 3 gives introductions about Latent Dirichlet Allocation (LDA) and Formal Concept Analysis (FCA). The topic sentiment analysis method is proposed in Section 4. Section 5 contains the illustration and implementation of the proposed method followed by the results and discussions given in Section 6. Section 7 presents the conclusions and future work followed by the acknowledgment and references.
The OLC contains a large amount of information, which can be divided into two parts: learning resources and student review information (Shea, Li, & Pickett, 2006). How to accurately extract the topic of students’ attention and make corresponding optimization according to the sentiment of students’ comments has become a key link to improve the quality of community service and enhance the learning effect of students. To achieve this goal, many scholars have carried out extensive and in-depth research, which can be summarized into three main steps: topic detection (Lu et al., 2013), sentiment analysis (Nan & Wu, 2010), and sentiment concept clustering (Pappas et al., 2017). In this regard, various investigations and studies have been put forward to optimize the quality of data mining in OLC. In order to explain the proposed method in an understandable way, this section focuses on the detection methods of community topics and analyzes different sentiment distribution models according to the corresponding topics.
The topic detection model is a kind of probability generation model for text content by simulating the human mind process to find the best topic set and its vocabulary. The existing topic models mainly include: latent semantic analysis (LSA) (Martínez, 2015), probabilistic semantic based indexing model (PLSI) (Parvathy, 2016) and Latent dirichlet allocation (LDA) (Yue, Barnes, & Jia, 2017).
The LSA method implements the representation of documents on low-dimensional implicit semantic spaces by introducing semantic dimensions. Although the model is capable of constructing text representation without dictionary spaces, the basis of the LSA methodology is still derived from linear algebra, generating huge amount of negative numbers in various dimensions. To deal with this problem, Hofmann proposed a probabilistic semantic based indexing model (PLSI) (Hofmann, 1999). The aim of PLSI is to emphasize the semantic interpretability of topic text based on implicit semantic indexing, which is unable to deal with the over-fitting problems caused by massive text. In order to improve the parameters from PLSI, which cannot be linearly changed as the document set grows, Blie proposed the LDA (Latent Dirichlet Allocation) model to retrieve potential topics, representing high-dimensional word space with low-dimensional topic space (Blei, Ng, & Jordan, 2003). The LDA model is a multi-layer unsupervised Bayesian network that has been widely used to mine document subject knowledge. The LDA-based approach for online community can be summarized into two categories. The first aspect is to identify similar topics under different time segments and analyze the evolution trends. Chu and Li (2010) proposed a method to realize the evolution of the topics. They utilized the original corpus for topic classification. Nagori developed a content-based recommended system to personalize the e-learning systems (Nagori & Aghila, 2012). They exploited the topic model by introducing the similarity metrics. Yang (Yang, Zhang, & Shi, 2014) adjusted the priori parameters of the model to find changeable topics in the text. Ge extracted the hidden micro blog topics to emerge topics that need to be expressed in the community (Ge, Chen, & Du, 2013). The second aspect is the combination with other models to enhance deep semantic relationships of the topic. Santosh, Vardhan, and Ramesh (2016) focused on the analysis of the feature attributes of online product reviews. They proposed the LDA model to obtain the feature keywords of the product and combined the feature ontology tree (FOT) to improve the accuracy of subject detection. Cerulo and Distante (2013) obtained a topic-terms matrix by developing a topic recognition model, which was utilized to form a formal context to constructing a theme concept lattice for topic-driven navigation. Zhong et al. (2018) designed an evaluation framework for the quality of student comments in online communities. They considered the dimensional characteristics of online commentary data quality, and constructed a set of topic features. To sum up, the current methods focus on the evolution analysis of community topic mining, semantic relationship enhancement, and probabilistic topic modeling, ignoring the hierarchical relationship between topics and sentimental analysis of students’ feedback.
In the process of topic detection, adding sentiment analysis can identify the sentiment changes from the online students implied in the topic. Therefore, it is necessary to identify sentiment distributions according to the corresponding topic. Sentiment analysis, also known as opinion mining, is the process of analyzing, processing and classifying subjective texts with sentiment techniques. At present, the mainstream sentiment analysis methods can be divided into three categories. The first aspect is to analyze the text by constructing a sentiment dictionary, which mainly relies on the qualities of sentiment lexicons with specific semantic rules. Pointwise Mutual Information (PMI) and Latent Dirichlet Allocation (LDA) are often used in constructing sentiment lexicons, among which PMI can be used to judge the sentiment tendency of words, while LDA is utilized to extract sentiment words from corpus (Li, Ba, & Huang, 2015). Turney and Littman (2003) developed the PMI algorithm to extend the sentiment dictionary, and then the semantic polarity algorithm is proposed to analyze the sentiment tendency of the text, which improves the accuracy of text data classification. Yang, Peng, and Chen (2014) proposed a LDA-based method to constructing a specific domain sentiment dictionary on the basis of the existing public sentiment dictionary, where the extracted topic words are viewed as a priori knowledge from the corpus. The second aspect is focused on mining the sentiment features of the text based on Machine Learning (ML), such as Support Vector Machine (SVM) (Liu, Bi, & Fan, 2017), Naive Bayes (NB) (Shirakawa et al., 2017), Maximum Entropy (ME) (Ficamos, Yan, & Chen, 2013). Vinodhini (2014) designed a hybrid formwork of SVM and principal component analysis (PCA) to improve the sentiment classification accuracy by reducing the complexity of the sentiment mining model. Mertiya and Singh (2016) proposed an unsupervised polarity selection method to determine the polarity of tweets via merging NB and adjective analysis theory. Xie et al. (2017) extracted the seed sentiment words from Wikipedia by using probabilistic latent semantic analysis, which are used as the input matrix of the ME model. Meanwhile, to classify sentiment, they used entropy classification theory to select sentiment features. In addition, the last aspect is the deep-learning based approach by converting word embedding into a text vector to extract deep sentiment features, which mainly includes Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN). Shin, Lee, and Choi (2017) integrated lexicon embeddings, attention mechanisms into CNN to analyze sentiment features with less noisy words. Ethemet, Aysu and Fazli (2018) proposed a cross-language emotion analysis model, which can realize sentiment analysis based on CNN under the condition of small corpus. Although many researchers have put a lot of efforts into improving the sentiment classification of online communities for practical work, there is still a lack of evaluation in the sentiment unit combination, especially when it comes to the OLC. Since the sentiment analysis of students’ learning is closely related to the context where the topic is located, it is necessary to establish a set of association rules with contextual awareness. To enact this need, we introduce formal concept analysis theory into online sentiment analysis by exploring the sentiment association rules between students.
To sum up, current paper tries to make improvements in two ways. Firstly, the granularity analysis of learning topic for visualizing the hierarchical relationships is considered. Afterwards, we stay focus on finding the negative sentiment form students’ comments on the basis of sentiment scoring calcualtion to form the basic association rule sets.
Latent Dirichlet Allocation (LDA) is a three-layer Bayesian probability network, which assumes that documents in the corpus select a topic based on a certain probability, and each topic also selects a term based on a certain probability. Therefore, a document is a mixture of multiple topics, and a topic is also a mixture of multiple terms. Suppose the topic distribution vector in the document is
Formal concept analysis (FCA) is a hierarchical concept construction theory based on Galois connection, which is utilized to describe the domain knowledge in depth on the basis of the mapping relationships between objects and attributes. The FCA theory consists of four basic notions of formal context, formal concept, partial ordering, and concept lattice. To further analyze the collections of documents in OLC by referring to Ren’s paper (Ren, Ling, & Yao, 2018), four definitions are given separately.
(Formal context) (Wei et al., 2019) A formal context is represented as a triple
(Formal concept) (Wei et al., 2019) Let us suppose
The inheritance relationship between different formal concepts can be utilized to construct a complete concept lattice through partial order relations, which is defined as follows.
(Partial ordering) (Zhang, Wei, & Qi, 2005) Let us suppose (
(Concept lattice) (Zhang, Wei, & Qi, 2005) Let us suppose ≺ be the set of partial orderings among the whole formal concepts.
Current paper focuses on the following steps to demonstrate the advantages of utilizing LDA for opinion mining: Firstly, when traditional machine learning methods are applied to sentiment classification, the classification effect is unstable, and most of them are supervised methods, which require a certain number of labeled training samples. The manual labeling process is relatively time-consuming and labor-intensive with poor field portability. Therefore, unsupervised learning algorithms have become an important research direction for sentiment analysis of online reviews. However, although the existing thematic sentiment hybrid model can extract both the subject and sentiment information of the document at the same time, the effect of the model’s sentiment classification and the stability are not ideal due to the local negation and the number of subjects in the subjective document. In fact, text sentiment classification is still essentially a text classification problem.
The innovations of this paper can be summarized as the following two points: 1) Modeling the specific domain knowledge of college students’ online learning communities, as well as proposing a framework that supports small-scale knowledge acquisition and modeling, and further refines the granularity of subject knowledge and sentiment. 2) On the basis of LDA theory, a concept hierarchy analysis method is introduced to design and implement a topic-clustered concept lattice generation algorithm for review documents, which is helpful for mining sentiment of sparse short text data.
The aim of the approach is not to demonstrate the advantages of utilizing LDA for opinion mining, but to construct a set of feature categories of students’ comment for the online learning community. The innovative point proposed is to use the relationship between topic features and review text classification to build the formal context of formal concept analysis, thereby constructing a hierarchical topic concept lattice to reveal the implied sentiment.
For the sentiment classification of texts in online learning communities, review topics of students often have characteristics such as limitedness, which can lead to calculation errors in the sentiment similarity of reviews. In order to reduce the interference of topic content on sentiment classification, this article first mines the implicit topics of online reviews based on the LDA topic model, and combines the sentiment dictionary to calculate the sentiment polarity of the topics to obtain the sentiment tendency of the comments. To enact these goals, two approaches are proposed: on the one hand, the topics are detected via the LDA probability topic model; on the other hand, the sentiment scores matrix based on FCA is obtained by calculating sentiment similarities.
The visualization for topic sentiment analysis in online learning community (TSAOLC) consists of four modules: data preprocessing, topic detection, sentiment analysis, and visualization. The architectural overview of TSAOLC is shown in Figure 2. The process of analyzing topic sentiment depends on a sequence of each step, which is depicted as follows.
First, we set the website list of OLC to the seed URL, and use the web crawler software to download the web pages and data sets; then, after downloading the web pages to the local disk, the source codes are analyzed to extract the useful information containing the web page titles, which are saved in the database. Meanwhile, in order to filter irrelevant webpages, the unnecessary symbols are removed on the basis of the common stop words list belonging to a predefined domain vocabulary, which consists of Baidu stopwords vocabulary and machine learning stopwords list of Sichuan University. Afterwards, we utilize the Chinese word segmentation system named NLPIR-ICTCLAS to process the text corpus, which can automatically discover new terms and adaptively test the linguistic probability distribution from longer text content. After exacting the candidate terms, TF-IDF (Term Frequency-Inverse Document Frequency) is introduced to assess the importance degree of a term to a document (Wu et al., 2008).
A proposed method algorithm for topic-clustered concept lattice generation.
Input: | A set of topic and comment documentation |
---|---|
Output: | A topic-clustered concept lattice |
1. | for each |
2. | |
3. | for each |
4. | |
5. | end for. |
6. | end for. |
7. | for each |
8. | |
9. | |
10. | end for. |
11. | |
12. | |
13. | Find the subset of topic attributes represented as |
14. | for |
15. | Compute the set of objects by applying the Glois connection. |
16. | |
17. | |
18. | |
19. | end for. |
20. | Return { |
21. | Derive the topic-clustered sets. |
Classification weights for adverb of degree.
Level(weights) | Included adverbs |
---|---|
excessively, completely, extensively, dreadfully, entirely, absulutely | |
fairly, pretty, rather, quite, very, much, greatly, by far, hightly, deeply | |
really, almost, nearly, bven, just, still | |
slightly, a little, a bit, trifle, somewhat |
Here,
On the basis of the equations above-mentioned, the proposed algorithm in Table 3 for calculating SSM is listed as follows: Step 1 to Step 3 initialize the sentiment score and the probability distribution
A proposed method algorithm for calculating sentiment scores matrix.
Input: | A topic formal context |
---|---|
Output: | A sentiment scores matrix |
1. | for each topic |
2. | |
3. | |
4. | Derive the positive and negative seed terms on the basis of domain experts. |
5. | Compute |
6. | Compute |
7. | for each topic of student |
8. | |
9. | for each topic of |
10. | |
11. | end for. |
12. | |
13. | end for. |
14. | end for. |
15. | Return |
Note: For the selection of positive and negative seed terms, domain expert refers to ten participants, including authors, who use Borda counts to vote on different seed terms. Specifically, for any term to be classified, it is called three alternative sentiment datasets (positive sentiment, negative sentiment, and neutral sentiment), sorted by score, and finally classified as the highest according to the majority voting principle.
Sentiment adverbs in Table 2 is to play the role of an adverb modifying the whole sentence, which can be a more accurate understanding of the performance of those comments, thoughts and experiences in the comment text. Besides, the stronger the sentiment polarity expressed are, the more in line with students demand for learning effects to understand and analyze. Therefore, it is very necessary to quantify the influence of degree adverbs on sentiment intensity. We classify sentiment adverbs into 4 levels to obtain common degree adverbs on the basis of the literature (Zhang et al., 2017).
After obtaining SSM, the generated topic set of formal concepts in the previous section is regarded as an input to ConExp 1.3, which can output two association rules under different confidence values: weak association rules and strong association rules. The weak association rule set is also named Luxenburger set of approximate rules, while strong association rules are called Duquenne-guigues set of implication rules where the degree of support values and confidence values are both greater than the minimum support and the minimum confidence threshold (Qodmanan, Nasiri M., & Minaei, 2011). In order to analyze the sentiment state of online learning in a more in-depth way, we utilize the both association rules to map each topic to its relevant sentiment. To enact this need, the weak association rules are generated by using “Calculate Duquenne-guigues set of implications” module to create implication rules. Meanwhile, the strong association rules are also generated by using “Calculate Association Rule”module. The generated association rule expression is of the form “
The same student has different information needs in different situations, so the relative identities and needs of students in OLC are dynamic. Therefore, the current method focuses on mining association rules of learning identities to identify the transformational rules of students’ relative roles in different situations, which helps to realize the precise service of the community. In addition, mining the association rules for student behaviors can help to establish the sentiment evolution path of students on specific topics, which can improve the basis for the transformation of roles from different learning groups. Therefore, we also discuss the behavioral association rules. A detailed explanation of the two association rules will be explained in the Section 6.
In this part, we develop the model of topic sentiment analysis in OLC on several modules to express and monitor opinions. The first module for data preprocessing collects 171,430 comments by crawling the text corpora from
Recognition results of topic terms.
Topic | Term and its probability |
---|---|
T1 | Course selection/0.023, Learning objectives/0.021, Difficulty of knowledge/0.018, Teaching methods/0.017, Guidance methods/0.013 |
T2 | Credits/0.025, Content organization/0.023, Teaching methods/0.021, Learning support/0.021, Homework and assessment methods/0.020 |
T3 | Case presentation/0.032, Procedural evaluation/0.031, Knowledge expansion/0.029, Analysis of difficult points/0.027, Group discussion/0.027 |
T4 | Communication and feedback/0.033, Resource sharing/0.033, Information update/0.032, Response time/0.031, Information acceptance/0.030 |
T1: Instructional design; T2: Course content; T3: Teaching effect; T4: Teaching interaction.
Multi-valued sentiment formal context based on topic association matrix.
T1 | T2 | T3 | T4 | |
---|---|---|---|---|
D1 | −3.427 | 2.874 | 4.315 | −1.306 |
D2 | 2.641 | −0.597 | −2.105 | 2.635 |
D3 | 4.715 | 2.132 | 1.624 | 0 |
D4 | 2.334 | 0 | −1.748 | 4.316 |
D5 | −3.619 | −1.857 | 3.624 | −0.391 |
D6 | −2.107 | 2.167 | 2.419 | 2.361 |
D7 | 0 | −0.524 | −0.267 | 2.638 |
D8 | 2.369 | 1.629 | 2.364 | 0 |
D9 | 1.024 | −0.121 | 3.478 | 2.964 |
D10 | 2.361 | 1.493 | −0.328 | −1.267 |
The binary sentiment of the single-valued formal context.
T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | T11 | T12 | T13 | T14 | T15 | T16 | T17 | T18 | T19 | T20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D1 | * | * | * | * | * | * | * | * | ||||||||||||
D2 | * | * | * | * | * | * | * | * | * | |||||||||||
D3 | * | * | * | * | * | * | * | * | ||||||||||||
D4 | * | * | * | * | * | * | * | * | ||||||||||||
D5 | * | * | * | * | * | * | * | * | * | |||||||||||
D6 | * | * | * | * | * | * | * | * | ||||||||||||
D7 | * | * | * | * | * | * | * | * | ||||||||||||
D8 | * | * | * | * | * | * | * | * | * | |||||||||||
D9 | * | * | * | * | * | * | * | * | ||||||||||||
D10 | * | * | * | * | * | * | * | * | * | * |
Note: *represents criterion satisfied, T1 represents
represents criterion satisfied. As the length limits, the topics of T21 to T26 is not shown in this table.
Figure 3 represents a screenshot of the proposed method, which enables teachers and supervisors to view each specific topic by adjusting the controls of the browser. Supervisors can select each topic (top-right section) based on LDA with its most relevant topics-terms matrix (middle-right section) and documents-topics matrix (middle-left section). Besides, five documents are randomly selected to be assigned to the topic whose TF-IDF values are greater than the pre-setting threshold (bottom-right section). Afterwards, the hierarchical topic concept lattice is constructed based on FCA (central part). Finally, two sets of the implication rules and association rules are listed on the basis of the sentiment scores matrix (bottom-left part).
The platform represented by Figure 3 is constructed based on two open source tools, namely the interactive visualization library pyLDAvis and the open-source tool Colibri / ML. On the one hand, for the analysis and discovery of topics, the topic model interactive visualization library pyLDAvis is introduced to semi-automatic mining of potential comment topics from unstructured text resources. The above process mainly includes preprocessing the data, generating a document word frequency matrix, generating an LDA model, and mining association rules. First, in the data preprocessing phase, for a large number of html tags, non-Chinese characters are eliminated. Secondly, in the frequency matrix generation stage, Chinese text analysis software is used to analyze the text data to meet the basic requirements of machine learning. At the same time, words that are meaningless are excluded.
Finally, in the LDA topic generation stage, the topic number
In order to verify the effectiveness and accuracy of the proposed model in mining learning behaviors and the hidden sentiment under different learning conditions, we first analyze the implication rules and association rules generated in the third module in Section 5. Afterwards, mean absolute error, precision, recall and F values are used to measure the overall performance of the proposed method, compared with other state-of-art models.
The model generates a total of 164 association rules with a confidence level greater than 50%, and 97 implication rules. As the negative sentiment can express a stronger need of information on certain topics compared to the positive sentiment. Thus, a list of total 48 association rules and 47 implication rules are obtained. In addition, by adjusting the minimum confidence and minimum support, the most efficient of the rules above-mentioned can be highlighted under a smaller number of rule conditions. Let us assume that the support degree is three, and the confidence is greater than 50%, the remaining ten association rules are selected when the preconditions in the rule sets contain “learner” and the conclusions involve student behaviors. Similarly, when the preconditions in rule sets relate to learning behavior and the conclusion contains student identity information or student behavior, three implication rules are obtained. The selected rules are listed in Table 7. In the sentiment mining of association rules, three valuable basic rules can be summed up. Firstly, students who are learning online have a great possibility to meet the dynamic needs of learning by adjusting learning behaviors. Rule 1 indicates that students who do not like to provide information will obtain information through information tracking when they are dissatisfied with the content of the course. Secondly, when students with relatively higher academic levels express negative sentiments, they often improve their learning effects through communication with professors, academic authorities, etc. Rule 6 shows that when learners are psychologically stressed and dissatisfied with the learning effect, there is a 67% chance that the learner is a graduate student and they will be willing to communicate their information needs by interacting with others. Finally, when students are dissatisfied with a topic, they usually adjust to themselves through some inherent learning behavior habits, which reflect the autonomy of students in OLC. Rule 4 shows that when learners are dissatisfied with the learning content, there is a 75% chance of dealing with related issues through information tracking and active sharing. Besides, the basic rules hidden in the implication rule sets can be summarized into two points. For the one hand, the attitude of the students to the teaching effect will greatly affect their learning status, and the learning behavior with stress will change accordingly. Rule 2 indicates that when students who tend to interact and are willing to share their personal attitudes show anxiety about the teaching effect, they are often reluctant to alleviate their psychological stress through information search. For another hand, there is a relatively strong correlation between different learning behaviors. Both Rules 1 and 3 indicate that students, who are likely to share information (retrieve information) and are willing to cooperate with each other, often track relevant information (share information) to conduct more in-depth learning and show more positive learning sentiment. This conclusion indicates that it is more important to cultivate students’ good learning behavior habits compared with the course content, which is of great guiding significance for improving students’ online learning efficiency.
The implication rules and association rules.
Association rules | 1<3>Learner Information provider<AVG NT2=[100%]=><3>Information searcher>AVG; |
2<4>Learner Psychological stress PT1=[75%]=><3>Information provider<AVG NT3; | |
3<4>Learner NT2 =[75%]=><3>Interaction; | |
4<4>Learner NT2 =[75%]=><3>Information sharer>AVG Information searcher>AVG; | |
5<3>Learner Information sharer<AVG Psychological stress Cooperation PT1=[67%]=><2> Information provider<AVG NT3; | |
6<3>Learner Information searcher>AVG Psychological stress PT1 NT3 =[67%]=><2> Postgraduate Information searcher<AVG Interaction; | |
7<3>Learner Information provider<AVG PT1 PT4=[67%]=><2>Information searcher<AVG NT2; | |
8<3> Learner Information provider<AVG Information searcher<AVG NT2=[67%]=><2> Information sharer>AVG Psychological stress Interaction; | |
9<3>Learner NT2 PT4 =[67%]=><2>Postgraduate Interaction; | |
10<3>Learner NT2 PT4 =[67%]=><2>Information sharer<AVG; | |
Implication rules | 1<2>Learner Information sharer>AVGInteraction cooperation ==> Information searcher>AVG Psychological stress PT2; |
2<2>Learner Interaction sharer>AVG NT3==> Information searcher<AVG Psychological stress; | |
3<2>Learner Information searcher>AVG Interaction cooperation ==> Information sharer>AVG Psychological stress PT4; |
Note: The pre-setting condition for the association rule is (Preconditions contain learners= [>50%] => Conclusions related to student behavior); The pre-setting condition for the implication rule is (Preconditions related to student behaviors => Conclusions related to student identities or student behaviors). When the frequency of the user behavior in Table 7 is greater than the mean value, it can be considered that under the constraint of the precondition, the student has a relatively high probability to adopt such behavior.
The experimental data of this paper is selected from the topics of “Computer Science”, “Information Science”, “Network Engineering” and “Software Engineering” in
Besides, to evaluate the advantages and effectiveness of the proposed method, we further performed a comparative analysis with the semi-supervised method (Co-Training algorithm) based on SVM Classifier on high-dimensional datasets, compared with Naïve Bayes, multilayer perceptron and random forest, to select a suitable classifier to build a predictive model for the quality of topic sentiment analysis.
The calculation results of relevant evaluation indicators are shown in Tables 8–10. The results show that TSAOLC exhibits high classification performance on all datasets, which validates the effectiveness and stability of the proposed method.
Precision contrast between different methods based on SVM.
St1 | St2 | St3 | St4 | St5 | St6 | St7 | |
---|---|---|---|---|---|---|---|
RA | 49.32 | 37.51 | 40.67 | 42.52 | 43.77 | 41.26 | 45.33 |
CG | 52.33 | 34.96 | 38.79 | 41.68 | 40.17 | 37.74 | 42.59 |
CoT | 57.73 | 46.28 | 48.85 | 44.84 | 51.39 | 47.77 | 48.25 |
TextBlob | 58.86 | 45.16 | 46.07 | 42.33 | 52.78 | 45.56 | 52.63 |
TSAOLC |
Recall contrast between different methods based on SVM.
St1 | St2 | St3 | St4 | St5 | St6 | St7 | |
---|---|---|---|---|---|---|---|
RA | 44.45 | 42.06 | 47.64 | 44.37 | 45.98 | 41.63 | 48.21 |
CG | 42.68 | 40.97 | 48.86 | 42.07 | 43.63 | 42.88 | 47.71 |
CoT | 49.99 | 47.38 | 52.84 | 55.36 | 52.09 | 49.23 | 53.84 |
TextBlob | 54.18 | 45.84 | 51.67 | 58.07 | 62.29 | 53.46 | 60.06 |
TSAOLC |
F-measure contrast between different methods based on SVM.
St1 | St2 | St3 | St4 | St5 | St6 | St7 | |
---|---|---|---|---|---|---|---|
RA | 46.67 | 39.65 | 43.88 | 43.43 | 44.85 | 41.44 | 46.73 |
CG | 47.01 | 37.73 | 43.25 | 41.87 | 41.83 | 40.15 | 45.00 |
CoT | 53.58 | 46.82 | 50.77 | 49.55 | 51.74 | 48.49 | 50.89 |
TextBlob | 56.42 | 45.50 | 48.71 | 48.97 | 57.14 | 49.19 | 56.10 |
TSAOLC |
MAE contrast between different methods based on SVM.
St1 | St2 | St3 | St4 | St5 | St6 | St7 | |
---|---|---|---|---|---|---|---|
RA | 98.42 | 92.46 | 90.87 | 88.38 | 89.07 | 91.45 | 95.63 |
CG | 82.03 | 85.56 | 87.69 | 89.06 | 92.61 | 94.97 | 86.36 |
CoT | 78.84 | 76.34 | 72.19 | 68.78 | 75.43 | 76.35 | 78.62 |
TextBlob | 72.93 | 67.45 | 69.37 | 64.92 | 70.14 | 68.62 | 62.15 |
TSAOLC |
In order to select a suitable classifier to establish a quality prediction model, we further performed a comparative analysis with Naïve Bayes, multilayer perceptron and random forest. The maximum number of decision trees in a random forest is 100. The hidden layer of the multilayer perceptron is 3, and the learning rate is 0.2.
The evaluation performance of partial data in different classifiers is shown in Figures 4–5. From the experimental results in Fig 4, it can be known that the precision, recall and F-measure of the topic sentiment prediction model of all the data can be around 0.5. Meanwhile, the value of MAE stays below 0.7. Based on the TSAOLC model defined in this paper, the comprehensive performance using the random forest method is optimal, which means that the weighted average of various indicators on the three sub-data sets is the best (Average MAE is 0.4758).
To better illustrate how the proposed method can help teachers or supervisors implement teaching process management, the construction for topic-clustered concept lattice can be divided into description layer, topic feature layer, learning sentiment analysis layer and visualization layer. The description layer uses data preprocessing technology to mine document-text matrix from student text, which mainly includes text content such as community postings, classroom discussions and students’ online course selection. The classroom topic feature layer and learning sentiment analysis layer provide statistical information on learning topic content and subject-level clustering, so that teachers and managers can view each specific topic by adjusting the controls of the browser. Specifically, teachers and supervisors can select specific topic words to obtain a vector of topic variables and visualize hierarchical dependencies between topics. The visualization layer mainly includes the dynamic display of hierarchical topic-clustered concept lattice and the visualization of association rules. In the constructed concept lattice, super-concepts have more extensions than sub-concepts, and sub-concepts have richer connotations than super-concepts. Among them, a white semicircle node indicates that the concept has an attribute, and a black semicircle node indicates that the concept has an object. As the level increases, the attributes of the layer concept gradually increase, the number of objects with these concepts gradually decreases, and finally a specific object is located. Teachers can obtain all formal concepts that contain the topic word by selecting appropriate confidence thresholds. At the same time, they can use the attributes as prerequisites in association rule sets to obtain association rules and replication rules, so as to identify the student group’s sentimental tendency on specific topics. In addition, teachers or teaching managers can dynamically display topic concepts with the same clustering characteristics by clicking on different concept nodes of the concept lattice. And if the topic concept set above-mentioned is used as a prerequisite of association rules, and the number of association rules is adjusted by adjusting the minimum confidence and the minimum support degree, a more concentrated negative sentiment evaluation in a certain type of topics can be obtained, thereby providing a reasonable basis for curriculum reform.
This paper designs a model for online sentiment analysis of various topics in OLC. The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach. Afterwards, a topic clustering concept lattice based on FCA model is constructed, where the topic sentiment can be identified by measuring their sentiment scores. In addition, the topic sentiment can be visualized based on the implication and association rules to refine the granularity of sentimented knowledge. Finally, from the results of the experiment, the following conclusions can be obtained:
The proposed model can effectively perceive students’ sentiment tendencies on different topics, which provides powerful practical reference for improving the quality of information services in teaching practice. The topic-sentiment visualization framework can clarify the hierarchical dependencies between different topics, which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.
In order to improve the accuracy of the topic-sentiment analysis model, the follow-up research will focus on optimization of semantic constraint capabilities between different topics. In addition, exploring the intensity of students’ sentiments and their evolutionary trends will also be an interesting content, which will improve the adaptive ability of opinion mining.