Identification of Sarcasm in Textual Data: A Comparative Study

Pulkit Mehndiratta; Devpriya Soni

Open Access

Identification of Sarcasm in Textual Data: A Comparative Study

Pulkit Mehndiratta

and

Devpriya Soni

| Dec 27, 2019

Journal of Data and Information Science

Volume 4 (2019): Issue 4 (December 2019)

About this article

Cite

Published Online: Dec 27, 2019

Page range: 56 - 83

Received: Sep 06, 2019

Accepted: Nov 29, 2019

DOI: https://doi.org/10.2478/jdis-2019-0021

Keywords
Machine learning, Artificial neural networks, Word embedding, Text vectorization, Accuracy

© 2019 Pulkit Mehndiratta, Devpriya Soni, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Snippet of the news headlines dataset for sarcasm detection.

Graph to show number of sarcastic and non-sarcastic labels.

Word cloud of various ACL dataset comments.

Representation of words as vectors in space.

Network before and after applying the dropout.

The system architecture for shallow machine learning algorithms.

Results from shallow machine learning models.

The system architecture with Deep Learning Models.

Plated framework for CNN-LSTM architecture.

Plated framework for LSTM-CNN architecturetic.

Results obtained from Sarcasm Corpus V2 Dataset.

Word2Vec
Dropout	Epochs	Accuracy (%)
Dropout	Epochs	CNN	LSTM	CNN-LSTM	LSTM-CNN
0.15	2	56.55	58.67	58.57	58.99
	4	56.76	56.55	56.55	57.61
	8	56.93	57.08	56.07	57.87
	16	57.25	57.08	55.69	55.69
Avg (0.15)		56.873	57.345	56.72	57.54
Avg (0.25)		57.185	57.44	56.757	57.565
Avg (0.35)		57.033	57.238	56.17	57.267

GloVe

Avg (0.15)		58.637	59.167	58.74	58.94
Avg (0.25)		59.075	59.082	58.775	59.277
Avg (0.35)		59.127	58.952	58.91	59.225

fastText

Avg (0.15)		58.655	58.86	58.28	59.155
Avg (0.25)		59.205	59.085	58.285	59.497
Avg (0.35)		59.212	59.13	58.7	59.277

Comparison table for News Headline Sarcasm Dataset.

Technique	Accuracy (%)
NBOW (Logistic Regression with neural Words)	0.724
NLSE ( Non-Linear Subspace Embedding)	0.72
CNN (Convolutional Neural Network )Kim (2014)	0.742
Shallow CUE CNN ((Context and User Embedding
Convolutional Neural Network)Amir et al. (2016)	0.793
Our Proposed technique	0.816

Comparison of Sarcasm Corpus Version 2.

Technique		Recall (%)	Precision (%)
Baseline (SVM) Oraby et al. (2017)	GEN	0.75	0.71
	RQ	0.73	0.70
	HYP	0.63	0.68
Our proposed technique	GEN	0.72	0.73
	RQ	0.71	0.71
	HYP	0.68	0.68

Parameter list for our models under training and testing.

Parameter	Set-Value
Filters	64
Kernel	3
Embedding Dimension	300
Epochs	2, 4, 8, 16
Activation Function	Sigmoid
Batch Size	128
Word Embedding	Word2Vec, GloVe and fastText
Pool Size	2
Dropouts	0.15, 0.25, 0.35:ConvNet, 0.25:Bi-LSTM
Optimizer	Adam

Results obtained from News Headlines Dataset.

Word2Vec
Dropout	Epochs	Accuracy (%)
Dropout	Epochs	CNN	LSTM	CNN-LSTM	LSTM-CNN
0.15	2	80.8	80.6	80.7	80.8
	4	80.5	81	81.1	80.4
	8	79.6	80.6	80.3	80.7
	16	78.1	80.3	78.3	81.23
Avg (0.15)		79.75	80.63	80.1	80.7825
Avg (0.25)		79.9	80.85	80.075	80.865
Avg (0.35)		80.1	80.88	80.125	80.8825

GloVe

Avg (0.15)		81	81.18	81.025	81.275
Avg (0.25)		81.1	81.21	81.2	81.25
Avg (0.35)		81	81.56	81.175	81.6

fastText

Avg (0.15)		80.96	81.38	80.6125	80.65
Avg (0.25)		81.23	81.26	80.975	81.45
Avg (0.35)		81	81.06	81	81.075

Results obtained from ACL 2014 Irony Dataset.

Word2Vec

Dropout	Epochs	Accuracy (%)
Dropout	Epochs	CNN	LSTM	CNN-LSTM	LSTM-CNN
0.15	2	54.34	55.28	56.98	58.07
	4	58.31	58.38	59.93	59.62
	8	59.62	59.32	60.87	60.33
	16	60.12	60.08	61.23	62.23
Avg (0.15)		58.097	58.265	59.752	60.062
Avg (0.25)		58.575	58.927	60.747	60.897
Avg (0.35)		59.292	59.327	60.545	61.132

GloVe

Avg (0.15)		58.73	58.83	58.15	60.53
Avg (0.25)		59.78	59.76	58.202	59.817
Avg (0.35)		59.29	59.43	57.882	59.922

fastText

Avg (0.15)		59.69	60.25	58.647	60.831
Avg (0.25)		59.26	58.87	59.99	60.32
Avg (0.35)		59.66	59.01	59.74	59.87

Comparison of ACL irony Dataset 2014.

Features	Recall (%)	Precision (%)
Baseline(BoW)Wallace et al. (2015)	0.288	0.129
NNP (Noun Phrase)	0.324	0.129
NNP + Subreddit	0.337	0.131
NNP + subreddit + sentiment	0.373	0.132
Our Proposed technique	0.489	0.472

eISSN:: 2543-683X
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Identification of Sarcasm in Textual Data: A Comparative Study

Article Category: Research Paper

Published Online: Dec 27, 2019

Page range: 56 - 83

Received: Sep 06, 2019

Accepted: Nov 29, 2019

DOI: https://doi.org/10.2478/jdis-2019-0021

Keywords
Machine learning, Artificial neural networks, Word embedding, Text vectorization, Accuracy

© 2019 Pulkit Mehndiratta, Devpriya Soni, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16

Results obtained from Sarcasm Corpus V2 Dataset.

Comparison table for News Headline Sarcasm Dataset.

Comparison of Sarcasm Corpus Version 2.

Parameter list for our models under training and testing.

Results obtained from News Headlines Dataset.

Results obtained from ACL 2014 Irony Dataset.

Comparison of ACL irony Dataset 2014.

Identification of Sarcasm in Textual Data: A Comparative Study

Article Category: Research Paper

Published Online: Dec 27, 2019

Page range: 56 - 83

Received: Sep 06, 2019

Accepted: Nov 29, 2019

DOI: https://doi.org/10.2478/jdis-2019-0021

KeywordsMachine learning, Artificial neural networks, Word embedding, Text vectorization, Accuracy

© 2019 Pulkit Mehndiratta, Devpriya Soni, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16

Results obtained from Sarcasm Corpus V2 Dataset.

Comparison table for News Headline Sarcasm Dataset.

Comparison of Sarcasm Corpus Version 2.

Parameter list for our models under training and testing.

Results obtained from News Headlines Dataset.

Results obtained from ACL 2014 Irony Dataset.

Comparison of ACL irony Dataset 2014.

Keywords
Machine learning, Artificial neural networks, Word embedding, Text vectorization, Accuracy