Open Access

Identification of Sarcasm in Textual Data: A Comparative Study


Cite

Figure 1

Snippet of the news headlines dataset for sarcasm detection.
Snippet of the news headlines dataset for sarcasm detection.

Figure 2

Graph to show number of sarcastic and non-sarcastic labels.
Graph to show number of sarcastic and non-sarcastic labels.

Figure 3

Word cloud of sarcastic words.
Word cloud of sarcastic words.

Figure 4

Word cloud of non-sarcastic words.
Word cloud of non-sarcastic words.

Figure 5

Snippet of the sarcasm corpus V2.
Snippet of the sarcasm corpus V2.

Figure 6

Word cloud of various ACL dataset comments.
Word cloud of various ACL dataset comments.

Figure 7

Snippet of the ACL irony dataset.
Snippet of the ACL irony dataset.

Figure 8

Representation of words as vectors in space.
Representation of words as vectors in space.

Figure 9

Typical CNN applied on textual data.
Typical CNN applied on textual data.

Figure 10

Structure of LSTM module.
Structure of LSTM module.

Figure 11

Network before and after applying the dropout.
Network before and after applying the dropout.

Figure 12

The system architecture for shallow machine learning algorithms.
The system architecture for shallow machine learning algorithms.

Figure 13

Results from shallow machine learning models.
Results from shallow machine learning models.

Figure 14

The system architecture with Deep Learning Models.
The system architecture with Deep Learning Models.

Figure 15

Plated framework for CNN-LSTM architecture.
Plated framework for CNN-LSTM architecture.

Figure 16

Plated framework for LSTM-CNN architecturetic.
Plated framework for LSTM-CNN architecturetic.

Results obtained from Sarcasm Corpus V2 Dataset.

Word2Vec
DropoutEpochsAccuracy (%)
CNNLSTMCNN-LSTMLSTM-CNN
0.15256.5558.6758.5758.99
456.7656.5556.5557.61
856.9357.0856.0757.87
1657.2557.0855.6955.69
Avg (0.15)56.87357.34556.7257.54
Avg (0.25)57.18557.4456.75757.565
Avg (0.35)57.03357.23856.1757.267
GloVe
Avg (0.15)58.63759.16758.7458.94
Avg (0.25)59.07559.08258.77559.277
Avg (0.35)59.12758.95258.9159.225
fastText
Avg (0.15)58.65558.8658.2859.155
Avg (0.25)59.20559.08558.28559.497
Avg (0.35)59.21259.1358.759.277

Comparison table for News Headline Sarcasm Dataset.

TechniqueAccuracy (%)
NBOW (Logistic Regression with neural Words)0.724
NLSE ( Non-Linear Subspace Embedding)0.72
CNN (Convolutional Neural Network )Kim (2014)0.742
Shallow CUE CNN ((Context and User Embedding
Convolutional Neural Network)Amir et al. (2016)0.793
Our Proposed technique0.816

Comparison of Sarcasm Corpus Version 2.

TechniqueRecall (%)Precision (%)
Baseline (SVM) Oraby et al. (2017)GEN0.750.71
RQ0.730.70
HYP0.630.68
Our proposed techniqueGEN0.720.73
RQ0.710.71
HYP0.680.68

Parameter list for our models under training and testing.

ParameterSet-Value
Filters64
Kernel3
Embedding Dimension300
Epochs2, 4, 8, 16
Activation FunctionSigmoid
Batch Size128
Word EmbeddingWord2Vec, GloVe and fastText
Pool Size2
Dropouts0.15, 0.25, 0.35:ConvNet, 0.25:Bi-LSTM
OptimizerAdam

Results obtained from News Headlines Dataset.

Word2Vec
DropoutEpochsAccuracy (%)
CNNLSTMCNN-LSTMLSTM-CNN
0.15280.880.680.780.8
480.58181.180.4
879.680.680.380.7
1678.180.378.381.23
Avg (0.15)79.7580.6380.180.7825
Avg (0.25)79.980.8580.07580.865
Avg (0.35)80.180.8880.12580.8825
GloVe
Avg (0.15)8181.1881.02581.275
Avg (0.25)81.181.2181.281.25
Avg (0.35)8181.5681.17581.6
fastText
Avg (0.15)80.9681.3880.612580.65
Avg (0.25)81.2381.2680.97581.45
Avg (0.35)8181.068181.075

Results obtained from ACL 2014 Irony Dataset.

Word2Vec
DropoutEpochsAccuracy (%)
CNNLSTMCNN-LSTMLSTM-CNN
0.15254.3455.2856.9858.07
458.3158.3859.9359.62
859.6259.3260.8760.33
1660.1260.0861.2362.23
Avg (0.15)58.09758.26559.75260.062
Avg (0.25)58.57558.92760.74760.897
Avg (0.35)59.29259.32760.54561.132
GloVe
Avg (0.15)58.7358.8358.1560.53
Avg (0.25)59.7859.7658.20259.817
Avg (0.35)59.2959.4357.88259.922
fastText
Avg (0.15)59.6960.2558.64760.831
Avg (0.25)59.2658.8759.9960.32
Avg (0.35)59.6659.0159.7459.87

Comparison of ACL irony Dataset 2014.

FeaturesRecall (%)Precision (%)
Baseline(BoW)Wallace et al. (2015)0.2880.129
NNP (Noun Phrase)0.3240.129
NNP + Subreddit0.3370.131
NNP + subreddit + sentiment0.3730.132
Our Proposed technique0.4890.472
eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining