Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Ajay Rastogi; Monica Mehrotra; Syed Shafat Ali

Open Access

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Ajay Rastogi

Monica Mehrotra

and

Syed Shafat Ali

| May 20, 2020

Journal of Data and Information Science

Volume 5 (2020): Issue 2 (April 2020)

About this article

Cite

Published Online: May 20, 2020

Page range: 76 - 110

Received: Jan 21, 2020

Accepted: Apr 23, 2020

DOI: https://doi.org/10.2478/jdis-2020-0013

Keywords
Opinion spam, Behavioral features, Textual features, Review spammers, Spam-targeted products

© 2020 Ajay Rastogi et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Framework of proposed research methodology.

Tripartite network of reviewers (U), reviews (R) and products (P).

Overall process from dataset to model training through feature extraction and balancing for any of the reviewer, review and product-centric settings.

ROC curves of different classifiers trained using behavioral and textual features on YelpZip over (a–c) reviewer-centric, (d–f) review-centric, and (g–i) product-centric setting.

ROC curves of different classifiers trained using behavioral and textual features on YelpNYC over (a–c) reviewer-centric, (d–f) review-centric, and (g–i) product-centric setting.

Performance of behavioral, textual and hybrid features using different classifiers on YelpZip over (a–c) reviewer-centric, (d–f) review-centric, and (g–i) product-centric setting.

Performance of behavioral, textual and hybrid features using different classifiers on YelpNYC over (a–c) reviewer-centric, (d–f) review-centric, and (g–i) product-centric setting.

Performance comparison of different features used in different works on YelpZip over (a–b) reviewer-centric, (c–d) review-centric, and (e–f) product-centric setting.

Performance comparison of different features used in different works on YelpNYC over (a–b) reviewer-centric, (c–d) review-centric, and (e–f) product-centric setting.

Computation time analysis of behavioral and textual feature extraction on (a) YelpZip and (b) YelpNYC dataset.

Classifiers performance on YelpNYC dataset using both behavioral and textual features over all three settings.

	SVM		LR		MLP		NB
(a) Reviewer-centric

	Behavioral	Textual	Behavioral	Textual	Behavioral	Textual	Behavioral	Textual

AP	0.8001	0.6866	0.8022	0.6999	0.8136	0.7029	0.7594	0.6860
Recall	0.6638	0.5945	0.6859	0.7011	0.7045	0.6878	0.5503	0.3103
F1 (Macro)	0.7140	0.6250	0.7148	0.6495	0.7224	0.6472	0.6749	0.5462
F1 (Micro)	0.7143	0.6259	0.7149	0.6517	0.7226	0.6491	0.6791	0.5759

(b) Review-centric

AP	0.7313	0.6566	0.7311	0.6444	0.7461	0.6708	0.6811	0.6640
Recall	0.7189	0.5270	0.7475	0.5073	0.7548	0.6089	0.3462	0.3611
F1 (Macro)	0.6573	0.6052	0.6728	0.5897	0.6822	0.6173	0.5621	0.5655
F1 (Micro)	0.6598	0.6069	0.6758	0.5917	0.6852	0.6179	0.5849	0.5851

(c) Product-centric

AP	0.8839	0.8345	0.8876	0.8367	0.8896	0.8345	0.8909	0.8357
Recall	0.8474	0.3396	0.8282	0.6844	0.8186	0.6770	0.8419	0.7006
F1 (Macro)	0.7865	0.6177	0.8016	0.7369	0.7974	0.7048	0.8066	0.7332
F1 (Micro)	0.7880	0.6601	0.8024	0.7385	0.7983	0.7109	0.8073	0.7344

Brief summary of features used by comparing methods under reviewer-centric, review-centric and product-centric settings.

Mukherjee et al. (2013a) Features		Mukherjee et al. (2013c) Features		Rayana & Akoglu (2015) Features

Reviewer-centric and Product-centric	Review-centric	Reviewer-centric and Product-centric	Review-centric	Reviewer-centric and Product-centric	Review-centric
CS	DUP	MNR	–	MNR	Rank
MNR	EXT	PR	–	PR	RD
BST	DEV	RL	–	NR	EXT
RFR	ETF	RD	–	avgRD	DEV
		MCS	–	WRD	ETF
				BST	PCW
				ERD	PC
				ETG	L
				RL	PP1
				ACS	RES
				MCS	SW
					OW
					DL_u
					DL_b

Dataset statistics after preprocessing (for YelpZip and YelpNYC).

Dataset	# Reviews (spam%)	# Reviewers (spammer%)	# Products (restaurants)
YelpZip (Preprocessed)	356,766 (4.66%)	49,841 (9.21%)	3,975
YelpNYC (Preprocessed)	90,906 (7.58%)	15,351 (10.67%)	873

Classifiers performance on YelpZip dataset using both behavioral and textual features over all three settings.

	SVM		LR		MLP		NB
(a) Reviewer-centric

	Behavioral	Textual	Behavioral	Textual	Behavioral	Textual	Behavioral	Textual

AP	0.7342	0.6682	0.7377	0.6717	0.7417	0.6783	0.6934	0.6558
Recall	0.5301	0.6063	0.5907	0.6537	0.6395	0.6497	0.5902	0.3140
F1 (Macro)	0.6700	0.6260	0.6841	0.6340	0.6943	0.6343	0.6681	0.5491
F1 (Micro)	0.6767	0.6268	0.6868	0.6343	0.6952	0.6353	0.6701	0.5808

(b) Review-centric

AP	0.6873	0.6461	0.6826	0.6232	0.6994	0.6581	0.6401	0.6478
Recall	0.7821	0.4348	0.7413	0.3775	0.7121	0.5947	0.6907	0.3612
F1 (Macro)	0.6394	0.5888	0.6544	0.5655	0.6637	0.6180	0.6233	0.5663
F1 (Micro)	0.6471	0.5998	0.6574	0.5830	0.6650	0.6187	0.6259	0.5876

(c) Product-centric

AP	0.8692	0.8440	0.8717	0.8421	0.8741	0.8499	0.8691	0.8432
Recall	0.8218	0.7795	0.8101	0.7774	0.7926	0.7731	0.9004	1.0000
F1 (Macro)	0.7488	0.7279	0.7526	0.7293	0.7569	0.7347	0.6758	0.3537
F1 (Micro)	0.7541	0.7321	0.7569	0.7333	0.7598	0.7384	0.7020	0.5472

Algorithm for balancing the feature set.

Algorithm 1: Balancing Algorithm for Feature Set
Input: Unbalanced feature set F.
Output:k balanced partitions each containing nearly equal number of instances from both the classes.
1. Randomly shuffle the instances in F;
2. Divide F into two sets S₁ and S₂ representing majority class and minority class, respectively;
3.S₁ ← minority class instances;
4.S₂ ← majority class instances;
5.p ← count(S₁);
6.q ← count(S₂);
7. $k \leftarrow ⌈ \frac{q}{p} ⌉$ k \leftarrow \left\lceil {{q \over p}} \right\rceil ;
8.S₃ ← Divide S₂ into k nearly equal size bins;
9. foreach bin z ∈ S₃do
10. Combine S₁ with z to get a balanced partition;
11. end
12. returnk balanced partitions for unbalanced feature set F;

Dataset statistics (for YelpZip and YelpNYC).

Dataset	# Reviews (spam %)	# Reviewers (spammer %)	# Products (restaurants)
YelpZip	608,598 (13.22%)	260,277 (23.91%)	5,044
YelpNYC	359,052 (10.2 7%)	160,225 (17.79%)	923

Brief description of behavioral and textual features employed under reviewer-centric, review-centric and product-centric settings.

Setting	Featuretype	Feature	Description
Reviewer-centric and Product-centric	Behavioral	ARD	Average rating deviation (Fei et al., 2013)
		WRD	Weighted rating deviation (Rayana and Akoglu, 2015)
		MRD^*	Maximum rating deviation
		BST	Burstiness (Mukherjee et al., 2013a)
		ERR^*	Early review ratio
		MNR	Maximum number of reviews (Mukherjee et al., 2013a)
		RPR	Ratio of positive reviews (Rayana and Akoglu, 2015)
		RNR	Ratio of negative reviews (Rayana and Akoglu, 2015)
		FRR	First review ratio (Mukherjee et al., 2013a)
		EXRR^*	Extreme rating ratio
		TRRR^*	Top ranked reviews ratio
		BRRR^*	Bottom ranked reviews ratio
	Textual	MCS	Maximum content similarity (Mukherjee et al., 2013a)
		ACS	Average content similarity (Lim et al., 2010)
		AFPP^*	Average first-person pronouns ratio
		ASPP^*	Average second-person pronouns ratio
		AFTAPP^*	Average first-and-third-person to all-person pronouns ratio
		ASAPP^*	Average second-person to all-person pronouns ratio
		ASW^*	Average subjective words ratio
		AOW^*	Average objective words ratio
		AI_nW^*	Average informative words ratio
		AI_mW^*	Average imaginative words ratio
		ARL	Average review length (Rayana and Akoglu, 2015)

Review-centric	Behavioral	RD	Rating deviation (Mukherjee et al., 2013a)
		ERD^*	Early rating deviation
		ETF	Early time frame (Mukherjee et al., 2013a)
		EXT	Extreme rating (Mukherjee et al., 2013a)
		TRR^*	Top ranked review
		BRR^*	Bottom ranked review
		RR	Review rank (Rayana and Akoglu, 2015)
		RL	Review length (Mukherjee et al., 2013c)
	Textual	RPW	Ratio of positive words (Li et al., 2011)
		RNW	Ratio of negative words (Li et al., 2011)
		RFPP	Ratio of first-person pronouns (Li et al., 2011)
		RSPP	Ratio of second-person pronouns (Li et al., 2011)
		RFTAPP^*	Ratio of first-and-third-person to all-person pronouns
		RSAPP^*	Ratio of second-person to all-person pronouns
		RSW	Ratio of subjective words (Li et al., 2011)
		ROW	Ratio of objective words (Li et al., 2011)
		RI_nW	Ratio of informative words (Ott et al., 2011)
		RI_mW	Ratio of imaginative words (Ott et al., 2011)

Statistical significance of results obtained on behavioral and textual features using Z-test analysis.

		Reviewer-centric		Review-centric		Product-centric

		Z-test statistic	P-value	Z-test statistic	P-value	Z-test statistic	P-value
YelpZip	ROC-AUC	30.03	~ 0.0	53.40	0.0	3.14	0.0016
	Avg. Precision	27.69	~ 0.0	37.58	~ 0.0	3.31	0.0009
	F1-Score (micro)	20.88	~ 0.0	48.91	0.0	2.07	0.0377
YelpNYC	ROC-AUC	23.02	~ 0.0	47.44	0.0	4.59	~ 0.0
	Avg. Precision	23.35	~ 0.0	33.48	~ 0.0	3.86	0.0001
	F1-Score (micro)	22.41	~ 0.0	30.17	~ 0.0	8.73	~ 0.0

eISSN:: 2543-683X
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Article Category: Research Paper

Published Online: May 20, 2020

Page range: 76 - 110

Received: Jan 21, 2020

Accepted: Apr 23, 2020

DOI: https://doi.org/10.2478/jdis-2020-0013

Keywords
Opinion spam, Behavioral features, Textual features, Review spammers, Spam-targeted products

© 2020 Ajay Rastogi et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Classifiers performance on YelpNYC dataset using both behavioral and textual features over all three settings.

Brief summary of features used by comparing methods under reviewer-centric, review-centric and product-centric settings.

Dataset statistics after preprocessing (for YelpZip and YelpNYC).

Classifiers performance on YelpZip dataset using both behavioral and textual features over all three settings.

Algorithm for balancing the feature set.

Dataset statistics (for YelpZip and YelpNYC).

Brief description of behavioral and textual features employed under reviewer-centric, review-centric and product-centric settings.

Statistical significance of results obtained on behavioral and textual features using Z-test analysis.

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Article Category: Research Paper

Published Online: May 20, 2020

Page range: 76 - 110

Received: Jan 21, 2020

Accepted: Apr 23, 2020

DOI: https://doi.org/10.2478/jdis-2020-0013

KeywordsOpinion spam, Behavioral features, Textual features, Review spammers, Spam-targeted products

© 2020 Ajay Rastogi et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Classifiers performance on YelpNYC dataset using both behavioral and textual features over all three settings.

Brief summary of features used by comparing methods under reviewer-centric, review-centric and product-centric settings.

Dataset statistics after preprocessing (for YelpZip and YelpNYC).

Classifiers performance on YelpZip dataset using both behavioral and textual features over all three settings.

Algorithm for balancing the feature set.

Dataset statistics (for YelpZip and YelpNYC).

Brief description of behavioral and textual features employed under reviewer-centric, review-centric and product-centric settings.

Statistical significance of results obtained on behavioral and textual features using Z-test analysis.

Keywords
Opinion spam, Behavioral features, Textual features, Review spammers, Spam-targeted products