Open Access

Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs


Cite

Introduction

Online shopping has gained great popularity among consumers because it is fast, convenient, and unrestricted in terms of time of day and product locale. The Internet has greatly lowered the cost and increased the efficiency of shopping compared to in-person searches, especially for alternative or substitute products, as it enables consumers to quickly collect more information about a wide range of products, brands, and sellers before they make purchasing decisions. Information search, identified by consumer behavior research as the first stage in the buying process (Rowley, 2000), thus becomes more important in online shopping than in traditional retailing. Online shopping is more “information intensive,” meaning that the e-commerce websites intended for transactions become increasingly vital and comprehensive information sources (Fortune, 1998).

According to The Research Report of Online Shopping Market in China, 2014 (CINIC, 2015), online retail transactions reached a revenue of 2.79 trillion Yuan with a yearly growth of 49.7%. Online shopping is undergoing a rapid growth in China, but research on consumers’ online search behavior is rather limited. More study is needed to better understand the characteristics of online consumer search behavior to improve e-commerce sites, consumer services, and sales.

Identifying the specific patterns related to how consumers seek information has always been critical for understanding consumer buying behavior trends (Bhatnagar & Ghose, 2004), and has important implications for decision-making tasks such as purchasing a product. Research has found that multi-tasking is quite common in Web search (Ye & Wilson, 2014). For example, Spink, Ozmutlu, and Ozmutlu (2002) found that 11.4% of 1,000 randomly extracted sessions involved multitasking. Spink et al. (2006) found that in sessions with more than three queries, more than 90% included multi-tasking. It is common for online shoppers to search multiple product categories simultaneously when making multiple purchases. Very little research has been done on product information searches, however, to identify and analyze the characteristics of multi-tasking product search, which are different from standard Web search queries. This research aims to bridge this gap.

The availability of clickstream data has contributed greatly to information seeking research for many tasks, including online shopping. In this paper, we analyze query terms from click-through logs to identify consumers’ shopping tasks, and to discover the characteristics of their multi-tasking product searches.

Definitions of the important concepts of session and shopping task in this study are:

A session is a series of queries by a single user made within a small range of time, which is meant to capture a single user’s attempt to fill a single information need. In this research, we use the heuristic that queries for single information queries become clustered over time, followed by a gap of up to 45 minutes before the user returns to that search engine (Moorthy & Talukdar, 1995).

A shopping task is a set of activities that a consumer conducts in order to purchase a product. A multi-tasking session refers to the consumer product search conduct for multiple shopping tasks.

This paper first reviews the related literature, followed by a description of the methodology and findings on characteristics of multi-tasking product search. We conclude with an analysis of the results, and discuss the limitation and implications of the research as well as future study suggestions.

Literature Review
Task Identification in Web Search

Previous research has identified two types of approaches for task identification: time splitting and query clustering (Lucchese et al., 2013). Query clustering is based on the content of the queries while time splitting uses contextual cues. Content-based methods to identify search tasks in Web search include comparisons of (1) similarities of two search queries, (2) URLs that the Web search engine returns (Glance, 2000), and (3) documents that the Web search engine returns (Raghavan & Sever, 1995). Similarity scores are calculated based on these three indexes to decide whether two queries belong to the same search task.

The two major methods used herein for comparing the relevance of these two search queries are (1) identifying word similarities in the queries and extracting the sets of the search terms from these two queries. Some useful indexes for this task include the Jaccard distance (Järvelin, Järvelin, & Järvelin, 2007), which calculates the ratio of the intersection and the union of the two search-term set and the Levenstein distance (Jones & Klinkner, 2008), and (2) comparison of the semantic relevance of the search terms by using the idea of vector space (Salton & Mcgill, 1986). For example, utilizing the semantic relation from Wiktionary and Wikipedia, Lucchese et al. (2011) calculated similarities between each search term and each source in the semantic network, and created a search term vector composed of the similarities between a search term and each source in the semantic network.

Usually the angle (cosine similarity) between two search query vectors is calculated as the index of the similarity between these two search queries. Lucchese et al. (2011) first processed the search log, including the removal of empty log records and stop words, as well as stemming and deleting sessions that last too long or include too many queries, which indicates it is likely produced by machines. Then they calculated the word and semantic similarities between queries using two methods to calculate the final similarity index. The first method is a weighted average of the word similarity and the semantic similarity, whereas the second method is to use a threshold. When the word similarity score is above the threshold, the final similarity index score equals the word similarity; when the word similarity score is lower than the threshold, the final similarity index is the greater value of the word similarity and the semantic similarity.

Multi-tasking Web Search

Information users often demonstrate multi-tasking behaviors in Web search. Spink et al. (2006) suggested that users generally produce multi-tasking sessions for two reasons. The first reason is that a user may have several search topics at the beginning of the search process, and the second reason is that although users may have only one search topic in the beginning of the search process, they may discover new search topics in relation to information needs while searching.

Numerous studies have examined the characteristics of multi-tasking search sessions, including the time involved in queries. For example, Spink, Ozmutlu, and Ozmutlu (2002) found that the length of search queries and the time costs in multitasking sessions are longer than those in mono-tasking sessions. Lin and Belkin (2005) also confirmed that the average number of search queries used in multitasking sessions is more than that in mono-tasking sessions. When Lucchese et al. (2011) analyzed the search logs of 307 search sessions and 1,424 queries from American On Line (AOL), they found that the average duration of each search session was about 15 minutes. The shortest session lasted for less than one minute and had only one or two queries, while the longest session lasted for about two and a half hours. There were on average 4.49 queries in one search session, where half of the sessions had fewer than five queries. The logs were divided into 554 search tasks, and the average number of queries per task was 2.57. On average, a session included 1.8 tasks. Within the total 307 sessions, there were 162 (52.8%) with only one search task, while the rest (47.2%) were multi-tasking search sessions. The number of queries in the multi-tasking sessions was 1,046, which accounts for 74.0% of total queries.

In another study of AltaVista (Spink et al., 2006), researchers found that among the 254 two-query sessions, 206 (81.1%) involved more than one task. There were 254 sessions that included two queries, 206 of which (81.1%) were multi-tasking sessions. There were 483 sessions that included more than two queries, 441 of which (91.3%) were multi-tasking sessions. In the multi-tasking sessions, there were on average 3.2 tasks per session.

Wang et al. (2013) analyzed the search logs collected from Bing.com, a dataset that includes 7,628 users, 37,547 sessions, and 114,723 queries. On average a user participated in 4.9 sessions and made 15.1 queries. There were 8,044 (77.9%) tasks that included only one query, 2,283 tasks (22.1%) that included more than one query, and 1,307 multi-session tasks. The average amount of tasks that a user performed was 7.2. Tasks that generally involved more than one query consisted of 2.8 sessions and 6.6 queries, where the task needed 491.1 minutes to finish.

Experiment
Data Collection and Preprocessing

In order to identify and analyze the characteristics of consumer multi-tasking product Web search, we performed a series of experiments on large-scale product search log records from taobao.com. The whole dataset includes browser clickthrough logs of 4,285 users with 81,759 sessions from taobao.com during the month of May, 2013. The whole dataset contains 1,410,960 records from 81,759 sessions (Yuan, 2014). Each record contains the following fields:

Uid: a uniqueuser code assigned to identify a user;

IP address: the IP address from which a click is made;

URL: the URL of the Web page a user visited;

Date and time: the starting time a user opened a certain URL in a browser window;

Staytime: the duration in seconds a user stayed active on a Web page;

Query terms: queries as entered by a user (if any);

Sessionid: a unique session identifier marking the session a record belongs to.

Figure 1 shows some sample log records.

Figure 1

Sample log records.

The log data contains click-through activities of both consumers and shop owners, but we are only interested in the search and browsing activities of consumers. Since shop owners tend to be a lot more active in making purchases than average consumers, we removed users who had too many sessions as belonging to businesses. Figure 2 shows the distribution of the users by the number of sessions.

Figure 2

Distribution of users by number of sessions.

The x-axis in Figure 2 is the number of sessions, and the y-axis shows the number of users who had a particular number of sessions. The secondary axis of y-axis shows the accumulative percentage of the users. We remove users who belong to the upper 2.5% (with more than 33 sessions), those who were likely to be shop owners rather than regular consumers. The remaining log records include 2,910 users with 18,102 sessions and 47,387 queries. We use one fifth of this dataset for our experiment, which contains 582 users with 3,483 sessions and 8,949 queries. Table 1 shows sample records from these queries.

Sample query records.

User IDSidQuery termsQuery terms (translation)
10284339747169671481973丰胸仪Breast augmentation instrument
10284339747169671481973优格格丰乳仪Yougege breast augmentation instrument
10284339747169671481975北京茶月饼Beijing tea mooncake
10284339747169671481975金凤呈祥Jinfengchengxiang
10284339747169671481975金凤呈祥200Jinfengchengxiang 200
10284339747169671481976美优食品Meiyou food
10284339747169671481977XQB38-83皮带XQB38-83 belt
10284339747169671481978味多美卡Meiduomei gift card
10284339747169671481978Laver丰胸精油Laver breast augmentation oil
10284339747169671481978AOC 拉莫圣日尔曼干红葡萄酒 750mlAOC Saint Germain Rameau claret 750ml
10284339747169671481978AOC银奖圣玛杰庄园干红葡萄酒 750mlAOC silver award Domaine Saint Majan claret 750ml
10284339747169671481978圣玛杰庄园干红葡萄酒 750mlDomaine Saint Majan claret 750ml
10284339747169671481978红绳Red rope
10284339747169671481978红绳批发Red rope wholesale
10284339747169671481978项链挂绳编织Necklace rope woven

Task Identification

We use Rwordseg (Li, 2013) as the default dictionary and an additional dictionary containing terms from the Product Catalog acquired from Taobao API

http://open.taobao.com/

for query term segmentation. Then we calculate the pairwise Jaccard index (Järvelin et al., 2007) of queries that belong to a same user, and construct a similarity matrix based on the Jaccard values. We employ the following four methods to identify whether queries belong to the same task:

Rule-based sequential comparison, where for each query qi, we calculate its Jaccard similarity score sij with all previously labeled queries qj, j ∊ {1,…,i-1}; if sij is greater than a given threshold t, it is assigned a task label of Tj;

Clustering that uses the average Jaccard value as the Jaccard index between the new cluster and other clusters (clustering-avg); and

Clustering that uses the maximum Jaccard value as the Jaccard index between new cluster and other clusters (clustering-max).

Hierarchical clustering stops, however, when the Jaccard indexes between the two clusters are lower than a given threshold. For each method (with the two dictionaries of default and product catalog), we experiment with threshold values ranging from 0.2–0.6 and plot the F-score results as discussed in Session 3.3. Figure 3 shows the results.

Figure 3

Thresholds and F-scores of the clustering approaches.

As Figure 3 shows, the performances of the clustering methods are most stable between thresholds 0.3 and 0.4. Therefore, we chose the following three thresholds for our later experiments: 0.3, 0.35, and 0.4.

Assessment

To identify which combination of dictionary, method, and threshold works best for task identification, we created a gold standard with 10% of the experiment data (1,015 search log records) chosen at random. Two human coders examined the query terms and identified product search tasks separately. The coders were instructed to assign a task number to each query in a sequence, where the same task numbers are assigned to queries that belong to the same task. Table 2 presents part of the human task identification results.

Sample of human task identifications.

SidQuery Terms (original)Query Terms (translation)Coder
#1#2
11973丰胸仪Breast augmentation instrument11
21973优格格丰乳仪Yougege breast augmentation instrument11
31975北京茶月饼Beijing tea mooncake22
41975金凤呈祥Jinfengchengxiang33
51975金凤呈祥 200Jinfengchengxiang 20033
61976美优食品Meiyou food44
71977XQB38-83皮带XQB38-83 belt55
81978味多美卡Meiduomei gift card63
91978Laver丰胸精油Laver breast augmentation oil76
101978AOC 拉莫圣日尔曼干红葡萄酒 750mlAOC Saint Germain Rameau claret 750ml87
111978AOC银奖圣玛杰庄园干红葡萄酒 750mlAOC silver award Domaine Saint Majan claret 750ml87
121978圣玛杰庄园干红葡萄酒 750mlDomaine Saint Majan claret 750ml87
131978红绳Red rope98
141978红绳批发Red rope wholesale98
151978项链挂绳编织Necklace rope woven98

As noted in Table 2, the two coders agreed on most of the queries, but for record #8 (gift card), Coder 1 considered it as a separate task than task #3 (mooncake), whereas Coder 2 considered it as the same task as task #3, making the agreement level for these two human identification results 91.97%. For the records that the two coders did not initially agree on, we asked the two coders to discuss and resolve their different interpretations. We then used the agreed-on identification result as the gold standard to assess different task identification methods used in this paper.

For each identification approach, we calculated standard recall and precision. Recall (R) is the percentage of correctly identified records in all manually identified tasks, and precision (P) is the percentage of correctly identified records in all identified records. Then we calculated the F-measure (F=2PRP+R)$\begin{array}{} \displaystyle (F=\frac {2PR}{P+R}) \end{array} $ to assess each task identification approach.

Findings
Task Identification Results

We experimented with several combinations of task identification methods, dictionaries, and thresholds. The results are shown in Table 3.

Task identification results.

MethodDictionaryThresholdRPF
Rule basedDefault0.30.89950.85420.8763
Rule basedDefault0.350.86700.89460.8806
Rule basedDefault0.40.84140.92320.8804
Rule basedDefault + Pro-Catalog0.30.88370.85520.8692
Rule basedDefault + Pro-Catalog0.350.84530.89260.8683
Rule basedDefault + Pro-Catalog0.40.82660.90540.8642
Clustering-avgDefault0.30.83940.91920.8775
Clustering-avgDefault0.350.80790.93790.8681
Clustering-avgDefault0.40.77240.95270.8531
Clustering-avgDefault + Pro-Catalog0.30.82460.92320.8711
Clustering-avgDefault + Pro-Catalog0.350.78920.93790.8571
Clustering-avgDefault + Pro-Catalog0.40.75570.95370.8432
Clustering-maxDefault0.30.91230.83840.8738
Clustering-maxDefault0.350.88670.87780.8822

This approach yields the highest F-score and is used to perform task identification for the rest of the dataset.

Clustering-maxDefault0.40.85120.91230.8807
Clustering-maxDefault + Pro-Catalog0.30.90150.84330.8714
Clustering-maxDefault + Pro-Catalog0.350.86500.87880.8719
Clustering-maxDefault + Pro-Catalog0.40.81380.91430.8611

Results show that the combination of the clustering method with the maximum similarity score, default dictionary, and threshold 0.35 yields the highest F-measure. So we used this combination with all 8,949 queries in the dataset and identified 6,189 shopping tasks associated with these queries.

Then we analyzed the task characteristics based on the task identification results. Basic characteristics of the sessions and tasks are shown in Table 4.

Basic characteristics of sessions and tasks.

ItemBasic characteristics
Average number of queries per session2.57
Highest number of queries in a session21
Average number of tasks per session1.78
Highest number of tasks in a session41
Average number of queries per task1.45
Highest number of queries in a task15

On average, users issued 1.45 queries per task, with a maximum of 15 queries in one task. The average number of tasks is 1.78 per session, with a maximum of 41 tasks. The distribution of the sessions according to the number of task included in each session is shown in Table 5.

Distribution of the sessions according to the number of task per session.

Number of task in a sessionFreq.Percent (%)Cumulative percent (%)
1214061.461.4
274821.582.9
32928.491.3
41323.895.1
5732.197.2
6371.198.2
7230.798.9
8140.499.3
9110.399.6
1050.199.8
11 and more80.2100

Of the 3,483 sessions, 2,140 (61.4%) contain only one task, and 38.6% are multitasking sessions. There are 748 (21.5%) two-task sessions and 292 (8.4%) three-task sessions. Only 98 (2.8%) sessions contain more than five tasks.

Search Characteristics in Mono-tasking and Multi-tasking Sessions
Number of Queries

We compared the number of queries per session with mono-tasking and multitasking sessions. Table 6 shows the results.

Average number of queries per session and per task.

Session typeNumber of queries per sessionNumber of queries per task
One task1.451.45
Two tasks2.931.47
Three or more tasks6.141.43

Table 6 shows that users issued more queries in multi-tasking sessions. Mono-tasking sessions contain 1.45 queries on average, whereas two-task sessions contain 2.93 sessions, and sessions dealing with three or more tasks contain 6.14 queries. The average number of queries issued per task is about the same, however, regardless of the number of tasks included in a session. An independent-sample T-test shows that there is no significant difference in the number of queries per task in mono-tasking and multi-tasking sessions. On average, users issue 1.45 queries per task.

Query Length

We analyzed the length of the queries (i.e. number of characters included in a query) in one-task sessions, two-task sessions, and three-or-more-task sessions. Table 7 shows the results.

Average query length.

Session typeAverage query length in characters
One task7.56
Two tasks7.28
Three or more tasks7.32

The average length of queries in all session is 7.39 while the average length of queries in one-task sessions is higher and the average length of queries in two-task and three-or-more-tasks is slightly shorter. The mean length of queries used in each task is quite similar to each other regardless of the number of tasks included in a session. An independent-sample T-test analysis shows that there is no significant difference in the mean length of queries in mono-tasking and multi-tasking sessions. The length of user queries is similar in the mono-tasking and multi-tasking sessions.

Session Duration

We examined duration of the sessions and compared their durations by session type (one task, two tasks, and three or more tasks). The results are shown in Tables 8 and 9.

Session duration.

ItemSession duration
Average session duration49 minutes 3 seconds
Average task duration27 minutes 36 seconds
Longest session14 hours 56 minutes 22 seconds

Average session duration.

Session typeAverage session duration
One task36 minutes 9 seconds
Two tasks54 minutes 19 seconds
Three or more tasks1 hour 22 minutes 22 seconds

The correlation analysis between the number of tasks and the session duration results in the correlation coefficient of 0.3458 (p < 0.01). The duration of a session is positively related to the number of tasks a user is dealing with in that session. The average duration of two-task sessions is 1.5 times the average duration of one-task sessions, and the average duration of three-or-more-task sessions is 2.3 times the average duration of one-task sessions. The average duration of a task in one-task sessions, two-task sessions, and three-or-more-task sessions is shown in Table 10.

Average duration of tasks.

Session typeAverage duration of tasks
One task37 minutes 10 seconds
Two tasks27 minutes 26 seconds
Three or more tasks19 minutes 44 seconds

Table 10 shows that as the number of tasks in a session increases, users spend less time on each task on average. The average duration of tasks in mono-tasking sessions is 37 minutes 10 seconds, while the average duration of tasks in multitasking sessions is 22 minutes 35 seconds (including two-task sessions and sessions with more than three tasks). T-test results suggest that there is a significant difference in the average task durations between mono-tasking sessions and multi-tasking sessions (F-value = 794.32, p < 0.01). The average duration of a task in mono-tasking sessions is significantly longer than that in multi-tasking sessions.

Task Relationship in Multi-tasking Sessions
Two-task Sessions

We examined the relationships between tasks in multi-tasking sessions using exploratory qualitative analysis. For example, Table 11 shows an example two-task session with two tasks that are related. The first two queries belong to Task 1 and the third query belongs to Task 2. The user searched for men’s shirts in Task 1 and men’s shorts in Task 1. The user was likely to search for men’s summer outfits (short-sleeves shirts and shorts), which resulted in two sub-tasks that are related.

Session with related search tasks.

SIDTimeQuery termsQuery terms (translation)
19852013/5/20 20:21:34休闲衬衫 男 短袖Casual shirt male short sleeve
19852013/5/20 20:22:10休闲衬衫 男 绿Casual shirt male green
19852013/5/20 20:23:43短裤 男Shorts male

Table 12 shows a sample two-task session with two unrelated search tasks. The user searched for a 16G memory card (first two queries) in Task 1, and a water cup (third query) in Task 2, a multi-tasking session with two seemingly unrelated items.

Session with unrelated search tasks.

SIDTimeQuery termsQuery terms (translation)
678042013-05-17 10:51:33内存卡16g正品包邮Memory card 16g free delivery
678042013-05-17 10:53:35vip16g正品包邮Vip memory card 16g free delivery
678042013-05-17 11:12:08水杯Water cup

Sessions with Three or More Tasks

Similar to two-task sessions, we observed both related and unrelated tasks in sessions with three or more related tasks. For example, Table 13 shows an example session with three different tasks that are related. Each task includes one query search for different types of shoes.

Three related search tasks.

SIDTimeQuery termsQuery terms (translation)
8792013-05-04 12:06:25增高鞋真皮休闲Hidden heel shoes leather leisure
8792013-05-04 12:20:25夏季潮男洞洞鞋牛皮Summer male leather crocs
8792013-05-04 13:05:58万斯低帮豹纹Vance leopard print low-cut

While some tasks were closely related, perhaps with purchasing intentions of products that belong to the same category, there were sessions with seemingly unrelated tasks. For example, Table 14 shows a search session with search tasks for sea-lion oil, a mobile phone card, and a lip balm.

Three unrelated search tasks.

SIDTimeQuery termsQuery terms (translation)
135272013-05-29 18:27:16海狮油Sea lions oil
135272013-05-29 18:35:24上海移动100元快充Shanghai Mobile 100 yuan recharge
135272013-05-29 18:35:48上海移动10元Shanghai Mobile 10 yuan
135272013-05-29 18:35:58上海移动100元Shanghai Mobile 100 yuan
135272013-05-29 19:05:44澄糖滋润护唇膏玫瑰粉红Sugar moist lip balm rose pink

Conclusion and Discussion

Further analysis is needed to better identify the relationships among tasks in the same session and how users cope with or manage different types of multi-tasking sessions. Understanding users’ search tasks is a complex challenge. Sometimes search tasks span multiple sessions while other users deal with multiple tasks in one session. After identifying and analyzing multi-tasking online product search sessions, study results show that 38.6% of all search sessions are multi-tasking sessions, where users deal with two or more tasks at the same time, 3.4 times more than Web search (11.4% reported by Spink, Ozmutlu, & Ozmutlu, 2002). This may be due to the differences in the nature of Web search, where queries generally involve concepts and more extensive data, and product search, where data generally describe the products.

Comparing mono-tasking sessions and multi-tasking sessions, we found that (1) users issued a similar number of queries (ranging from 1.43 to 1.47) with similar lengths per task (7.3 to 7.6 characters) in mono-tasking and multi-tasking sessions, and (2) users spent more time in sessions with more tasks, which is similar to Web search, but spent less time on average for each task when the number of tasks increases in a session. The length of search queries in multi-tasking sessions for Web search are longer than those in mono-tasking sessions, which is not the case in product search.

The relationships between sessions and tasks are complex due to the myriad types of online search technology and variation in consumer behavior and intentions. Research has found that people may be involved in off-topic tasks while working on one-topic tasks (Feild & Allan, 2013), where search is a changing process that combines keyword search, browsing, and serendipity or unintentional discovery (Jiang, He, & Allan, 2014), in addition to impulse purchasing triggered by advertisement banners and promotions that are common in product search activities.

One limitation of this study is that our methods only consider query terms, which may not completely reflect the complex nature of consumer shopping behaviors. In future research, the identification of search tasks may take clues from click-through logs, which yield data on sites and items visited, mouse movement sequences, and so on. The identification of search tasks may also yield better results if the items viewed can be taken into consideration. Other measurements that help to measure the semantic similarity of queries instead of term similarity could also be used in further study. As understanding consumer behavior is a key aspect of many business enterprises, and the Internet and social media have become increasingly powerful consumer tools, this study contributes to the literature on online shopping trends. Gaining insights on information search activities within the Internet buying processes is thus an essential step to enhance awareness of consumer behavior for industry and providing better product search and recommendation services to consumers.

eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining