Smartphone and Tablet Application (App) Life Cycle Characterization via Apple App Store Rank

Han Jia 1 , Chun Guo 2 ,  and Xiaozhong Liu 2
  • 1 Department of Information and Library Science, School of Informatics, Computing and Engineering, Indiana University, 47405, Bloomington
  • 2 Department of Information and Library Science, School of Informatics, Computing and Engineering, Indiana University, 47405, Bloomington
Han Jia
  • Corresponding author
  • Department of Information and Library Science, School of Informatics, Computing and Engineering, Indiana University, Bloomington, 47405
  • Email
  • Search for other articles:
  • degruyter.comGoogle Scholar
, Chun Guo
  • Department of Information and Library Science, School of Informatics, Computing and Engineering, Indiana University, Bloomington, 47405
  • Search for other articles:
  • degruyter.comGoogle Scholar
and Xiaozhong Liu
  • Department of Information and Library Science, School of Informatics, Computing and Engineering, Indiana University, Bloomington, 47405
  • Search for other articles:
  • degruyter.comGoogle Scholar

Abstract

With the rapid growth of the smartphone and tablet market, mobile application (App) industry that provides a variety of functional devices is also growing at a striking speed. Product life cycle (PLC) theory, which has a long history, has been applied to a great number of industries and products and is widely used in the management domain. In this study, we apply classical PLC theory to mobile Apps on Apple smartphone and tablet devices (Apple App Store). Instead of trying to utilize often-unavailable sales or download volume data, we use open-access App daily download rankings as an indicator to characterize the normalized dynamic market popularity of an App. We also use this ranking information to generate an App life cycle model. By using this model, we compare paid and free Apps from 20 different categories. Our results show that Apps across various categories have different kinds of life cycles and exhibit various unique and unpredictable characteristics. Furthermore, as large-scale heterogeneous data (e.g., user App ratings, App hardware/software requirements, or App version updates) become available and are attached to each target App, an important contribution of this paper is that we perform in-depth studies to explore how such data correlate and affect the App life cycle. Using different regression techniques (i.e., logistic, ordinary least squares, and partial least squares), we built different models to investigate these relationships. The results indicate that some explicit and latent independent variables are more important than others for the characterization of App life cycle. In addition, we find that life cycle analysis for different App categories requires different tailored regression models, confirming that inner-category App life cycles are more predictable and comparable than App life cycles across different categories.

1 Introduction and Motivation

1.1 Background, Motivation, and Research Questions

Smartphone and tablet application software, also known as smartphone or tablet applications (Apps), have become a booming industry on account of the wide acceptance and popularity of smartphone and tablet devices such as Apple’s iPhone, iPad, and those that support Google’s Android system. The smartphone and tablet industry has built a symbiotic relationship with the App industry. According to an article from Bloomberg Businessweek (MacMillan, 2009), since Apple launched the App Store in 2008, the market value of the App industry surpassed $1 billion in 2009. In 2013, smartphone Apps generated approximately $25 billion in revenue. A survey indicated that the App industry is now creating more than 460,000 jobs in the US, while this number was almost zero in 2007 (TechNet, 2012). Since the launch of the iPhone in 2007, Apps have created tremendous economic value and jobs at an amazing speed. As a major driver for the App economy, Apple currently provides the largest application platform for both iPhone and iPad. Another report indicated that, in 2010, 30% of newly created Apps were designed for the iPhone and 21% for the iPad (MillennialMedia, 2010). The report interviewed also CEOs of the App industry. The results showed that more than 30% of developers believed that they would double their revenue from the previous year. Undoubtedly, the App industry has become a significant component of the digital products market. To the best of our knowledge, however, academic research on the App industry, in comparison with other industries, is quite sparse, i.e., what kind of factor(s) contribute to the success of a candidate App?

The product life cycle (PLC) theory has a long history. It has been applied to a tremendous number of industries and is widely used in management. PLC analyzes the stages that products and product categories pass through, from initial market introduction to decline (Wikipedia). Different industries, brands, and products can demonstrate a great variety of life cycle patterns. The digital market, in particular, exhibits a very different life cycle pattern than traditional industries such as pharmaceuticals and manufacturing. In many cases, however, lack of product (sales) data makes it difficult for researchers to characterize PLCs at the brand or product model level.

Unfortunately, we cannot directly apply classical PLC definitions and methodologies to study Apps at the product level due to four reasons. First, it is difficult to obtain App sales data (for paid Apps) and download volume data (for free Apps). Because smartphone and tablet Apps are released by a very large number of App providers via Apple and Google platforms, sales data for each App are not openly accessible. Second, compared to other industries, the App life cycle is much more dynamic. While it may take months or years for traditional products to build their public reputation and sales volume, Apps may attract and lose user interest within a short amount of time, sometimes within just a couple of days. Third, since the App industry is growing exponentially, employing “unnormalized” individual App sales data to characterize the App life cycle and App popularity could be biased. In order to accurately estimate App popularity, we need to normalize App sales and/or download volumes with App industry growth (perhaps on a per category basis). Fourth and finally, in the big data era, much heterogeneous App-specific data are available, including user App ratings, App hardware/software requirements, and App version updates. Such data are likely to have some bearing upon the App life cycle, but in-depth studies are needed to explore and characterize these relationships.

Three major questions are investigated in this study:

  • RQ1: [App PLC Definition and Simulation]What is the smartphone and tablet App life cycle? How can we accurately characterize or simulate the App life cycle by leveraging very dynamic and openly accessible App daily download ranking information?
  • RQ2: [App Category Assumption]For each smartphone and tablet App category (e.g., Game, Education, Life, Book, Finance, etc.), do Apps in the target category share similar or different life cycle patterns when compared with Apps from other categories?
  • RQ3: [App PLC Regression]How we can quantitatively estimate the correlation between App life cycle and other explicit or latent App features, e.g., provider, version, rank, price, rating, and hardware/software requirements?

Unlike classical life cycle studies, the contributions of this project are fourfold. First, we apply the classical PLC theory to novel smartphone and tablet App products, which are much more dynamic than the products from traditional industries and even other digital products, i.e., PC software. To the best of our knowledge, this kind of study is quite sparse. Second, because most researchers cannot directly access smartphone and tablet App sales data, in this study, we propose an innovative alternative approach. We use open-access App daily download ranking information to simulate the PLC of each candidate App. App ranking position is used as a normalized indicator for App daily popularity. Third, because Apps fall into different categories, in this study, we compare PLCs for 20 different App categories. Fourth and finally, in addition to download ranking data, we also analyze several other types of (daily) App-related data to identify important correlations that can help improve and better understand App PLC modeling.

In what follows we (1) review the relevant literature; (2) define and characterize the smartphone and tablet App PLC by using daily download ranking data; (3) describe the dataset we used for this study and the basic statistics of the App life cycle; (4) introduce different explicit and latent independent variables along with regression results to analyze the relationship between these variables and the App life cycle; and (5) discuss the limitations of this study and relevant avenues for future research.

1.2 Literature Review

The origin of using Life cycle patterns to analyze marketing behavior can be traced back to the late 1950s. In 1965, Theodore Levitt first conceptualized a managerial theory in the article “Exploit the Product Life Cycle”, published in Harvard Business Review (Levitt, 1965). The theoretical foundation behind the PLC theory is the diffusion of innovations theory (Rink & Swan, 1979) proposed by Rogers (1962) in his influential book The Diffusion of Innovations. He (Levitt, 1965) seeks to analyze the adoption of innovations as a socialization process in which innovations are spread by communication in a social system over time. Inspired by the idea, F. Bass subsequently developed a well-known quantitative marketing model, the Bass model, which can predict sales volume changes over time by considering innovation and imitation factors (Bass, 1969).

PLC and diffusion theory studies are closely interwoven and aim to capture and explain observed market patterns. According to Golder and Tellis (2004), one major stream of PLC studies now attempts to investigate the sales patterns by adopting and extending the Bass model. An obvious link between diffusion and PLC theory is that logistic distribution of market share in the diffusion model can be regarded as cumulative of the normal sales curve in the PLC (Rink & Swan, 1979). Even though they are homogeneous, the major applications of the two theories are distinct. The diffusion model is suitable for learning interaction characteristics and then predicting future sales volume. The PLC model, however, is more useful for strategic planning and policy formulation. Its forecasting utility has been a controversial issue mainly due to the model’s poor generalizability.

According to previous studies, several key aspects can be used to define a PLC model: (1) type of product; (2) scope; (3) length; and (4) metric. The review of existing studies from these four perspectives is as follows. A comparison of these studies can be found in Table 1.

Table 1

A List of Studies on the PLC

AuthorYearType of productsScopeDurationMeasurementMajor contributions
Cox1967Consumer non-durables (ethical drugs)Form<5 yearsRevenueExamining validity of PLC theory
Anderson and Zeithaml1984Industrial products (manufacturing industry)Brand4–10 yearsMarket shareInvestigation on various strategic and performance variables at different stages in the PLCs
Polli and Cook1969Consumer non-durables (food, cigarettes, personal care)Class>10 yearsAdjusted salesTesting the validity of the theory and developing an operational model for sales behavior
Simon1979Consumer durablesBrand<10 yearsSales unitFormulating a PLC model for price elasticity
Mercer1993Consumer non-durablesBrand20 yearsRelative market positionShowing that the PLC is longer and more stable in the case of brand leaders
Golder and Tellis2004Consumer durablesFormVarySales VolumeDefining turning points in PLC takeoff and slowdown; predictive model for the timing of slowdown
Kurawarwala1998Consumer durables (PC)Brand3 yearsDemand volumeDevelopment of a demand forecasting model
Qualls1981Consumer durables (household appliances)FormChanging from 12.5 + 33.8 to 2.0 + 6.8 (introductory + growth)Production volumeProviding empirical evidence for the assumption that the PLC is shortening (at least during the introductory and growth stages)
Kim2003Industrial (Internet infrastructure)Form<3 yearsNumber of usersUnderstanding the factors that determine the technology transition from a managerial perspective

(1) Type of product

In PLC studies, there are four main types of products frequently analyzed: consumer durables, consumer non-durables, industrial durables, and industrial nondurables. Because of their unique natures, these four product types exhibit different characteristics in terms of marketing behavior. Product type can be a major determinant for the length of the PLC (Bayus, 1994). Even within a product type, product subcategories can also vary. Golder and Tellis (2004) examined two types of consumer products—timesaving and leisure—finding that their life cycles are not identical and have their own characteristics. Thus, for example, timesaving products usually have longer life cycles with a more gradual slope during both growth and decline stages. Since Golder and Tellis (2004) focused only on traditional consumer durables, it will be interesting to see whether their findings hold in other industries. According to Rink and Swan (1979), generally, researchers tend to study consumer products and focus less on industrial products for mainly two reasons: (1) slow responsiveness and (2) lack of data.

(2) Scope (level)

Day (1981) pointed out that successful application of PLC relies heavily on selecting appropriate dimensions. Dhalla and Yuspeh (1976) argued that the validity of PLC diminishes with higher granularity. Therefore, it is necessary to distinguish industry, product, and brand in PLC research (Simon, 1979) because distinct factors come into play at different levels and because results obtained at different levels may be used for different purposes.

Most existing literature clearly defines which level of PLC it concentrates on. For example, Polli and Cook (1969) anatomized the PLC into three distinct levels: product classes, product forms, and brands. Bayus (1994) categorized the PLC into four levels: industry, product category, product technology, and product model. Early studies mainly focused on industry-level or product category-level analysis for two reasons: (1) difficulty of modeling brands (previous studies suggested that individual brands are difficult to model, although product forms can achieve a fair approximation (Tellis & Crawford, 1981) and (2) lack of access to the data.

While industry-level analyses dominate in the early studies, more attention has recently been paid to brand-level PLCs. Peres et al. (Peres, Muller, & Mahajan, 2010) indicated that diffusion research is shifting from industry-level analysis to brand-level PLCs.

(3) Length

Shorter PLCs are becoming increasingly common in many industries. Bayus (1994) summarized three causes of this phenomenon: (1) the faster implementation cycle of new knowledge; (2) the larger size of newly introduced products; and (3) the shorter time between innovations. Kurawarwala and Matsuo (1998), for example, noted that the demand for a personal computer model lasts only for 1 to 2 years.

Some empirical evidence and explanations for the shrinking of the PLC can be found in diffusion model research. Norton and Bass (1987) improved Bass’s diffusion theory by taking the likelihood of technological substitution into account. Their data were collected from the semiconductor industry and tracked the evolution of random access memory (RAM) and its sales history. As they stated in 1987, the creation of successive generations of high-tech products has continued to accelerate.

Even though, as far as we know, there are no recent studies on PLC models for digital or online products, we can still find some useful implications from the traditional diffusion model. Web 2.0 technologies provide an extraordinary platform for users of certain products to share experiences, express emotions, give feedback, and communicate with other users. In the case of Apple’s App Store, where App purchasing and downloading occurs, users who have purchased a product are allowed to rate and comment on it. This information may immediately affect decisions made by other potential buyers. For traditional products, the influence of imitators on the PLC may have latency because of slower user communication, such as word of mouth. The rapid growth of online social networks, on the other hand, can accelerate the effect of word of mouth communication. Li et al. (Li, Bhowmick, & Sun, 2011) suggested that the delay between high ratings and high social network user interest in a product is less than a week.

In-App advertising has also made it easier for developers to reach potential customers. For example, Apple’s smartphone advertising platform, iAd, allows developers to embed interactive, dynamic ads within other Apple Apps. Apple claims that its platform implements ongoing optimization to deliver ads to the right audience and thus help developers elicit more App downloads.

(4) Metric

A wide range of literature has paid attention to metrics used to measure the growth of a product. Polli and Cook (1969) expressed concern that treating change in absolute sales volume directly as an indicator in the model may be problematic, since a lot of factors affect sales, including population growth and personal consumption levels. Hence, they proposed an adjusted sales indicator to offset various irrelevant factors. Seasonal sales variations also have an impact on consumer demand, which may dramatically distort the PLC (Kurawarwala & Matsuo, 1998). Another issue, one mentioned above, is that accurate sales data about individual corporations or individual product models are usually unavailable to researchers.

In our research, we utilize sales rank on the Apple platform instead of sales itself in part because only rank data were available to us and also because we believe that ranking data can be more accurate and productive than sales data. One advantage of this metric is that it is free of undesirable influence from the aforementioned exogenous factors. Ranking is also a useful indicator to support managerial diagnostics as it provides hints on relative position in a competitive market. The adoption of relative market position as an indicator can be found in previous studies, such as Mercer (1993) and Anderson and Zeithaml (1984).

Researchers of marketing and information systems have long been interested in software adoption. For example, Duan et al. (Duan, Gu, & Whinston, 2009) applied the information cascades theory by Bikhchandani et al. (Bikhchandani, Hirshleifer, & Welch, 1992) to software purchasing behavior through the Internet and highlighted the impact of user reviews. As far as we know, however, very few studies pay attention to App adoption. A study by Carare (2012) attempted to understand the influence of App ranking on future user behavior. Even though both Duan and Carare’s studies and our study have some common features in terms of dataset and research methodology, we address a different issue from a different perspective, namely, framing the life cycle of Apps.

2 Smartphone and Tablet App Life Cycle Definition and Characterization

Classical PLC studies have shown that different products from traditional industries tend to share a similar evolutionary pattern grow progressively. Unfortunately, we can hardly use traditional methods and data to depict the App life cycle for the aforementioned data boundary and normalization reasons. In this section, we propose a new method of characterizing the App life cycle by using App daily ranking data.

The App daily ranking services provided by Apple and Google are “black boxes” because they use confidential ranking algorithms. Evidence shows, however, that the most important ranking indicators come from the normalized App sales volume. Take the Apple App Store as an example, “most of the weight (of ranking factors) is placed on the 4 previous days of sales and nearly none of the weight is placed on days before that. ¼ In order for the weighted average method to be accurate, you have to normalize your sales volume to the total app store sales volume”.1 As a result, we use App daily ranking data as an accurate indicator of real-world user interest, or dynamic App popularity, toward either all Apps or Apps from a specific category.

2.1 App Life Cycle Definition by Using App Daily Download Rank

As App sales data for a large number of App providers are not directly available from Apple and Google, in this study, we use open-access App daily rank data to characterize the App life cycle. More specifically, based on Cox’s classical study, we define the following five critical key timestamps on App life cycle:

(1) App catalog birth: Catalog birth is the time when the product is released and carried in the catalog of a firm. This indicates the start of a product. In our study, App catalog birth is defined as the date when the App provider releases the target App to the Apple or Google network so that it is ready for smartphone and tablet users to access and to download.

(2) App Commercial Birth: Commercial birth begins when the product starts to attract a relatively large volume of sales and becomes known by more consumers. Based on studies of the ethical drug industry, Cox required that a drug must be included in an industry report (Cox, 1967). Because such reports only listed drugs with more than five prescriptions, it was an indirect way to verify product sales and popularity. Similarly, an App’s first appearance in the daily top 2002 download rank list indicates that it has reached a decent number of downloads and has become popular. We utilize this date as the App commercial birth. Note that for each target App, there can be two different daily ranking lists (and two App commercial birth dates): (1) the general ranking for all apps (across different categories) and (2) the categorical ranking for apps in the same category, e.g., Game, Book, Social Network, and Finance.

(3) App Peak: The peak is decided when a product reaches its highest sales historically. Since we use the App daily download rank as the metric, the date we used as the App peak was when the App reached its highest rank. Even though App ranks can increase and decrease a number of times in this procedure, in this study, we only capture the highest point in the whole life cycle. Based on the experimental data, one App could achieve two different peaks: (1) its peak ranking relative to all Apps and (2) its categorical peak, i.e., its peak ranking relative to other Apps in the same category.

(4) App commercial death: Commercial death, as Cox defines it, occurs when sales revenue falls to a threshold proportion of its maximum sales revenue. The threshold could, empirically, be 10% or 20%. Since we are using App rank information instead of sales data, we use the 200th rank as the threshold, such that if an App drops out of the top 200 for more than 14 consecutive days (2 weeks), we declare it to have been commercially dead from the dropout date. We decided on a 14-day “grace period” because Apps are more dynamic than other traditional products (e.g., an App update could help reignite user interest). While an App could come back into the daily top 200 after having fallen out, we assume it very unlikely that an App would come back into the top 200 if it remains out for longer than 14 consecutive days.

(5) Catalog death: Catalog death refers to the removal of the product from its company’s catalog. Because neither Apple nor Google provides data about an App’s exit from the market, we ignore this stage (in the study) and focus only on the PLC cycle from catalog birth to commercial death.

Employing daily ranking information for App life cycle characterization has the following merits. First, because sales and actual download data of a very large number of Apps are scattered over many App providers, researchers can hardly access these data for analysis. App daily rank data are open accessible and accurately mirror dynamic user interest. Second, because the App industry is rapidly growing, we cannot tell from sales quantity data alone whether increase in an App’s sales/download volume over time is due to growing user interest (relative to other Apps) or just due to the fact that there are more users in total. Daily ranking information can help us normalize the sales data. Because different Apps compete for ranking on a daily basis, ranking position is an indicator of relative App popularity in a dynamic growing user community. Third, because Apple and Google release two different rankings every day—overall and by category—we can characterize two different life cycles for each App, namely, the App life cycle (ALC) and the App categorical life cycle (ACLC).

Employing daily ranking information does, however, have two major limitations. First, because Google and Apple only release daily top 200 App ranking information to the public, ranking data for sub-200 Apps are not available for App life cycle analysis. This does not mean those Apps are not important—many of them are. Dropping out of the top 200 does not mean that an App is no longer popular. It only indicates that the App was not in the top 200 on that given day. If the data were available, we could, of course, use top 500 or top 1000 data to characterize the App life cycle. This would allow us to generate a more comprehensive analysis. However, unfortunately, we currently have no means of getting around the bottleneck caused by the top 200 data boundary.

A second limitation is that, when defining App commercial death, we assume that Apps who have been dropped from the top 200 for more than 14 consecutive days (2 weeks) would not make a comeback. While this is generally the case, we do find rare exceptions in the dataset. Developers can sometimes reawaken interest in an App by, e.g., lowering the price or announcing a new version. We will save work on this limitation for future research.

After determining key points, four stages can be straightforwardly obtained. The time period from catalog birth to commercial birth is the Introduction stage. The Growth stage is defined as the interval between commercial birth and peak. The Maturity stage lasts from peak to commercial death. Finally, there is the Decline or Saturation stage from commercial death to catalog death. The life cycle definition and key points are depicted in the following diagram.

In this study, because of the data boundary, as the above diagram shows, we investigate App performance between App commercial birth (when the App first appears in the daily top 200) and App commercial death (when the App drops out of top 200 for more than consecutive 14 days). This covers the first three stages of the App life cycle (indicated by the red line). By using App daily rank data, we can track and characterize normalized App popularity in this period of time.

It should be noted that Apple’s App Store separates paid and free (a.k.a. Lite) Apps into two different ranking lists for each category, since paid and free Apps are not comparable. Hence, in the study, we report the life cycle and App regression for paid and free Apps separately.

2.2 App Life Cycle (ALC) and ACLC

Based on the life cycle definition in Section 2.1, each App can have two different life cycles with respect to different data resources: (1) the App life cycle (ALC), which characterizes App popularity changes over time in relation to all the other Apps in the market and (2) the ACLC, which characterizes App popularity changes over time in relation to other Apps in the same category. In most cases, because of ALC’s increased range of competition, ALC duration is less than or equal to ACLC duration.

Considering Apple’s App Store, all the Apps are classified into the following categories: Entertainment, Game, Business, Sports, News, Navigation, Utilities, Travel, Social Network, Reference, Productivity, Photography, Music, Medical, Lifestyle, Health Care and Fitness, Finance, Education, and Books. The difference between categories is significant, and not just with respect to the number of Apps in each. We hypothesize that different categories could have significantly different average life cycles. In other words, ACLCs of Apps from the same category are likely to be more homogeneous when compared with Apps from other categories or with all Apps across different categories. For example, an App having an average user rating of 4 (out of 5) in category A may correlate more strongly with an App having a long ACLC than an App in category B.

Figure 1
Figure 1

App life cycle definition and five key points

Citation: Data and Information Management 4, 1; 10.2478/dim-2020-0002

In order to verify this assumption, we use several latent or explicit features (as independent variables) to represent the target App and use ALC and ACLC as dependent variables. By using various regression methods, the correlation coefficients between independent variables and App Life Cycle can be used to estimate the predictability of ALC and ACLC. For instance, if the coefficients between various independent variables and ACLC are consistently higher than the the coefficients between various independent variables and ALC, the hypothesis is confirmed. The detailed independent variables are introduced in Section 2.3.

2.3 Factor Analysis and Hypothesis Development

Some Apps are undoubtedly much more successful/profitable than others. From a life cycle perspective, the more successful Apps tend to have a longer ALC or ACLC length and a higher peak (ranking position) than other Apps. This gives popular Apps greater visibility, which in turn enables users to access and purchase the App more easily. Therefore, in this study, we define the success of an App as the one with a longer ALC or ACLC and with a higher peak rank. A number of important factors may contribute to the success of a specific App. High user ratings, for instance, should intuitively be highly positively related to the success of an App, whereas higher App price should intuitively correlate negatively with ALC (since expensive Apps may not be favored by most users).

In this research, we use various App features as latent/explicit independent variables to characterize and explain the ALC and ACLC by utilizing linear regression and partial least squares (PLS) regression. The features can be classified into the following categories, with each feature category being a latent independent variable. The feature/variables and their corresponding hypothesis can be found in Table 2.

Table 2

Features for ALC and ACLC Characterization

IDFeature Category NameHypothesisDescription
H1App ratingApp rating has a positive effect on ALC and ACLCAfter downloading an App, users can provide feedback by rating it on a 5-point scale. Only users who have purchased/downloaded the App are able to rate it. Although it is unclear how Apple calculates an App’s overall rating based on individual user input, it is assumed that this rating reflects the collective opinion of its user base. In addition, it is important that the overall rating be interpreted together with the total number of ratings, since a sparse opinion pool is more likely to suffer from biased information.
H2App pricePrice has a negative effect on ALC and ACLCPrice is a very important factor affecting consumer decisions on App purchasing. If a close substitute of an App is sold at a lower price, new users will tend to switch to the cheaper one. It is also a common strategy among App providers to use temporary price cuts to promote new App purchases.
H3ProviderProvider (reputation or experience) has a positive effect on ALC and ACLCLarge App providers usually have multiple products on the market. Apps developed by such providers tend to have better sales performance. On the one hand, well-established providers have the necessary resources to develop high-quality products. On the other hand, their good reputation help make their products more popular among consumers. In addition, providers with multiple products can develop a better “in-App” purchase strategy among the products they own.
H4VersionVersion (number of versions or version duration) has a negative effect on ALC and ACLCThe App developer usually keeps improving its product by fixing bugs and adding new features. It is natural that an App would go through several versions during its entire life cycle. However, Apps that have too many fixes are likely to have been poorly developed. In addition, if the duration between two version releases is too long, it might indicate that the provider does not have a very responsive maintenance strategy for its product, which would/could affect sales.
H5Hardware requirementHardware requirement has a positive effect on ALC and ACLCApple produces three lines of smartphone devices, including iPhone, iPad, and iTouch. Each product line has multiple generations with different hardware configurations. In addition, iOS, the default operating system running on Apple’s smartphone products, keeps evolving. Apps that run on diverse devices and OS versions would be able to reach a larger audience and lead to better sales performance.

By using linear regression and PLS regression methods, we will verify these hypotheses in Section 5. Please note that each feature category has more than one explicit independent variable. Take the price category as an example. Since App price may change over time (e.g., due to a temporary price cut or discount), we use average price, price standard deviation, maximum price, minimum price, and number of changes in price as explicit independent variables to represent this category. We use linear regression to investigate correlations between the explicit independent variables and ALC/ACLC, and we use PLS regression to characterize and explain ALC/ACLC by using the latent independent variables associated with the feature categories.

3 Data Collection

As previously noted, for our research in this paper, we used Apple’s App Store daily ranking data3 to estimate ACL and ACLC. Our data collection method is introduced in this section.

3.1 Data Collection via Apple’s App Store

Our empirical work is based on observations of the top 200 most popular Apps on Apple’s App Store. The App Store is an application available on Apple’s smartphone devices (iPhone, iPod Touch, and iPad) where consumers can easily discover and download smartphone Apps via daily top rankings. For Apple device users, the discovery of good Apps is not easy, given the large number of Apps available. To facilitate this process, Apple maintains multiple charts to track the most downloaded Apps (i.e., charts for top Apps in specific categories and charts for most downloaded Apps across all categories). These charts are released through the App Store on a daily basis. The same information is also available online via the iTunes website,4 as shown in Figure 2. For this study, we built a Web crawler and automatically collected daily data about the 200 most downloaded Apps (from different charts) for a period of 400 consecutive days, from March 29, 2011, to May 1, 2012.

Figure 2
Figure 2

iTunes App daily rank information.5

Citation: Data and Information Management 4, 1; 10.2478/dim-2020-0002

Each App available in the App Store has an associated web page on the iTunes website called iTunes Preview. As shown in Figure 3, detailed information about the App can be found on that page, including App description, App provider information, properties (e.g., size, version, and language), user reviews, and user rating. In addition to collecting App names from iTunes charts every day, our crawler also went to each App’s iTunes Preview page to gather and record App details. In this way, we were able to capture any changes reflected on an App’s iTunes Preview page over the 400 days (e.g., newer version release, price change, and rating change).

Figure 3
Figure 3

App Preview page on iTunes website6

Citation: Data and Information Management 4, 1; 10.2478/dim-2020-0002

A total number of 55,457 distinct Apps were tracked in our initial data collection stage. However, this was only a small fraction of the total number of Apps competing in the App Store. Although ranking information for Apps that never managed to appear in the iTunes charts is not released by Apple, we were still able to collect other information for those Apps from their iTunes Preview page.

During the second data collection stage, we crawled, in an alphabetical order, snapshots of all Apps provided by iTunes that were competing on Apple’s App Store at that point in time (static page). The resulting dataset contained no ranking information, but it did have all information available on each App’s iTunes Preview page. Since we crawled each Preview page only once, we were not able to track changes to these Apps. In order to compare App feature sets and life cycles, all Apps released before the start date of the experiment, March 29, 2011, were removed from our dataset. In addition, since Apps enter and leave the market every day, if we had run the crawler again at a different time, the coverage of the resulting dataset would not have been the same. During the second stage, we collected a total of 506,178 unique Apps.

Table 3 depicts the number of Apps in each category from our 400-day dataset (stage 1 and stage 2). Our study focused mainly on the ten categories with the largest number of Apps. As mentioned earlier, Apple provides multiple charts to track the most downloaded Apps, one for each App category plus one for cross-category ranking. In this paper, we use “all” to refer to the cross-category ranking chart.

Table 3

Number of Apps in Each Category in the Top 200 Dataset

App categoryNumber of Apps in each category in the top 200 datasetNumber of Apps in each category in the overall dataset
Business3,55141,854
Game4,68868,114
Entertainment4,38440,792
Sports3,15621,169
News1,91216,720
Navigation1,8079,478
Utilities3,27434,207
Travel3,60934,658
Social network1,82411,906
Reference2,86619,806
Productivity2,38519,009
Music3,06726,155
Photography3,04618,096
Medical3,13414,986
Lifestyle3,64541,633
Health and fitness2,61717,678
Finance1,92515,331
Books5,05827,306
Education4,42855,951
All (cross-category)5,560-

It is clear that the proportion of the Apps entering the top 200 is small, meaning that a large number of Apps never get a chance to appear in the top ranking chart. Based on our definitions, the ALC and ACLC of these Apps are equal to 0. Based on the dataset, we also observe that some categories, e.g., Game, collected a larger number of Apps, while others, e.g., News, have a relatively small number of Apps. We hypothesize that this is because some categories are more stable than others. In the news category, for instance, we found that some Apps, e.g., CNN and FOX news services, stayed at the top for a very long time, resulting in a long ACLC. In contrast, in the Game and Entertainment categories, user interest moved relatively quickly from one App to another. A large number of top 200 Apps were collected in the 400 days. Hence, the average ACLC for game was much shorter than the average ACLC of news. We will verify this hypothesis in the following sections.

For each category, we collected two different ranking lists from iTunes: paid Apps and free Apps. While paid Apps, obviously, require user payment, free Apps only require users to log in. We report life cycle and regression results for paid and free Apps separately in each category.

3.2 A Closer Look at the Dataset

3.2.1 Number of Days on the Top 200 List

Box plots in Figure 4 provide an overview of Apps’ overall presence in the top 200 dataset by category, without considering life cycle definition. For each App, we counted how many days it managed to enter the top 200 lists between March 29, 2011, and May 1, 2012. In general, the distributions of Apps’ overall presence in different categories follow a similar long tail pattern, but differences can still be found across categories.

Figure 4
Figure 4

Distribution of Apps’ overall presence in the top 200 dataset by category.

(Note: y-axis is plotted in log scale and the width of boxplot is proportional to the square root of category size.)

Citation: Data and Information Management 4, 1; 10.2478/dim-2020-0002

Similarity. The distribution of Apps’ overall presence on iTunes charts during the 400-day period has a heavy tail on the upper side. The majority of Apps (more than 50%) only appeared in the ranking list for fewer than 20 days. Only a few Apps (less than 25%) were able to stay on the list for more than 100 days. For example, some popular Apps, such as “Angry Birds” and “Twitter”, survived on the list for a long period of time.

Differences. Apps in categories with more Apps have shorter appearance. For the five largest categories (All, Game, Entertainment, Books, and Education), the majority of Apps (middle half) appeared on the list for 2–20 days. However, Apps in categories that have fewer Apps have longer appearance. The four smallest categories (News, Navigation, Social Network, and Finance) host Apps with presence typically between 5 and 100 days. These categories are more stable compared with the larger ones, which indicates that the nature of Apps in these categories might have posed a higher barrier for entry into this section of the App market.

3.2.2 App Statistics for Life Cycle Characterization

In this study, we only explore the PLC of smartphone Apps with finished life cycles. Based on the definitions given in Section 3.1, if an App does not show up in the top 200 list for 14 consecutive days, the date it drops out will be regarded as the termination of its entire life cycle. As mentioned earlier, we can identify App commercial death but not App catalog death due to limited data access. In our experiment, a completed life cycle refers to cases in which an App’s commercial birth (its entering the top 200), peak (its best ranking), and commercial death (its dropping out of top 200) can be located within our dataset. iTunes’s App preview also provides App catalogue birth (i.e., release) date. In this study, therefore, to qualify as having a “complete” life cycle, an App must satisfy three criteria: (1) its release date must be later than March 29, 2011 (the start of the data collection); (2) its peak can be identified within our dataset; and (3) its last presence in the top 200 should be earlier than April 18, 2012 (14 days before the end of the experiment).

The average lengths of different stages of different categories are presented in the Table 4. Based on the results, we find that life cycles of popular Apps are much shorter than the life cycles of products in traditional industries. While average ALC length is 54.84 days and length of growth plus maturity stages (in the top 200 list) is 33.98 days, average ACLC length across different categories is 50.91 days and length of growth plus maturity stages is 42.11 days.

Table 4

Statistics of Complete ALCs by Category (Ordered by Average of “Growth + Maturity” Stages from Smallest to Largest)

CategoryAverage of introAverage of growthAverage of maturityAverage of totalAverage of growth + maturity
Education12.7856.20921.0440.03427.249
Entertainment5.9757.00421.33434.31328.338
Books15.8618.83723.03947.73731.876
Lifestyle7.5058.89225.03541.43233.927
All (ALC)20.8599.85324.12454.83633.977
Game13.75610.76625.84750.36936.613
Business8.2659.79927.81745.88237.616
Utilities6.84110.43228.24845.52138.680
Photography4.3687.84232.17544.38540.017
Music4.7038.87433.19146.76842.065
Reference10.79611.41131.30953.51742.720
Finance7.3810.7933.49451.66444.284
Health care and fitness7.137.94136.56251.63244.503
News9.79311.90832.87354.57344.781
Productivity8.17510.75335.25054.17846.003
Travel14.00210.21138.10362.31648.314
Medical8.7417.60441.75458.10049.358
Sports7.83112.21440.24360.28752.457
Social network5.7907.99744.58658.37352.583
Navigation7.65311.73546.90366.29058.638

It is clear that different App categories can have very different life cycles. Some categories, e.g., Education, Entertainment, and Books, have much shorter life cycles (first three stages) than other categories, e.g., Travel, Sports, and Navigation. We can interpret these statistics in another way: some categories are more dynamic than others. User interest toward Apps in the Entertainment, Game, and Book categories typically changes quickly from one App to another, while user interest towards Apps in the Navigation and Sports categories is more stable. The category with the longest ACLC, Navigation (66.29 days), is 93.2% longer than the category with the shortest, Entertainment (34.31 days). For this experiment, we are most interested in the average combined length of the growth and maturity stages (the number of days an App stays in the top 200). Accordingly, in Table 4, we have ordered categories by this statistic (last column).

With regard to the introduction stage (second column in Table 4), it is not surprising that Apps in the Book and Education categories (and excluding the All category) take longer to reach their commercial birth after launch, given the general nature of these Apps, i.e., Apps in Book and Education categories may take longer time for users to experience and evaluate. In contrast, it is easier for Social Network, Entertainment, and Music Apps to reach commercial birth, since downloads by one consumer could often lead to further downloads by his/her friends. We did not, however, expect that Apps in the Game category would tend to have a relatively long introduction stage (from App catalogue birth to App commercial birth).

In Table 5, we present the standard deviation of three different stages of ALC and ACLC for different categories. Interestingly, we found standard deviations for these stages to be quite large, indicating that different Apps from the same category can have significantly different life cycles. If we assume that some Apps have much longer life cycles compared with others in the same category, we can view these Apps as outliers (from a statistical perspective). In this study, we used the generalized extreme Studentized deviate (ESD) method (Rosner, 1983) to detect outliers for each App category. One limitation of this method is that ESD assumes that App life cycle lengths are normally distributed, which is not necessarily true.

Table 5

STD of Apps by Category and Life Cycle Stage

CategorySD of introSD of growthSD of maturitySD of totalSD-WO of introSD-WO of growthSD-WO of maturity
Books34.98529.84954.32070.50232.427 (10)26.576 (10)51.590 (10)
Business21.63734.71663.44975.85118.393 (10)29.537 (10)75.272 (10)
Education30.05626.04152.88865.84826.549 (10)20.069 (10)49.624 (100
Entertainment19.79426.40054.48664.14414.147 (10)22.180 (10)51.359 (10)
Finance22.12635.62763.70574.178
Game32.05833.90755.70271.07927.979 (10)29.837 (10)53.260 (10)
Healthcare and fitness21.29227.37462.45770.26715.962 (10)20.073 (10)58.295 (10)
Lifestyle21.44530.57755.76767.70516.609 (10)25.273 (10)52.192 (10)
Medical26.38328.23270.99078.99521.106 (10)21.441 (10)67.541 (10)
Music15.88732.29063.85573.2039.328 (10)25.317 (10)59.803 (10)
Navigation20.01336.06673.58482.19812.308 (10)26.043 (10)72.085 (2)
News22.48135.46362.94373.88716.810 (10)26.320 (10)56.311 (10)
Photography18.47428.67359.71168.46310.206 (10)22.462 (10)55.682 (10)
Productivity25.87135.71865.15677.53816.354 (10)28.205 (10)60.586 (10)
Reference29.46436.14562.94477.47423.700 (10)29.642 (10)58.204 (10)
Social network15.91824.76871.18276.6009.863 (10)16.270 (10)64.950 (10)
Sports21.72334.69365.53676.60917.721 (10)29.232 (10)63.478 (5)
Travel33.34933.38471.41885.21729.349 (10)29.242 (10)68.172 (10)
Utilities20.53934.87858.97972.00215.477 (10)29.534 (10)54.934 (10)
All45.14832.56955.61974.90141.738 (10)28.501 (10)52.761 (10)

SD, standard deviation; SD-WO, standard deviation without outlier data points.

In the last three columns of Table 5, we present standard deviations for different life cycle stages after removing the outliers from the dataset using the ESD method. The number of outliers is presented in parentheses. It is clear that ESD successfully found a number of outliers (maximum number is 10) in each category. After removing these outliers, the standard deviations for each category decreased (since some Apps with very long life cycles were removed).

For the “All” category and ALC (based on the overall ranking across different categories), it is obvious that average length of the introduction stage is longer than other categories. This means that it takes longer for popular Apps to appear on the general top 200 list across different categories. This is consistent with the intuition that competition for a top 200 spot in the “All” category ranking is much fiercer than in a more specific category. Accordingly, it generally takes more time and effort for an App to break into the overall top 200. Growth and maturity stages for the “All” category, however, are not necessarily longer than for other categories. This may reflect the fact that even though Apps on the general top 200 list are outstanding, they also face wider competition from other Apps to keep that valuable spot.

3.3 Data Limitation

The dataset used for this study clearly has some limitations. We could not, for example, collect dynamic changes (e.g., price and rating changes over time) for all Apps in Apple’s repository. Data access limits us to the top 200 list. In addition, we are limited in what App features we can record for the regression tasks. Apple does not release App advertising information, for example. Hence, we are not able to accurately estimate the importance of App advertising strategies. More detailed limitations of this study can be found in the last chapter.

4 Regression Models

After life cycle characterization, our focus shifts to RQ3, quantitatively estimating the correlation between ALC or categorical ACLC and a number of explicit or latent independent variables by utilizing regression techniques. In Section 3.3, we proposed a number of hypotheses anticipating correlations between each independent feature category (latent variable) and the life cycle. Unlike other regression studies in the social sciences, we used an automatic approach to collect a large number of Apps in our database.

As mentioned earlier, we hypothesize that Apps from the same category are generally more similar in life cycle than Apps from other categories. In this section, we validate this hypothesis using regression methods. If the correlation of independent variables with ALC (across various categories) is weaker than that with ACLC (for a specific category), then our hypothesis is highly likely correct. Otherwise, the hypothesis would be problematic.

To address the effect of price, we divide the regression model into two with one for free Apps and the other for paid Apps.

4.1 Three Regression Methods and Independent Variables

A series of regression techniques, namely, binary logistic regression (logistic), linear regression (a.k.a. ordinary least squares (OLS)), and PLS regression, were used to understand the relationship between each independent variable and dependent variable. Please note that, as mentioned earlier, our 400-day dataset does not contain ranking information for all Apps. We were only able to attain life cycle information for Apps on the top 200 ranking lists. As a result, OLS and PLS regression models are only valid for popular Apps. We therefore needed a binary regression model to tell whether an App had the potential to reach the top 200. As mentioned in Section 4.1, however, we only collected static information (not dynamic) for Apps that never appeared in the top 200. The number of independent variables for binary logistic regression is also smaller than that for OLS regression and PLS regression. All independent variables are shown in Table 6.

Table 6

A List of Independent Variables

Latent variableVariableDescriptionLogisticOLSPLS
RatingRating_avgAverage rating
Rating_maxMaximum rating
Rating_minMinimum rating
Rating_stdStandard deviation of rating
Rating_numNumber of ratings
Rating_currentCurrent rating
PricePrice_avgAverage price
Price_maxMaximum price
Price_minMinimum price
Price_stdStandard deviation of price
Price_changeNumber of price changes in the whole life cycle
Price_currentCurrent price
ProviderProvider_app_numTotal number of Apps developed by this provider
Provider_app_num_catTotal number of Apps developed by this provider in a specific category
Provider_num_ratingAverage length of all Apps produced by this provider
Provider_rating_avgAverage peak rating of all Apps produced by this provider
VersionVersion_changeNumber of major version releases
Version_duration_maxMaximum duration between two consecutive version releases
Version_duration_minMinimum duration between two consecutive version releases
Hardware/softw are requirementReq_num_changeNumber of requirement changes
Req_num_hardwareNumber of supporting hardware
Req_num_networkNumber of supporting network
Req_num_OS_changeNumber of operating system version changes

For PLS regression, we use latent independent variables, which are not directly available and observable in the dataset but are inferred from related explicit independent variables. For instance, we use “Provider_ app_num”, “Provider_app_num_cat”, “Provider_num_ rating”, and “Provider_rating_avg” to infer the latent variable “App Provider”. The latent variables were used for PLS regression.

Similarly, as the following diagram shows, we use a number of explicit variables to represent dependent variables ALC and ACLC. The most important variables are life cycle length, life cycle peak, and life cycle sum. Length is the number of days between App category birth (App release date) and App commercial death (App drops out of top 200). Peak is the best ranking position of the App in the life cycle. The sum region contains the accumulated ranking positions between App commercial birth and App commercial death. It can be used to estimate normalized App accumulated revenue (as daily App rank representing daily App popularity). Length, peak, and sum are used as the explicit dependent variables for OLS, and the latent dependent variables (inferred from these three variables) are used for PLS regression.

4.1.1 Logistic Regression

We first examine the factors that can estimate and determine whether an App can enter the top 200 ranking list and thus start its commercial life, according to our definition. Since the observed outcome (dependent variable) will be binomial, the task requires a typical binary logistic regression. Please note that the number of Apps for this regression experiment is much larger than that for the other regression tasks because the Apps for this task are not necessarily restricted to the top 200 list and because a large number of Apps in the long tail are used to distinguish the popular ones.

As suggested earlier, due to the data limitations of Apple iTunes charts, we collected a new dataset based on the alphabetical list of Apps provided by Apple’s App Store. This dataset included all relevant static information on the App preview pages. For all Apps released between March 29, 2011, and April 18, 2012 (14 days before the end of the experiment), we created a simple binary dependent variable to indicate whether the target App ever appeared in the top 200. We then used the information (independent variables) gathered from its current preview page to estimate the binary variable. Given paper space limitations, we sampled the Books, Game, and Sports categories to report the regression results in this paper. Some other categories’ results are available in the online appendix.7

4.1.2 Linear Regression—OLS

We use linear regression, specifically, OLS, to estimate the effects of the various independent variables listed in Table 6 on App life cycles. As mentioned earlier, we are interested in three aspects of the App life cycle: Highest Ranking (Peak), Total Length (Length), and Aggregated Ranking (Sum). These dependent variables are visualized in Figure 5. For free Apps, the price-related variables are not available, and thus were removed from the regression model.

Figure 5
Figure 5

Dependent variables of life cycle.

Citation: Data and Information Management 4, 1; 10.2478/dim-2020-0002

Sum, in this model, is a sum of all the top 200 rankings in an App’s life cycle. A mathematical expression for this variable is

Sum=t=1n(200-Rankt)Dt
where t is the time stamp of an App in its life cycle and Dt is a binary variable (0 or 1) that reflects whether the App is in the top 200 list on the particular date represented by t.

4.1.3 PLS Regression

Finally, PLS regression is used to study the hierarchical structure of the latent variables. The rationale for using PLS regression is that variables used in the OLS regression can be grouped into constructs such as price and provider and can be manifested in different aspects and treated as latent variables in the PLS model. The constructed PLS model is presented in Figure 6. “Life Cycle”, the latent dependent variable in the PLS model, is constructed by the abovementioned variables life cycle length, peak, and sum. As the model is constructed for paid and free Apps, the latent variable “price” does not exist for free Apps. We use SmartPLS (Ringle, Wende, & Will, 2005) to perform PLS regression.

Figure 6
Figure 6

PLS regression model construction. PLS, partial least squares.

Citation: Data and Information Management 4, 1; 10.2478/dim-2020-0002

5 Regression Results

We present the results of three different regression methods in this section. For binary regression, as All category data are not available, only Books, Game, and Sports models are reported. For the other two regression methods, linear regression and PLS, separate results for all three categories and All category models are reported in Sections 5.2 and 5.3. As already mentioned, due to space limitations, we only report four sample categories: All, Books, Game, and Sports. Some other category results are available in the online appendix.8

5.1 Binary Regression

The results of logistic regression are shown in Table 7 and Table 8. Both coefficients and standard errors are reported. We also present the classification accuracy for each category because usually logistic regression can be regarded as a classification task. The benchmark is obtained by assigning an instance to the dominant class (the class with more instances).

Table 7

Binary Regression Results for Paid Apps

LevelVariable namesBooksGamesSports
CoefficientSECoefficientSECoefficientSE
App Level1. Price_current−0.032***0.009−0.0270.030−0.0010.002
2. Rating_current0.411***0.0290.819***0.0290.459***0.043
3. Rating_num0.0020.0020.001**0.0000.0050.005
Provider Level4. Provider_app_num−0.004*0.001−0.012***0.0040.006***0.001
5. Provider_app_num_cat−0.0010.0020.011*0.004−0.007***0.001
6. Provider_rating_avg0.444***0.0410.148***0.0360.278***0.056
7. Provider_ rating_num−0.001***0.0000.000*0.000−0.0000.001
Nagelkerke R2: 0.316

Accuracy: 71.9% (benchmark: 50.0%)
Nagelkerke R2: 0.657

Accuracy: 86.7% (benchmark: 50.0%)
Nagelkerke R2: 0.201

Accuracy: 65.5% (benchmark: 50.0%)

Significant levels:

***

p < 0.001,

**

p < 0.01, and

*

p < 0.05. SE, standard error.

Table 8

Binary Regression Results for Free Apps

LevelVariable namesBooksGamesSports
CoefficientSECoefficientSECoefficientSE
App level1. Rating_current0.215***0.0220.491***0.0210.365***0.033
2. Rating_num0.0000.0000.000**0.0000.0010.001
Provider level3. Provider_app_num−0.0020.0010.0010.002−0.0010.001
4. Provider_app_num_cat0.0030.002−0.0020.003−0.0050.003
5. Provider_rating_avg0.360***0.0360.309***0.0250.336***0.055
6. Provider_ rating_num0.000*0.0000.000*0.000−0.0000.000
Nagelkerke R2: 0.104

Accuracy: 62.1% (benchmark: 50.0%)
Nagelkerke R2: 0.427

Accuracy: 75.3% (benchmark: 50.0%)
Nagelkerke R2: 0.184

Accuracy: 65.3% (benchmark: 50.0%)

Significant levels:

***

p < 0.001,

**

p < 0.01, and

*

p < 0.05. SE, standard error.

Overall, the binary regression model works best for the Game category (with 86.7% and 75.3% accuracy for paid and free Apps respectively) when compared with the other two categories. This means that when using the simple independent variables we chose, it is highly likely that we can estimate which App in the Game category has a chance to achieve the top 200 ranking. This estimation ability is weaker for the Book and Sports categories. Among the seven independent variables, two are significantly correlated with App popularity: “Rating_current” and “Provider_rating_avg”. It is clear that App rating and App provider’s averaged App rating are important for predicting whether an App will enter the top 200. Other variables are not strongly correlated with App popularity of an App. Even App provider is not very significant, which disproves our initial hypothesis H3.

5.2 Linear Regression

As mentioned earlier, we selected peak, length, and sum as three dependent variables and attempted to use a combination of independent variables to explain the dependent variables. Adjusted R2 and analysis of variance values are shown in Table 9.

Table 9

Linear Regression Results

Dependent VariableMetricAllBooksGameSports
PaidFreePaidFreePaidFreePaidFree
PeakAdjusted R20.0960.1040.2890.1930.3180.3560.2200.220
ANOVA7.877***9.277***27.455***26.373***24.892***57.086***10.360***14.273***
LengthAdjusted R20.1660.1700.3620.4280.3970.4500.3650.543
ANOVA13.987***15.707***37.865***80.430***34.755***83.983***20.046***56.862***
SumAdjusted R20.1510.1180.5260.3150.5090.6000.4320.442
ANOVA12.526***10.581***72.996***49.700***54.267***153.284***26.175***38.148***

Significant levels:

***

p < 0.001,

**

p < 0.01, and

*

p < 0.05.

Owing to space limitations, we provide detailed regression results only when Sum is the dependent variable.

Table 10 clearly shows that almost all the department variables have very weak correlations with category “All”. This evidence proves our initial hypotheses that App life cycles across different categories (ALC) are not comparable. For all the other three individual categories, we did find a number of independent variables significantly related to the ACLC. Interestingly, the results show that different App categories have different independent variable coefficients with respect to ACLC. Of all the independent variable categories, App price and rating are closely related to ACLC, while App requirement and provider are the most weakly related (hypotheses H3 and H5 are both problematic). For App price, we found that App price for the Books category was less important than that for the Game and Sports categories. We found a similar pattern for “Version_change”: in the Game and Sports categories, frequently updated Apps are more likely to have a longer ACLC. Another interesting finding was that App “price_avg” was positively related to the Game and Books categories (the coefficient to Books is weak at 0.071) but negatively related to the Sports category. This means that expensive Apps tend to have a better ACLC in the Game and Books categories (hypothesis H2 is wrong for these two categories), whereas cheaper Apps do better in the Sports category (H2 is correct for this category). We found similar result for “Rating_avg”. The rating is positively related to Books and Sports categories, while negatively related to Games category. This finding shows that high rating of Apps in the Game category does not necessarily lead to a better ACLC. Namely, hypothesis H1 is wrong for Game category but correct for the other two categories.

Table 10

Independent Variable Coefficient with Linear Regression

Latent variableVariable namesAllBooksGamesSports
PaidFreePaidFreePaidFreePaidFree
Rating1. Rating_avg−0.0520.145*0.490***0.193*−0.328**−0.326**0.2000.122
2. Rating_std−0.269−0.106*−0.757***−0.979***−0.485***−0.604***−0.685***−0.703***
3. Rating_max0.201***0.0121.089***1.301***0.747***1.132***1.058***1.107***
4. Rating_min−0.250***−0.239***−1.261***−0.923***−0.494***−0.815***−0.745***−0.646***
5. Rating_num0.234***0.139***0.063**0.162***0.374***0.245***0.0370.045
Price6. Price_avg−0.0510.0710.406***−0.903
7. Price_std0.043−0.146*−0.409***0.736*
8. Price_max−0.0330.159*0.416***-−1.174-
9. Price_min0.065-−0.139-−0.730***-1.793**-
10. Price_change0.102***-0.160***-0.073*-0.182***-
Requirement11. Req_num_change−0.004−0.021−0.0270.057*−0.0420.050**−0.0290.153***
12. Req_num_hardware−0.070*−0.127***−0.015−0.027−0.053*−0.110***−0.044−0.058
13. Req_num_network0.079*0.045−0.036−0.0040.0150.042*−0.002−0.004
14. Req_num_OS_change−0.080**−0.017−0.018−0.001−0.069**−0.0240.041−0.003
Version15. Version_duration_max0.046−0.0190.0350.069−0.151**−0.000−0.168*−0.126*
16. Version_duration_min−0.0320.0270.029−0.0450.078*−0.0120.0850.031
17. Version_change0.0540.171***0.079**0.087**0.285***0.305***0.377***0.229***
Provider18. Provider_app_num−0.081−0.071−0.006−0.0150.0130.026−0.035−0.022
19. Provider_app_num_cat0.127*0.0690.0410.042−0.0240.085**−0.0130.042
20. Provider_num_rating−0.0030.079**0.267***−0.0120.0230.179***−0.0450.044
21. Provider_rating_avg0.0280.126***−0.058*0.046*0.0100.0160.050−0.009

Significant levels:

***

p < 0.001,

**

p < 0.01, and

*

p < 0.05. The coefficients have been standardized.

Unlike our previous assumption (H3), App provider—especially variable “Provider_app_num” (the number of Apps from the target App provider)—is not very important for ACLC. This means that App provider experience is not critical for ACLC.

For free and paid Apps, while most variables showed similar patterns within the same App categories, we found some factors that were not the same. For example, in the Books category, “Rating_num” (the number of ratings) is not important (0.063) for paid Apps, but it is moderately important (0.162) for free Apps. Similarly, “Provider_num_ rating” shows different coefficients for paid and free Apps for both Books and Game categories.

5.3 PLS Regression with Latent Variables

For PLS regression, we used the latent independent variables presented in Table 6, with the App life cycle, ALC or ACLC, characterized by three explicit variables: length, peak, and sum. To assess reliability and validity, we calculated Cronbach’s alpha and average variance extracted values for each latent variable. The results are shown in Table 11.

Table 11

Reliability and Validity for Latent Variable and ACLC

Latent VariableAllBooksGameSports
PaidFreePaidFreePaidFreePaidFree
Life cycleAlpha0.5140.5220.4360.5280.4030.4610.4660.318
AVE0.4230.4590.3980.4030.4110.4300.3880.404
PriceAlpha0.676-0.672-0.715-0.778-
AVE0.225-0.269-0.334-0.226-
ProviderAlpha0.5430.5600.5400.4890.5540.5890.2650.429
AVE0.4040.3750.4120.3830.3840.4180.2980.345
RatingAlpha0.2300.2370.8150.7970.5370.5870.8070.766
AVE0.3190.2490.6000.5820.3760.4250.5920.548
RequirementAlpha0.6210.6340.3950.4440.5580.4890.4870.385
AVE0.4280.3840.3130.3330.3500.2970.4030.348
VersionAlpha.05500.5560.8110.8000.7630.7290.7570.744
AVE0.5250.4740.7270.7190.6710.6350.6600.656

Significant levels: p < 0.05 for bold values.

Table 12 presents coefficients for paths between each independent latent variable and the dependent latent variable (life cycle) for the sampled four categories. Correlations among latent variables, as well as factor loadings, can be found in the online appendix.

Table 12

Coefficients among Latent Variable and ACLC

Latent VariableAllBooksGameSports
paidFreePaidFreePaidFreePaidFree
Price0.213***-0.329***-0.332***-0.233***-
Provider−0.090***−0.0470.107**0.128***0.0260.179***0.061*0.031
Rating0.178**0.148***0.208***0.315***0.292***0.245***0.276***0.288***
Requirement0.0140.0670.069*0.133***0.0360.144***0.121**0.192***
Version0.175***0.211***0.234***0.288***0.285***0.434***0.275***0.446***
R20.1740.1230.4440.3720.4660.4890.4070.519

Significant levels:

***

p < 0.001,

**

p < 0.01, and

*

p < 0.05. ACLC, App categorical life cycle; PLS, partial least squares.

Based on the results of R2, it is clear that the latent variables cannot characterize App ALC (in the All category), since the R2 values are only 0.174 (for paid) and 0.123 (for free), which, again, verified our hypotheses that Apps across different categories are not comparable. In the Game and Sports categories, free App ACLC can be better characterized by the latent variables as compared with paid Apps. In contrast, in the Books category, paid App performance surpasses that of free Apps. It is clear that different categories are not comparable, which is the key reason why ALC cannot be estimated by using either linear or PLS regression.

For latent variable coefficients comparison, App price (for paid), rating, and version are the optimized latent variables that best describe App ALC or ACLC. We can interpret this finding in this way: if an App has a reasonable price, good user rating, and is regularly updated, then this App is highly likely to achieve a better life cycle. We found that the system requirement (i.e., whether the target App needed a specific hardware or operating system) was not important for the App life cycle. Somewhat surprisingly, results showed that App provider reputation (i.e., if the App provider had already released a number of successful Apps) was not important when compared with other latent variables. This observation was verified by the results of the linear and binary regressions.

6 Conclusion and Future Work

In this study, we used the Apple iTunes App Store’s daily ranking data to characterize the App life cycle. More specifically, we conceptualized two different kinds of life cycles for each popular App: ALC (based on data from the overall top 200 rankings) and ACLC (based on data from category-specific top 200 rankings). Using daily ranking data, we can depict the first three stages of the ALC or ACLC. As Figure 1 shows, the time period from catalog birth to commercial birth is the introduction stage. The growth stage is defined as the interval between commercial birth and peak. The maturity stage is the interval from peak to commercial death.

By using data automatically collected by crawler, we calculated the ALC and ACLC for a large number of Apps. Results show that different App categories have different life cycles. For instance, the Entertainment and Education categories tend to have a short ACLC (i.e., user interest in such Apps tends not to be long-lasting), whereas Travel and Navigation categories usually have a much longer ACLC. Another interesting finding is that most categories’ ACLC lengths have relatively large standard deviations, meaning that different Apps in the same category often have very different ACLC lengths.

After regression analysis, we found, contrary to our initial hypothesis, that App provider reputation (H3) and App requirements (H5) are not closely related to the App life cycle. In contrast, App rating (H1), user price (H2), and App revision (H4) information are more reliable indicators of App life cycle. However, our hypotheses are not necessarily correct for all the App categories. For example, the linear regression result shows that high rating of Game App does not necessarily lead to a more successful ACLC, and expensiveness of Game App is not necessarily a bad thing.

The contributions of this paper are fivefold. First, we examined the plausibility of conceptualizing modern digital products, specifically smartphone and tablet Apps, by using traditional PLC theory. Second, we proposed a new open-access life cycle indicator—App download rankings—in place of the sales volume indicator used by classical PLC theory. Third, we compared Apps cross-categorically from a PLC viewpoint to identify salient category differences. Fourth, using regression modeling, we identified correlations between App PLC and other important independent or latent variables. Finally, this study provides a useful framework for further PLC research on digital products. We expect that our methods of collecting and analyzing large datasets of digital product data can be generalized to a wider array of digital products, such as digitalized music, games, and movies.

In the future, we will extend our investigation of the App life cycle in two different ways. First, we will collect Google Android App daily ranking data in the same way as we did for this study and will compare life cycle results between Android and Apple Apps. Second, we will use machine learning algorithms, e.g., Classification and Regression Trees, to “predict” the App life cycle. For instance, we will use the first few days of an App’s performance and related features (price, user rankings, and comments) to predict the App life cycle.

References

  • Anderson, C. R., & Zeithaml, C. P. (1984). Stage of the product life cycle, business strategy, and business performance. Academy of Management Journal, 27(1), 5–24. doi:

    • Crossref
    • Export Citation
  • Bass, F. M. (1969). A new product growth for model consumer durables.. Management Science, 15(5), 215–227.

    • Crossref
    • Export Citation
  • Bayus, B. L. (1994). Are product life cycles really getting shorter? Journal of Product Innovation Management, 11(4), 300–308.

    • Crossref
    • Export Citation
  • Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100(5), 992–1026.

    • Crossref
    • Export Citation
  • Carare, O. (2012). THE IMPACT OF BESTSELLER RANK ON DEMAND: EVIDENCE FROM THE APP MARKET*. International Economic Review, 53(3), 717–742.

    • Crossref
    • Export Citation
  • Cox, W. E., Jr. (1967). Product life cycles as marketing models. The Journal of Business, 40(4), 375–384.

    • Crossref
    • Export Citation
  • Day, G. S. (1981). The product life cycle: Analysis and applications issues. Journal of Marketing, 45(4), 60–67.

    • Crossref
    • Export Citation
  • Dhalla, N. K., & Yuspeh, S. (1976). Forget the product life cycle concept! Harvard Business Review, 01, 102–112.

  • Duan, W., Gu, B., & Whinston, A. B. (2009). Informational cascades and software adoption on the internet: An empirical investigation. Management Information Systems Quarterly, 33(1), 23–48.

    • Crossref
    • Export Citation
  • Golder, P. N., & Tellis, G. J. (2004). Growing, growing, gone: Cascades, diffusion, and turning points in the product life cycle. Marketing Science, 23(2), 207–218.

    • Crossref
    • Export Citation
  • Kurawarwala, A. A., & Matsuo, H. (1998). Product growth models for medium-term forecasting of short life cycle products. Technological Forecasting and Social Change, 57(3), 169–196. doi:

    • Crossref
    • Export Citation
  • Levitt, T. (1965). Exploit the product life cycle. Harvard Business Review, (November): 81–84.

  • Li, H., Bhowmick, S. S., & Sun, A. (2011). AffRank: Affinity-driven ranking of products in online social rating networks. Journal of the American Society for Information Science and Technology, 62(7), 1345–1359. doi:

    • Crossref
    • Export Citation
  • MacMillan, D. B., Peter; Ante, Spencer E. (2009). Inside the App Economy. Retrieved from http://www.businessweek.com/magazine/content/09_44/b4153044881892.htm

  • Mercer, D. (1993). A two-decade test of product life cycle theory. British Journal of Management, 4(4), 269–274. doi:

    • Crossref
    • Export Citation
  • MillennialMedia. (2010). State of the Apps Industry.

  • Norton, J. A., & Bass, F. M. (1987). A diffusion theory model of adoption and substitution for successive generations of high-technology products. Management Science, 33(9), 1069–1086.

    • Crossref
    • Export Citation
  • Peres, R., Muller, E., & Mahajan, V. (2010). Innovation diffusion and new product growth models: A critical review and research directions. International Journal of Research in Marketing, 27(2), 91–106. doi:

    • Crossref
    • Export Citation
  • Polli, R., & Cook, V. (1969). Validity of the product life cycle. The Journal of Business, 42(4), 385–400.

    • Crossref
    • Export Citation
  • Ringle, C. M., Wende, S., & Will, A. (2005). SmartPLS (Version 2.0 (beta)). Retrieved from http://www.smartpls.de

  • Rink, D. R., & Swan, J. E. (1979). Product life cycle research: A literature review. Journal of Business Research, 7(3), 219–242. doi:

    • Crossref
    • Export Citation
  • Rogers, E. (1962). Diffusion of Innovations. New York: The Free Press.

  • Rosner, B. (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25(2), 165–172. doi:

    • Crossref
    • Export Citation
  • Simon, H. (1979). Dynamics of price elasticity and brand life cycles: An empirical study. Journal of Marketing Research, 16(4), 439–452.

    • Crossref
    • Export Citation
  • TechNet. (2012, February 7). New TechNet sponsored study: Nearly 500,000 “app economy” jobs in United States..

  • Tellis, G. J., & Crawford, C. M. (1981). An evolutionary approach to product growth theory. Journal of Marketing, 45(4), 125–132.

    • Crossref
    • Export Citation
  • Wikipedia. Product Life Cycle Retrieved from http://en.wikipedia.org/wiki/Product_lifecycle

Appendix

Online at http://discern.uits.iu.edu:8820/Apple/Apple_Appendix.htm

Footnotes

2

Apple only releases the top 200 daily ranking on the iTunes web site.

5

Apps logos are processed for copyright reason.

6

App logo and App screenshots are processed for copyright reason.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Anderson, C. R., & Zeithaml, C. P. (1984). Stage of the product life cycle, business strategy, and business performance. Academy of Management Journal, 27(1), 5–24. doi:

    • Crossref
    • Export Citation
  • Bass, F. M. (1969). A new product growth for model consumer durables.. Management Science, 15(5), 215–227.

    • Crossref
    • Export Citation
  • Bayus, B. L. (1994). Are product life cycles really getting shorter? Journal of Product Innovation Management, 11(4), 300–308.

    • Crossref
    • Export Citation
  • Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100(5), 992–1026.

    • Crossref
    • Export Citation
  • Carare, O. (2012). THE IMPACT OF BESTSELLER RANK ON DEMAND: EVIDENCE FROM THE APP MARKET*. International Economic Review, 53(3), 717–742.

    • Crossref
    • Export Citation
  • Cox, W. E., Jr. (1967). Product life cycles as marketing models. The Journal of Business, 40(4), 375–384.

    • Crossref
    • Export Citation
  • Day, G. S. (1981). The product life cycle: Analysis and applications issues. Journal of Marketing, 45(4), 60–67.

    • Crossref
    • Export Citation
  • Dhalla, N. K., & Yuspeh, S. (1976). Forget the product life cycle concept! Harvard Business Review, 01, 102–112.

  • Duan, W., Gu, B., & Whinston, A. B. (2009). Informational cascades and software adoption on the internet: An empirical investigation. Management Information Systems Quarterly, 33(1), 23–48.

    • Crossref
    • Export Citation
  • Golder, P. N., & Tellis, G. J. (2004). Growing, growing, gone: Cascades, diffusion, and turning points in the product life cycle. Marketing Science, 23(2), 207–218.

    • Crossref
    • Export Citation
  • Kurawarwala, A. A., & Matsuo, H. (1998). Product growth models for medium-term forecasting of short life cycle products. Technological Forecasting and Social Change, 57(3), 169–196. doi:

    • Crossref
    • Export Citation
  • Levitt, T. (1965). Exploit the product life cycle. Harvard Business Review, (November): 81–84.

  • Li, H., Bhowmick, S. S., & Sun, A. (2011). AffRank: Affinity-driven ranking of products in online social rating networks. Journal of the American Society for Information Science and Technology, 62(7), 1345–1359. doi:

    • Crossref
    • Export Citation
  • MacMillan, D. B., Peter; Ante, Spencer E. (2009). Inside the App Economy. Retrieved from http://www.businessweek.com/magazine/content/09_44/b4153044881892.htm

  • Mercer, D. (1993). A two-decade test of product life cycle theory. British Journal of Management, 4(4), 269–274. doi:

    • Crossref
    • Export Citation
  • MillennialMedia. (2010). State of the Apps Industry.

  • Norton, J. A., & Bass, F. M. (1987). A diffusion theory model of adoption and substitution for successive generations of high-technology products. Management Science, 33(9), 1069–1086.

    • Crossref
    • Export Citation
  • Peres, R., Muller, E., & Mahajan, V. (2010). Innovation diffusion and new product growth models: A critical review and research directions. International Journal of Research in Marketing, 27(2), 91–106. doi:

    • Crossref
    • Export Citation
  • Polli, R., & Cook, V. (1969). Validity of the product life cycle. The Journal of Business, 42(4), 385–400.

    • Crossref
    • Export Citation
  • Ringle, C. M., Wende, S., & Will, A. (2005). SmartPLS (Version 2.0 (beta)). Retrieved from http://www.smartpls.de

  • Rink, D. R., & Swan, J. E. (1979). Product life cycle research: A literature review. Journal of Business Research, 7(3), 219–242. doi:

    • Crossref
    • Export Citation
  • Rogers, E. (1962). Diffusion of Innovations. New York: The Free Press.

  • Rosner, B. (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25(2), 165–172. doi:

    • Crossref
    • Export Citation
  • Simon, H. (1979). Dynamics of price elasticity and brand life cycles: An empirical study. Journal of Marketing Research, 16(4), 439–452.

    • Crossref
    • Export Citation
  • TechNet. (2012, February 7). New TechNet sponsored study: Nearly 500,000 “app economy” jobs in United States..

  • Tellis, G. J., & Crawford, C. M. (1981). An evolutionary approach to product growth theory. Journal of Marketing, 45(4), 125–132.

    • Crossref
    • Export Citation
  • Wikipedia. Product Life Cycle Retrieved from http://en.wikipedia.org/wiki/Product_lifecycle

OPEN ACCESS

Journal + Issues

Search

  • View in gallery

    App life cycle definition and five key points

  • View in gallery

    iTunes App daily rank information.5

  • View in gallery

    App Preview page on iTunes website6

  • View in gallery

    Distribution of Apps’ overall presence in the top 200 dataset by category.

    (Note: y-axis is plotted in log scale and the width of boxplot is proportional to the square root of category size.)

  • View in gallery

    Dependent variables of life cycle.

  • View in gallery

    PLS regression model construction. PLS, partial least squares.