## Abstract

The second-order h-type indicators are suggested to identify top units in scientometrics. Basically, the re-ranking of h-type series leads to the second-order h-type indicator. The second-order h-type indicators provide an interesting and natural method to identify top units, yielding fixed h-top. Differentiating from the series of artificially defined highly cited percentile classes, the h-top contributes a natural definite top in the series of highly cited classes. When studying theoretically, the second-order h-index concerns 3% of the h-top whereas the first-order h-index refers to 10% of the h-core. The ratio of the first- and second-order h-index, h_{T}/h, is 30%. When studying empirically, the ratio of the first- and second-order h-index, h_{T}/h, is <30%. The approach of calculating second-order h-type indicators is exemplified based on journals in two fields.

## 1 Introduction

Hirsch introduced the idea of the h-index in 2005, which has been compared with other bibliometric indicators (Bornmann et al., 2008, 2011) and its theoretical aspects have been discussed (Egghe and Rousseau, 2006; Glänzel, 2006; Schubert and Glänzel, 2007; Ye, 2009, 2011). Its applications have been expanded from single researchers to various other units (e.g., journals or countries) as well as networks (Korn et al., 2009; Schubert et al., 2009; Schubert and Soos, 2010; Zhao et al., 2011). One of the most important reasons for the success of the h-index is its ability to delimit the core part (in terms of citation impact) of a publication set in a simple way. Furthermore, it is one of the few indicators which combine output and impact in a single number. When the h-index is used within a single-subject category (or related subject categories) and with publications from a same time period, it might be an interesting complement to other bibliometric indicators. Since the introduction of the h-index in 2005, the results of many studies on the index have been published. Recently, various reviews of the literature have appeared (see, e.g., Alonso et al., 2009; Egghe, 2010; Norris & Oppenheim, 2010).

Shortly after the introduction of the h-index, the concepts of first-order h-index and second-order h-index were proposed by Prathap (2006): the first-order h-index h_{1}=h if the unit (e.g., an institution) has published h papers with at least h citations each, and the second-order h-index h_{2}=h if the unit (e.g., an institution) has h individuals each having an individual h-index of at least h. Furthermore, some h-series, such as successive h-indices (Schubert, 2007; Ruane & Tol, 2008), have been discussed. However, Prathap’s “second-order” h-index is not really “of order two,” and the h-series are h-indices at different objects. The further consideration of the h-index of h-series has not yet been performed because real ‘second-order h-indexes’ have never been used. Thus, the real second-order h-index remains an unanswered question, and our study focuses on the question of how a second-order h-index can be defined and identified for the same unit.

Recently, the topic of research excellence has received increasing attention in scientometrics, and many different methods have been proposed for identifying excellent papers (Bornmann, 2013, 2014). The concept of “core documents” (Glänzel, 2012) was introduced, mostly with a focus on “highly cited papers,” “most frequently cited papers,” or “top cited papers,” by the methodology of similarity. According to the review of Bornmann (2014), some different methods have been used in scientometric studies to identify excellent papers. However, most methods are applied in arbitrary ways by artificially setting proportions, such as setting excellence at top 1%, top 5%, or top 10%. In this study, we propose a simple and natural method for identifying top units at fixed proportions, based on the h-index concept.

## 2 Methodology

Suppose S denotes sources (e.g., publications, P) and T denotes items (e.g., citations, C) in a source-item model (Egghe, 2005) as well as R denotes the order number of sources ranked by items and T_{R} denotes the number of items of source R. Then, there exists the number series:

The h-index is defined as

If there is an h-index series *{ h_{r}(r =,1,2,…r..)} and* we re-rank the h-index series from high to low value as

then we obtain the second-order h-index of this h-index series as

The procedure is illustrated in Figure 1, in which each P_{i}–C_{i} (i=1, 2, … r…s) plane contributes a single h-index. Linking all h-indices in all P_{i}–C_{i} planes and projecting them onto the h_{r}–r plane, a distributed curve of h-indices emerges (let us call it the h-curve). The second-order h-index is the h-index of the h-curve.

In Figure 1, the h-indices {h_{1}, h_{2}, h_{3}…} are located on different planes {C_{1}O_{1}P_{1}, C_{2}O_{2}P_{2}, C_{3}O_{3}P_{3},…}. The projective curve of all h-indices defines the h-curve {linking points h_{1}, h_{2}, h_{3}…h_{T}…}. The h-index h_{T} of the h-curve (crossing point of the h-curve and line OA) is the second-order h-index, which provides a simple method to identify top units. We call the core of the *second*-*order h*-*index as* h-top.

*The second*-*order h*-*index in a ranked h*-*index series {h _{r} (r =*,

*1*,

*2*,…

*r*…

*)} is equal to h*.

_{T}*if h*-

_{T}is the largest natural number such that the h*index reaches h*-

_{T}with the corresponding h*index equaling at least h*.

_{T}*The second*-

*order h*-

*core is defined as h*-

*top including units with h*≤

*h*.

_{T}When the first-order h-index is h and the second-order h-index is h_{T}; in any system, h_{T}/h can be defined as the radio of the first-order to the second-order h-index.

A second-order variant can not only be defined for the h-index itself but also for variants of the h-index. For example, one of the most important variants is the g-index which is defined as the largest number n of highly cited publications, for which the mean number of citations is at least n (Egghe, 2006).

If there is a cumulative series corresponding to (1) and (2)

then the g-index is defined as

The second-order g-index can be defined on a cumulative series following (3) and (4) as

leading to

In a series of percentile classes focusing on highly cited papers, one can set {top 1%, top 2%, top 3%,…} series constructing an artificially processed series. As h-top is a naturally fixed number, it becomes a natural definite top in the series of highly cited classes. As the processed objects keep concordance, the second-order h-type indices and h-top are unique.

We will illustrate the proposed method with the following empirical cases.

## 3 Empirical cases

For the empirical examples, we extracted publications from the Web of Science (WoS), including Science Citation Index – Expanded (SCI-E), Social Science Citation Index (SSCI), and Arts & Humanities Citation Index (A&HCI). We downloaded papers with the document type Article, Letter, and Review published between 2001 and 2011 (in May 12, 2016). Two fields were selected as examples: one field is Mathematics (Math), covering 425 journals, and the other field is Library and Information Science (LIS), covering 103 journals. The following results are based on publications and their citations. The journal h-indices are used as first-order h-indices, and the second-order h-cores of these journals are listed in the Appendix.

### 3.1 (1) h-top journals in Mathematics

Ranking the h-indices of math journals, we obtained h_{T}=34 in the h-series. The top 34 journals are shown in Figure 2.

h-top journals in Mathematics.

Citation: Data and Information Management 2, 1; 10.1515/dim-2017-0011

Figure 2 and the data in the Appendix show the h-top journals in Mathematics, for example, the *Journal of Mathematical Analysis and Application*, the *Annals of Mathematics*, *Communication on Pure and Applied Mathematics*, the *Journal of the American Mathematical Society*, etc.

With 34/425=0.08, the h-top of the math journals refers to the top 8% journals in the field of Mathematics.

### 3.2 (2) h-top journals in LIS

Ranking the h-indices of LIS journals, we obtained h_{T}=28. The top 28 journals are shown in Figure 3.

h-top journals in library and information science.

Citation: Data and Information Management 2, 1; 10.1515/dim-2017-0011

With 28/103=0.27, the h-top of the LIS journals is 27%. Among these journals, there are *MIS Quarterly*, *Scientometrics*, the *Journal of Informetrics*, the *Journal of Information Science*, *and JASIST*, etc.

As both examples show, h-tops refer to different top percentages, but each h-top is field-specifically fixed. This is a natural way to generate h-top.

Here, we see that a larger h-set produces a smaller h-top in proportion (8% in 425 Math journals), while a smaller h-set produces a larger h-top in proportion (27% in 103 LIS journals).

## 4 Analysis and Discussion

In this section, we discuss both static and dynamic cases.

### 4.1 (1) Static case

Let h(r) be the second-order h-curve in the continuous case, its h-core (as h-top) is

which is equal to the integral area of the h-curve in the h-top.

If all units of the h-series are equal to x (number of units), then the second-order h-tail will be

Historically, we have three theoretical models for estimating the h-index.

In Hirsch’s original paper (Hirsch, 2005), the mathematical model for the h-index is given as follows:

where C is the total number of citations and a is a constant ranging between 3 and 5.

Egghe and Rousseau derived the Egghe–Rousseau model (Egghe and Rousseau, 2006) in the framework of the Lotkaian informetrics, which can be re-written as

where P is the total number of publications and α>1 is the Lotka’s exponent.

Glänzel and Schubert proposed the Glänzel–Schubert model (Glänzel, 2006; Schubert and Glänzel, 2007) with the formula:

in which C/P isassociated with the Journal Impact Factor (JIF) and c is a constant near 1.

Under Heaps’ law of Herdan’s law (Egghe, 2007), the three models can be unified (Ye, 2011), whereby the h-index is linked to total items (such as citations, C) and total sources (such as publications, P) following the formula:

where α>1 is Lotka’s exponent and c>0 is a constant.

With H^{2} items and sources X in a second-order h-curve, it results in

where α>1 is Lotkaian exponent and c>0 is a constant.

In the framework of the Loktaian informetrics, Eq. (21) can be simplified by using the Egghe–Rousseau formula, i.e.,

By using the Egghe–Rousseau formula with α=2 and P=100, we estimate h=10 according to Eq. (14). When α=2 and X=10, we estimate h_{T} ≈3.3 according to Eq. (18). This means that the first-order h-core refers to 10% and the second-order h-top to about 3% of the sources. The ratio of the first-order h-core to the second-order h-top is 3/10=30%. Since the Egghe–Rousseau formula is highly simplified and is used only as a reference in this study, the estimated values can be referenced only.

Let us record h_H, h_E-R, and h_G-S as the Hirsch estimate, the Egghe–Rousseau estimate, and the Glänzel-Schubert estimate of the h-index. Suppose α=2, a=5, and c=1, we obtain the following estimates as theoretical reference values of the h-index (Ye, 2011):

Using our empirical cases, we computed the theoretical estimations based on the original data (*P* and *C*, c.f. Appendix). The results are shown in Figures 4 and 5.

Three estimations upon h-index of Math journals.

Citation: Data and Information Management 2, 1; 10.1515/dim-2017-0011

Three estimations upon h-index of LIS journals.

Citation: Data and Information Management 2, 1; 10.1515/dim-2017-0011

Visually, the Glänzel–Schubert estimation and the Hirsch estimation look better than the Egghe–Rousseau estimation. The Egghe–Rousseau formula is strictly limited by á=2 in the fitting. This situation has been discussed by Ye (2011) and can be quantitatively measured by Pearson correlation coefficients. Table 1 shows that the Glänzel–Schubert estimation and the Hirsch estimation correlate higher with the real h than the Egghe–Rousseau estimation.

Pearson correlation coefficients with p-values.

Correlations | Library & Information Science | ||||
---|---|---|---|---|---|

h | h_{H} | h_{E-R} | h_{G-S} | ||

Mathematics | h | 1 | .937(.000) | .543(.003) | .950(.000) |

h_{H} | .832(.000) | 1 | .697(.000) | .900(.000) | |

h_{E-R} | .472(.005) | .872(.000) | 1 | .324(.093)^{*} | |

h_{G-S} | .954(.000) | .739(.000) | .334(.054)^{*} | 1 |

The analytical results in the table reveal that both the Glänzel–Schubert estimation and the Hirsch estimation can be applied as a theoretical reference for computing the h-index.

However, in the second-order case, only sources X show clear numbers, so that it is convenient to apply the Egghe–Rousseau estimation. The comparable results are shown in Table 2, where α=2.

Egghe–Rousseau estimation of h-top in two cases.

Mathematics | Library & Information Science | ||||
---|---|---|---|---|---|

h_{T} | X | h_E-R | hT | X | h_E-R |

34 | 425 | 20.62 | 28 | 103 | 10.15 |

We see that the Egghe–Rousseau estimates are not correct in two cases; both are smaller than the practical values. An important reason for the result is the choice of α=2. Generally, 1<α<3⍰X ≥ 1, and h_{T} ≥ 1 link with α as

or

These are static results.

### 4.2 (2) Dynamic case

Following Egghe (2007), the dynamic h-index is

where *t* is the time period, *a* is the aging rate, and α>1 is the Lotkaian exponent (in the Lotkaian informetrics).

Since the calculations are made in a single field, the dynamic second-order h-index is

When X is a constant and α>1 is stable in the field, changes over time are

For all *t*≥ 0, h_{T}’(*t*) > 0 and h_{T}’’(*t*) < 0, there is

which means that h_{T}(t)is a concavely increasing function for a fixed X, α, and a. This is a dynamic reference system. Taken all together, the analyses reveal that the second-order h-index and h-top are relatively unique, simple, and robust, such as the first-order h-index and h-core. With h-top, a core can be efficiently extracted from large datasets. Thus, h-top might be especially useful in the analysis of big-data. However, the second-order h-index and h-top take – as simple concepts – only a few information into account, and both can only provide ‘core’ information.

## 5 Conclusions

The second-order h-index h_{T} can be differentiated from the h-index of highly cited papers by finding a fixed value in the series of highly cited percentile classes. According to a rough estimation on the basis of the Egghe–Rousseau formula, h_{T} approximately indicates the top 3% if h denotes the 10% core. This means that the second-order h-index assigns 30% of the first-order h-core to h-top.

We studied the journals from two fields empirically. The results show a percentage of 8% for h-top in Mathematics and 27% in LIS. These values are smaller than the theoretically expected values. The exploration of reasons for the differences between expectations and empirical results is a question for future research.

Differentiating from the series of highly cited percentile classes, which are artificially defined, the h-top is defined as the natural definite top in the series of highly cited classes. The second-order h-index and h-top have unique fixed values, which is beneficial to other methods based on arbitrarily set proportions.

Although both the second-order h-index h_{T} and the h-index of highly cited papers can be used as top indicators, they reflect different concepts, whereas the second-order h-index h_{T} measures h-top, the h-index of highly cited papers represents the h-core in highly cited papers. Both top metrics can be applied to any informetric unit.

Note that the use and comparison of the second-order h-type indicators are only applicable in one and the same field. If one wants to compare citation impact across different fields, field-normalized indicators have to be used.

We acknowledge the National Natural Science Foundation of China Grant No. 71673131 and Jiangsu Key Laboratory Fund for financial supports and thank Mr. Eric P. Qi for the data collection.

## References

Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E., & Herrera, F. (2009). h-Index: a review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3(4), 273-289.

Bornmann, L. (2013). How to analyse percentile citation impact data meaningfully in bibliometrics: the statistical analysis of distributions, percentile rank classes and top-cited papers. Journal of the American Society for Information Science and Technology, 64, 587–595.

Bornmann, L. (2014). How are excellent (highly cited) papers defined in bibliometrics? A quantitative analysis of the literature. Research Evaluation, 23, 166–173.

Bornmann, L., Mutz, R.& Daniel, H.-D.(2008). Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine. Journal of the American Society for Information Science and Technology, 59(5), 830-837.

Bornmann, L., Mutz, R., Hug, S. E. & Daniel, H.-D. (2011). A multi level meta-analysis of studies reporting correlations between the h index and 37 different h index variants. Journal of Informetrics, 5(3), 346-359.

Egghe, L. (2005), Power laws in the information production process: Lotkaian informetrics. Elsevier, Oxford.

Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69(1), 131-152

Egghe, L. (2007), Dynamic h-index: the Hirsch index in function of time. Journal of the American Society for Information Science and Technology, 58(3), 452-454

Egghe, L. (2008). Examples of simple transformations of the h-index: Qualitative and quantitative conclusions and consequences for other indices. Journal of Informetrics, 2: 136-148.

Egghe, L. (2010). The Hirsch index and related impact measures. Annual Review of Information Science and Technology, 44, 65-114.

Egghe, L. & Rousseau, R. (2006), An informetric model for the Hirsch-index. Scientometrics, 69(1), 121-129.

Egghe, L. & Rousseau, R. (2012). Theory and practice of the shifted Lotka function. Scientometrics, 91(1), 295-301.

Glänzel, W. (2006), On the h-index – A mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67(2), 315-321.

Glänzel, W. (2012). The role of core documents in bibliometric network analysis and their relation with h-type indices. Scientometrics, 93(1), 113-123.

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 6569-16572.

Jin, B.H., Liang, L.M., Egghe, L. & Rousseau, R. (2007). The R- and AR-indices: Complementing the h-index. Chinese Science Bulletin, 52(6), 855-863.

Norris, M., & Oppenheim, C. (2010). The h-index: a broad review of a new bibliometric indicator. Journal of Documentation, 66(5), 681-705.

Prathap, G. (2006). Hirsch-type indices for ranking institutions’ scientific research output. Current Science, 91(11), 1439.

Ruane, F., & Tol, R. (2008). Rational (successive) h -indices: an application to economics in the Republic of Ireland. Scientometrics, 75(2), 395-405.

Schubert, A. (2007). Successive h-indices. Scientometrics, 70 (1), 201–205.

Schubert, A. and Glänzel, W. (2007), A systematic analysis of Hirsch-type indices for journals. Journal of Informetrics, 1(2), 179-184.

Schubert, A., Korn, A., & Telcs, A. (2009). Hirsch-type indices for characterizing networks. Scientometrics, 78(2), 375–382.

Ye, F.Y. (2009). An investigation on mathematical models of the h-index. Scientometrics, 2009, 81(2), 493-498.

Ye, F. Y. (2011). A unification of three models for the h-index. Journal of the American Society for Information Science and Technology, 62(1), 205–207.

Ye, F. Y.& Rousseau, R. (2008), The power law model and total career h-index sequences. Journal of Informetrics, 2, 288-297.

Ye, F. Y. & R. Rousseau (2010). Probing the h-core: an investigation of the tail-core ratio for rank distributions. Scientometrics, 84(2), 431-439.

Zhao, S.X., Rousseau, R., & Ye, F.Y. (2011). h-Degree as a basic measure in weighted networks. Journal of Informetrics, 5(4), 668-677.

A1. The top journal set in the field of mathematics

Rank | Journals | Citations | Publications | C/P | h-index |
---|---|---|---|---|---|

1 | Journal of Mathematical Analysis and Applications | 91226 | 8697 | 10.4894 | 86 |

2 | Nonlinear Analysis-Theory Methods & Applications | 64344 | 6608 | 9.73729 | 75 |

3 | Journal of Differential Equations | 39490 | 2678 | 14.7461 | 66 |

4 | Annals of Mathematics | 19564 | 662 | 29.5529 | 65 |

5 | Inventiones Mathematicae | 18824 | 722 | 26.072 | 62 |

6 | Communications on Pure and Applied Mathematics | 19001 | 516 | 36.8236 | 58 |

7 | Journal of the American Mathematical Society | 12425 | 349 | 35.6017 | 57 |

8 | Journal of Functional Analysis | 27975 | 2444 | 11.4464 | 54 |

9 | Advances in Mathematics | 24754 | 1914 | 12.9331 | 53 |

10 | Linear Algebra and Its Applications | 34208 | 4179 | 8.18569 | 53 |

11 | Transactions of the American Mathematical Society | 25522 | 2650 | 9.63094 | 52 |

12 | Duke Mathematical Journal | 14700 | 854 | 17.2131 | 47 |

13 | Communications in Partial Differential Equations | 11702 | 842 | 13.8979 | 46 |

14 | Proceedings of the American Mathematical Society | 28648 | 5153 | 5.55948 | 43 |

15 | Journal of Algebra | 29899 | 4917 | 6.08074 | 41 |

16 | Acta Mathematica | 4584 | 141 | 32.5106 | 40 |

17 | Discrete and Continuous Dynamical Systems | 13904 | 1803 | 7.71159 | 40 |

18 | discrete mathematics | 24798 | 4823 | 5.14161 | 40 |

19 | Indiana University Mathematics Journal | 9145 | 874 | 10.4634 | 40 |

20 | Journal fur die reine und Angewandte Mathematik | 11845 | 1060 | 11.1745 | 40 |

21 | Journal of Differential Geometry | 7820 | 474 | 16.4979 | 40 |

22 | Journal de Mathematiques pures et Appliquees | 7641 | 558 | 13.6935 | 39 |

23 | Mathematische Annalen | 12994 | 1228 | 10.5814 | 39 |

24 | Calculus of Variations and Partial Differential Equations | 8934 | 702 | 12.7265 | 37 |

25 | Comptes Rendus Mathematique | 16354 | 3194 | 5.12023 | 37 |

26 | Mathematische Zeitschrift | 11463 | 1536 | 7.46289 | 37 |

27 | Bulletin of the American Mathematical Society | 5288 | 150 | 35.2533 | 36 |

28 | International Mathematics Research Notices | 10850 | 1477 | 7.34597 | 36 |

29 | Journal of Pure and Applied Algebra | 12477 | 2093 | 5.9613 | 36 |

30 | Geometric and Functional Analysis | 7415 | 517 | 14.3424 | 36 |

31 | Annales Scientifiques de l ecole Normale Superieure | 4489 | 288 | 15.5868 | 35 |

32 | Journal of the London Mathematical Society-Second Series | 9468 | 1037 | 9.13018 | 35 |

33 | Proceedings of the Royal Society of Edinburgh Section A-Mathematics | 6214 | 757 | 8.20872 | 35 |

34 | Discrete & Computational Geometry | 7329 | 888 | 8.25338 | 34 |

A2. The top journal set in the field of library and information science

Rank | Journals | Citations | Publications | C/P | h-index |
---|---|---|---|---|---|

1 | Mis Quarterly | 41918 | 344 | 121.85 | 99 |

2 | Journal of the American Medical Informatics Association | 34483 | 1183 | 29.149 | 87 |

3 | information & Management | 27340 | 640 | 42.719 | 81 |

4 | Journal of the American Society for Information Science and Technology | 31891 | 1690 | 18.87 | 74 |

5 | Information Systems Research | 20970 | 306 | 68.529 | 72 |

6 | Journal of Management Information Systems | 18120 | 423 | 42.837 | 63 |

7 | Scientometrics | 27938 | 1509 | 18.514 | 62 |

8 | international Journal of Geographical Information Science | 13388 | 647 | 20.692 | 54 |

9 | Journal of Health Communication | 12696 | 609 | 20.847 | 49 |

10 | Information Processing & Management | 12246 | 767 | 15.966 | 48 |

11 | European Journal of Information Systems | 9182 | 415 | 22.125 | 47 |

12 | international journal of information management | 7630 | 451 | 16.918 | 41 |

13 | Information Systems Journal | 5633 | 206 | 27.345 | 40 |

14 | Government Information Quarterly | 6675 | 404 | 16.522 | 39 |

15 | Journal of Informetrics | 5501 | 236 | 23.309 | 39 |

16 | Information Society | 4846 | 287 | 16.885 | 37 |

17 | Journal of Strategic Information Systems | 5610 | 192 | 29.219 | 37 |

18 | Journal of Computer-Mediated Communication | 6679 | 334 | 19.997 | 36 |

19 | Telecommunications Policy | 6707 | 524 | 12.8 | 36 |

20 | Journal of Information Technology | 4325 | 264 | 16.383 | 34 |

21 | Journal of Documentation | 5585 | 401 | 13.928 | 33 |

22 | Journal of the Medical Library Association | 5280 | 526 | 10.038 | 33 |

23 | Social Science Computer Review | 5081 | 378 | 13.442 | 33 |

24 | Annual Review of Information Science And Technology | 3599 | 133 | 27.06 | 33 |

25 | Journal of Information Science | 6703 | 519 | 12.915 | 32 |

26 | International Journal of Computer-Supported Collaborative Learning | 2695 | 123 | 21.911 | 29 |

27 | Journal of the Association for Information Systems | 3680 | 185 | 19.892 | 29 |

28 | Library & Information Science Research | 3751 | 283 | 13.254 | 29 |

## Footnotes

^{*}

The coefficient is statistically not significant.

^{*}

The coefficient is statistically not significant.