Fault Diagnosis and Prognosis of Bearing Based on Hidden Markov Model with Multi-Features

A new approach to achieve fault diagnosis and prognosis of bearing based on hidden Markov model (HMM) with multifeatures is proposed. Firstly, the time domain, frequency domain, and wavelet packet decomposition are utilized to extract the condition features of bearing vibration signals, and the PCA method is merged into multi-features to reduce their dimensionality. Then the low-dimensional features are processed to obtain the scalar probabilities of each bearing condition, which are multiplied to generate the observed values of HMM. The results reveal that the established approach can well diagnose fault conditions and achieve the remaining life estimation of bearing.


Introduction
As is known to us bearing is one of the most important components used in modern engineering machinery. Once the bearing fails, it will lead to serious consequences such as equipment damage and great economic loss. Fault diagnosis and prognosis for bearing are very important, which can effectively prevent unexpected failures and assist engineering technicians to implement targeted equipment maintenance [1][2][3][4]. Fault diagnosis is used for identifying its symptom and fault conditions, and prognosis approach is generally employed to implement the remaining life prediction by existing information and knowledge. Before implementing fault diagnosis and prognosis approaches, it is key for us to effectively extract the fault features of bearing signals, which have direct effects on the diagnosis precision and prediction of bearing. Therefore, the selection for signal features of bearing can comprehensively and concretely reflect the information condition of bearing from different levels [5,6]. In general, the time-domain component and frequency-domain component can be viewed as important elements of bearing vibration signals [7,8]. In addition, some low and high frequencies are closely related to the operating states of bearing, which can be obtained by wavelet packet decomposition [9,10]. These components can reflect the detail features of signals from different levels [11]. In this study, the time-domain and frequency-domain components, as well as different frequency scales of vibration signals of bearing are used for the extraction of bearing vibration signals.
After the signal features are extracted, an effective prognostic and prognostic approach needs to be determined. There are many fault diagnosis and prognosis approaches in literature which can be roughly categorized into three categories: experience-based, model-based, and data-driven methods [12]. The experience-based approach only depends on statistical reliability and past experience information, resulting in low accuracy [13]. The model-based approach makes use of complicated physical model and damage propagation model to analyze the degradation process of machines. Though the precision is high, the cost is very high [14]. Data-driven method actually belongs to artificial intelligence approaches, in which the routine condition data of the equipment is used to train a good model. Data-driven methods are viewed as an effective prognostics approach [15]. With the popularity of data-driven approaches, they are increasingly applied to different engineering areas, including artificial neural network (ANN) [16], relevance vector machine (RVM) [17], Bayesian network [18], hidden Markov model (HMM) [19], and support vector regression (SVR) [20]. HMM represents the individual component states of a dynamic system in a natural way. This fact makes this method useful in fault detection and mechanical system monitoring. With its emergence, it has been widely applied as a data-driven modeling approach to different fields such as pattern recognition [21], image identification [22], speech recognition [23] and so on. The intent of this work needs to not only precisely describe the fault conditions of bearing, but also reflect the transformation relation among different fault conditions. Such a dual stochastic process is consistent with the process of HMM. Therefore, the HMM method can be employed to implement fault diagnosis and remaining life prediction of bearing.
In this study, a comprehensive fault diagnosis and prognosis technique based on HMM is proposed. In this approach, the time-domain and frequency-domain, three-layer wavelet decomposition components are extracted from the vibration signals of bearing. Meanwhile, The PCA method is used to fusion multi-features to reduce the dimensionality of the multi-features. Then the scalar probabilities of all the operating conditions of bearing are obtained by the scalar quantization and the observation values of HMM are obtained through the multiplication of these scalar probabilities. According to these observed values, the fault diagnosis and remaining life prediction by HMM can be implemented. The experimental results discover that the proposed scheme not only shows a good fault identification effect for different conditions of bearing, but only makes a remaining life estimation.

Time and Frequency Domains
For the vibration signals of bearing, the time-domain analysis and frequency-domain analysis are two extraction methods of different dimensions. The time-domain analysis represents the dynamic characteristics of vibration signals from the time level. The frequency-domain analysis represents the dynamic characteristics of signals from the frequency level. The time-domain representation, by contrast, is more visual, while the information contained in the signals can be easily observed by frequency-domain representation. Table 1 provides 16 different time-domain statistical characteristics [24], where x(n) is time-domain signal sequence and n is the number of samples. These time-domain characteristics include the amplitude mean, root mean square, root amplitude, mean root amplitude, degree of skewness, kurtosis, variance, maximum amplitude, minimum amplitude, peak-to-peak value, waveform factor, peak factor, pulse index, margin index, and kurtosis index. Equations ft 1 -ft 10 the number of spectral lines, and f k is the frequency value of the kth spectral line. These frequency-domain characteristics include the mean of spectrum, root mean square of spectrum, variance of spectrum, skewness index of spectrum, kurtosis index of spectrum, gravity center of spectrum, spectrum dispersion, mean square of spectrum, main frequency variation index, rate of change, skew of frequency, kurtosis of frequency-domain frequency, and ratio of square root, which reflects the time series distribution of time domain signals. Equation ff 1 reflects the magnitude of vibration energy in frequency domain, equations ff 2 -ff 4 , ff 6 -ff 7 and ff 11 -ff 14 reflect the dispersion or concentration degree of spectrum, and, equations ff 5 and ff 8 -ff 10 reflect the position change of main frequency.

Wavelet Packet Decomposition
Wavelet packet decomposition method is the further development of wavelet decomposition, which provides more abundant signal analysis. Wavelet packet decomposition is able to decompose each detail coefficient vector into two parts, producing a complete binary tree [25,26]. Wavelet packet is a linear combination of a series of wavelet functions φ i (t) which is expressed by where i is the frequency factor, j is the scale factor, and k is the translation factor. Any time domain signal can be decomposed by where x i j (t) is the ith frequency band signal at the jth layer wavelet decomposition, {h(k), g(k)} are the scale sequences representing the orthogonal low-pass and high-pass filters, by which the decomposition signals in different frequency band ranges are obtained by means of filtering.
When analyzing the vibration signals of bearing using wavelet packet decomposition, it is necessary to determine the type of wavelet package basis function and the number of wavelet decomposition layers. In this work, we adopt 3-layer one-dimensional wavelet packet decomposition based on db1 wavelet and shannon threshold. The energy of wavelet packet reconstruction coefficients on the third layer is extracted as the characteristic value of vibration signals. This decomposition can generate eight wavelet packet reconstruction coefficients. Therefore, combined with 16 time-domain characteristics and 14 frequency-domain characteristics, in total 38 characteristics of vibration signals of bearing in this experiment are used for analysis.
The PCA is used to reduce the dimensionality and extract useful features from 38 statistical characteristics, the obtained low-dimensional features are viewed as the input of the HMM. The procedure of reducing dimensionality is given by (1) The time-domain and frequency-domain characteristics are obtained by time-domain analysis and frequency domain analysis.
(2) A series of vibration signals of bearing are decomposed using three layer wavelet packet transition, 8 wavelet packet coefficients are obtained as the statistical characteristics of the model.
(3) The PCA is employed to reduce the dimensionality of the statistical characteristics and the new lowdimensional features are obtained as the input of the model, and the mapping matrix of the PCA is restored.

Scalar Quantization of Signal Characteristics
Scalar quantization is a technique for machine training and learning to meet certain requirements. The original data is processed according to a certain partition and the new values are assigned, so the continuous values are transformed into discrete values which can be put into the model for training. After the principal characteristics are normalized sequentially, the scalar quantization is carried out. The scalar quantization process used in this work is as follows where x is the original signal and y is the signal characteristic after the scalar quantization.

Hidden Markov Model
The description of the hidden Markov model was created in the 1960s, initially used in speech recognition in the 1970s. By the late 1980s, the hidden Markov model was applied to the analysis of DNA, and then became an important technology in the field of biological information. As people constantly explore and apply this technology, now it has a wide application in many fields such as fault diagnosis, machine learning, automatic driving, natural language processing, and target recognition. The HMM is a statistical model, which can be used to describe Markov process with hidden state. First, the model needs to be trained according to existing Immobility hypothesis: the state of the system is not related to time Output independence hypothesis: the output of the system is only related to the current state of the system In this paper, based on the estimation of model parameters and the determination of the location and severity of fault signals by the maximum probability, the forward algorithm, backward algorithm and forward-backward algorithm are adopted. The definition and symbols of each algorithm are as follows.
Forward algorithm: The forward variable is defined as the probability of ending up in any particular state given the first t observations in the sequence where T is the length of observations. The forward recursion is expressed as [28] a j (t + 1) = [ Backward algorithm: The backward variable is defined as the probability of observing the remaining observations given any starting point t The backward recursion is expressed as [28] Forward-backward algorithm (Baum-Welch algorithm): Forward-backward algorithm obtains a set of forward probabilities and a set of backward probabilities, which be used to jointly acquire the distribution over states at any specific time t [29] ξ t (i, j) = P(Q t = q i , Q t+1 = q j |O, λ )= The probability of the model λ and the observation sequence O at time t is defined as When training parameters, the initial probability distribution matrix π and state transition probability matrix A are assigned for each state, and these matrixes are updated according to forward-backward algorithm until the requirement of accuracy is met.

The HMM for Fault Diagnosis
The fault diagnosis of bearing refers to identifying the current health status of bearing according to various signals detected in the operation of bearing. A typical diagnostic procedure based on the HMM with multifeatures for bearing involves the following steps and is depicted in Fig. 1.
(1) Obtain the vibration signals of bearing with various states as the training samples.
(2) Obtain the time-domain characteristics, frequency-domain characteristics, and wavelet packet reconstruction coefficients according to the original vibration signals.
(3) Reduce the dimensionality of 38 statistical characteristics and obtain the low-dimensional features and the mapping matrix of the training samples.

The HMM for Remaining Life Prediction
The whole life data of each degradation mode of bearing is used to train to HMM, which constitutes the life prediction model library. The degradation is non-recoverable and gets worse gradually. The HMM model is viewed as a left-right model, each degradation stage corresponds to a specific state in this HMM model, and the transitions among different states constitute the transition probability matrix. The structure of a M-state left-to-right HMM is described in Fig. 2.
The feature extraction process of bearing vibration signals is performed on all data from the beginning of operation to the current time to obtain a feature vector sequence, and the duration in a state can be expressed as Suppose the probability vector of the current state of the system is (p 1 , p 2 , · · · , p M ) , the remaining life can be given as where r is the dimension of the feature vectors and thr is a threshold. In this paper the threshold is set as 0.96, and then we can get the projection matrix and r = 4. The 4 characteristic values of each state are changed by scalar quantization according to equation (6), and then these scalar values are multiplied and putted into a probability model, by which the corresponding probabilities from No. 1 to 10 are obtained used for the inputs of the HMM. Table 3  The HMM is established separately for each fault state according the corresponding signals, in which the same probability matrix B is adopted. During the training process, each probability distribution matrix and each state transition probability matrix are redefined according to equations (16) and (17), and the training process is stopped when the convergence precision achieves 0.001. Fig. 3 shows the logarithmic likelihood probability values versus iterations for each fault state. The maximal number of iterations is set to 40. The statistics of the overall training results using HMM are provided in Table 4.
It can be seen that from Fig. 3, the training step for each fault state of HMM is less than 30, demonstrating the strong learning abilities of HMM for the various fault states of bearing. Inspecting Table 4, the training accuracy is relatively high for each fault state, especially for the states of Nor, I1, I3 and O1, their training accuracy comes up to a hundred percent. Fig. 4 shows the fault identification results of training samples.
In order to verify the performance of the established HMM, 250 test samples for each fault state are used for test. After the characteristics of all the test samples are extracted, their corresponding dimensionalities are  reduced and the characteristics are mapped into a low-dimensional space by a mapping vector offered by the training process. Eventually, the processed data is converted into scalar values and putted into the probability model to produce the observation values used for the input of HMM.
To make use of the Markov nature of HMM, the following test processes are implemented. Process 1: For each fault state, every 10 observation samples are tested as a group, the observation samples from 1 to 10, from 2 to 11, by that analogy, until form 241 to 250, a total of 241 tests can be obtained.
Process 2: For each fault state, every 20 observation samples are tested as a group, the observation samples from 1 to 20, from 2 to 21, by that analogy, until form 231 to 250, a total of 231 tests can be obtained.
The rest can be done in the same manner. Process 15: For each fault state, every 150 observation samples are tested as a group, the observation samples from 1 to 150, from 2 to 151, by that analogy, until form 101 to 250, a total of 101 tests can be obtained.
According the following processes, the observation sequences are input into 10 HMMs, the likelihood probability values of each model are calculated and the diagnosis result is the state corresponding to the maximum probability. Fig. 5 shows the identification results of various fault states of bearing for every 30 observation samples using the established HMMs. Fig. 6 shows the identification results of various fault states of bearing for every 100 observation samples using the established HMMs. The statistics of test results of various fault states for different sample lengths are offered in Table 5. As shown in the table, the identification accuracy of various fault states is improved as the length of observation samples increases. The identification result of each fault state reaches the best accuracy when the length of observation samples is more than 120. It is very difficult for us to acquire complete lifetime data of bearing, so the vibration signals are obtained for each degradation phase at intervals. In this simulation, we only investigate the lifetime when the outer race fault appears. We construct 3056 groups of data and the time interval among them is 2 minutes. Each group of data has 1024 sample points. The duration of the normal state is 1450 minutes, the duration of the slight outer race fault is 1368 minutes and the medium outer fault is 238 minutes. Because the performance of the bearings is always getting worse, to reflect the state change of its performance, the HMM is initialized to left-right model with 4 fault states. We assume that the lifetime ends when the serious outer race appears. The half sample data is used for training, the remaining half is used for prediction test. The lifetime curve of prediction test is shown in Fig. 7. From the figure, we can find that the prediction lifetime is improved as the time increases. It is worth mentioning that the prediction results in the practice may be even worse because the actual test data obtained in real situations is imperfect, complex and uncertain.

Conclusions
In this study, a diagnostics and prognostics method for bearing, called HMM with multi-domain features, is proposed. First, the multi-features, including time-domain, frequency-domain, wavelet packet decomposition, are used to extract the characteristics from the original vibration signals of bearing. To remove the redundant or irrelevant features, the PCA method is employed for selection features and reduction dimensionality. Then the low-dimensional features are converted into the scalar probabilities multiplied to generate the observation values of HMM, which are fed into HMM model to achieve fault diagnostics and prognostics for bearing. The experimental results show that the proposed scheme is efficient and available in diagnostics and prognostics for bearing. It is very interesting to integrate multi-features with optimized-HMM model for diagnostics and prognostics of bearing [30,31], this study will be our future focus.  Nor R1  R2  R3  I1  I2  I3  O1  O2  O3  10 T h i s p a g e i s i n t e n t i o n a l l y l e f t b l a n k