Fault Diagnosis and Prognosis of Bearing Based on Hidden Markov Model with Multi-Features

As is known to us bearing is one of the most important components used in modern engineering machinery. Once the bearing fails, it will lead to serious consequences such as equipment damage and great economic loss. Fault diagnosis and prognosis for bearing are very important, which can effectively prevent unexpected failures and assist engineering technicians to implement targeted equipment maintenance [1,2,3,4]. Fault diagnosis is used for identifying its symptom and fault conditions, and prognosis approach is generally employed to implement the remaining life prediction by existing information and knowledge. Before implementing fault diagnosis and prognosis approaches, it is key for us to effectively extract the fault features of bearing signals, which have direct effects on the diagnosis precision and prediction of bearing. Therefore, the selection for signal features of bearing can comprehensively and concretely reflect the information condition of bearing from different levels [5, 6]. In general, the time-domain component and frequency-domain component can be viewed as important elements of bearing vibration signals [7,8]. In addition, some low and high frequencies are closely related to the operating states of bearing, which can be obtained by wavelet packet decomposition [9,10]. These components can reflect the detail features of signals from different levels [11]. In this study, the time-domain and frequency-domain components, as well as different frequency scales of vibration signals of bearing are used for the extraction of bearing vibration signals.

After the signal features are extracted, an effective prognostic and prognostic approach needs to be determined. There are many fault diagnosis and prognosis approaches in literature which can be roughly categorized into three categories: experience-based, model-based, and data-driven methods [12]. The experience-based approach only depends on statistical reliability and past experience information, resulting in low accuracy [13]. The model-based approach makes use of complicated physical model and damage propagation model to analyze the degradation process of machines. Though the precision is high, the cost is very high [14]. Data-driven method actually belongs to artificial intelligence approaches, in which the routine condition data of the equipment is used to train a good model. Data-driven methods are viewed as an effective prognostics approach [15]. With the popularity of data-driven approaches, they are increasingly applied to different engineering areas, including artificial neural network (ANN) [16], relevance vector machine (RVM) [17], Bayesian network [18], hidden Markov model (HMM) [19], and support vector regression (SVR) [20]. HMM represents the individual component states of a dynamic system in a natural way. This fact makes this method useful in fault detection and mechanical system monitoring. With its emergence, it has been widely applied as a data-driven modeling approach to different fields such as pattern recognition [21], image identification [22], speech recognition [23] and so on. The intent of this work needs to not only precisely describe the fault conditions of bearing, but also reflect the transformation relation among different fault conditions. Such a dual stochastic process is consistent with the process of HMM. Therefore, the HMM method can be employed to implement fault diagnosis and remaining life prediction of bearing.

In this study, a comprehensive fault diagnosis and prognosis technique based on HMM is proposed. In this approach, the time-domain and frequency-domain, three-layer wavelet decomposition components are extracted from the vibration signals of bearing. Meanwhile, The PCA method is used to fusion multi-features to reduce the dimensionality of the multi-features. Then the scalar probabilities of all the operating conditions of bearing are obtained by the scalar quantization and the observation values of HMM are obtained through the multiplication of these scalar probabilities. According to these observed values, the fault diagnosis and remaining life prediction by HMM can be implemented. The experimental results discover that the proposed scheme not only shows a good fault identification effect for different conditions of bearing, but only makes a remaining life estimation.

The Extraction of Multi-features

2.1

Time and Frequency Domains

For the vibration signals of bearing, the time-domain analysis and frequency-domain analysis are two extraction methods of different dimensions. The time-domain analysis represents the dynamic characteristics of vibration signals from the time level. The frequency-domain analysis represents the dynamic characteristics of signals from the frequency level. The time-domain representation, by contrast, is more visual, while the information contained in the signals can be easily observed by frequency-domain representation.

Table 1 provides 16 different time-domain statistical characteristics [24], where x(n) is time-domain signal sequence and n is the number of samples. These time-domain characteristics include the amplitude mean, root mean square, root amplitude, mean root amplitude, degree of skewness, kurtosis, variance, maximum amplitude, minimum amplitude, peak-to-peak value, waveform factor, peak factor, pulse index, margin index, and kurtosis index. Equations ft₁–ft₁₀ reflect the energy and amplitude of time domain signals and equations ft₁₁–ft₁₆ reflect the time series distribution of time domain signals.

Table 1

Time-domain statistical characteristics.

${ft}_{1} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$ f{t_1} = {1 \over N}\sum\limits_{i = 1}^N {{x_i}}	${ft}_{2} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}}$ f{t_2} = \sqrt {{1 \over N}\sum\limits_{i = 1}^N {x_i^2} }	${ft}_{3} = {[\frac{1}{N} \sum_{i = 1}^{N} \sqrt{\| x_{i} \|}]}^{2}$ f{t_3} = {\left[ {{1 \over N}\sum\limits_{i = 1}^N {\sqrt {\left\| {{x_i}} \right\|} } } \right]^2}	${ft}_{4} = \frac{1}{N} \sum_{i = 1}^{N} \| x_{i} \|$ f{t_4} = {1 \over N}\sum\limits_{i = 1}^N {\left\| {{x_i}} \right\|}
${ft}_{5} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}^{3}$ f{t_5} = {1 \over N}\sum\limits_{i = 1}^N {x_i^3}	${ft}_{6} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}^{4}$ f{t_6} = {1 \over N}\sum\limits_{i = 1}^N {x_i^4}	${ft}_{7} = \frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - \bar{X})}^{2}$ f{t_7} = {1 \over {N - 1}}\sum\limits_{i = 1}^N {{{\left( {{x_i} - \bar X} \right)}^2}}	f t₈ = max{\|x_i\|}
f t₉ = min{x_i}	f t₁₀ = max(x_i) − min(x_i)	${ft}_{11} = \frac{{ft}_{2}}{{ft}_{4}}$ f{t_{11}} = {{f{t_2}} \over {f{t_4}}}	${ft}_{12} = \frac{{ft}_{8}}{{ft}_{2}}$ f{t_{12}} = {{f{t_8}} \over {f{t_2}}}
${ft}_{13} = \frac{{ft}_{8}}{{ft}_{4}}$ f{t_{13}} = {{f{t_8}} \over {f{t_4}}}	${ft}_{14} = \frac{{ft}_{8}}{{ft}_{3}}$ f{t_{14}} = {{f{t_8}} \over {f{t_3}}}	${ft}_{15} = \frac{{ft}_{5}}{{ft}_{2}^{3}}$ f{t_{15}} = {{f{t_5}} \over {ft_2^3}}	${ft}_{16} = \frac{{ft}_{6}}{{ft}_{2}^{4}}$ f{t_{16}} = {{f{t_6}} \over {ft_2^4}}

Table 2 provides 14 frequency-domain statistical characteristics, where s(k) is the spectrum of x(n), k is the number of spectral lines, and f_k is the frequency value of the kth spectral line. These frequency-domain characteristics include the mean of spectrum, root mean square of spectrum, variance of spectrum, skewness index of spectrum, kurtosis index of spectrum, gravity center of spectrum, spectrum dispersion, mean square of spectrum, main frequency variation index, rate of change, skew of frequency, kurtosis of frequency-domain frequency, and ratio of square root, which reflects the time series distribution of time domain signals. Equation ff₁ reflects the magnitude of vibration energy in frequency domain, equations ff₂–ff₄, ff₆–ff₇ and ff₁₁–ff₁₄ reflect the dispersion or concentration degree of spectrum, and, equations ff₅ and ff₈–ff₁₀ reflect the position change of main frequency.

Table 2

Frequency-domain statistical characteristics.

${ff}_{1} = \frac{\sum_{k = 1}^{K} s (k)}{k}$ f{f_1} = {{\sum\limits_{k = 1}^K {s(k)} } \over k}	${ff}_{2} = \frac{\sum_{k = 1}^{K} {(s (k) - {ff}_{1})}^{2}}{k - 1}$ f{f_2} = {{\sum\limits_{k = 1}^K {{{\left( {s(k) - f{f_1}} \right)}^2}} } \over {k - 1}}	${ff}_{3} = \frac{\sum_{k = 1}^{K} {(s (k) - {ff}_{1})}^{3}}{k {(\sqrt{p_{2}})}^{3}}$ f{f_3} = {{\sum\limits_{k = 1}^K {{{\left( {s(k) - f{f_1}} \right)}^3}} } \over {k{{\left( {\sqrt {{p_2}} } \right)}^3}}}	${ff}_{4} = \frac{\sum_{k = 1}^{K} {(s (k) - {ff}_{1})}^{4}}{k \cdot {ff}_{2}^{2}}$ f{f_4} = {{\sum\limits_{k = 1}^K {{{\left( {s(k) - f{f_1}} \right)}^4}} } \over {k \cdot ff_2^2}}
${ff}_{5} = \frac{\sum_{k = 1}^{K} f_{k} s (k)}{\sum_{k = 1}^{K} s (k)}$ f{f_5} = {{\sum\limits_{k = 1}^K {{f_k}s(k)} } \over {\sum\limits_{k = 1}^K {s(k)} }}	${ff}_{6} = \sqrt{\frac{\sum_{k = 1}^{K} {(f_{k} - {ff}_{5})}^{2} s (k)}{k}}$ f{f_6} = \sqrt {{{\sum\limits_{k = 1}^K {{{\left( {{f_k} - f{f_5}} \right)}^2}s(k)} } \over k}}	${ff}_{7} = \sqrt{\frac{\sum_{k = 1}^{K} f_{k}^{2} s (k)}{\sum_{k = 1}^{K} s (k)}}$ f{f_7} = \sqrt {{{\sum\limits_{k = 1}^K {f_k^2s(k)} } \over {\sum\limits_{k = 1}^K {s(k)} }}}	${ff}_{8} = \sqrt{\frac{\sum_{k = 1}^{K} f_{k}^{4} s (k)}{\sum_{k = 1}^{K} f_{k}^{2} s (k)}}$ f{f_8} = \sqrt {{{\sum\limits_{k = 1}^K {f_k^4s(k)} } \over {\sum\limits_{k = 1}^K {f_k^2s(k)} }}}
${ff}_{9} = \frac{\sum_{k = 1}^{K} f_{k}^{2} s (k)}{\sqrt{\sum_{k = 1}^{K} s (k) \sum_{k = 1}^{K} f_{k}^{4} s (k)}}$ f{f_9} = {{\sum\limits_{k = 1}^K {f_k^2s(k)} } \over {\sqrt {\sum\limits_{k = 1}^K {s(k)} \sum\limits_{k = 1}^K {f_k^4s(k)} } }}	${ff}_{10} = \frac{{ff}_{6}}{{ff}_{5}}$ f{f_{10}} = {{f{f_6}} \over {f{f_5}}}	${ff}_{11} = \frac{\sum_{k = 1}^{K} {(f_{k} - {ff}_{5})}^{3} s (k)}{{kp}_{6}^{3}}$ f{f_{11}} = {{\sum\limits_{k = 1}^K {{{\left( {{f_k} - f{f_5}} \right)}^3}s(k)} } \over {kp_6^3}}	${ff}_{12} = \frac{\sum_{k = 1}^{K} {(f_{k} - {ff}_{5})}^{4} s (k)}{{kp}_{6}^{4}}$ f{f_{12}} = {{\sum\limits_{k = 1}^K {{{\left( {{f_k} - f{f_5}} \right)}^4}s(k)} } \over {kp_6^4}}
${ff}_{13} = \frac{\sum_{k = 1}^{K} {(\| f_{k} - p_{5} \|)}^{\frac{1}{2}} s (k)}{k \sqrt{p_{6}}}$ f{f_{13}} = {{\sum\limits_{k = 1}^K {{{\left( {\left\| {{f_k} - {p_5}} \right\|} \right)}^{{1 \over 2}}}s(k)} } \over {k\sqrt {{p_6}} }}	${ff}_{14} = \frac{\sum_{k = 1}^{K} {(f_{k} - {ff}_{5})}^{2} s (k)}{\sum_{k = 1}^{K} s (k)}$ f{f_{14}} = {{\sum\limits_{k = 1}^K {{{\left( {{f_k} - f{f_5}} \right)}^2}s(k)} } \over {\sum\limits_{k = 1}^K {s(k)} }}

2.2

Wavelet Packet Decomposition

Wavelet packet decomposition method is the further development of wavelet decomposition, which provides more abundant signal analysis. Wavelet packet decomposition is able to decompose each detail coefficient vector into two parts, producing a complete binary tree [25, 26]. Wavelet packet is a linear combination of a series of wavelet functions φⁱ(t) which is expressed by (1) $φ_{j, k}^{i} (t) = 2^{j / 2} φ^{i} (2^{j} t - k) i = 1, 2, \dots$ \phi _{i,k}^i(t) = {2^{j/2}}{\phi ^i}({2^j}t - k)i = 1,2, \cdots where i is the frequency factor, j is the scale factor, and k is the translation factor. Any time domain signal can be decomposed by (2) $x (t) = \sum_{i = 0}^{2 i - 1} x_{j}^{i} (t), i = 1, 2, \dots$ x(t) = \sum\limits_{i = 0}^{2i - 1} {x_j^i(t),\;\;\;\;\;\;\;\;i = 1,2, \cdots }(3) $x_{j}^{i} (t) = x_{j + 1}^{2 i - 1} (t) + x_{j + 1}^{2 i} (t)$ x_j^i(t) = x_{j + 1}^{2i - 1}(t) + x_{j + 1}^{2i}(t)(4) $x_{j + 1}^{2 i - 1} (t) = H x_{j}^{i} (t)$ x_{j + 1}^{2i - 1}(t) = Hx_j^i(t)(5) $x_{j + 1}^{2 i} (t) = {Gx}_{j}^{i} (t)$ x_{j + 1}^{2i}(t) = Gx_j^i(t) where $x_{j}^{i} (t)$ x_j^i (t) is the ith frequency band signal at the jth layer wavelet decomposition, {h(k),g(k)} are the scale sequences representing the orthogonal low-pass and high-pass filters, by which the decomposition signals in different frequency band ranges are obtained by means of filtering.

When analyzing the vibration signals of bearing using wavelet packet decomposition, it is necessary to determine the type of wavelet package basis function and the number of wavelet decomposition layers. In this work, we adopt 3-layer one-dimensional wavelet packet decomposition based on db1 wavelet and shannon threshold. The energy of wavelet packet reconstruction coefficients on the third layer is extracted as the characteristic value of vibration signals. This decomposition can generate eight wavelet packet reconstruction coefficients. Therefore, combined with 16 time-domain characteristics and 14 frequency-domain characteristics, in total 38 characteristics of vibration signals of bearing in this experiment are used for analysis.

The PCA is used to reduce the dimensionality and extract useful features from 38 statistical characteristics, the obtained low-dimensional features are viewed as the input of the HMM. The procedure of reducing dimensionality is given by (1)

The time-domain and frequency-domain characteristics are obtained by time-domain analysis and frequency domain analysis.

(2)

A series of vibration signals of bearing are decomposed using three layer wavelet packet transition, 8 wavelet packet coefficients are obtained as the statistical characteristics of the model.

(3)

The PCA is employed to reduce the dimensionality of the statistical characteristics and the new low-dimensional features are obtained as the input of the model, and the mapping matrix of the PCA is restored.

Scalar Quantization of Signal Characteristics

Scalar quantization is a technique for machine training and learning to meet certain requirements. The original data is processed according to a certain partition and the new values are assigned, so the continuous values are transformed into discrete values which can be put into the model for training. After the principal characteristics are normalized sequentially, the scalar quantization is carried out. The scalar quantization process used in this work is as follows (6) $y = {\begin{array}{l} 1, x < 0 \\ 2, 0 \leq x \leq 0.1 \\ 3, 0.1 \leq x \leq 0.2 \\ \dots \\ 11, 0.9 \leq x \leq 1 \\ 12, x > 1 \end{array}$ y = \left\{ {\matrix{ {1,x < 0} \hfill \cr {2,0 \le x \le 0.1} \hfill \cr {3,0.1 \le x \le 0.2} \hfill \cr \ldots \hfill \cr {11,0.9 \le x \le 1} \hfill \cr {12,x > 1} \hfill \cr } } \right. where x is the original signal and y is the signal characteristic after the scalar quantization.

The HMM for Fault Diagnosis and Prognosis

4.1

Hidden Markov Model

The description of the hidden Markov model was created in the 1960s, initially used in speech recognition in the 1970s. By the late 1980s, the hidden Markov model was applied to the analysis of DNA, and then became an important technology in the field of biological information. As people constantly explore and apply this technology, now it has a wide application in many fields such as fault diagnosis, machine learning, automatic driving, natural language processing, and target recognition. The HMM is a statistical model, which can be used to describe Markov process with hidden state. First, the model needs to be trained according to existing observation values to determine the parameters in the model, and then the observation values are analyzed and identified according to the established model.

Assume λ = (N,M,A,B,π) is a hidden Markov model(HMM) [27] where, N: number of states in the model.

M: number of different observations for each state.

A: state transition probability matrix N × N.

B: observation probability matrix for each state N × M.

π: initial state probability matrix 1 × N.

Markov hypothesis: the future state of a process does not depend on the past state, but only on the present state

(7)

q_{i + 1} = f (q_{i})

{q_{i + 1}} = f({q_i})

Immobility hypothesis: the state of the system is not related to time (8) $P (q_{j + 1} | q_{j}) = P (q_{i + 1} | q_{i})$ P({q_{j + 1}}|{q_j}) = P({q_{i + 1}}|{q_i})

Output independence hypothesis: the output of the system is only related to the current state of the system (9) $P (o_{1}, o_{2}, \dots, o_{t} | q_{1}, q_{2}, \dots, q_{t}) = \prod_{i = 1}^{t} P (o_{i} | q_{i})$ P({o_1},{o_2}, \ldots ,{o_t}|{q_1},{q_2}, \ldots ,{q_t}) = \prod\limits_{i = 1}^t {P({o_i}|{q_i})}

In this paper, based on the estimation of model parameters and the determination of the location and severity of fault signals by the maximum probability, the forward algorithm, backward algorithm and forward-backward algorithm are adopted. The definition and symbols of each algorithm are as follows.

Forward algorithm: The forward variable is defined as the probability of ending up in any particular state given the first t observations in the sequence (10) $α (t, i) = P (o_{1}, o_{2}, \dots, o_{t}, Q_{t} = q_{i} | λ), 1 \leq t \leq T$ \alpha (t,i) = P({o_1},{o_2}, \ldots ,{o_t},{Q_t} = {q_i}|\lambda ),1 \le t \le T where T is the length of observations. The forward recursion is expressed as [28] (11) $a_{j} (t + 1) = [\sum_{i = 1}^{N} α_{j} (t) a_{ij}] B_{j} (o_{t + 1})$ {a_j}(t + 1) = [\sum\limits_{i = 1}^N {{\alpha _j}(t){a_{ij}}} ]{B_j}({o_{t + 1}}) Backward algorithm: The backward variable is defined as the probability of observing the remaining observations given any starting point t(12) $β (t, i) = P (o_{t - 1}, o_{t - 2}, \dots, o_{T}, Q_{t} = q_{i} | λ), 1 \leq t \leq T - 1$ \beta (t,i) = P({o_{t - 1}},{o_{t - 2}}, \ldots ,{o_T},{Q_t} = {q_i}|\lambda ),1 \le t \le T - 1

The backward recursion is expressed as [28] (13) $β_{j} (t) = [\sum_{i = 1}^{N} a_{ij} B_{j} (o_{t + 1})] β_{j} (t + 1)$ {\beta _j}(t) = [\sum\limits_{i = 1}^N {{a_{ij}}{B_j}({o_{t + 1}})} ]{\beta _j}(t + 1)

Forward-backward algorithm (Baum-Welch algorithm): Forward-backward algorithm obtains a set of forward probabilities and a set of backward probabilities, which be used to jointly acquire the distribution over states at any specific time t [29] (14) $\begin{array}{l} ξ_{t} (i, j) = P (Q_{t} = q_{i}, Q_{t + 1} = q_{j} | O, λ) = \frac{α_{t} (i) A_{ij} B_{j} (o_{t + 1}) β_{t + 1} (j)}{P (O | λ)} \\ = \frac{α_{t} (i) A_{ij} B_{j} (o_{t + 1}) β_{t + 1} (j)}{Σ_{i = 1}^{N} Σ_{j = 1}^{N} α_{t} (i) A_{ij} B_{j} (o_{t + 1}) β_{t + 1} (j)} 1 \leq i, j \leq N, 1 \leq t \leq T \end{array}$ \matrix{ {{\xi _t}(i,j) = P({Q_t} = {q_i},{Q_{t + 1}} = {q_j}|O,\lambda ) = {{{\alpha _t}(i){A_{ij}}{B_j}({o_{t + 1}}){\beta _{t + 1}}(j)} \over {P(O|\lambda )}}} \hfill \cr { = {{{\alpha _t}(i){A_{ij}}{B_j}({o_{t + 1}}){\beta _{t + 1}}(j)} \over {\mathop \Sigma \limits_{i = 1}^N \mathop \Sigma \limits_{j = 1}^N {\alpha _t}(i){A_{ij}}{B_j}({o_{t + 1}}){\beta _{t + 1}}(j)}}1 \le i,j \le N,1 \le t \le T} \hfill \cr }

The probability of the model λ and the observation sequence O at time t is defined as (15) $γ_{t} (i) = \sum_{j = 1}^{N} ξ_{t} (i, j), 1 \leq i, j \leq N$ {\gamma _t}(i) = \sum\limits_{j = 1}^N {{\xi _t}(i,j),1 \le i,j \le N}

When training parameters, the initial probability distribution matrix π and state transition probability matrix A are assigned for each state, and these matrixes are updated according to forward-backward algorithm until the requirement of accuracy is met.

(16)

π_{i} = γ_{1} (i), 1 \leq i \leq N

{\pi _i} = {\gamma _1}(i),1 \le i \le N

(17)

A_{ij} = \frac{Σ_{t = 1}^{T - 1} ξ_{t} (i, j)}{Σ_{t = 1}^{T - 1} γ_{t} (i)}, 1 \leq i, j \leq N

{A_{ij}} = {{\mathop \Sigma \limits_{t = 1}^{T - 1} {\xi _t}(i,j)} \over {\mathop \Sigma \limits_{t = 1}^{T - 1} {\gamma _t}(i)}},1 \le i,j \le N

4.2

The HMM for Fault Diagnosis

The fault diagnosis of bearing refers to identifying the current health status of bearing according to various signals detected in the operation of bearing. A typical diagnostic procedure based on the HMM with multi-features for bearing involves the following steps and is depicted in Fig. 1. (1)

Obtain the vibration signals of bearing with various states as the training samples.

(2)

Obtain the time-domain characteristics, frequency-domain characteristics, and wavelet packet reconstruction coefficients according to the original vibration signals.

(3)

Reduce the dimensionality of 38 statistical characteristics and obtain the low-dimensional features and the mapping matrix of the training samples.

(4)

Train the HMM classifiers and obtain a set of parameters of each HMM classifier corresponding to a fault state.

(5)

Obtain the vibration signals of bearing with various states as the test samples.

(6)

Obtain the time-domain characteristics, frequency-domain characteristics, and wavelet packet reconstruction coefficients according to the vibration signals to be identified.

(7)

Map 38 statistical characteristics into low-dimensional features according to the mapping matrix obtained from the training process.

(8)

Input the low-dimensional features into each HMM classifier and obtain M probability values.

(9)

Choose the maximum probability of M HMM classifiers and get the corresponding fault states.

4.3

The HMM for Remaining Life Prediction

The whole life data of each degradation mode of bearing is used to train to HMM, which constitutes the life prediction model library. The degradation is non-recoverable and gets worse gradually. The HMM model is viewed as a left-right model, each degradation stage corresponds to a specific state in this HMM model, and the transitions among different states constitute the transition probability matrix. The structure of a M-state left-to-right HMM is described in Fig. 2.

The feature extraction process of bearing vibration signals is performed on all data from the beginning of operation to the current time to obtain a feature vector sequence, and the duration in a state can be expressed as (18) $D (S_{i}) = \sum_{d = 1}^{\infty} dp (d_{i})$ D({S_i}) = \sum\limits_{d = 1}^\infty {dp({d_i})}

Suppose the probability vector of the current state of the system is (p₁, p₂, ···, p_M), the remaining life can be given as (19) $R = \sum_{j = 1}^{M} \sum_{d = 1}^{\infty} dp (d_{j})$ R = \sum\limits_{j = 1}^M {\sum\limits_{d = 1}^\infty {dp({d_j})} }

Simulation and Analysis

100 samples are obtained for each fault state and the length of each sample is 1024. So there are 1000 training samples corresponding to 10 different fault states are used for training. These training samples are extracted into 38-dimensional characteristic vectors, which are processed by PCA to produce the low-dimensional feature vectors. The dimensionality of feature vectors is determined by (20) $\sum_{i = 1}^{r} λ_{i} / \sum_{i = 1}^{38} λ_{i} \geq thr, (0 \leq thr \geq 1)$ \sum\limits_{i = 1}^r {{\lambda _i}/\sum\limits_{i = 1}^{38} {{\lambda _i}} \ge thr,\left( {0 \le thr \ge 1} \right)} where r is the dimension of the feature vectors and thr is a threshold. In this paper the threshold is set as 0.96, and then we can get the projection matrix and r = 4.

The 4 characteristic values of each state are changed by scalar quantization according to equation (6), and then these scalar values are multiplied and putted into a probability model, by which the corresponding probabilities from No. 1 to 10 are obtained used for the inputs of the HMM. Table 3 shows some obtained observation values. Nor: normal, R1: slight rotor fault, R2: medium rotor fault, R3: serious rotor fault, I1: slight inter ring fault, I2: medium inter ring fault, I3: serious inter ring fault, O1: slight outer ring fault, O2: medium outer ring fault, O3: serious outer ring fault.

Table 3

Some obtained observation values as the input of HMM.

States of faults	No. of observation values

	1	2	3	4	5	6	7	8	9	10
Nor	0.71	0.03	0.1	0.1	0	0.01	0	0	0.05	0
R1	0.07	0.56	.07	0.1	0	0.17	0	0	0.03	0
R2	0.23	0.08	0.35	0.2	0.01	0.06	0.04	0	0.01	0.02
R3	0.31	0.07	0.06	0.41	0.01	0.03	0.04	0	0.07	0
I1	0	0	0.02	0.01	0.84	0.07	0.01	0	0	0.05
I2	0.04	0.3	0.07	0.06	0.13	0.37	0.02	0.01	0	0
I3	0	0	0.01	0.04	0.11	0.04	0.65	0	0	0.15
O1	0	0	0	0	0	0.02	0	0.97	0	0.01
O2	0.37	0.1	0.02	0.32	0	0	0.01	0	0.18	0
O3	0	0.01	0.03	0.01	0.15	0.03	0.07	0.03	0	0.67

The HMM is established separately for each fault state according the corresponding signals, in which the same probability matrix B is adopted. During the training process, each probability distribution matrix and each state transition probability matrix are redefined according to equations (16) and (17), and the training process is stopped when the convergence precision achieves 0.001. Fig. 3 shows the logarithmic likelihood probability values versus iterations for each fault state. The maximal number of iterations is set to 40. The statistics of the overall training results using HMM are provided in Table 4.

Table 4

The statistics of the overall training results using HMM.

States of faults	Nor	R1	R2	R3	I1	I2	I3	O1	O2	O3
Accuracy	100%	99%	95%	90%	100%	99%	100%	100%	96%	99%

Training curves of various fault states of HMMs.

It can be seen that from Fig. 3, the training step for each fault state of HMM is less than 30, demonstrating the strong learning abilities of HMM for the various fault states of bearing. Inspecting Table 4, the training accuracy is relatively high for each fault state, especially for the states of Nor, I1, I3 and O1, their training accuracy comes up to a hundred percent. Fig. 4 shows the fault identification results of training samples.

Fault Identification results of training samples.

In order to verify the performance of the established HMM, 250 test samples for each fault state are used for test. After the characteristics of all the test samples are extracted, their corresponding dimensionalities are reduced and the characteristics are mapped into a low-dimensional space by a mapping vector offered by the training process. Eventually, the processed data is converted into scalar values and putted into the probability model to produce the observation values used for the input of HMM.

To make use of the Markov nature of HMM, the following test processes are implemented.

Process 1: For each fault state, every 10 observation samples are tested as a group, the observation samples from 1 to 10, from 2 to 11, by that analogy, until form 241 to 250, a total of 241 tests can be obtained.

Process 2: For each fault state, every 20 observation samples are tested as a group, the observation samples from 1 to 20, from 2 to 21, by that analogy, until form 231 to 250, a total of 231 tests can be obtained.

The rest can be done in the same manner.

Process 15: For each fault state, every 150 observation samples are tested as a group, the observation samples from 1 to 150, from 2 to 151, by that analogy, until form 101 to 250, a total of 101 tests can be obtained.

According the following processes, the observation sequences are input into 10 HMMs, the likelihood probability values of each model are calculated and the diagnosis result is the state corresponding to the maximum probability. Fig. 5 shows the identification results of various fault states of bearing for every 30 observation samples using the established HMMs. Fig. 6 shows the identification results of various fault states of bearing for every 100 observation samples using the established HMMs. The statistics of test results of various fault states for different sample lengths are offered in Table 5. As shown in the table, the identification accuracy of various fault states is improved as the length of observation samples increases. The identification result of each fault state reaches the best accuracy when the length of observation samples is more than 120.

Identification results of various fault states of bearing for every 30 observation samples.

Identification results of various fault states of bearing for every 100 observation samples.

Table 5

Statistics of test results of various fault states for different sample lengths.

Length of observation samples	States of faults

	Nor	R1	R2	R3	I1	I2	I3	O1	O2	O3
10	64.3%	82.2%	41.1%	53.9%	93.8%	81.3%	97.5%	98.3%	58.9%	85.1%
20	86.1%	88.7%	55.0%	72.3%	93.9%	97.8%	100%	100%	69.7%	92.2%
30	96.8%	95.5%	60.2%	77.8%	99.5%	99.5%	100%	100%	71.9%	98.6%
40	100%	100%	64.0%	77.3%	100%	100%	100%	100%	67.3%	100%
50	100%	100%	63.7%	85.6%	100%	100%	100%	100%	70.6%	100%
60	100%	100%	69.1%	86.9%	100%	100%	100%	100%	74.3%	100%
70	100%	100%	76.8%	90.6%	100%	100%	100%	100%	91.7%	100%
80	100%	100%	83.6%	91.2%	100%	100%	100%	100%	100%	100%
90	100%	100%	83.2%	94.4%	100%	100%	100%	100%	100%	100%
100	100%	100%	90.7%	100%	100%	100%	100%	100%	100%	100%
110	100%	100%	95.7%	100%	100%	100%	100%	100%	100%	100%
120	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
130	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
140	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
150	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%

It is very difficult for us to acquire complete lifetime data of bearing, so the vibration signals are obtained for each degradation phase at intervals. In this simulation, we only investigate the lifetime when the outer race fault appears. We construct 3056 groups of data and the time interval among them is 2 minutes. Each group of data has 1024 sample points. The duration of the normal state is 1450 minutes, the duration of the slight outer race fault is 1368 minutes and the medium outer fault is 238 minutes. Because the performance of the bearings is always getting worse, to reflect the state change of its performance, the HMM is initialized to left-right model with 4 fault states. We assume that the lifetime ends when the serious outer race appears. The half sample data is used for training, the remaining half is used for prediction test. The lifetime curve of prediction test is shown in Fig. 7. From the figure, we can find that the prediction lifetime is improved as the time increases. It is worth mentioning that the prediction results in the practice may be even worse because the actual test data obtained in real situations is imperfect, complex and uncertain.

Conclusions

In this study, a diagnostics and prognostics method for bearing, called HMM with multi-domain features, is proposed. First, the multi-features, including time-domain, frequency-domain, wavelet packet decomposition, are used to extract the characteristics from the original vibration signals of bearing. To remove the redundant or irrelevant features, the PCA method is employed for selection features and reduction dimensionality. Then the low-dimensional features are converted into the scalar probabilities multiplied to generate the observation values of HMM, which are fed into HMM model to achieve fault diagnostics and prognostics for bearing. The experimental results show that the proposed scheme is efficient and available in diagnostics and prognostics for bearing. It is very interesting to integrate multi-features with optimized-HMM model for diagnostics and prognostics of bearing [30, 31], this study will be our future focus.

eISSN:: 2444-8656
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Journal RSS Feed

Fault Diagnosis and Prognosis of Bearing Based on Hidden Markov Model with Multi-Features

Published Online: Mar 30, 2020

Page range: 71 - 84

Received: Dec 05, 2019

Accepted: Jan 14, 2020

DOI: https://doi.org/10.2478/amns.2020.1.00008

Keywordshidden Markov model, fault diagnosis, prognosis, multi-features, wavelet packet

© 2020 Weiguo Zhao et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Keywords
hidden Markov model, fault diagnosis, prognosis, multi-features, wavelet packet