Textile Fiber Identification Using Near-Infrared Spectroscopy and Pattern Recognition

Abstract Fibers are raw materials used for manufacturing yarns and fabrics, and their properties are closely related to the performances of their derivatives. It is indispensable to implement fiber identification in analyzing textile raw materials. In this paper, seven common fibers, including cotton, tencel, wool, cashmere, polyethylene terephthalate (PET), polylactic acid (PLA), and polypropylene (PP), were prepared. After analyzing the merits and demerits of the current methods used to identify fibers, near-infrared (NIR) spectroscopy was used owing to its significant superiorities, the foremost of which is it can capture the tiny information differences in chemical compositions and morphological features to display the characteristic spectral curve of each fiber. First, the fibers’ spectra were collected, and then, the relationships between the vibrations of characteristic chemical groups and the corresponding wavelengths were researched to organize a spectral information library that would be beneficial to achieve quick identification and classification. Finally, to achieve intelligent detection, pattern recognition approaches, including principal component analysis (PCA) (used to extract information of interest), soft independent modeling of class analogy (SIMCA), and linear discrimination analysis (LDA) (defined using two classifiers), assisted in accomplishing fiber identification. The experimental results – obtained by combining PCA and SIMCA – displayed that five of seven target fibers, namely, cotton, tencel, PP, PLA, and PET, were distributed with 100% recognition rate and 100% rejection rate, but wool and cashmere fibers yielded confusing results and led to relatively low recognition rate because of the high proportion of similarities between these two fibers. Therefore, the six spectral bands of interest unique to wool and cashmere fibers were selected, and the absorbance intensities were imported into the classifier LDA, where wool and cashmere were group-distributed in two different regions with 100% recognition rate. Consequently, the seven target fibers were accurately and quickly distinguished by the NIR method to guide the fiber identification of textile materials.


Introduction
Fibers are raw materials used for manufacturing yarns and fabrics. As the basic element of fabrics and other textile structures, their varieties and properties are closely related to the performances of yarns and fabrics [1][2]. Therefore, in processes such as the analysis and design of textile products, commodity inspection in import and export, and so on, identification of textile fibers is an important component. To date, in the textile field, there are all kinds of fiber materials, including natural fibers, such as cotton and wool; regenerated fibers such as lyocell; and synthetic fibers or artificial fibers (created by extruding fiber-forming materials through spinnerets into air and water, forming a thread) such as polyethylene terephthalate (PET) and polypropylene (PP).The large amount of textile fibers available renders it difficult for researchers to use a stable and quick method to achieve identification. Therefore, how to go through the fiber identification procedure with celerity and precision has become a challenging research topic in the field of fiber testing.
The most common methods of identifying textile fibers include the handle vision method, microscopic observation, as well as chemical and physical analyses. The handle vision method used to differentiate fibers is mainly based on the fibers' surface parameters, such as length, dimeter, curl, and luster. Sometimes, to increase the identification accuracy, this method is combined with burning to enable one to observe and smell the fibers' status. However, this method has certain limitations that cause a reduction of accuracy: on the one hand, handle vision is easily affected by the subjective judgments of inspectors, and on the other hand, through after-finish techniques, many synthetic fibers are finished to possess similar surface parameters, which confuse the identification process. The microscope [1][2][3][4][5], usually accompanied by image processing [6,7], has been the primary tool for fiber analysis, and its applications range from simple stereomicroscopy through to scanning electron microscopy (SEM), which help in visualizing the longitudinal and cross-sectional characteristics. The longitudinal characteristic of the cotton fiber is its natural twist, the wool fiber has scale, and the surface of the silk fiber is smooth and its section is an irregular triangle. The obvious difference enables the successful identification of the textile fibers through microscopic analysis. Yet, the American Association of Textile Chemists and Colorists (AATCC) notes that microscopy should be used with caution on manufactured fibers since the fibers are produced using a variety of modifications that alter their appearance [1]. For example, many synthetic fibers possess smooth appearance and round cross section, due to which microscopy is not valid. In chemical analysis [1], the identification is accomplished by solubility tests, and this method is accomplished by solvent concentration, use of high temperature, and dissolution, causing a complicated and time-consuming manipulation. Melting point measurement, thermal analysis [2] and fluorimetry [8] are common physical analyses but have low efficiency and accuracy. Recently, the DNA [9][10][11] method has become popular for identifying fiber classes qualitatively and for determining the fiber concentration quantitatively. While it is reliable and objective, DNA analysis is time consuming and destructive, which makes it impractical and ineffective in many cases. With the development of modern instrumentation, fiber identification has become increasingly convenient and accurate with the use of spectroscopic methods, such as Raman spectroscopy [12], terahertz time-domain spectroscopy [13], and near-infrared (NIR) spectroscopy [14][15][16]. In this paper, we have tried NIR spectroscopy for the identification of textile fibers.
NIR spectroscopy, one of the vibrational spectroscopic techniques, is a novel qualitative and quantitative method and has been widely applied in several fields, including analysis in agriculture, pharmaceutical, textile, wood, food, and petroleum industries, with superior advantages of credibility, nondestruction, high speed, and minimal sample preparation [17][18][19][20][21]. In this approach, the sample does not need to be preprocessed, and the sample is only put under the NIR spectometer without the need for any complicated processes to prepare the sample. An NIR spectrum covers wavelengths between the visible (about 400-780 nm) and mid-IR bands (about 2500-20000 nm), normally extending from 780 nm to 2500 nm [22][23][24]. The NIR energy absorbance relies on the material's interactions with light, which are mainly associated with the overtone vibrations and combination vibrations of chemical bonds, such as C-H and O-H, located on distinctive wavelengths. Apart from the chemical composition, the physical properties of materials (such as surface scattering) and sample size also have an influence on the spectral curve. Therefore, the NIR spectral method can successfully monitor the key chemical and physical properties of materials to complete qualitative and quantitative analyses. While using NIR spectroscopy to implement the classification of materials, the NIR method needs to be combined with several statistical analysis methods and pattern recognition approaches, such as principal component analysis (PCA) [25][26], partial least squares (PLS) [27], artificial neural network (ANN) [28], and support vector machine (SVM) [29]. In this paper, combining the PCA, soft independent modeling of class analogy (SIMCA) [30][31], and linear discrimination analysis (LDA) [32], we use NIR spectroscopy to quickly and efficiently identify different kinds of textile fibers. Ultimately, a textile library of NIR spectroscopy is built to guide the process of fiber identification.

Sample preparation
In this paper, seven kinds of textile fibers with the weight of 1 g were prepared. The fiber list contained natural fibers (cotton, fine wool, and cashmere), regenerated fibers (tencel), and synthetic fibers [PP, PET, and polylactic acid (PLA)]. All provided fibers were carded into a sliver with a dimension of 20 mm (length) ×15 mm (width) ×5 mm (thickness), and then were placed in the laboratory environment at constant temperature and humidity for 24 hours.

Spectral collection and pretreatment
The spectrometer used in this study was the vision Luminar 5030 (Brimrose Corporation of America), a handheld acoustooptic tunable filter-NIR (AOTF-NIR) miniature spectrometer that scans from 1100 nm to 2300 nm and the removal of two parts before (780-1100 nm) and after (2300-2500 nm) could essentially avoid the noise interference of visible and mid-IR wavelengths. Relying on the Brimrose analytical software called Snap 32!, the NIR spectra were collected by acquire. exe and preprocessed by prospected.exe. In the collecting spectra, the wavelength scanning increment was set as 2 nm to guarantee detailed information, and the scans to average and the smoothing constant were set to be 50 and 5, respectively, to reduce the noise effect with a high signal-to-noise (S/N) ratio.
The probe of the NIR analyzer was placed in contact with the sample surface to collect spectra. In order to further reduce noise disturbance, five duplicate spectra were collected in each location, and the averaged spectrum was used in the subsequent analysis. In this paper, the spectra were captured on 75 testing locations for each fiber. However, while changing the testing locations, the spectra showed a baseline shift, just as shown in Figure 1(a), which represented the spectra of cashmere fiber. The main reason for the baseline shifts is the change in the diffuse reflectance mode of the NIR probe, which can be affected by the nonhomogeneous distribution of sample mass, size, and morphology (shape and surface roughness). In order to analyze the spectral information accurately, it is critical to pretreat the spectra to eliminate or minimize baseline variations by applying the baseline offset function offered in prospected.exe to the spectra. Compared with Figure 1(a), Figure 1(b) did not show obvious baseline shifts in the pretreated spectra, successfully correcting unwanted systematic locationto-location variations.

PCA, LDA, and SIMCA
In the spectral curves, there were 601 variables; not all variables could be equally important and unequivocally assigned to characteristic information. Therefore, reducing the data dimensions was required to extract suitable and new variables for subsequent analyses. In this paper, we used the PCA -an unsupervised pattern recognition analysis -to refine the useful information and applied the SIMCA -a supervised classifier -to achieve the identification of the seven fibers. In order to increase the identification of wool and cashmere, the LDA-supervised pattern recognition was also used to implement binary classification.
The SIMCA method is useful for classifying high-dimensional observations because it incorporates properties of PCA for dimension reduction and provides additional information on different groups, whereby it is assumed that the samples that lie closer to each other in measurement terms are likely to belong to the same category. First, each training set class is described by its own PCA model, and a certain number of principal components, gained by transforming correlated variables into a set of new linearly uncorrelated variables, are confirmed. Then, the SIMCA model defines subspaces within a given confidence limit, and the samples are projected in each subspace and the distances among the subspaces are assessed. Generally, the larger the interclass distance between the distances of two groups, the better is the separation. Therefore, when the spectral samples are imported into the SIMCA classifier, the samples can be classified automatically.
The LDA is similar to the PCA, but it considers the label problem, anticipating greater distance between different categories and more compactness among members of same category. When using the LDA to operate the binary classification, the original data is reduced to one dimension for best classification ability. however, the PET fiber specially possessed a benzene ring (Ar); the tencel and cotton fibers were both mainly made of cellulose, including C-H, C-C, and O-H groups; cashmere and wool contained peptide (CO-NH), disulfide (S-S), C-H, and so on. Therefore, by just relying on the morphological features or chemical features, it is impossible to classify these seven fibers, and by only combining two aspects, the classification can be achieved. However, the distinctions between physical and chemical features can be captured sensitively by the NIR spectroscopy method to yield distinguishing curve features; thus, we can rely on the spectral signals to accomplish the fiber classification. Figure 4(a) and (b) present the averaged spectra and the second-derivative spectra of the seven fibers, respectively. In order to clearly discern the spectral distribution of the seven fibers, the seven spectra were classified into three groups, as shown in Figure 4(c) -averaged original spectra and (d) -second-derivative spectra. As we all know, the absorbance intensities in a spectrum are the characteristics of the functional groups of a molecule. Therefore, when molecular groups happen to vary, the distinctions result in several slight and high shoulders in the spectral distribution. With reference to literature [33], Table 1 lists the homologous wavelengths of absorption shoulders and the corresponding vibrations caused by chemical molecules. For PP, PET, and PLA fibers, due to the significant differences in the chemical content, there were special shoulder distributions at various wavelengths in the original spectra and special zero distributions in the second-derivative spectra. Inversely, owing to similar chemical information in the cotton/tencel group and the wool/cashmere group, they have the same wavelength shoulders and molecular variation, so that the second-derivative spectra were nearly coincident. However, at wavelengths of about 1480 nm and 1942 nm, the absorbance intensities of the tencel fiber were higher than those of cotton, and at about 2100 nm, the cotton fiber showed higher dominance. Moreover, the wool spectrum was located above that of the cashmere fiber. Therefore, visual differences in the spectral signal would support the pattern recognition methods used to classify these given fibers.     two outliers in PLA, two outliers in tencel, one outlier in cotton, one outlier in wool, and one outlier in cashmere were detected. Figure 5(a) displays the accumulated variances along with increasing PCs for the calibration subsets, where the variances were up to 90% when the PC number was set as three. Through cross validation, when the PC number was three, the residual sum of squared errors was within 0.005.Therefore, for the seven fibers, the PC number was consistently defined as three for each class.

Analysis of spectral characteristics
In the PCA, if a sample is similar to the other samples in the class, it will lie near them in the PC map. The score plot of the calibration subsets with three PCs is shown in Figure  5(b), where we can see that the seven fibers were distributed in clusters, with farther distances of samples with respect to the three components, representing the fiber classification macroscopically. In Figure 5(b), PP, PET, PLA, cotton, and tencel samples had excellent cluster separation. It was not a surprise that the cluster of wool fiber was closer to the cashmere fiber than the other groups, since the spectral signal of cashmere fiber was very similar to that of wool fiber.

PCA and SIMCA results
In the PCA and SIMCA analysis, the spectral data need to be divided randomly into two subsets. One subset is the calibration set used to establish chemometric models, and the other subset is used to predict the classification. In this paper, 75 spectra in each fiber were collected; therefore, we randomly selected 40 spectra in each fiber as calibration set and the remaining 35 spectra as validation set. Relying on the Camo Unscrambler software with 10.4X vision (CAMO Software Inc., One Woodbridge Center, NJ, USA), the PCA model and the SIMCA classifier were constructed. In the PCA model, confirming the number of significant principal components (PCs) is the pivotal step, and only an appropriate number of PCs can guarantee the accurate prediction performance of SIMCA. If too few PCs are retained, the information content contained in the model for the class is distorted, whereas the retention of too many PCs will diminish the signal-to-noise effect. In this paper, the calibration set in each class was first processed under 5% Hotelling's T2 ellipse to exclude some outliers; thereby, three outlier samples in PET,  Relying on the PCA model of each class, the SIMCA model was established based on the calibration subsets. Generally, the classification performance of the SIMCA model is evaluated by the interclass distance between two groups, the percentage of rejection, and the recognition rate. The larger the interclass distance between two groups, the better is the separation. Table 2 lists the interclass distance under the 95% confidence interval, and Figure 6(a) also visualizes the interclass distance distribution of the calibration samples, relative to the PP fibers and cotton fiber, wherein the seven clusters were dispersed at a large distance, but the distance between wool and cashmere was relatively short, which could confuse the identification. The two parameters percentage of rejection and recognition rate can be described as the advantage of the SIMCA model, with its ability to determine not only whether a sample belongs to a predefined category but also whether it does not belong to any class. In this paper, the recognition rates and rejection rates obtained from the calibration subsets are displayed in Table   Table 2   where wool and cashmere were classified with 100% accuracy. When the validation set was imported into the LDA model, the classification was successfully predicted with 100% accuracy, increasing the recognition rate.

Conclusion
In this paper, relying on the NIR spectroscopy method and the three pattern recognition methods PCA, SIMCA, and LDA, seven common textile fibers were identified quickly and accurately. The sensitivity of NIR could capture the discrepancy of samples in terms of their physical and chemical properties to yield special spectral signals. In the paper, the spectra features corresponding to each fiber were recorded and analyzed based on the variation of chemical bonds, which was beneficial for establishing the NIR spectra library for later recognition. In the pattern recognition method, the combination of PCA and SIMCA was adopted. In the PCA model, three PCs were selected by cross validation, and the seven fibers were dispersed in the score plot. In the SIMCA analysis, the interclass distance, recognition rate, and rejection rate were described to represent the classification performance. The parameter results showed that PP, PET, PLA, cotton, and tencel could be 3, with 100% recognition rate; however, one wool sample was simultaneously recognized as cashmere and eight cashmere samples were also classified as wool.
When the SIMCA was developed, the validation subsets were imported into the model to predict the classification. Figure 6(b), similar to Figure 6(a), displays the dispersal of the seven fibers, and the predicted classifications in the validation subsets are shown in Table 2: there was nearly 100% recognition rate and 100% rejection rate; however, the recognition rate of cashmere was low, and six cashmere samples were simultaneously recognized as wool and 14 cashmere samples were wrongly classified as wool.

LDA results
In order to increase the recognition rates of wool and cashmere, their spectral signals were analyzed unconventionally. In this paper, 300 spectra for wool and cashmere were collected, and each 10 spectra was set as a group; the averaged spectrum was used in the following analysis. The process of grouping and obtaining the mean result would decrease the disturbance of their overlapped features. Therefore, there were 30 averaged spectra for each class, and 20 spectra were defined as the calibration set, the remaining spectra belonged to the validation set.
As shown in Figure 4, the absorbance intensity of wool was higher than that of cashmere and there were six spectral shoulders. Referring to Table 1, the six shoulders were formed by the protein chemical bonds, the spectral bonds for wool and cashmere. As we know that the chemical bonds affect the spectral signal at a certain wavelength, combining the secondderivative spectra, we could confirm the six corresponding wavelength ranges where the shoulders were at the center. The characteristic ranges of the wool and cashmere spectra were 1440-1484 nm, 1486-1540 nm, 1720-1750 nm, 1900-1972 nm, 2034-2138 nm, and 2140-2220 nm. To avoid sensitive variations at a single wavelength, the total absorbance intensity in each characteristic band was used as a variable to implement analysis. Based on the six characteristic variables, the calibration sets were analyzed by LDA-binary classification, and the classification result is shown in Figure 7,  divided efficiently with large distances, as well as nearly 100% recognition rate and 100% rejection rate. But owing to many overlapping features in the chemical and physical analyses, the pattern method yielded confusing results and low recognition rate, whereby many samples were simultaneously recognized as wool and cashmere. In order to increase the recognition rate of wool and cashmere, six characteristic variables representing the protein structure were selected to implement LDA. The LDA result showed that the two fibers were classified with 100% accuracy. To sum up, the NIR spectroscopy method could efficiently and accurately recognize all kinds of textile fibers, showing significant superiorities compared with other identification methods.