Construction of tree species composition map of Estonia using multispectral satellite images, soil map and a random forest algorithm

Abstract Landsat-8 OLI and Sentinel-2 MSI images from years 2015 and 2016, a 1:10,000 digital soil map and a large number of reference samples were used with a random forest machine learning implementation in GRASS GIS to construct a tree species map for the entire territory of Estonia (42,755 km2). Class probabilities for seven main tree species, an extra class for other species and probability of the forest cover not conforming to the forest definition were assigned for each pixel. Validation of dominant species distribution by area showed very strong correlation at county level both in state forests (R2 = 0.98) and in private forests (R2 = 0.93). Validation of tree species composition using harvester measurement data from 2,045 regeneration felling areas showed also very strong correlation (R2 = 0.75) with the measured values of the proportion of coniferous trees. There was some tendency to underestimate the proportion of more common species and overestimation was found for the species with smaller proportion in the mixture. The accuracy for the proportion of deciduous species that were present in a smaller number of reference observations was substantially smaller. Validation of the results by using data from 659 large sample plots from the database of the Estonian Network of Forest Research Plots and 3,002 small sample plots from the National Forest Inventory (NFI) data base confirmed the findings based on harvester data. The NFI data revealed also a decrease of estimation error with the increase of forest age. Cohen’s kappa index of agreement for main species for NFI sample plots with main species proportion equal to or greater than 75% decreased from 0.69 to 0.66 when observations with forests younger than 20 years were included in the comparison. Overall, the constructed map provides valuable data about tree species composition for the forests where no up to date inventory data are available or for the projects that require continuous cover of tree species data of known quality over the entire Estonia.


Introduction
Sustainable forest management planning and forest policy making requires forest inventory data at different spatial scales and generalization levels. Individual forest stand data are used for forest management planning (Metsakorralduse, 2017). Forest policy development and general monitoring of trends is usually based on National Forest Inventories (NFI) carried out on a sampling grid of permanent sample plots . Both of the forest inventory methods use remote sensing data. Stands are delineated for the forest management planning using aerial photographs (Spurr, 1948). Maps of wood volume estimates that are constructed with nonparametric estimation methods e.g. k-nearest neighbour (kNN) using sample plot measurements and feature variables from multispectral satellite images are some of the outputs of NFIs (McRoberts & Tomppo, 2007). In Estonia Tamm & Remm (2009) used reference set observations taken from a forest management inventory database (FIDB), Landsat ETM+ images and a 1:10,000 digital soil map data for machine learning-based construction of standing wood volume maps and obtained root mean square error (RMSE) 87.0 m 3 ha -1 (42%) at pixel level and 74.6 m 3 ha -1 (36%) at stand level.
Wood volume data are important for the estimation of carbon storage and estimation of timber, but tree species composition data are required for biodiversity assessment (Laarmann et al., 2009;McRoberts et al., 2012), satellite-based estimation of net primary production (Zhao et al., 2011;Lang et al., 2017), ecosystem models (Duveneck et al., 2015), and also for purposes of monitoring and forest industry planning. Tree species composition of a forest stand is a vector of the relative proportions of individual species stemwood volume from total stemwood volume of the stand. However, maps with up-to-date inventory estimates in the Forest Inventory Data Bases (FIDB) cover usually only the forests where the owner is interested in management and the number of NFI sample plots per unit area is only suffi cient for estimating regional averages. For spatially continuous estimates, a machine learning approach with spatial feature variables (multispectral images, airborne lidar data, soil maps) can be used to construct tree species composition maps at medium spatial resolution (20-30 m) for all forests in the region. Decision trees-based random forest-type (RF) methods have been successfully used for tree species classifi cation and land cover mapping (Yang et al., 2014;Barrett et al., 2016).
In this study we used a RF implementation (r.learn.ml) in GRASS GIS (GRASS Development Team 2017), Landsat-8 Operational Land Imager (OLI) and Sentinel-2 Multi spectral Instrument (MSI) images, a 1:10,000 digital soil map and a large number of reference samples drawn from an FIDB to construct a tree species map for all of Estonia. The map was then validated using county level statistics, harvester measurements from regeneration felling stands, samples from the database of Estonian Network of Forest Research Plots (Kiviste et al., 2015) and also a set of NFI sample plots.

The location
The study area (Figure 1) included the entire terrestrial territory of Estonia (42,755 km 2 ) except for Ruhnu, a small and distant island in the Baltic Sea. About half (53.2%) of the Estonian terrestrial territory is forest land, which is 51% publicly owned by state or municipalities while the remainder belongs to private forest owners; a small part (46,341 ha) was retained by the state after land restitution following the collapse of the Soviet Union (Raudsaar et al., 2017, Valgepea & Maamets, 2017. The most widespread forest trees in Estonia are European aspen (Populus tremula L.), silver birch (Betula pendula Roth), downy birch (B. pubescens Ehrh.), Norway spruce (Picea abies (L.) Karst.), black alder (Alnus glutinosa (L.) Gaertn.), grey alder (A. incana (L.) Moench), Scots pine (Pinus sylvestris L.) and common ash (Fraxinus excelsior L.). These tree species grow in different mixtures in Estonia. On fertile soils Norway spruce is also common in the mid-story and the understory of the forests. Due to the historical background state owned forests stands are dominated by Scots pine, birch spp. and Norway spruce and in private land the forest stands are dominated by birch spp. followed by Scots pine and grey alder (Raudsaar et al., 2017). This difference is caused by the tendency for natural regeneration of fast growing broadleaf deciduous species after regeneration fellings (Raudsaar et al., 2014) and the large share of abandoned agricultural private land where fast growing broadleaf deciduous trees do occur in the fi rst order. Ancillary data and feature variables Satellite imagery from Landsat-8 Operational Land Imager (OLI) and Sentinel-2 Multispectral Instrument (MSI) ( Table 1) were downloaded from the USGS GloVis portal (https://glovis.usgs.gov) and from the Copernicus Open Access Hub (https:// scihub.copernicus.eu). We used top-ofatmo sphere radiances and did not apply atmospheric correction. Using cloud-free sub-regions of the images it was possible to cover the entire country and pairs of images on different dates provided information on phenology as this has been found informative for mapping species composition (Wilson et al., 2012). All the images were transformed into the Estonian base map coordinate system (EPSG:3301) using GDAL tools (www.gdal. org). A pixel size of 25 m was used as a compromise between the original spatial resolution of the images and the large volume of data. Nearest neighbour resampling was used for the 30 m resolution Landsat-8 OLI (USGS, 2016) bands and for the Sentinel-2 MSI bands that have 20 m resolution. For the Sentinel-2 MSI bands with 10 m resolution (SUHET, 2015), the averaging of source pixel values was used for raster image resampling. Cloud and cloud shadow areas were delineated manually. Both of the scanners have a special channel near to the water absorption band in the electromagnetic spectrum that was useful for haze and cirrus detection. The entire territory of Estonia was subdivided into partially overlapping regions ( Figure 1) according to forest growth conditions and to establish data processing units with sufficient counts of reference samples. After cutting the individual Landsat and Sentinel images according to the regional borders for data processing there were 85 image combinations with suffi ciently large area and each containing a single image or a pair of images with phenology effect. In the central region the number of image combinations was the greatest (15), but in the western Estonia and in the islands only 7 image combinations could be constructed. Processing data by geographical sub-regions has been used to decrease the possibility of erroneously predicting a species outside its realistic region of occurrence (Duveneck et al., 2015). We used a similar approach ( Figure 1) but due to the variability of the Estonian forested landscape we included also data from the 1:10,100 national soil map (Mullakaardi, 2001). The soils data base was downloaded from the website (www.maaamet.ee [01.02.2017]) of the Estonian Land Board. Each soil polygon is associated with a data record containing the soil type and morphological information. For the machine learning procedure the soils were grouped according to Table  A1.1 using the pedo-ecological schema of normally developed mineral soils (see Figure 2 in Kõlli et al., 2004). The vector map of soils was rasterized, each pixel was assigned the soil group code, and the soil group code was used as a categorical variable in the RF estimation procedure.

Reference set data
While NFI data are used as reference sets in many studies for machine learning we used sample plots from the Estonian NFI only for validating our results. The NFI sampling grid is designed so that each annual sampling unit corresponds to 1,000 ha in land category-based estimates. Starting from 2014, the grid was modifi ed and the estimated number of yearly measured sample plots is now 5,600. Since the share of forest land is 53%, there will be about 200 sample plots per county with possible tree species composition data. This is not much when considering that in addition to species composition the spectral signatures of forests also depend on stand age, leaf area index, stand density, and ground vegetation (Nilson & Peterson, 1994). By incorporating sample plots from a larger area comprised of regions with different growth conditions and from several sampling years, this increases the number of observations but also increases the risk of mixing samples with similar spectral signature but different species composition. At the same time, reference samples for model fi tting and validation of the results are lost in places where clouds, cloud shadows and haze infl uence pixel values in the satellite images. Since NFI sample plots are small (radius 7 to 10 m), their positioning errors combined with raster image re-sampling errors introduce substantial random errors in the spectral signatures of the sample plots. Finally, the locations of NFI sample plots do not follow forest stands, but are determined by the sampling grid. Hence, sample plots can be located near to stand borders and thereby have a mixed spectral signature.
We used for our machine learning procedure the forest management inventory database from the Estonian Environmental Agency (Forest database, 2016). The database contains a 1:10,000 map of stand borders and mensurational data (forest age, stand height, basal area, stand relative density, site class, site type, wood volume, etc.) used for forest management planning. For each stand element (a tree species growing in particular social layer) its proportion is given according to wood volume. Although total wood volume is known to be underestimated in the database (Lang et al., 2014;Arumäe & Lang, 2016), the distribution of wood among species and thereby the tree species composition is usually reliable. The only exception is a small systematic underestimation of Norway spruce in state owned forests, according to volume measurements made at time of harvest (Tavo Uuetalu, The Estonian State Forest Management Company, personal communication).
A copy of the FIDB was obtained from the Estonian Environmental Agency on 13 February 2017. The FIDB data were preprocessed similar to Lang et al. (2016) to extract suffi ciently large and compact forest stands for a reference set and to exclude outdated FIDB records and the polygons with substantial variability in pixel values. As the RF estimation procedure in GRASS is pixel-based, the within stand variability described by variograms is technically complicated to use. However, training data can be prepared separately from the estimation procedure and we used mean values of pixels located near to stand polygon centroids instead of single, nearest to centroid pixel value.
Firstly, a subset of forest land parcels inventoried since 2014 with area between 1 and 10 hectares was extracted. The extracted polygons were buffered 30 m towards the inside and the areas under large ditches were cut out. Irregular polygons were then deleted from the selection. For each remaining polygon the mean radiance value was calculated for each band in the satellite images and only the polygons for which at least 16 pixels were extracted were kept.
Secondly, the selection was fi ltered using their spectral radiance. Parcels were retained if the ratio of standard deviation to mean in the near infrared radiance (NIR) bands (OLI5, MSI08) was less than the 97.5 th percentile of the population value. This fi lter excluded internally variable polygons. Next, the remaining observations with possible disturbances from 2015 to 2016 were identifi ed according to radiance changes in the blue (OLI2, MSI02) and NIR bands (OLI5, MSI08). For this the reference observations were grouped by main species and those deviating more than four residual standard errors from a linear regression model between radiances from different dates were excluded as disturbed.
Thirdly, the concordance of spectral ra diance on forest age and wood volume of remaining observations were analysed. Since the images were taken over two years and the forest inventory records were also from later dates than the last image, some polygons had small radiance in the short-wave infrared (SWIR) bands (OLI6, MSI11) characteristic of old stands but the forest age was zero. This hints at outdated or confl icting data, since young stands are brighter then old stands (Nilson & Peterson, 1994). All the observations were removed that had zero age and less than average radiance of the 1 to 6 year-old forests observations in the SWIR bands. Finally, all the stands older than 20 years or with wood volume over 50 m 3 ha -1 were grouped according to main species and outliers were identifi ed by their spectral radiance in 10 year age classes and 50 m 3 ha -1 volume classes. The stands with radiance in red, NIR or SWIR bands deviating more than three standard deviations from the class mean were excluded. The procedure was repeated three times. Some outliers in the classes with a small number of observations (very old stands) were identifi ed visually from scatter plots of wood volume and spectral radiance values. About 480 outliers were later detected and excluded when feature variable values from a 3 × 3 pixel window around polygon centroids were calculated for the random forest algorithm. The count of reference samples after all the outliers were removed was 102,291.

The random forest model fi tting and map construction
Random forests is a machine learning algorithm for classification that corrects for overfitting of the training set (Breiman, 2001). The random forest (RF) classification algorithm (r.learn.ml) in GRASS has the following hyper-parameters: number of feature variables during node splitting N feat , maximum tree depth H max , minimum number of samples required for node splitting N split , minimum number of samples for leaf node N leaf , and number of estimators N trees . The values for the hyper-parameters are recommended to fi t according to the user guide. Since the model construction involves random sampling of features during building of the trees, the results are dependent on the random number generator initial state I rand . During some initial tests we found that the algorithm was infl uenced by the distribution of observations between classes similar to kNN as found by Lang et al. (2014. We found that our estimates were more reliable with automatic balancing switched on. We also used the standardisation option of feature variables. The number of permutations of feature variable values during model construction was set to 10 (default value). The number of estimators was fi xed to default value N trees = 100.
The machine learning module includes a cross-validation component. We used non-spatial nested cross-validation with the reference set split to 2 folds for smaller sets of observations and up to 5 folds for larger reference subsets determined by the useful area of image pairs. In this study the crossvalidation of a single image or image pair based results was not of interest, however, cross-validation was important for optimisation of the algorithm parameters and selection of informative feature variables.
The RF classifi cation algorithm predicts a class code and also probabilities of all classes for each pixel in the target set. These probabilities can be used as reliability estimates (Barrett et al., 2016). However, considering that spectral signature of a forest stand is a linear mixture of the refl ectance of the trees, these probabilities can also be interpreted as species composition estimates for forest stands assuming that different species have different optical properties. In this study we used seven classes corresponding to the most widespread tree species in Estonia and one class for other tree species. A separate class was used for the reference observations located in recent regeneration felling areas where the tree canopy did correspond to the forest defi nition in Estonia.
The estimation procedure was performed in four steps for each image or image pair: 1. search for informative feature variables; 2. preparation of reference sample data; 3. search for the model hyper-parameter optimum values; and 4. imputation of target set pixel values.
In the fi rst step the hyper-parameters of the RF algorithm were set to N feat = 0 (automatic), H max = 27, N split = 30, N leaf = 8 and four estimates were imputed using   3,9,41,87  rand I . Feature variable values were extracted from centroid pixels of the reference polygons. The feature importance's for each run were obtained from nested cross-validation and averaged. Soil data were almost always ranked at as one of most important feature variables and followed usually by the NIR bands. The six to ten most informative spectral bands were selected for the RF model training and estimation. In the second step a training data table was created by sampling pixels located closer than 36 m to the reference stand centroid position. Spectral radiance was averaged and mode value of the pixels in soil map codes was used.
In the third step the RF model was fi tted to the training data to find optimal values for the hyper-parameters. We fi xed the maximum number of features for node splitting to two features less than the number of features found during the optimization step (not all features are always required), N leaf was fi xed to a value between 5-10 depending on the sample size, N split was usually set to 3N leaf , random state was I rand = 1 and maximum tree depth value was the free parameter searched from the range of 15-50.
Finally, imputation of the target set pixel values was carried out for For each reference set pixel the probabilities of classes were averaged from available estimates; this produced 9 raster layers that covered all of Estonia. The map layers correspond to the probabilities of tree species in the composition and one layer contained the probability that the pixel does not correspond to forest stand defi nition (tree layer was too sparse or trees were too small). The species composition estimates for each pixel were calculated by scaling the tree class probabilities to sum to 100 excluding the non-forest probability class. Finally, the class code with the highest probability was then indicated in a separate layer of the tree species composition map. Using the Estonian 1:10,000 base map, the pixels with highly improbable occurrence of tree cover were assigned a no data fl ag. The no data fl ag was also assigned to agricultural land where the records of the Estonian Agricultural Registers and Information Board indicated the landowner applied for a subsidy in 2009-2011.

Validation of the species composition map
The validation of the tree species composition map was carried out using four data sources. The fi rst validation dataset was the area distribution of inventoried stands by dominant tree species in counties according to offi cial national statistics (Raudsaar et al., 2017). The pixel distribution of the dominant species layer was calculated from the species composition map. The second validation dataset contained timber volume measurements made by harvesters in regeneration fellings that was made available by the Estonian State Forest Management Company. From this dataset we selected 2,045 records corresponding to the stands that contained at least 16 pixels and where more than 85% of the pixels were assigned a tree species class as the most probable and less than 20% of the harvested timber (fuelwood, etc.) was not assigned to a tree species. The mean proportion of each tree species in the species composition map was calculated for each stand and stand level data were compared with the harvester measurements.
The third validation dataset was extracted from the database of Estonian Network of Forest Research Plots (ENFRP) (Kiviste et al., 2015) from the list of sample plots measured from 2012 to 2015. The sample plots have a radius ranging from 15 m to 30 m depending on forest age, thus a sample plot could cover an area larger than the 25 m pixel. Mean age of the forests was 69 years ranging from 17 to 243 years. The sample plots were established in homogeneous parts of forest stands and therefore are representative also for their near vicinity. After checking for possible stand replacing disturbances using orthophotos and according to the forest age and brightness relationship, the number of suitable observations was 659 observations. A subset from the Estonian National Forest Inventory (Adermann, 2010) database was used for a fourth set of validation tests of the stand map. These sample plots have a radius of 7 m or 10 m and the error in co ordinate values is usually less than 45 m. A subset of 3,002 sample plots was extracted to analyse species composition. The selection criteria were as follows: the plot was not near to roads or ditches, the wood volume was greater than 5 m 3 ha -1 and the probability of the non-forest (NFD) class was less than 10%. Each plot was related to the imputed values of the nearest pixel of the sample plot location. For each sample plot Euclidean distance between the measured and predicted species composition vectors was calculated as , (1) where k i,SMI is the proportion of the species i in the NFI data and k i,map is the predicted proportion of the i th species. Dependence of on the forest age was analysed.
For estimates of categorical variables as land cover types it is common to report Cohen's kappa index of agreement. Tree species proportion, however, is a complex ratio variable. It is reasonable to calculate Cohen's kappa only for the validation sample plots where proportion of dominating species is suffi ciently large to classify the observations as pure stands of particular tree species. From the NFI data we first selected the sample plots where dominating species proportion was equal or greater than 75%. A second validation subsample of pure stands was created by excluding the stands with age less than 20 years. All statistical analyses were carried out in R (R Core Team, 2016).

Results and Discussion
The total area of the map pixels with tree species composition is 2.26 million hectares that is about 8% more than the offi cial estimate (Raudsaar et al., 2017) of forested forest land area of 2.09 million hectares. The difference is related to distinction of forest land from bush and bog land categories, and interpretation of land cover for small wooded land patches. The comparison of the main species distribution in inventoried forest stands at the county level indicated a high correlation between the predicted values and national statistics ( Figure 2). For state forests (862,136 ha) the determination coeffi cient R 2 was 0.98 and for private forests (803,525 ha) R 2 was 0.93. The random forest algorithm-based estimates showed a larger share of grey alder and birch stands in the private forests and a greater share of Scots pine stands in state forests, similar to national inventory statistics. This result indicates that the constructed map ( Figure A3.1) of tree species composition (available from Tartu Observatory web page (www.to.ee) and upon request from the corresponding author) can be used for the rest of the forest land (410,778 ha) for which there are no records in the FIDB.
The aggregated estimates of main species at the county level indicated that there are no substantial shortcomings in the data processing. However, our objective was to obtain a map of tree species composition, not only the main species. The comparison of predicted proportion of tree species in stands with the harvester measurements showed the strongest correlation for Scots pine followed by Norway spruce, birch, and European aspen ( Table 2). The predicted proportion of coniferous trees had very strong correlation (R 2 = 0.75) with the measured value. However, there was also substantial scatter in the relationship (Figure 3) and the gain of the expected linear model was only 0.67. There was a characteristic lack-of-fi t with large values underestimated and small values overestimated. It appeared (Table 2) also that the mean proportion of the Norway spruce, Scots pine and birch was underestimated and the proportion of less common species tended to be overestimated. Estimation errors can cause biased values because the proportion of species cannot be negative or greater than 100%. Additionally, the random forest algorithm only predicted the probability of classes (main species) and the scope of this project did not include reprogramming of the RF implementation to access directly the data vectors of the reference observations. Also, the target set pixel values were calculated as mean values of several estimates based on available image combinations and three selected random state values. The mean proportion of Norway spruce was substantially greater in the harvester dataset than in the constructed map, similar to the apparent underestimate in the forest inventory database (FIDB) compared to the harvester data. Since the FIDB was used to draw reference set observations for the random forest algorithm the imputed target set pixel values in the constructed map were probably infl uenced by the possible bias in the FIDB. However, it was not possible to identify the true causes of the observed systematic difference that exists between the harvester measurements and the constructed map in the proportion of Norway spruce and birch in tree species composition in forest stands.   While the harvester measurement data represented only old forest stands, the sample plots from the ENFRP dataset (Kiviste et al., 2015) included also younger stands. There was a strong correlation (R 2 = 0.79) between the measured and predicted proportion of evergreen coniferous trees, similar to harvester dataset, but the gain of the expected linear relationship was greater (0.76). Nevertheless, there was also rather large scatter at the sample plot level ( Figure  4). The validation dataset confi rms that the proportions of Norway spruce, Scots pine and birch were underestimated and the proportion of less common species was overestimated in the target set pixels ( Table  3).
The validation based on NFI sample plots (Table 4) showed results similar to the other validation data sets. However, the coeffi cient of determination R 2 between the measured and predicted proportion of coniferous trees at the sample plot level was only 0.64. A similar decrease in R 2 also was present at the species level compared to the other validation data sets. The main reason for the decreased correlation likely was the smaller plot size with respect to pixel size and errors in the spatial location of the sample plots. However, the NFI dataset covered the entire age range of forests and this enabled a study of possible dependence of the estimation errors in relation to forest age. There was a decrease (slope -0.21 in linear model, p-value < 0.001) ( Figure 5) in the estimation error depending on forest age (R 2 = 0.1, p-value < 0.001 at 3,000 degrees of freedom). There was no age dependence in the number of tree species in the NFI sample plots as indicated by nonsignifi cant slope (p-value > 0.2) and R 2 = 0 of the relationship.
T able 3. Mean proportion of species in 659 sample plots from ENFRP (Kiviste et al., 2015) database and the constructed map. S e is standard error. Sample plot measurements are used for the independent variable in the linear model y = a + bx. Tabel 3. Puuliikide osakaalude hinnangud 6 59 kasvukäiguproovitükil (KKPRT) (Kiviste et al., 2015)    Overall accuracy of the main species estimates was 75.5% for the 1,529 NFI sample plots where a single tree species proportion was equal or greater than 75%. Mean value of Cohen's kappa index of agreement was 0.66. Scots pine stands were the most accurately discriminated followed by Norway spruce stands. Separation of deciduous species was less accurate (Table  A2.1). This is similar to the results obtained from the analysis of species composition. By excluding stands less than 20 years-old the overall accuracy increased to 78.4% (Table  A2.2) and kappa increased to 0.69 for the subsample of 1,354 NFI sample plots. The increase in accuracy is small but consistent with the previous analysis ( Figure 5) that showed increased estimation accuracy of tree species composition in older stands. We used class membership probabilities of RF machine learning procedure as estimates of tree species proportions in the species composition. The underlying assumption was that the spectral signature of a forest depends linearly on tree species composition. The validation results indicated that the constructed map of tree species composition provided reliable estimates of the main tree species in all counties in Estonia. Discrimination of deciduous tree species proportions in a mixture was less accurate than Norway spruce and Scots pine proportion, which is related to the differences in their spectral signatures. Main species estimation accuracy at the pixel level for NFI sample plots with dominant species proportion of 75% and more was 75.5% and increased when young stands were excluded. However, predictions of species proportions in composition at the stand or pixel level have a lack-of-fi t characterized by an underestimation of larger values and overestimation of smaller values. While our assumption was justified that the class probabilities predicted by the random forest procedure may be used as linear proportions of species, as shown by correlation analyses, there are options to improve precision. For example, there was always about 2 to 3% probability for each class in the imputations. Considering that there were 9 classes in the dataset, the results have always about 18 to 27% noise. This noise could be reduced by direct processing of the data vectors of the reference samples from leaf nodes of the decision trees in the random forest model. However, this requires modifi cations in the software of the GRASS machine learning module. The forest age dependence of the predicted species composition indicates that a twostage approach could be tested in future studies; inclusion of canopy height information from airborne laser scanning (ALS) also may be useful to separate forests by age. It is also possible that calibration of the class membership probabilities can improve the accuracy (Niculescu-Mizil & Caruana, 2005), however, the procedure requires an additional independent set of observations.

Conclusions
In this study we processed freely distributed multispectral satellite images using a Random Forest implementation in free software GRASS and constructed the fi rst high spatial resolution map of tree species composition for Estonia. Validation of the map showed good discrimination between deciduous broadleaf and evergreen coniferous species, but separation and estimation of proportion of deciduous broadleaf species was less accurate and this has to be targeted in further studies. Overall, the constructed map provides valuable data for the forests where no up-to-date inventory data are available or for projects that require continuous cover of tree species data of known quality over all of Estonia.

Acknowledgements. The Estonian State
Forest Management Centre provided financial support for the study. The Estonian Network of Forest Research Plots is supported by the Estonian State Forest Manage ment Centre and the Estonian Environ-mental Investment Centre. The Estonian Land Board released digital soil map for public use. Landsat-8 OLI and Sentinel-2 MSI images were made available by the USGS and Copernicus framework. Discussions with Dr. Kalle Remm helped to improve our understanding about machine learning. Prof. Andres Kiviste suggested statistical tests and commented on early versions of the manuscript. Dr. John Stanturf commented the manuscript and edited the English language. We thank anonymous reviewers for comments that helped us to improve the quality of the manuscript. Authors acknowledge also GRASS GIS developers for their efforts to create free software.
Appendix A2. Confusion matrices of pixel level dominant tree species estimates. Lisa A2.