Process-based vegetation models are crucial tools to better understand biosphere-atmosphere exchanges and ecophysiological responses to climate change. In this contribution the performance of two global dynamic vegetation models, i.e. CARAIB and ISBACC, and one stand-scale forest model, i.e. 4C, was compared to long-term observed net ecosystem carbon exchange (NEE) time series from eddy covariance monitoring stations at three old-grown European beech (Fagus sylvatica L.) forest stands. Residual analysis, wavelet analysis and singular spectrum analysis were used beside conventional scalar statistical measures to assess model performance with the aim of defining future targets for model improvement. We found that the most important errors for all three models occurred at the edges of the observed NEE distribution and the model errors were correlated with environmental variables on a daily scale. These observations point to possible projection issues under more extreme future climate conditions. Recurrent patterns in the residuals over the course of the year were linked to the approach to simulate phenology and physiological evolution during leaf development and senescence. Substantial model errors occurred on the multi-annual time scale, possibly caused by the lack of inclusion of management actions and disturbances. Other crucial processes defined were the forest structure and the vertical light partitioning through the canopy. Further, model errors were shown not to be transmitted from one time scale to another. We proved that models should be evaluated across multiple sites, preferably using multiple evaluation methods, to identify processes that request reconsideration.