Increasing the capacity of wireless cellular network is one of the major challenges for the coming years. A lot of research works have been done to exploit the ultra-wide band of millimeter wave (mmWave) and integrate it into future cellular networks. In this paper, to efficiently utilize the mmWave band while reducing the total deployment cost, we propose to deploy the mmWave access in the form of ultra-high capacity mmWave gates distributed in the coverage area of the macro basestation (Macro BS). Delayed offloading is also proposed to proficiently exploit the gates and relax the demand of deploying a large number of them. Furthermore, a mobility-aware weighted proportional fair (WPF) user scheduling is proposed to maximize the intra-gate offloading efficiency while maintaining the long-term offloading fairness among the users inside the gate. To efficiently link the mmWave gates with the Macro BS in a unified cellular network structure, a cloud cooperated heterogeneous cellular network (CC-HetNet) is proposed. In which, the gates and the Macro BS are linked to the centralized radio access network (C-RAN) via high-speed backhaul links. Using the concept of control/user (C/U) plane splitting, signaling information is sent to the UEs through the wide coverage Macro BS, and most of users’ delayed traffic is offloaded through the ultra-high capacity mmWave gates. An enhanced access network discovery and selection function (eANDSF) based on a network wide proportional fair criterion is proposed to discover and select an optimal mmWave gate to associate a user with delayed traffic. It is interesting to find out that a mmWave gate consisting of only 4 mmWave access points (APs) can offload up to 70 GB of delayed traffic within 25 sec, which reduces the energy consumption of a user equipment (UE) by 99.6 % compared to the case of only using Macro BS without gate offloading. Also, more than a double increase in total gates offloaded bytes is obtained using the proposed eANDSF over using the conventional ANDSF proposed by 3GPP due to the optimality in selecting the associating gate.