Energy storage such as battery and thermal energy storage is an effective approach to shift building peak load and alleviate grid stress at a building cluster level. However, due to the heterogeneous performance of different types of storage (e.g., response speed, charge/discharge efficiency and rate, storage capacity) and highly diversified energy use patterns of individual buildings, the multi-energy storage should be properly selected and optimally designed for individual buildings to achieve effective load shifting. The optimal deployment of multi-energy storage at a cluster level is a challenging optimization problem due to the nonlinear dynamic performance of the multi-energy storage and the high dimensionality as a result of a large number of buildings. To tackle the challenges, this study proposes a data-driven surrogate optimization method that optimally deploys multi-energy storage at a cluster level to minimize the building cluster energy bill under demand response programs.