Power consumption has become the major bottleneck for modern high-performance architectures, which typically contain large numbers of modules. To suppress leakage power, sleep transistors have been extensively used, and wake-up scheduling is needed to determine the wake-up times and order of these sleep transistors. Most existing works on wake-up scheduling are based on sleep transistors and delay buffers in daisy-chains; they work well for the gate-level scheduling within a module when all the gates need to be turned on. Yet, for state-of-the-art designs, the number of modules that need to be turned on and their locations may vary depending on the task to be performed at runtime. Accordingly, we cannot extend the existing gate-level scheduling algorithms to decide the module-level wake-up order. To address the problem, we propose to first off-line construct a multi-conflict graph (MCG) based on the noise constraints; based on the graph, we then develop an on-line algorithm to decide the wake-up order. Experimental results show that on average, the wake-up latency from our approach is not only 46.01% shorter compared with the existing work but also conservatively only 0.45% longer than that from a Monte Carlo search-based evaluation, which is orders of magnitude slower. To the best of our knowledge, this is the first in-depth study on on-line module-level wake-up scheduling for high-performance architectures.