This paper proposes an ensemble neural network (ENN) framework for robust automatic speech recognition (ASR). The proposed ENN framework can be divided into offline and online phases. In the offline phase, the ENN framework first applies an environment clustering technique to partition the training data into several subsets, where each subset characterizes specific local information of the entire acoustic space. Next, each subset of training data is adopted to train an NN acoustic model. Finally, the entire set of training data is used to estimate a gating function, which can determine the most suitable NN acoustic model given an input utterance. In the online phase, given the testing utterance, the gating function specifies the optimal NN acoustic model to perform speech recognition. Because local environment information is incorporated, ENN can effectively determine the NN acoustic model that optimally matches the testing condition. The proposed framework was evaluated on the Aurora-2 task. Experimental results show that the proposed ENN framework can provide a notable word error rate reduction of 5.35% (from 5.05% to 4.78%) when compared to the baseline.