As the semiconductor manufacturing technology advances, the size of a wafer becomes bigger and the critical dimension becomes smaller than before. This means a wafer can be used to produce more chips. However, the process of manufacturing chips is costly while using today's semiconductor manufacturing technology. Any defect on the wafer may fail the final product and cause large business loss. To reduce the chance of defects on the wafer, the parameters of the manufacturing environment must be precisely controlled. To achieve this goal, a monitoring system is usually used to collect real-time information, which helps shorten the decision time for changing the parameters of the manufacturing environment. For now, most of the semiconductor manufacturing machines support the SECS/GEM standard, which defines how to obtain the monitoring data of the machines via TCP/IP. The problem is that, the existing monitoring approach rarely supports failover and needs human intervention when the system crashes. This implies a long recovery time. Moreover, the failure may further cause other problems. For example, a manufacturing alarm system could generate a false alarm or overlook an important abnormality during the failure time, since the monitoring system fails to feed any data to the alarm system. To solve this problem, we introduce a new fault-tolerance monitoring mechanism based on the techniques of server redundancy and checkpointing. With the proposed approach, the monitoring system is able to achieve a very small downtime, and consequently helps the manufacturing process and the yield rate.