Big data analytics in healthcare is one of the most important processes that help accelerate the progress of clinical research. With the implementation of health information system and electronic health records (EHR), all relevant clinical data—data corresponding to the time before disease onset, during disease progression, and after successful treatment—are recorded in the EHR system. Enriched EHR may contain key information related to disease progression, and access to this information could help in healthcare decision-making. However, the characteristics of healthcare big data, such as heterogeneity and sparseness, make pre-processing and analysis of the information difficult, creating a common bottleneck in healthcare big data analytics. Therefore, we need an algorithm and easy-to-use tool to help explore data related to disease progression recorded in EHR and acquire crucial information from the data. We, therefore, proposed a concept named “EHR phenotype”, wherein useful information is extracted from EHR using standardized healthcare big data process pipeline. In big data analytics in healthcare, machine learning and statistical techniques can be used to extract, explore, and distinguish the EHR phenotype from EHR. We plan to develop tools to help researchers integrate and process healthcare big data and mine the EHR phenotype as R packages. For integrating and processing information stored in healthcare big data, we plan to design a pipeline and tool to convert records into reasonable and meaningful groups using health information standards. We would design the tools such that a supervised EHR phenotype mining mechanism and an unsupervised disease subtyping mechanism based on EHR phenotype would be established to be activated after basic data cleaning and processing steps. In addition, a web-based EHR phenotype share platform would be developed to enable sharing and promoting of the EHR phenotype found by researchers. Thus, we believe EHR phenotype could change the manner of conducting healthcare research, i.e., from clinical-driven data analysis to data-driven healthcare research.