A practical deep learning face recognition system can be divided into several tasks. These tasks can be time-consuming if each task is executed with the original image as the input data. And the feature extractors used by different tasks may duplicate its function. In this paper, a multi-task training method based on feature pyramid and triplet loss to train a single-stage face detection and face recognition deep neural network is proposed. As a single-stage work, every task's data is passed through the same backbone network to avoid duplicate computation by sharing the weights and computation. The whole network is established using feature pyramid and anchor boxes to localise the face position, using triplet loss to establish the feature extractor, and finally matching the feature through a simple math function. The benefits of the approach are faster computation speed and less memory usage. On an Nvidia 2080Ti GPU accelerator, this system can achieve 212 FPS for a 640 × 640 resolution input and maintains 92.4% accuracy on the LFW data set.