In this paper, we consider an energy harvesting (EH) two-way (TW) dual-relay network, including one non-EH relay and one EH relay equipped with a finite-sized battery. In the network, a space-time transmission protocol with space-time network coding is designed, and an optimal transmission policy for the EH relay is proposed by using a stochastic solar EH model. In this optimal policy, the long-term paired-wise error probability (PEP) of the system is minimized by adapting the EH relay's transmission power to the knowledge of its current battery energy, channel fading status, and causal solar EH information. The designed problem is formulated as a Markov decision process framework, and the conditional capability of the contribution to PEP by the EH relay is adopted as the reward function. We uncover a monotonic and limited difference structure for the expected total discounted reward. Furthermore, a non-conservative property and a monotonic structure of the optimal policy are revealed. Based on the optimal policy and its special structures, the expectation, lower and upper bounds, and asymptotic approximation of the PEP are computed and an interesting result on the system diversity performance is revealed, i.e., the full diversity order can be achieved only if the EH capability index, a metric to quantify the EH node's capability of harvesting and storing energy, approaches to infinity; otherwise, the EH diversity order is only equal to one, and the coding gain of the network is increasing with the EH capability index at this time. Furthermore, a full diversity criterion for the EH TW dual-relay network is proposed. Finally, computer simulations confirm our theoretical analysis and show that our proposed optimal policy outperforms other compared policies.