An optimal relay transmission policy by exploiting a stochastic energy harvesting (EH) model is proposed for EH two-way relay (TWR) networks, wherein a solar-powered relay with a finite-sized battery adopts an amplify-and-forward protocol for helping relaying signals. The relay transmission power is optimized to minimize the long-term outage probability by considering the causal EH information, battery energy and random channel status. The design framework is formulated as a Markov decision process (MDP), in which a monotonic structure for the long-term reward values and a threshold property for the optimal relay transmission are revealed. Furthermore, an interesting saturation structure of the outage performance is uncovered, which means the expected outage probability eventually approaches to the relay's battery empty probability. Simulation results are demonstrated to verify the theoretical analysis and prove that the proposed optimal policy outperforms other myopic policies.