A new reinforcement learning algorithm with fixed exploration for semi-Markov decision processes