Coexistence of multiple wireless access technologies will be an indicator of next-generation wireless network, and the integration of heterogeneous wireless networks will meet the needs of high-performance services for mobile users. According to unique quality of service (QoS) requirements of different service type users in heterogeneous environment, the Markov decision model based handoff selection algorithm is proposed in this paper. A heterogeneous wireless network architecture based on the software defined network (SDN) is established to realize the transparency control of heterogeneous networks. Network state information of heterogeneous wireless networks is mastered by SDN controller. It is responsible for scheduling network resources dynamically according to the performance characteristics of each network. If the network state information in equal interval is sampled, the next moment state of network is only related to the current network state and action, but it is not related to the historical state. The problem of handoff selection for heterogeneous wireless networks is modeled as a Markov process with discrete time and continuous state. To predict the next moment state of network by Markov process to obtain a reward, when the reward is positive, it represents the income; when it is negative, it represents the cost. An immediate reward function is constructed for real-time service and non real-time service users respectively according to their different state attributes of the network. Considering five state attributes of wireless network as follows:delay, delay jitter, bandwidth, error rate and network load, the immediate reward function is constructed with weighted summation. Due to the difference in attribute weight distribution among different service type users, the attribute weights are determined by the analytic hierarchy process. In the long term, the objective function which consists of immediate reward function sequence is used to measure future long-term rewards. Then expected reward function based on the state action pair is constructed to obtain the handoff strategy of the maximum expected return by the iterative method of successive approximation. The proposed Markov decision model based handoff selection algorithm is used in simulation of the Matlab platform. The simulation results show that the proposed method can select the optimal handoff strategy for different service type users and reduce the blocking rate, thereby improving the QoS of users and resource utilization of wireless networks.