As the core of SDN network, the robustness of SDN controller is one of the key factors affecting SDN network. The change of switch connection state caused by the unreliability of management channel in SDN network is the main reason that affects the stability of SDN controller. In this paper, by introducing a new state of multiple switches, and based on the state machine (FSM), the state changes between switches are managed finely, so that it can cope with the transient dropout of switches and other situations, thus greatly improving the robustness of SDN controller. Software Defined Networking (SDN) is a new network innovation architecture proposed by Clean slate Research Group of Stanford University to improve the traditional network.
Its design idea is to separate the control plane of the network from the data forwarding plane, support centralized network state control, and realize the transparency of the underlying network facilities to the upper application. [1-2]. SDN divides the traditional network structure which is tightly coupled with network equipment into three layers: application, control and data forwarding. SDN has flexible programming ability, which improves the automation management and control ability of the network unprecedentedly. It can effectively solve the problems faced by the current network, such as limited expansion of resources, poor flexibility of networking, and difficulty in meeting business needs quickly. The centralized control plane in SDN network is called SDN controller. It is a software system running on an independent server.
Logically, it is in the middle of data forwarding plane and upper network application. It is the core of the whole SDN network. Fig. 1 shows the typical system architecture of SDN network, in which the SDN controller is the core of the whole network. It obtains the information of underlying network equipment through the South-Oriented Interface (OpenFlow [3] protocol), carries out unified deployment, centralized management and flexible control, thus solving the management and control problem of decentralized network equipment. At the same time, the controller provides a programmable and extended northward interface. The functional application software designed according to different requirements can be directly run on the controller, and the controller can be used to unify the management of the global network equipment. In the history of SDN, the concept of OpenFlow appeared before SDN. In SDN network architecture, the control plane and data forwarding plane communicate through standard interface, namely the south-facing interface. OpenFlow is one of the most widely used SDN south-facing interfaces. Through this interface, the controller configures and manages the underlying network devices, receives event reports from network devices, and sends data packets to network devices. With OpenFlow protocol, the controller can edit the records in the switch flow table such as adding, updating and deleting. SDN controllers communicate with each switch they manage through a TCP connection, also known as a management channel. Since the communication between SDN controller and switch needs to be maintained all the time when the network is running, the stability of management channel will greatly affect the stability of SDN network. In the actual application scenario of SDN network, such as data center, SDN management channel is generally used for the reason of the management network. Compared with the business network, the stability of the management network will be greatly reduced. Switches in SDN networks may be temporarily unavailable due to two factors: one is that TCP connections will be temporarily closed due to the influence of the network; the other is that the heartbeat packets between SDN controllers and switches will lose packets, which will cause the controllers to think that the switches are inaccessible. However, when the SDN controller sends the network communication rules in the switch, even if the management channel temporarily fails, the switch can still transmit the data packets according to the communication rules already issued. Therefore, how to deal with the instability of management channels and provide the robustness of SDN controllers in this situation has become an urgent problem for technical personnel in this field. Generally speaking, when the SDN controller and the switch have established the OpenFlow session, the normal working state of the switch will be set as “connected state”[4]. Under this state, the normal OpenFlow message interaction between the controller and the switch can be carried out. When the long TCP connection of the switch is disconnected or the heartbeat saves time out, the switch is immediately turned to “offline state”. When a switch becomes offline, the network topology will change and the resources attached to the switch object will be recovered by the controller.
Mainstream open source controllers such as OpenDayLight [5] and ONOS [6] also adopt similar switch state management mechanisms, which can easily lead to the stability of the aforementioned controllers when the management channel is unstable. In order to solve the above problems, this paper defines a variety of switch states in SDN controller (including initialization, new connection establishment, successful version negotiation, normal operation, unreachable, disconnected, power-down, downline and offline) and corresponding switch state transition diagrams, and defines several key reflections. Events of switching status change can deal with the uncertainty of switching management channel, improve the accuracy of switching management in SDN controller, enhance the reliability and robustness of the controller, reduce the requirement for the reliability of management channel in SDN network, and provide guarantee for the application of SDN network in actual environment. Initial state, the initial state created for a switch object, the Socket connection of the switch has not yet been established. In this state, the controller allocates resources for each switch object, including memory resources, message buffers, etc. This state indicates that the TCP connection between the controller and the switch has been successfully established. The information of IP address, MAC, TCP communication port of the switch is stored in the switch object. After that, the switch can communicate with the SDN controller normally. According to OpenFlow protocol [7], SDN controller and SDN switch will negotiate the version of OpenFlow protocol after successfully establishing TCP connection. If they reach an agreement on the version of OpenFlow protocol and the switch enters the SW_STATE_CONNECTED state, they can interact with OpenFlow messages based on the agreed version. This state is the normal working state of the switch, that is, it can communicate with OpenFlow message normally. When the switch and the controller establish the OpenFlow message channel, the controller receives the correct OFPT_BARRIER_REPLY message sent by the switch, thermostatic element and both sides can enter the normal communication working state. In this state, the SDN controller can send messages, downstream flow tables and so on. This state corresponds to the “connected state” in reference [4-6]. Switch unreachable state.
When the switch fails to send messages or heartbeat interaction timeouts (echo has not responded after retransmitting), the SDN controller considers that the switch is in an unreachable state. In this state, no messages are sent to the switch except the heartbeat packet, and the state will not change until any messages are received from the switch. There are many reasons for the inaccessibility of switches, such as the dropping of pipeline channels and instability. In the unreachable state, the controller continuously detects the switch to determine the reason for its unreachability. If no outage event is detected, the switch may remain unreachable. It is possible for the switch to transmit and process data packets normally in the unreachable state, but it can not send and receive OpenFlow messages. Switch disconnection status. When the controller explicitly receives the switch TCP connection disconnection message, it enters this state. In this state, if the switch is not powered off, it can still transmit data packets normally. Therefore, like the unreachable state, the controller will determine whether the switch has been powered off or just the management channel has been closed. Switch power off status. Switch will first experience disconnection or unreachable state before it enters power-off state.
SDN controller will determine whether the switch has turned off power by power-off detection. The main method of power-off detection is to determine whether all the adjacent switches of the switch report port power-off events. If all the adjacent switches report once the OFPPS_LINK_DOWN events of the connected ports, the power-off detection of the switch is successful. The switch is offline, indicating that the switch is doing memory cleaning and other work. After the power failure detection is successful in the unreachable state and disconnected state, the switching opportunity is set to be in the offline state.
The switch is closed, indicating that the work of clearing the memory of the switch has been completed, and the SDN controller will only save some basic information indicating that the switch has been connected. Compared with other SDN controllers, the main features of the nine switch states mentioned above are unreachable state and disconnected state. The reason for setting these two states is that although the controller and switch lose their normal TCP communication due to the failure of the management channel, they can still transmit data packets normally. When the management channel is unreliable, TCP connection closure and heartbeat packet loss often occur in the switching opportunities. If the switch is set to be closed every time as usual, the network topology will be very unstable, thus affecting the stability of communication. For example, we propose a label-based SDN network switching scheme in document [9], which calculates communication paths for each pair of switches at network initialization and downstream flow tables as the key basic flow tables for SDN network switching. Non-access switches are only responsible for data exchange based on downstream tables. Even if their management channels are unstable and frequent dropouts occur, they should not delete them from the network topology and adjust the basic forwarding flow tables. It should be pointed out that if a switch enters these two states, the controller should notify the relevant network application of such events so that it can handle them according to its own situation. For example, if a critical communication path passes through a switch with unreliable management channel, its communication path should be adjusted.
In this paper, the state machine shown in Figure 2 is used to realize the transition process and triggering conditions between states of the switch. The core is the transition process and conditions related to the unreachable state and disconnected state. When the switch is in normal working state, there will be an interaction of OpenFlow messages between the controller and the switch and a continuous heartbeat packet sending and receiving action. If an explicit TCP disconnection event is detected, the switch enters the disconnection state; if the message is sent unsuccessfully or the heartbeat interaction is abnormal, the switch enters the unreachable state. When the switch enters an unreachable state, the controller will determine the processing of its subsequent state through two types of detection. The first one is the power-off detection described in the previous section. When the detection is successful, it will enter the power-off state and enter the closed channel. The second is the heartbeat packet sending detection. If the heartbeat packet can be sent successfully after a certain time interval and received a reply, the switch will return to its normal working state. If neither of the above tests succeeds, the exchange opportunities remain unreachable. When the switch enters an unreachable state, the controller will detect the power failure to determine whether it is maintained in the disconnected state or turned into the power failure state. In these two states, the switch may also try to connect the controller again. If the controller detects that the IP and MAC addresses of a newly established switch are the same as that of an existing switch in these two states, it will consider the switch to be reconnected and then enter the subsequent version negotiation process. In response to the above state transition diagram, the following key events are defined for the controller to notify the upper network