In order to solve the problem of signal jump after incremental downloading and master-slave switching of a DCS controller for a nuclear power unit, the hot standby redundancy, incremental downloading, master-slave switching mechanism and the internal logic configuration of DCS are studied in depth. It is found that the non-standard logic configuration under the incremental downloading mechanism of DCS controller makes the controller unable to keep part of the signal state after the master-slave switching. After the comprehensive investigation and solution of the hidden trouble, the signal jump risk of DCS controller is completely eliminated. DCS of nuclear power plant is divided into safety level 1E? DCS and non-nuclear level NC? DCS. Although non-nuclear DCS does not involve nuclear safety, it is responsible for the important functions of plant-wide regulation, operation and monitoring, involving more than 200 process systems [1]. If abnormal signal jump occurs in DCS controller, it will undoubtedly affect the stable operation of the unit and the reliability, availability and economy of the unit to a great extent.
If the event occurs after the unit’s commercial operation, the modification will also cause unplanned shutdown, resulting in additional economic losses [2]. In May, 2001, the controller (CPU) of a non-nuclear DCS site control station (FCS) of a nuclear power project under construction underwent master-slave switching after incremental downloading.
The switching resulted in partial signal jump of auxiliary water supply system (ASG), which eventually resulted in the jump of operation status of one pump and five valves in the auxiliary water supply system. After the accident, the engineer first investigated the software configuration of the controller. Logical retrospective investigation shows that the jump of the six devices is caused by the jump of the result of the same logic intermediate operation from 1 to 0. A drawing with self-locking logic in the auxiliary water supply (ASG) system is shown in Figure 1. The final output signal of this part of logic is the result of the intermediate operation in which the jump occurs. According to the upstream design drawings, DCS vendors have designed the corresponding self-locking logic configuration in a field control station, as shown in Figure 2. When the signal triggering self-locking (PY9ASG801TL and PY9ASG160VDSM5 after “and” operation and the state signal of gate AND) are pulse signals, when the pulse signal disappears, the self-locking state is maintained by OR and gate AND together, and the result of self-locking is output to BOOL type local variable ASGL38D02. The jump of this variable value will cause the change of six device control instructions.
From the above analysis, it can be seen that the incremental download of the controller and the master-slave switching operation make the logic self-locking state jump from 1 to 0 under some mechanism. Therefore, it is necessary to study and analyze the relevant mechanisms of the controller. Non-nuclear DCS field control station (FCS) of nuclear power plant based on HOLLiAS? N platform adopts a pair of NM203 as the controller (CPU) module.
The two controllers adopt the synchronization mode of main and standby operations to realize the disturbance-free switching, that is, the two controllers do control operations simultaneously, and the control operation period keeps synchronization to ensure that the event variables and the accumulated operation values of the two controllers are equal, but only one output operation result is shown in Fig. 3. The controller has an independent hardware watchdog circuit, which is used to monitor the operation status of the controller. Once any task is abnormal, it will cause master-slave switch, the original host will be reduced to slave, and the original backup machine will be upgraded to master. The new backup machine discards the previous program and operation data and synchronizes the program and data from the new host. When the controller unit runs in real time, there are some important data that need to be saved, and these data should be maintained when the system is powered down. So there is a power-off protection circuit inside the module to save these data in low-power SRAM and provide backup battery power supply. In the control station algorithm, the reserved variables defined with the keyword “Retain” will be regarded as Retain variables. When the controller runs, the Retain variables are periodically transferred from SDRAM to SRAM to realize power-down protection. Incremental downloading means that the configuration tool of the control station only loads the modified part of the program to the main controller, which realizes undisturbed downloading. However, for the slave machine, it is not without disturbance download. After the download, the slave discards the original data, starts to synchronize the new program from the host and restart the calculation according to the initial value of variables and the new engineering logic. After 20 cycles, the slave machine starts to synchronize the Retain variable in SRAM.
For the value of the non-Retain area, the slave machine continues the calculation with the current results. Combining the above logic configuration and the incremental download and master-slave switching mechanism of the controller, the cause of the problem is becoming clear. The occurrence of the event can be reproduced as follows: at some point in the operation of the main controller, the pulse signal from the AND triggers the self-locking logic. When the pulse signal disappears, the self-locking logic is maintained by OR and AND together, making the value of ASGL38D02 1. Download to the main controller, after download, the main controller will continue to operate without disturbance, and the data in the main controller SDRAM will remain unchanged, and the self-locking state will remain unchanged. After the main controller is downloaded, the hot standby slave controller begins to discard the original program and data in its own memory, and synchronizes the new program and SRAM data from the master controller. Because the states of OR and AND in host logic are stored in the running memory SDRAM of master controller, slave computers can not get their states. After computing from the slave computer, because the state of OR and AND can not be acquired from the host computer before, the state of AND, OR and AND are all 0 when computing from the slave computer, resulting in no self-locking state in the slave computer, so the value of ASGL38D02 is 0. The master-slave switching of the controller was carried out later by the field instrument controller. The slave controller was backed up by the standby transformer, and its non-self-locking state came into effect, which caused the signal jump of the field equipment. From the above analysis, it can be seen that the jump is caused by the inability of the slave controller to acquire the self-locking state of the master controller after the incremental downloading. When researching the solution, it is noted that the AND and OR logic in Figure 2 belongs to the general operation function of the system and cannot be defined as Retain type. Its output state can only be stored in the running memory SDRAM. Even if ASGL38D02 is defined as a Retain variable, the value of ASGL38D02 will be overwritten by the zero of the non-self-locking value obtained from the new operation of the slave after the slave starts the first cycle operation with the default value, so other schemes must be sought. After research, the logic shown in Figure 4 is the ultimate feasible self-locking logic. In the logic, ASGL38D02 is defined as a Retain variable, and the original logic self-locking signal line is dismantled and replaced by ASGL38D02. After this modification, the self-locking state can be saved by using a Retain intermediate variable ASGL38D02. After the master-slave switch of the controller, the slave can obtain the value of the variable ASGL38D02 from the host, thus obtaining the original self-locking state, and the problem is solved perfectly. In view of the importance attached to the incident, the engineers conducted in-depth studies and special investigations on the problem units. It is found that such self-locking logic also exists in multiple logical pages of multiple stations; moreover, it is found that a large number of variables or algorithm blocks which need to be defined as Retain type are not processed. For example, RS flip-flops, thermostatic element accumulative counting function blocks, integral and differential algorithms related to the results of previous periodic operations, if not defined as Retain type, the incremental downloading and master-slave switching of the controller will cause discontinuity in the control of the controller and even signal events. Therefore, it is necessary to conduct a comprehensive investigation of such hidden dangers. Table 1 lists all the possible problems of the incident unit and the scope of the investigation. It can be seen that the workload of the investigation is very large. After investing a lot of manpower to investigate and locate all the problem points, the engineer formulated a total of 449 modifications, including 197 logical modifications and 252 variable Retain attribute modifications.
These modifications were implemented in batches on event units in early September 2013, and subsequently on the remaining 13 units of CPR 1000. Since the implementation of CPR 1000, no similar situation has occurred in all units. Through the treatment and research of this problem, it is considered that although the occurrence of this problem is also related to the incremental downloading mechanism of the controller, it can be avoided by reasonable and standardized configuration. In order to prevent the recurrence of this kind of problem, the engineering side requires DCS manufacturers to explicitly increase the configuration requirements of the relevant logic in the internal configuration design specification document. In addition, because of the concealment of such problems, no test can cover and detect such configuration errors in the current factory test (FT) program, which enables errors to be brought from the factory to the site. At present, engineers are studying how to promote manufacturers to add relevant logical tests in subsequent factory tests to ensure the quality of equipment out of the factory. For the first time in China, China Guangzhou Nuclear Group has adopted the localized DCS platform HOLLiAS? N on non-nuclear NC? DCS, and the system integration of DCS is undertaken by Guangzhou Nuclear Corporation, a subsidiary of China Guangzhou Nuclear Engineering Corporation. The emergence of this problem reflects that the experience of CGNPC in nuclear power DCS still needs to be accumulated, and the technology and management of localized DCS need to be improved step by step. The successful solution of this problem reflects the technical capability of China-Guangzhou Nuclear Corporation and the importance it attaches to the nuclear safety concept of “safety first, quality first”, and accumulates valuable experience.