Unannounced Interim Inspections: Do False Alarms Matter?

Unannounced Interim Inspections (UIIs) in nuclear facilities of the European Union have recently attracted major attention by the International Atomic Energy Agency (IAEA) and by European Atomic Energy Community (EURATOM) in the context of the IAEA/EURATOM Partnership Approach. Therefore, a research project had been organized by the Joint Research Centre in Ispra in collaboration with the Universität der Bundeswehr München in the framework of which the assumptions have been classified which are necessary for a quantitative analysis and a few variants have been studied in detail. In that project only so-called Attribute Sampling Procedures were considered which means that only errors of the second kind (no detection of the illegal activity), but not those of the first kind (false alarms), where taken into account. It was the purpose of the work presented here to investigate the impact of errors of the first kind on UIIs which may occur if so-called Variable Sampling Procedures are used. Two kinds of planning UIIs are considered: in the sequential one both players, the inspector and the operator of the facility, decide step by step to inspect resp. to start the illegal activity – if there is one. In the hybrid-sequential one the inspector decides at the beginning of the reference time interval where to place his UIIs, whereas the operator acts again sequentially. For two UIIs during the reference time interval equilibria are determined, which generalize the results of the above mentioned research project. It turns out that in both cases, the sequential and hybrid-sequential one, the equilibrium strategies of the inspector and the equilibrium payoffs to both players are the same, but not the equilibrium strategies of the operator. We try to present a plausible explanation for this surprising result.


Introduction
Unannounced Interim Inspections (UIIs) in nuclear facilities of the European Union have recently attracted major attention by the International Atomic En-ergy Agency (IAEA) and by the European Atomic Energy Community (EU-RATOM) in the context of the IAEA/EURATOM Partnership Approach of both organizations. Therefore, after several investigations of the subject, see, e.g., [1] and [6], a research project had been organized by the Joint Research Centre in Ispra in collaboration with the Universität der Bundeswehr München in the framework of which the assumptions have been classified which are necessary for a quantitative analysis and a few variants have been studied in detail. The results of these analyses have been applied to two kinds of nuclear facilities in one State of the European Union, see [2] and [4].
One assumption made in [2] and [4] is that only so-called Attribute Sampling Procedures were considered which means that only errors of the second kind (no detection of the illegal activity) are taken into account, but not those of the first kind (false alarms) which cannot be avoided if so-called Variable Sampling Procedures are applied by the inspector. It was the purpose of the work presented here to investigate the impact of errors of the first kind on UIIs which may occur if so-called Variable Sampling Procedures are used. The limitation to only one model results from the fact that the modelling effort increases significantly, as will be explained and demonstrated subsequently, if the possibility of errors of the first kind is taken into account.
Formal models for inspections using Variable Sampling Procedures have been analyzed at various occasions. In particular one variant has been considered in detail in [1], where • UIIs are possible at any time during the reference time interval (continuous time model); • Both the inspection authority and the operator proceed sequentially: the inspector first decides at the beginning only when to perform the first UII and after it has taken place, he decides when to perform the second one and so on. The operator decides first whether or not to start the illegal activity immediately or after the first inspection and so on. In other words, the inspector decides about the inspection time points and the operator only whether to start the illegal activity immediately or later; • The objectives of both players are expressed by the detection time. The inspection authority aims at as short time as possible between the start and the detection of the illegal activity -if there is one -whereas the operator aims at getting it as long as possible.
For any number of UIIs during the reference time interval Nash equilibria, i.e., equilibrium strategies and payoffs to both players have been determined as functions of the parameters of the model: the payoff parameters and probabilities of errors of the first and second kind. In particular conditions for legal behavior of the operator have been given.
Here a hybrid-sequential model is analyzed where only the operator acts sequentially. This model has been considered already in the project mentioned above, see [2] and [4], for Attribute Sampling Procedures. Since it turned out that in this case both models lead to the same result, i.e., the same equilibrium strategies and payoffs, it was of special interest to find out whether or not this holds also for Variable Sampling Procedures. For this purposes only two UIIs in the reference time interval are considered (for only one UII both models are identical), even though, should it be of major interest, the analysis might be generalized to more than two UIIs.
In the following a quantitative hybrid-sequential continuous time model for two UIIs is developed and Nash equilibria of this model are determined. It turns out that the equilibrium strategy of the inspector and equilibrium payoffs to both players are the same both in the hybrid-sequential and the sequential model, but not the equilibrium strategies of the operator. We try to give a plausible explanation for this surprising result.

The Model
In the following we present a game theoretical model for UIIs. We consider a nuclear facility and two UIIs during the reference time interval (e.g., one year). Furthermore, we consider a so-called hybrid-sequential model, i.e., a model in which the inspector fixes the two time points for his UIIs at the beginning of the reference time interval, whereas the operator of the facility decides at the beginning of the reference time interval whether to start the illegal activity immediately or not, in the latter case after the first inspection he decides again in the same way, and so on. The objective of the operator is to achieve as long a time as possible between the start of the illegal activity and its detection, the latest at the end of the reference time interval (playing for time); the objective of the inspector is to get this time interval as short as possible.
Let us summarize the assumptions we have made so far, and some additional technical ones: • There are two players: operator and inspector.
• The inspector can perform his inspections at any time point within the reference time interval (we ignore the fact that in reality an inspection extends over some finite time interval). The operator can start his illegal activity only right after an inspection, and therefore, the illegal activity can be detected only at the occasion of the next inspection(s) or with certainty at the Physical Inventory Verification (PIV) at the end of the reference time interval.
• The inspector will commit -depending on measures taken by him -an error of the first kind (false alarm) and of the second kind (no detection of the illegal activity) with probability α resp. β per inspection.
• The number of interim inspections is known to the operator. Two UIIs are permitted in the facility and the reference time interval.
• The inspector decides at the beginning of the reference time interval when to perform his inspections. The operator decides at the beginning of the reference time interval whether to start his illegal activity immediately or only right after the inspection(s) -if at all.
• The payoff to the operator resp. the inspector is proportional to the time between the start of the illegal activity and its detection.
• The game ends either after the final PIV or after that interim inspection at which the illegal activity -if there is one -is detected, which means that the operator cannot start another illegal activity during the reference time interval.
If we consider Variable Sampling procedures which imply the possibility of errors of the first and second kind, several new aspects have to be taken into account. From a practical point of view, we assume that the "game" continues after an error of the first kind -false alarm -has been committed, of course, causing costs to both players. Therefore, the zero sum assumption has to be given up, and more than that, payoff parameters have to be introduced which evaluate the different outcomes of the game. This however, gives us the possibility to answer a question which was not posed in [2]: under which circumstances will the operator be induced to behave legally?
In Fig. 1 the extensive form of our inspection game is represented graphically.
Let us describe this game in words. At the beginning t 3 of the reference time interval [t 3 , t 0 ] the inspector decides at which time points t 2 and t 1 during the reference time interval to perform his two UIIs. Time is counted backward for formal mathematical reasons.
The operator decides at t 3 whether to behave illegally (l 3 ) or not (l 3 ). In the latter case he decides again at t 2 , i.e., after the first inspection, whether to behave illegally (l 2 ) or not (l 2 ). His information set for a given t 2 contains all possible time points t 1 : t 2 < t 1 < t 0 , thus there are infinitely many information sets, one for each t 2 . If the operator decides at t 2 to behave legally, then he has to decide at t 1 whether to behave illegally (l 1 ) or not (l 1 ). In the latter case he behaves legally throughout the reference time interval. As already mentioned an illegal activity will be detected with certainty the latest at t 0 . (1 − β) is the detection probability, α the false alarm probability.
It should be mentioned that we also assume that a false alarm is not possible in the course of an inspection if prior to that inspection an illegal activity was started. This is not a trivial assumption; depending on the details of the inspection procedure alternative assumptions would have to be formulated. Let ∆t be the time interval between start of the -if at all -illegal activity and its detection, the latest at t 0 , i.e., at the end of the reference time interval. Then the payoffs to the operator are the following:  where 0 < e < a (t 0 − t 3 ), 0 < f < d (t 0 − t 3 ) and 0 < b. Furthermore, for the longest possible detection time ∆ t = t 0 − t 3 we have to postulate d (t 0 − t 3 ) − b > 0, otherwise the operator would not have any incentive to behave illegally at all.
Since for a given time point t 1 the operator has to decide betweenl 1 and l 1 according to d (t 0 − t 1 ) − b ≶ 0 for all possible situations, see Fig. 1, we introduce the decision variable g 1 (t 1 ) meaning g 1 (t 1 ) = 1 forl 1 , 0 for l 1 and then reduce the game tree appropriately. From the mathematical point of view g 1 should depend on t 1 and t 2 , see Fig. 1. Due to our special payoff structure, however, g 1 does not depend on t 2 .
Since the decision betweenl 2 and l 2 is based on the same payoff alternative in both information sets it is sufficient to introduce the same probability for behaving illegally g 2 (t 2 ) = prob(l 2 ) for both information sets. If we finally introduce g 3 = prob(l 3 ), then, for fixed values of t 2 and t 1 and of g 3 , g 2 (t 2 ), g 1 (t 1 ), the expected payoff to the inspector is given by where A = 0. The payoff to the operator is obtained from that to the inspector by replacing (−a) by d, e by f , and setting A := −b/d.
Equilibrium strategies and the corresponding payoffs of this non-cooperative two person game are defined by the Nash conditions, see [7]: Here we assume already, as outlined before, that an equilibrium strategy of the inspector is a pure strategy. We present a Nash equilibrium of our game theoretical model in Theorem 1. Let us consider the game theoretical model developed here and let the test procedure be unbiased, i.e., α + β < 1. Then a Nash equilibrium is given as follows an equilibrium strategy of the operator is legal behavior, i.e., g * 3 = g * 2 (t 2 ) = g * 1 (t 1 ) = 0 for all t 3 < t 2 < t 1 < t 0 . That of the inspector is not unique, but given by the set of all (t * 2 , t * 1 ) with and the equilibrium payoffs are Op * = −2f α and In * = −2eα.

Under the assumptions
an equilibrium strategy of the operator is for all (t 2 , t 1 ) with t 3 < t 2 < t 1 < t 0 , an equilibrium strategy of the inspector is given implicitly as and the equilibrium payoffs are The proof of this theorem is given in [3]. It should also be mentioned that our theorem does not cover all possibilities namely the case We will come back to this point in the Discussion.

Remark 1.
At first sight it looks as if (2.1) and (2.3) depended on the dimension of t 0 − t 3 . This is not true of course, since d -as a proportionality factorchanges appropriately. If we measure, for example, t 0 − t 3 in months instead of years, then d has to be divided by 12. From this point of view it would be better to always write d (t 0 − t 3 ), but this would lead to more clumsy formulae.
Using the technique of proving the Nash equilibrium, in which the operator behaves legally (part 1 of the theorem), also for the one in which the operator behaves illegally (part 2), one can show immediately, that the equilibrium strategy of the inspector for the legal behavior equilibrium of the operator is also equilibrium strategy for the illegal behavior equilibrium of the operator. In other words, (t * 2 , t * 1 ) as given by (2.4) and (2.5) is an element of the set given by (2.2). In this sense we can consider (2.4) and (2.5) as a robust equilibrium strategy. Let us illustrate this with the help of a numerical example: According to (2.2) the strategy of the inspector in the legal equilibrium is Furthermore, according to (2.4) the illegal strategy (t * 2 , t * 1 ) of the inspector is given by which gives (t * 2 , t * 1 ) = (0.16, 0.37). In Fig. 2 this case is represented graphically. We see the rather complicated domain for the legal equilibria (shaded area) and the unique illegal equilibrium in the midst of it. In a similar case M. Kilgour called this area cone of deterrence, see [5].

Discussion
Whereas we considered in this paper a hybrid-sequential inspection model, Avenhaus and Canty, see [1], studied a sequential model where also the inspector decides at the beginning of the reference time interval only at which time point t 2 to inspect, and at t 2 at which time point t 1 to inspect the second time. It should just be mentioned that in that paper the general case of k > 1 inspections during the reference time interval was analyzed for all cases t k−1 < t * k < . . . < t * 1 < t 0 . Surprisingly enough (at least at the first sight), the equilibrium of the sequential game is very close to that obtained here: the equilibrium strategy of the inspector as well as the equilibrium payoffs to both players are the same, whereas the equilibrium strategy of the operator in case of illegal behavior is for all t 2 ∈ (t 3 , t 0 ), see [1], in contrast to (2.4) which is independent of α.
One may explain this surprising result as follows. For the inspector there is only one advantage in the sequential variant as compared to the hybridsequential one which exists only if both types of errors are possible. Whereas in both variants without first kind errors (but eventually second kind errors) the inspector does not know after the first inspection without detection of the illegal activity whether or not it took place, after a false alarm and its clarification, he does know that there was no illegal activity. In the sequential variant therefore he can use this information for the planning of the second inspection, whereas this is not possible in the hybrid-sequential variant. The operator, on his side, reacts to this difference by an appropriately modified equilibrium strategy such that the advantage of the inspector is neutralized.
A weak point of this argument is that without both error types we also have the situation that after inspection the inspector knows whether or not an illegal activity took place, but in both variants, as well as in the variant without errors of the first kind, the equilibrium strategies of both players are the same, see [3]. Maybe these games are too simple to contain as subtle differences as described above.