Robust Solutions for Multi-Defender Stackelberg Security Games

Classical formulations of Multi-defender Stackelberg security games have the unpleasant property that the solutions are highly sensitive to perturbation of the attacker’s valuation. By introducing small uncertainty about the other players valuations renders this artefact can be rendered moot.

Strategic reasoning plays an important role in artificial intelligence, especially when building autonomous agents. In the realm of (cyber-)security, multi-defender Stackelberg security games (SSG) have been used to model the scenario of multiple defenders that want to protect a common set of goods. Each defender has, however, its own valuation of the goods. More precisely:

• An MSSG is played by $d$ defenders that protect $n$ targets against one attacker.
• Defender $i$ has $k_i$ resources, and each can be assigned to one target. The placement can be stochastic.
• A target is protected if at least one defender has assigned (at least) one of its resources to it.
• The attacker gets to know the marginal probability of each target to be protected under the stochastic placements of the defenders.
• The attacker chooses a target $j$ to attack.
• Each defender draws a placement from its probabilistic placement policy.
• The attack fails if the target is protected and is otherwise successful.
• Each defender $i$ earns a reward $\mathrm{rd}_{ij}$ if target $j$ is defended and pays a penalty $\mathrm{pd}_{ij}$ is the target $j$ is lost to the attacker. Note that each defender has its own rewards and penalties. Hence, different defenders might want to protect different targets.
• Similarly, the attacker earns a reward $\mathrm{ra}_{j}$ if an attack on target $j$ is successful and pays a penalty $\mathrm{pa}_{j}$ if his attack on target $j$ was unsuccessful.

In the standard formulation of the game, the utility of the defenders can be analysed by assuming the attacker play their best response, which is the target that maximises the expected utility.

MSSGs have been applied in multiple situations such as allocation of security resources at airports or the protection of biodiversity in conservation areas (see e.g. [Sin18S] for a survey). However, they have some unfavourable properties: The utility functions of the defenders may show sharp discontinuities due the multiple defenders having different valuations and the tie breaking rules that have to be applied if multiple targets have the same expected utility for the attacker. As a result, small perturbations of the utility vector may result in drastic changes in the defenders utility and the game equilibria.

The authors of [Mut22R] argue that these issues arise as artefacts of the unrealistic assumption that the exact player valuations are common knowledge and the deterministic behavior of the attacker to always play the best response even if the differences in utility are minuscule. While the assumption of commonly known valuations is rather common in game theory, it is unreasonable to assume that the knowledge of the other utilities is absolutely precise.

To anticipate that the attackers true utility might be different from the announced one they allow that the attacker play a stochastic strategy, i.e. given the marginal protection probabilities for each target the attacker computes a probability distribution over the targets through some continuous function and draws the target according to this distribution. This formulation is general enough to capture most notions of uncertainty about the true valuations and prevents discontinuities in the defender utility functions.

The paper proceeds by formalising the notion of robust approximate solution concepts such as robust Nash-equilibria and robust (alpha- or gamma-) cores, which guarantee that small perturbations of the attackers utility vectors result only in small changes of the defenders utilities. This ensures that approximate equilibria remain (slightly looser) approximate equilibria under small perturbations.

It is shown that robust Nash-equilibria exist as long as the attacker chooses only distributions that are concentrated on the targets with the highest utilities according to the announced valuation (means that all targets that get much lower utility have assigned very low probability. Softmax applied to the expected utilities under the marginal protection probabilities of the targets is an example of a function that generates such distributions).

In the cooperative setting they show that the alpha-core is always non-empty under the same conditions but the gamma-core might be empty.

I think introducing uncertainty is an interesting and - from a practical point of view - highly justified approach to circumvent problematic artefacts that appear in combinatorial game theory.

# References

• Robust Solutions for Multi-Defender Stackelberg Security Games, Dolev Mutzari, Yonatan Aumann, Sarit Kraus. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022 (2022)
• Stackelberg Security Games: Looking Beyond a Decade of Success, Arunesh Sinha, Fei Fang, Bo An, Christopher Kiekintveld, Milind Tambe. (2018)