We present a system for interactive examination of learned security policies.
It allows a user to traverse episodes of Markov decision processes in a
controlled manner and to track the actions triggered by security policies.
Similar to a software debugger, a user can continue or or halt an episode at
any time step and inspect parameters and probability distributions of interest.
The system enables insight into the structure of a given policy and in the
behavior of a policy in edge cases. We demonstrate the system with a network
intrusion use case. We examine the evolution of an IT infrastructure’s state
and the actions prescribed by security policies while an attack occurs. The
policies for the demonstration have been obtained through a reinforcement
learning approach that includes a simulation system where policies are
incrementally learned and an emulation system that produces statistics that
drive the simulation runs.