Synthetic Log Generation

Synthetic Log Generation through Abduction

As the recent years have seen the rise of a new discipline commonly addressed as Process Mining, focused on the analysis and management of business processes, two tasks have gained increasing attention in research: process discovery and compliance monitoring. In both these fields of Process Mining, the pressing demand for event log benchmarks with predefined characteristics -- useful to evaluate the process model and the techniques for its discovery -- has determined the design of various methodologies and tools for synthetic log generation.


Real-life as well as artificially created logs often contain positive examples only (i.e. process instances deemed as compliant w.r.t. the model), while the presence of negative process instances (i.e., non-compliant traces) can be crucial to correctly evaluate the performance and robustness of a novel process discovery or conformance checking technique.

In this work, we report our experience in the design of a generator of synthetic logs, which takes as input a declarative or structured workflow model, encoded in the abductive declarative language SCIFF, and provides as output logs containing positive and negative traces. Our approach provides both a highly expressive notation for the description of the business model, and the ability to generate logs with various customizable features, such as the capability of generating trace templates as well as grounded traces, the possibility of taking into account user-defined constraints on data and time, and the capability of generating traces starting from a user-specified partial trace.

The work at hand presents our abductive approach to log synthesis as well as a study on positive and negative trace generation in case of open and closed model specification. The theoretical approach has been concretely employed in the implementation of a first software prototype for synthetic log generation.