ActiTraC: Active Trace Clustering
ActiTraC is a novel trace clustering approach based on active learning which significantly outperforms other clustering techniques when applied on complex, large event logs.
Process discovery is the learning task that entails the construction of process models from event logs of information systems. Typically, these event logs are large data sets that contain the process executions by registering what activity has taken place at a certain moment in time.
By far the most arduous challenge for process discovery algorithms consists of tackling the problem of accurate and comprehensible knowledge discovery from highly flexible environments. Event logs from such flexible systems often contain a large variety of process executions which makes the application of process mining most interesting.
Simply applying existing process discovery techniques will often yield highly incomprehensible process models because of their inaccuracy and complexity. With respect to resolving this problem, trace clustering is one very interesting approach since it allows to split up an existing event log so as to facilitate the knowledge discovery process.
With ActiTraC (“Active Trace Clustering”), we propose a novel trace clustering technique that differs significantly from previous approaches. Above all, it starts from the observation that currently available techniques suffer from a large divergence between the clustering bias and the evaluation bias. By employing an active learning inspired approach, this bias divergence is solved.
Please cite the following work if you use ActiTraC:
* De Weerdt, J., vanden Broucke, S., Vanthienen, J., Baesens, B. (2013). Active trace clustering for improved process discovery. IEEE Transactions on Knowledge and Data Engineering, 25 (12), 2708-2720.
ActiTraC is implemented as a [ProM 6](// plugin and can be installed using the ProM package manager.
[Source code]( can be viewed and downloaded using the TU/Eindhoven SVN repository.
You can download anonymized versions of the event logs used in the paper below. Please reference our work if you use these data sets.
* [KIM](downloads/KIM.anon.xes.gz): 24770 process instances (1174 distinct), 124217 events, 18 activity types
Helpdesk process of an ICT service at a University
* [MCRM](downloads/MCRM.anon.xes.gz): 956 process instances (212 distinct), 11218 events, 22 activity types
CRM process at a manufacturing company
* [TSL](downloads/TSL.anon.xes.gz): 17812 process instances (1908 distinct), 83286 events, 42 activity types
Second-line CRM process at a telecom company
* [ICP](downloads/ICP.anon.xes.gz): 12391 process instances (1411 distinct), 65653 events, 70 activity types
Incoming document handling at an insurance company
Contact the authors at:
* [Jochen De Weerdt]( (corresponding author)
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium
* [Seppe vanden Broucke](
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium
* Jan Vanthienen
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium
* Bart Baesens
Department of Decision Sciences and Information Management, KU Leuven
Naamsestraat 69, B-3000 Leuven, Belgium