Framsteg Think Tank - Heuristic Algorithm for Process Mining

Heuristic Algorithm for Process Mining

Written by Jani Podlesny

font size decrease font size increase font size
Print
Email

What is process mining?

Process mining is a technique in the field of process management which allows user to analyse business processes based on behavior and event logs. Basically the original idea is to extract knowledge from IT systems in order to visualize them on the meta level. In most cases these knowledge is represented through logs. With the usage of discovering and conformance metrics we are able to gain and verify our results.

Why is this so important?

Imagine the following situation. Your company has a high response time or the production time of a product or service takes very much unexpected time. The reason is not necessary that our employees are not good or skills, rather a typically reason is that there are weak points in the value chain within the business process. Like the situation that a department can't handle to amount of work due to missing or inexperienced personnel.

If you model the value chain of the german administration system in citizen center one would get an sad result to due the budgets cuts within the last years. Therefor it takes more 3 weeks to get an appointment for requesting a passport and another 3 month to receive it. Another example would be the production of a car within factory road. We all know that the most weak point will determine the speed of production. The process mining part will find the causes for that.

We already discussed a similar topic within the area of security analysis of organized IT crime.

How to mine processes?

There are several ways and attempts in order to extract knowledge from event logs. One of the most common is the Alpha Algorithm with its improvements Alpha(+/++):

“The Alpha(+/++) Algorithm aims at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Măruşter. Several extensions or modifications of it have since been presented, which will be listed below. Within the concept of the algorithm one takes a workflow log as input and results in a workflow net being constructed. It does so by examining causal relationships observed between tasks. For example, one specific task might always precede another specific task in every execution trace, which would be useful information.”

So what is the problem?

The biggest advantage of the heuristic algorithm is also its main problem. The threshold. By increasing the threshold we are able to remove instances with a low frequence. But we have to watch out because the threshold applies to the entire net and not single edges within it. Therefor, there is always the possibility to remove process relevant information by increasing the threshold and we try to handle this failure.

Let me make that clear with a small example. We have the log containing the following entries:

Normally with the attempt of the heuristic miner, we would increase the threshold up to 3 in order to get rid of our assumed noise. For final safety reasons we always have to interview a domain expert. That not our goal, therefore we have to think about something else.

The workflow above is the original process represented in our log. So if we would have increased our threshold up to the value of 3, our main failover plan doesn't work anymore. And this issue addresses all relating processes with backup and failover technology because a backup or failover should only appear in 1 of 1000000000 cases in our event log. Because its a FAIL over and not the average case.

Enclosed a detailed explanation.

Just imagine we increase the threshold and kick out failover instances within the process. That would be the state of emergency.

So, how can we handle this disadvantage?

Basically we thought of a two step improvement of the heuristic miner.

The Preprocessing Stage indicates the major part in order to find the sibling model of our research instance. Here we are comparing the log against all logs in our archive with algorithms close to the predictable behavioral analysis group. But instead of comparing behaviors we take a look at the activities in order to find relative ones. If we have a match, we will mark the congruent model.

During the Postprocessing Stage we are able to compare the results of our heuristic miner with the results of the preprocessing stage. If we increase the threshold we can compare in time against the congruent model if process relevant activities get kicked out. Therefore we are able to increase the threshold without loosing process relevant activities.

We developed this technique as SaaS. We you want to know more about it, just step over to our project site:

http://process-mining.framsteg.de

Thanks to your partners who helped us to create a comparable archive with sample processes for the comparison part.

Jani Podlesny

Head of Engineering

I am focusing on Data Architecture and Analytics for Management Consulting across EMEA and the US. For my passion in Data Profiling & Privacy I am doing a PhD research at the Hasso- Plattner- Institute.