Hadoop Tutorials.CO.IN
Big Data - Hadoop - Hadoop Ecosystem - NoSQL - Spark

Show now !!

Complex Event Processing (CEP) on disparate, high frequency data streams using Apache Flink and Kafka

by Tanmay Deshpande

What is Complex Event Processing (CEP)?

CEP is a technique to analyze stream of disparate events occurring with high frequency and low latency. In Oil & Gas industry we can imagine it to be sensors data coming from drilling equipment or sensors data from upstream assembly sending information about the temperature, pressure etc. It is very important to analyze the variation patterns to get notified in real time about any change in regular assembly. CEP includes understanding the patterns across the stream of events, sub-events and their sequence. CEP helps identifying meaningful patterns, complex relationships among unrelated events and sending notifications in real and near real time to avoid any damage.

CEP includes various challenges as

  • Ability to produce results as soon as the input event stream is available.
  • Ability to provide computations like aggregation over time, timeout between two events of interest
  • Ability to provide real time/near real time alerts & notifications on detection of complex event patterns
  • Ability to connect and correlate heterogeneous sources and analyze patterns in them.
  • Ability to achieve high throughput, low latency processing.


There are various solutions available in the market. With Big Data technology advancements, we have multiple options like Apache Spark, Apache Samza, and Apache Beam etc. Out of which in this article we are going to talk about one upcoming framework called Apache Flink. Apache Flink is an open source platform for distributed stream and batch data processing. Flink's core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

In order to build a Complex Event Processing engine, I am proposing following architecture

In above system, we will have sensors from various systems sending data to our engine in real time and the events would be persisted in Kafka queues. Kafka being reliable persistent distributed messaging framework, guarantees storage and delivery of messages. Apache Flink's stream engine would listen to the topics and then let the streams pass through. We can define the certain patterns and try to match the patterns against the stream of events. If the pattern matches, notification can be generated.

Following are some code snippets

Defining patterns

Matching patterns with input streams

Generating Alerts

Use of CEP in Oil & Gas

There are various places where the above mentioned system can be used

  • Downstream Use cases - Real Time Drill monitoring, Live tool performance monitoring & failure detection, sub-surface temperature, pressure monitoring & alerting etc.
  • 2. Upstream Use Cases - Monitor production platforms, Monitor temperature, pressure at various joints etc.


Follow us on Twitter

Recommended for you