Using Join Enrichment and Fork Enrichment in Apache NiFi

Stevanus Mardiady

Sep 30, 20242 min read

Updated: Oct 23, 2024

Apache NiFi is a powerful data integration tool that allows you to automate the flow of data between systems. Two useful features for enriching data in NiFi are Join Enrichment and Fork Enrichment. In this post, we’ll explain how to use these features, along with sample data and step-by-step instructions.

Fork Enrichment:

Fork Enrichment is used when you want to enrich data by splitting it into multiple streams and performing operations on each stream separately.

Nifi docs: ForkEnrichment

Join Enrichment:

Join Enrichment allows you to combine two datasets based on a common key.

Nifi docs: JoinEnrichment

Set Up The Flow in NiFi

NiFi Flow Explanation:

Generate Customer Data:
We start by generating customers.csv using the GenerateFlowFile processor, which creates flow files based on our sample data
Fork Enrichment:
After generating the flow file for customers.csv, we run the ForkEnrichment processor to split the flow file into two distinct outputs. Each resulting flow file will contain attributes that help identify their role:
a. Original Flow File (example):
1. enrichment.group.id: 8722b85a-0772-4563-9a9a-b197361fc61e (auto-generate)
2. enrichment.role: ORIGINAL
b. Enrichment Flow File (example):
1. enrichment.group.id: 8722b85a-0772-4563-9a9a-b197361fc61e (auto-generate)
2. enrichment.role: ENRICHMENT
Generate Order Data:
In this step, we use the ReplaceText processor to create the orders.csv data, generating the necessary flow files.
Join Data:
Once we have the order data, we will join the original and enrichment flow files based on the customer_id. We accomplish this using a SQL query with a LEFT OUTER JOIN strategy.
Output of the Join:
The output of the join operation will be generated in the joined relationships, providing a comprehensive view that combines customer details with their corresponding orders.
The query used is:

SELECT o.customer_id, o.customer_name, o.customer_city, e.order_id, e.order_amount FROM original o LEFT OUTER JOIN enrichment e ON o.customer_id = e.customer_id

Sample Data

Filename: customers.csv

Filename: orders.csv

Output Data:

Processor Properties

Generate customers.csv

Generate orders.csv

ForkEnrichment

JoinEnrichment

Fork & Join Enrichment Use Cases

In our use case, we utilize the Join Enrichment and Fork Enrichment processors in Apache NiFi when we have an input file and a control file which contain some informations to be validated and enriched in the input file. Combining these files, we can process the enriched data and insert it into the database in a single operation. Additionally, these processors are useful for wait-and-notify scenarios, where we need to ensure specific data is deleted from the database before inserting new entries.

The benefits of using these processors include:

The ability to join content from two files seamlessly.
Eliminating the need to send data to the database just to perform a look up validation, allowing for a more efficient workflow.
Optimizing resource usage within NiFi without relying on external resources.

Step-by-Step Video Guide

Using Join Enrichment and Fork Enrichment in Apache NiFi

Using Join Enrichment and Fork Enrichment in Apache NiFi

Set Up The Flow in NiFi

Sample Data

Processor Properties

Fork & Join Enrichment Use Cases

Step-by-Step Video Guide

Recent Posts

Comentarios