Apache NiFi is a powerful data integration tool that allows you to automate the flow of data between systems. Two useful features for enriching data in NiFi are Join Enrichment and Fork Enrichment. In this post, we’ll explain how to use these features, along with sample data and step-by-step instructions.
Fork Enrichment:
Fork Enrichment is used when you want to enrich data by splitting it into multiple streams and performing operations on each stream separately.
Nifi docs: ForkEnrichment
Join Enrichment:
Join Enrichment allows you to combine two datasets based on a common key.
Nifi docs: JoinEnrichment
Set Up The Flow in NiFi
NiFi Flow Explanation:
Generate Customer Data:
We start by generating customers.csv using the GenerateFlowFile processor, which creates flow files based on our sample data
Fork Enrichment:
After generating the flow file for customers.csv, we run the ForkEnrichment processor to split the flow file into two distinct outputs. Each resulting flow file will contain attributes that help identify their role:
a. Original Flow File (example):
enrichment.group.id: 8722b85a-0772-4563-9a9a-b197361fc61e (auto-generate)
enrichment.role: ORIGINAL
b. Enrichment Flow File (example):
enrichment.group.id: 8722b85a-0772-4563-9a9a-b197361fc61e (auto-generate)
enrichment.role: ENRICHMENT
Generate Order Data:
In this step, we use the ReplaceText processor to create the orders.csv data, generating the necessary flow files.
Join Data:
Once we have the order data, we will join the original and enrichment flow files based on the customer_id. We accomplish this using a SQL query with a LEFT OUTER JOIN strategy.
Output of the Join:
The output of the join operation will be generated in the joined relationships, providing a comprehensive view that combines customer details with their corresponding orders.
The query used is:
SELECT o.customer_id, o.customer_name, o.customer_city, e.order_id, e.order_amount FROM original o LEFT OUTER JOIN enrichment e ON o.customer_id = e.customer_id
Sample Data
Filename: customers.csv
Filename: orders.csv
Output Data:
Processor Properties
Generate customers.csv
Generate orders.csv
ForkEnrichment
JoinEnrichment
Fork & Join Enrichment Use Cases
In our use case, we utilize the Join Enrichment and Fork Enrichment processors in Apache NiFi when we have an input file and a control file which contain some informations to be validated and enriched in the input file. Combining these files, we can process the enriched data and insert it into the database in a single operation. Additionally, these processors are useful for wait-and-notify scenarios, where we need to ensure specific data is deleted from the database before inserting new entries.
The benefits of using these processors include:
The ability to join content from two files seamlessly.
Eliminating the need to send data to the database just to perform a look up validation, allowing for a more efficient workflow.
Optimizing resource usage within NiFi without relying on external resources.
Step-by-Step Video Guide
Using Join Enrichment and Fork Enrichment in Apache NiFi
Comments