top of page
Writer's pictureStevanus Mardiady

Using Join Enrichment and Fork Enrichment in Apache NiFi

Updated: Oct 23

Apache NiFi is a powerful data integration tool that allows you to automate the flow of data between systems. Two useful features for enriching data in NiFi are Join Enrichment and Fork Enrichment. In this post, we’ll explain how to use these features, along with sample data and step-by-step instructions.

Fork Enrichment:

Fork Enrichment is used when you want to enrich data by splitting it into multiple streams and performing operations on each stream separately.

Nifi docs: ForkEnrichment


Join Enrichment:

Join Enrichment allows you to combine two datasets based on a common key.

Nifi docs: JoinEnrichment

 
Set Up The Flow in NiFi

NiFi Flow Explanation:

  1. Generate Customer Data:

    We start by generating customers.csv using the GenerateFlowFile processor, which creates flow files based on our sample data

  2. Fork Enrichment:

    After generating the flow file for customers.csv, we run the ForkEnrichment processor to split the flow file into two distinct outputs. Each resulting flow file will contain attributes that help identify their role:

    a. Original Flow File (example):

    1. enrichment.group.id: 8722b85a-0772-4563-9a9a-b197361fc61e (auto-generate)

    2. enrichment.role: ORIGINAL

    b. Enrichment Flow File (example):

    1. enrichment.group.id: 8722b85a-0772-4563-9a9a-b197361fc61e (auto-generate)

    2. enrichment.role: ENRICHMENT

  3. Generate Order Data:

    In this step, we use the ReplaceText processor to create the orders.csv data, generating the necessary flow files.

  4. Join Data:

    Once we have the order data, we will join the original and enrichment flow files based on the customer_id. We accomplish this using a SQL query with a LEFT OUTER JOIN strategy.

  5. Output of the Join:

    The output of the join operation will be generated in the joined relationships, providing a comprehensive view that combines customer details with their corresponding orders.

    The query used is:

SELECT o.customer_id, o.customer_name, o.customer_city, e.order_id, e.order_amount FROM original o LEFT OUTER JOIN enrichment e ON o.customer_id = e.customer_id
 
Sample Data

Filename: customers.csv

Filename: orders.csv

Output Data:

 
Processor Properties
  • Generate customers.csv

Processor Name: Generate FlowFile
  • Generate orders.csv

Processor Name: ReplaceText
  • ForkEnrichment

Processor Name: ForkEnrichment
  • JoinEnrichment

Processor Name: JoinEnrichment
 
Fork & Join Enrichment Use Cases

In our use case, we utilize the Join Enrichment and Fork Enrichment processors in Apache NiFi when we have an input file and a control file which contain some informations to be validated and enriched in the input file. Combining these files, we can process the enriched data and insert it into the database in a single operation. Additionally, these processors are useful for wait-and-notify scenarios, where we need to ensure specific data is deleted from the database before inserting new entries.


The benefits of using these processors include:

  1. The ability to join content from two files seamlessly.

  2. Eliminating the need to send data to the database just to perform a look up validation, allowing for a more efficient workflow.

  3. Optimizing resource usage within NiFi without relying on external resources.

 
Step-by-Step Video Guide

Using Join Enrichment and Fork Enrichment in Apache NiFi


Recent Posts

See All

Comments


bottom of page