stapetro home
Learning journey to Apache Nifi
Logo source is Apache Nifi website

Getting hands on Apache Nifi

Learning journey to Apache Nifi

I share my steps I walked through learning Apache Nifi

Background

Apache Nifi (nye fye) is a decent tool. It minimizes writing code in many cases. The tool forced me to think in flow-based style which was totally unknown area for me.

The tool is frequently used as an orchestrator only when teams/developers don’t invest enough time in learning it. The orchestration capability is a small fraction of Nifi’s rich feature set. There are many built-in ready-to-use data processors the tool provides for you. The learning curve is a bit steeper than learning how to print your name with Python. That’s why I decided to start sharing ideas on how to solve various tasks with flow-based programming using Nifi. I hope this article is the beginning of ongoing series.

Knowledge sources

These sources helped me a lot to start understanding what I’m doing all day long.

  1. Wikipedia - read at least flow-based programming wiki’s page to get an idea of this programming paradigm.
  2. Official Apache Nifi documentation
    1. Apache Nifi Overview explains the concepts and the programming perspective.
    2. Apache Nifi Terminology explains briefly the system’s domain model.
  3. Apache Nifi source code is the most genuine source of truth.
  4. Cloudera/Hortonworks Community Forum presents many case studies and implementation directions.
  5. Cloudera Dataflow Platform (CDF)
  6. Best of Nifi presented by Pierre Villard
  7. Stackoverflow questions on Nifi
  8. ExecuteScript Cookbook
  9. ExecuteScript Cookbook - json2json
  10. Trial & Error experience

Environment and tools

  1. Official Apache Nifi docker image runs the system in a single-node execution mode.
$ sudo docker run --name nifi -p 8080:8080 -d apache/nifi:latest
  1. Any OS that supports docker. I use Ubuntu.
  2. Load existing workflow on docker container’s startup.

Configuration

Docker

Tested with Ubuntu 18.04.

I had to change the docker root directory because /var is a small partition on my system. Docker stores its data by default in /var/lib/docker directory. The steps are:

# Stop docker daemon
$ sudo service docker stop

# Create new docker root directory
$ sudo mkdir /opt/docker
# Copies all existing docker data to the new docker root location
$ sudo rsync -a /var/lib/docker/ /opt/docker/
# many 3rd party tools expect docker to be located in /var/lib/docker
$ sudo ln -s /opt/docker /var/lib/docker

$ sudo service docker start
$ sudo service docker status
$ sudo docker info
# Output
...
Docker Root Dir: /opt/docker
...

Tasks

I share a tasks’ list I covered with Nifi.

  1. Save large XML files into JSON-formatted documents in ElasticSearch (ES) store.
  2. Send a mobile push notification when an XML file is processed successfully.
  3. Clean invalid cache data when a new XML file is processed.
  4. Saving ES data without disturbing current production users. I leverage ES’ alias/index feature.
  5. Create maintenance jobs to clean up old resources.
  6. Invoke CLI tools/scripts.
  7. Consume REST API via HTTP requests.

Stanislav Petrov BLOG · TECH
apache nifi flow-based programming dataflow programming big data