Getting hands on Apache Nifi
Learning journey to Apache Nifi
Background
Apache Nifi (nye fye) is a decent tool. It minimizes writing code in many cases. The tool forced me to think in flow-based style which was totally unknown area for me.
The tool is frequently used as an orchestrator only when teams/developers don’t invest enough time in learning it. The orchestration capability is a small fraction of Nifi’s rich feature set. There are many built-in ready-to-use data processors the tool provides for you. The learning curve is a bit steeper than learning how to print your name with Python. That’s why I decided to start sharing ideas on how to solve various tasks with flow-based programming using Nifi. I hope this article is the beginning of ongoing series.
Knowledge sources
These sources helped me a lot to start understanding what I’m doing all day long.
- Wikipedia - read at least flow-based programming wiki’s page to get an idea of this programming paradigm.
- Official Apache Nifi documentation
- Apache Nifi Overview explains the concepts and the programming perspective.
- Apache Nifi Terminology explains briefly the system’s domain model.
- Apache Nifi source code is the most genuine source of truth.
- Cloudera/Hortonworks Community Forum presents many case studies and implementation directions.
- Cloudera Dataflow Platform (CDF)
- Best of Nifi presented by Pierre Villard
- Stackoverflow questions on Nifi
- ExecuteScript Cookbook
- ExecuteScript Cookbook - json2json
- Trial & Error experience
Environment and tools
- Official Apache Nifi docker image runs the system in a single-node execution mode.
$ sudo docker run --name nifi -p 8080:8080 -d apache/nifi:latest
- Any OS that supports docker. I use Ubuntu.
- Load existing workflow on docker container’s startup.
Configuration
Docker
Tested with Ubuntu 18.04.
I had to change the docker root directory because /var is a small partition on my system. Docker stores its data by default in /var/lib/docker directory. The steps are:
# Stop docker daemon
$ sudo service docker stop
# Create new docker root directory
$ sudo mkdir /opt/docker
# Copies all existing docker data to the new docker root location
$ sudo rsync -a /var/lib/docker/ /opt/docker/
# many 3rd party tools expect docker to be located in /var/lib/docker
$ sudo ln -s /opt/docker /var/lib/docker
$ sudo service docker start
$ sudo service docker status
$ sudo docker info
# Output
...
Docker Root Dir: /opt/docker
...
Tasks
I share a tasks’ list I covered with Nifi.
- Save large XML files into JSON-formatted documents in ElasticSearch (ES) store.
- Send a mobile push notification when an XML file is processed successfully.
- Clean invalid cache data when a new XML file is processed.
- Saving ES data without disturbing current production users. I leverage ES’ alias/index feature.
- Create maintenance jobs to clean up old resources.
- Invoke CLI tools/scripts.
- Consume REST API via HTTP requests.
Stanislav Petrov BLOG · TECH
apache nifi flow-based programming dataflow programming big data