‌Continue for part 2.

As we know that Apache Beam pipeline will process like a waterfall from top to bottom, and also no cycle. This is what we call "DAG" or "Directed Acyclic Graph".

We write Beam code in Python and we also can generate a DAG in visual figure using a few steps.


1. Install Graphviz

graphviz is a common package for generating any diagram using DOT language. We need to install this first and there are many installation method depends on your platform. See all download list at https://graphviz.org/download/

For me, I prefer using brew.

brew install graphviz

Verify if graphviz has been installed properly with the command.

dot -V # capital `V`

Then we should see its version.

read more about brew at link below.

Homebrew - One place for all
Homebrew is a package manager for MacOS and Linux. Most of necessary, popular, or essential packages (and programs) can be found here.

2. Apply RenderRunner in Beam

Now we go back to our Beam code and update the code like this.

We are using RenderRunner to generate a DOT script for graphviz. Read more about this runner at this doc.

Also we put beam.options.pipeline_options.PipelineOptions() for the parameter options as well or it won't generate a figure.


3. Execute

Let's say we have a complete code like this one.

What we should do next is to run this with parameter --render_output="<path>". For example:

python3 main.py --render_output="dag.png"

Therefore we will see "dag.png" as follows.

However, if we name the step like this.

The figure it generated also has the name we put.