Let's try: Apache Beam part 2 – draw the graph
in this series
- Let's try: Apache Beam part 1 – simple batch
- Let's try: Apache Beam part 2 – draw the graph
- Let's try: Apache Beam part 3 - my own functions
- Let's try: Apache Beam part 4 - live on Google Dataflow
- Let's try: Apache Beam part 5 - transform it with Beam functions
- Let's try: Apache Beam part 6 - instant IO
- Let's try: Apache Beam part 7 - custom IO
- Let's try: Apache Beam part 8 - Tags & Side inputs
Continue for part 2.
As we know that Apache Beam pipeline will process like a waterfall from top to bottom, and also no cycle. This is what we call "DAG" or "Directed Acyclic Graph".
We write Beam code in Python and we also can generate a DAG in visual figure using a few steps.
1. Install Graphviz
graphviz
is a common package for generating any diagram using DOT language. We need to install this first and there are many installation method depends on your platform. See all download list at https://graphviz.org/download/
For me, I prefer using brew
.
brew install graphviz
Verify if graphviz
has been installed properly with the command.
dot -V # capital `V`
Then we should see its version.
read more about brew
at link below.
2. Apply RenderRunner in Beam
Now we go back to our Beam code and update the code like this.
We are using RenderRunner
to generate a DOT script for graphviz
. Read more about this runner at this doc.
Also we put beam.options.pipeline_options.PipelineOptions()
for the parameter options
as well or it won't generate a figure.
3. Execute
Let's say we have a complete code like this one.
What we should do next is to run this with parameter --render_output="<path>"
. For example:
python3 main.py --render_output="dag.png"
Therefore we will see "dag.png" as follows.
However, if we name the step like this.
The figure it generated also has the name we put.