DAG integrity - unit test your DAG before deploy
Hi, guess you are not getting bored about Airflow stuff.
This blog, we are going to see how can we make sure our DAG is looking good. With DAG integrity checking method, we can ensure at our DAG is proved to be executable and has no error in a basic level.
The objective
At a unit test step, we just want to guarantee our DAGs are imported correctly. No syntax errors nor library import errors. We don't do proving our pipeline is perfect at this time, we do that in integration test or end-to-end test.
What should we do now?
DAGBAG
DAGBAG is a module in Airflow DAG. It stores the DAGs and has structured in DAGs' metadata. For more info, visit this link.
We can use this module to verify our DAGs are imported properly. Like this code stub.
After importing DagBag
and initiate the class object as dagbag
at line 3, we can print out its attributes .dags
and .import_errors
to see list of DAGs and list of errors if any.
This is similar to the commands we used in the last blog. Follow this link below if you want to re-read.
Combine with unittest
We use unittest
together with this DagBag
to test if we have the target DAG or not.
Please follow this link to a complete scripts.
Now we try run this command to validate the DAG and see we found a DAG in DagBag
.
python tests/dag_integrity.py
If DAGs are good, we shall see this message.
Otherwise, it will show an error like this.
Further applications
This is great to do unit test before deploying our apps to server, either preproduction or production.
We could add the command into our CI/CD stages. Any in our favor, Github action, Bitbucket pipeline, Google Cloud Build and others.
Have a great day with no bugs.