Let's try: Airflow 2

Let's try: Airflow 2

This is a new version of Airflow. You can go back read the tutorial I made about Airflow 1 below.

Try Apache Airflow
Apache Airflow is an open-source program under Apache foundation. It allows us to create each step to run in arbitrary sequences and conditions like a flow.

What's new?

As far as I read the changelog, I can summarize the big points that we can reach and use it as a consumer here.

New UI

Of course, it comes with new cleaner user interface, more understandable history page.

source: https://airflow.apache.org/blog/airflow-two-point-oh-is-here/

TaskFlow API

Now we can add the decorator @task on top of a method and assign them as a python operator. This feature is new to me as well in order to make code cleaner and easier to read. I will write this later.

and et cetera.

A little gimmick here. Twisting fan.

Can visit the official page here to read all changes.


Installation in docker

I do have time back then to build a container of Airflow 1 in my macbook but found the official image at that time isn't good enough. Kind of information is lacking in the website and configurations are lots to go. So I ended up using Puckel's image instead.

Now the official docker compose for Airflow 2 has been launched here so I have no need to find out an other more reliable one.

I have assembled the steps defined at the Airflow documentation page into a single repo below. You can clone and try yourself.

GitHub - bluebirz/airflow-docker: Docker-compose for local airflow development
Docker-compose for local airflow development. Contribute to bluebirz/airflow-docker development by creating an account on GitHub.

Details in the repo

  • The installation starts from docker-compose.yaml. If you want to get more familiar with this, can visit my latest blog below.
  • The original docker-compose.yaml relies on default image but I want ability to add additional  python packages, so that I create a simple dockerfile. You can add yours at requirements.txt and constraint.txt if needed.
  • Disable sample DAGs at line 59 of docker-compose.yaml
  AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
  • Prepare all necessary directory: /dags, /logs, /plugins. Mostly use /dags for works.

Let's hands-on

0. Prepare all dependencies

Make sure all dependencies of our works are listed in requirements.txt and constraints.txt (if any) before go next.

For example, I put the package pysftp in the file so when we build an image up, the following packages will be installed in the worker instance and be ready to use.

Also find the available packages via https://pypi.org.

1. docker-compose up

Now it is ready to go. Run this to roll the ball.

docker-compose up

Seconds after this will show the necessary images are downloading.

Next are to create the scheduler which is for scheduling our jobs, and the worker that execute them.

And the last one is webserver. We see this means we are ready to view the airflow webpage now.

2. Logging in

Open a browser then go to http://localhost:8080. Use the username/password as airflow/airflow to login this page.

It should be successful and now we can see the first page of Airflow here.

3. Try add a DAG

Back to the editor. I example the DAG from https://airflow.apache.org/docs/apache-airflow/stable/tutorial/fundamentals.html and save it in the folder /dags like this.

It wouldn't show in the DAGs list instantly. We can trigger them by

1. Access the worker instance with the command.

docker exec -it <airflow-worker-container-name> /bin/bash

The airflow-worker-container-name can be retrieved by docker ps -a and choose one with worker in its name.

2.  List the DAGs with command.

airflow dags list

It will automatically compile DAGs files and display on the web if all is successful.

4. Break a DAG to see errors

In some cases, the web show some errors, for example we programmatically put wrong syntaxes or imports.

We can check the failed one by this command.

airflow dags list-import-errors

There will be a table showing all error messages in every DAGs files.

5. Let the DAG runs

Say every DAGs are good. When we click one and go see the UI of DAG history is improved. It's more modern, clearer, and neater.

DAG history is quite easier to read. It shows some basic stats about the DAGs at a side.

And the graph is not much differ, yet better, right?

6. docker-compose down

Stop the running terminal ( control + c on Mac) and down them using this command.

# stop all containers
docker-compose down

# stop all containers and remove everything
docker-compose down --volumes --rmi all

This is just an introduction of Airflow 2 developing on Docker technology. I don't recommend this for Production but it is great for local development.

If you are looking for the best way to deal with Airflow job, you can try this way and hope it will be useful for you.

Show Comments