Functions are very common in programming schemes. They are useful for repeated operations and readability. Then, what if we have to add the same functions into our programs for many?

Imagine we have many projects in our hands, and some of them need a same function. We might end up copying that same function into each. That will be a redundancy problem which mean we probably have to maintain the function multiple times based on number of copies.


Talk a basic one

Let's say we have files like this.

We have a function to sum all integers in a list here in adder.py.

We can import it into a main program main.py like this.

Run and the output should be like the following.

But, how could we share this adder function for the team?


Introduce Google Artifact Registry

Google Artifact Registry is a service from Google to store an image, a package in Python, NodeJS, and much more (see full list here.)

We now use it to store our functions. Here is the list of our missions today.

  1. Build a package and upload to Google Artifact Registry repository
  2. Prepare setting to access the repo
  3. Install the package
  4. Test if we can import the package successfully

Let's go!

1. Build and upload

(1) Prepare a repository in Google Artifact Registry

  • Make sure the API is enabled, otherwise enable it.
  • Create a repo. Feel free to use the web console, but this time we use gcloud command
gcloud artifacts repositories create {REPO-NAME} --repository-format=python --location={LOCATION}
  • Verify if the repo is ready

(2) Prepare a package

Install libraries.

Setup files for packaging.

  • LICENSE
  • README.md
  • pyproject.toml
  • src files. Should have a folder with same name as project name in pyproject.toml at line #6 to avoid naming mistakes.
  • test files. Can be empty at this moment.

(3) Build the package

python3 -m build

As a result, we should see the folder dist in the same directory as src

(4) Upload to Google Artifact Registry

Now it's time to upload our package to the repo on Google Artifact Registry.

twine upload --repository-url https://{LOCATION}-python.pkg.dev/{PROJECT-ID}/{REPO-NAME}/ dist/*

(5) Verify the package

  • Web UI
  • List packages
gcloud artifacts packages list --repository={REPO-NAME} --location={LOCATION}
  • List package versions
gcloud artifacts versions list --package={PACKAGE-NAME} --repository={REPO-NAME} --location={LOCATION}

2. Access the repo

Now we already have the first package in our Google Artifact Registry repo. So what should we do next to access and grab it?

We need 3 things

  1. .pypirc
  2. pip.conf
  3. requirements.txt with our index URLs

(1) Print setting from the repo

Run the command

gcloud artifacts print-settings python \
--project={PROJECT-ID} \
--repository={REPO-NAME} \
--location={LOCATION}

And we should get the similar output.

(2) Copy a part of output to .pypirc

The .pypirc would be like this

(3) Copy another part to pip.conf

Like this one

(4) Add a package name for requirements.txt

-i means the flag --index-url. We need this to tell pip to find this package name in that URL as well.

(5) Final structure

3. Install the packages

At this step, we should install the packages we developed. Just using the command.

pip install -r requirements.txt

See we finally got the package in our environment now. Verify with the command.

pip list | grep {PACKAGE-NAME}

When we go see the folders inside our virtualenv, we would find our files there.

4. Test the package

The last step is to ensure we can import the package properly and successfully. Now we can import from the folder name like this.

And run it with confidence.

YEAH!! WE DID IT!!


Integrate with Docker image

Let's move to next topic. Basically Docker image is a fundamental tool for development. We shall apply this package with the image as follows.

1. Prepare structure

Let's say we have files in this structure. Don't forget .pypirc, pip.conf, and requirements.txt

2. Understand "OAuth 2.0 token" from GCP

When we work with a Docker image, need to know that we can't directly access GCP APIs unlike running a gcloud command on our laptop. This means, our one big question is how can we authenticate to access the Google Artifact Registry repo.

The answer is, to authenticate through "OAuth 2.0 Token".

In brief, "OAuth 2.0 token" is a long long string used for authenticating to a system, in this case is Google Cloud Platform. Follow the link below to read more.

Using OAuth 2.0 to Access Google APIs | Authorization | Google Developers

3. Apply OAuth 2.0 token

We will generate the OAuth 2.0 token and add it into the requirements.txt in order to authorized access and read then download the package.

This is what requirements.txt in OAuth 2.0 token version looks like

At the token part, that ya29.abc123, we need to generate it with the command.

gcloud auth print-access-token

Learn more about this command here.

One thing to remember is storing credentials in Git is bad practice.

So what should we do? We will create the requirements.txt with OAuth 2.0 token from the raw version inside the image and delete that OAuth 2.0 token version as soon as the installation is completed.

4. Define Dockerfile

As mentioned above, now we can create a Dockerfile

  • Get a token as a parameter by ARG TOKEN at line #4
  • Normally requirements.txt has -i as https://{LOCATION}..., so we need to substitute to another with awk (using sed before yet I got many errors)
  • Once substitution completed, save result into another requirements.txt, name it tokenized_requirements.txt
  • pip install from tokenized_requirements.txt
  • Delete tokenized_requirements.txt not to leak the credentials
  • Put CMD at the end to run the command when an image container is run.

5. Build an image and test run

Now build an image with this command

docker build \
--no-cache \
--progress=plain \
--build-arg TOKEN=$(gcloud auth print-access-token) \
-t entry-point:latest .
  • --no-cache means building this image without any cache from previous builds
  • --progress=plain means printing out the build progress in plain format
  • variable TOKEN can be parsed via flag ---build-arg
  • Name it "entry-point" by flag -t

Once the image is there, we can run to see the result.

docker run -it --name testpy entry-point

And yes, it's correct.


Bottomline diagram

I write the diagram to summarize all process above.

All materials in this blog also is at the github repo below

GitHub - bluebirz/google-artifact-registry-custom-module: Private repo for custom modules
Private repo for custom modules. Contribute to bluebirz/google-artifact-registry-custom-module development by creating an account on GitHub.

Bonus track

If using Google Cloud Composer, we can setup to install the package from Google Artifact Registry by following this link.

Install Python dependencies for Cloud Composer | Google Cloud
Learn how to install PyPI packages for your Cloud Composer environment.

References