A private repo for our own Python packages
Functions are very common in programming schemes. They are useful for repeated operations and readability. Then, what if we have to add the same functions into our programs for many?
Imagine we have many projects in our hands, and some of them need a same function. We might end up copying that same function into each. That will be a redundancy problem which mean we probably have to maintain the function multiple times based on number of copies.
Talk a basic one
Let's say we have files like this.
We have a function to sum all integers in a list here in adder.py
.
We can import it into a main program main.py
like this.
Run and the output should be like the following.
But, how could we share this adder
function for the team?
Introduce Google Artifact Registry
Google Artifact Registry is a service from Google to store an image, a package in Python, NodeJS, and much more (see full list here.)
We now use it to store our functions. Here is the list of our missions today.
- Build a package and upload to Google Artifact Registry repository
- Prepare setting to access the repo
- Install the package
- Test if we can import the package successfully
Let's go!
1. Build and upload
(1) Prepare a repository in Google Artifact Registry
- Make sure the API is enabled, otherwise enable it.
- Create a repo. Feel free to use the web console, but this time we use
gcloud
command
gcloud artifacts repositories create {REPO-NAME} --repository-format=python --location={LOCATION}
- Verify if the repo is ready
(2) Prepare a package
Install libraries.
Setup files for packaging.
- LICENSE
- README.md
- pyproject.toml
-
src
files. Should have a folder with same name as project name inpyproject.toml
at line #6 to avoid naming mistakes. test
files. Can be empty at this moment.
(3) Build the package
python3 -m build
As a result, we should see the folder dist
in the same directory as src
(4) Upload to Google Artifact Registry
Now it's time to upload our package to the repo on Google Artifact Registry.
twine upload --repository-url https://{LOCATION}-python.pkg.dev/{PROJECT-ID}/{REPO-NAME}/ dist/*
(5) Verify the package
- Web UI
- List packages
gcloud artifacts packages list --repository={REPO-NAME} --location={LOCATION}
- List package versions
gcloud artifacts versions list --package={PACKAGE-NAME} --repository={REPO-NAME} --location={LOCATION}
2. Access the repo
Now we already have the first package in our Google Artifact Registry repo. So what should we do next to access and grab it?
We need 3 things
.pypirc
pip.conf
requirements.txt
with our index URLs
(1) Print setting from the repo
Run the command
gcloud artifacts print-settings python \
--project={PROJECT-ID} \
--repository={REPO-NAME} \
--location={LOCATION}
And we should get the similar output.
(2) Copy a part of output to .pypirc
The .pypirc
would be like this
(3) Copy another part to pip.conf
Like this one
(4) Add a package name for requirements.txt
-i
means the flag --index-url
. We need this to tell pip
to find this package name in that URL as well.
(5) Final structure
3. Install the packages
At this step, we should install the packages we developed. Just using the command.
pip install -r requirements.txt
See we finally got the package in our environment now. Verify with the command.
pip list | grep {PACKAGE-NAME}
When we go see the folders inside our virtualenv
, we would find our files there.
4. Test the package
The last step is to ensure we can import the package properly and successfully. Now we can import
from the folder name like this.
And run it with confidence.
YEAH!! WE DID IT!!
Integrate with Docker image
Let's move to next topic. Basically Docker image is a fundamental tool for development. We shall apply this package with the image as follows.
1. Prepare structure
Let's say we have files in this structure. Don't forget .pypirc
, pip.conf
, and requirements.txt
2. Understand "OAuth 2.0 token" from GCP
When we work with a Docker image, need to know that we can't directly access GCP APIs unlike running a gcloud
command on our laptop. This means, our one big question is how can we authenticate to access the Google Artifact Registry repo.
The answer is, to authenticate through "OAuth 2.0 Token".
In brief, "OAuth 2.0 token" is a long long string used for authenticating to a system, in this case is Google Cloud Platform. Follow the link below to read more.
3. Apply OAuth 2.0 token
We will generate the OAuth 2.0 token and add it into the requirements.txt
in order to authorized access and read then download the package.
This is what requirements.txt
in OAuth 2.0 token version looks like
At the token part, that ya29.abc123
, we need to generate it with the command.
gcloud auth print-access-token
Learn more about this command here.
One thing to remember is storing credentials in Git is bad practice.
So what should we do? We will create the requirements.txt
with OAuth 2.0 token from the raw version inside the image and delete that OAuth 2.0 token version as soon as the installation is completed.
4. Define Dockerfile
As mentioned above, now we can create a Dockerfile
- Get a token as a parameter by
ARG TOKEN
at line #4 - Normally
requirements.txt
has-i
ashttps://{LOCATION}...
, so we need to substitute to another withawk
(usingsed
before yet I got many errors) - Once substitution completed, save result into another
requirements.txt
, name ittokenized_requirements.txt
pip install
fromtokenized_requirements.txt
- Delete
tokenized_requirements.txt
not to leak the credentials - Put
CMD
at the end to run the command when an image container is run.
5. Build an image and test run
Now build an image with this command
docker build \
--no-cache \
--progress=plain \
--build-arg TOKEN=$(gcloud auth print-access-token) \
-t entry-point:latest .
--no-cache
means building this image without any cache from previous builds--progress=plain
means printing out the build progress in plain format- variable
TOKEN
can be parsed via flag---build-arg
- Name it "entry-point" by flag
-t
Once the image is there, we can run to see the result.
docker run -it --name testpy entry-point
And yes, it's correct.
Bottomline diagram
I write the diagram to summarize all process above.
All materials in this blog also is at the github repo below
Bonus track
If using Google Cloud Composer, we can setup to install the package from Google Artifact Registry by following this link.
References
- Python packaging
https://packaging.python.org/en/latest/tutorials/packaging-projects/ - Publish with twine
https://www.geeksforgeeks.org/how-to-publish-python-package-at-pypi-using-twine-module/ - Google Artifact Registry official docs
https://cloud.google.com/artifact-registry/docs/python/authentication - Python package in Google Artifact Registry on Medium
https://lukwam.medium.com/python-packages-in-artifact-registry-d2f63643d2b7 - String substitution not using
sed
https://unix.stackexchange.com/questions/97582/how-to-find-and-replace-string-without-use-command-sed