Welcome to the fourth lecture,
Last Updated: 2024-1-11
By the end of today's lecture, you should have
docker cli
locally ( you can use any product suite that works for you such as docker-desktop, rancher-desktop, orbstack etc etc)make
on your local machineMany iterative approaches today originate somewhere around this manifesto https://agilemanifesto.org
Evolutionary Architecture is the art of constraining the blast radius of an incorrect decision (find the correct M Fowler citation), we will now explore methods of communication and process around iterative approaches:
When playing with an idea, we need to start with the least amount of cognitive friction as possible. So, the ability to set up a local development environment where we are independent of various failure modes (like e.g. the internet) is crucial.
We want to avoid context switching and rabbit holes, as much as possible.
Later in the lectures, we ll work on kubernetes, which is distributed, prescribes policies, is remote and hard to debug -> immediately writing a deployment for k8s, would be not just a colossal time-sink, but also distract us from understanding the core idea that we are iterating over. it would steal our focus.
Our goal must be to have a setup in which we can elastically fail and react quickly to the insights gained from each failed attempt, such that we can keep cognitive coherence.
We must proceed systematically and pose hypotheses that we can note down after each attempt. We can use the time when we write into an ADR as a reflection or meditation session between attempts.
It is tempting to go too fast, and forget to note down in a lightweight way our insights. This usually leads to regrets later.
https://martinfowler.com/articles/developer-effectiveness.html
Architecture Decision Records (ADRs) are documents that capture important architectural decisions along with their context and consequences. They are beneficial for team reasoning for several reasons:
In summary, ADRs are a valuable tool for improving communication, preserving knowledge, and ensuring transparency and consistency in decision-making within a team
https://martinfowler.com/articles/scaling-architecture-conversationally.html
Lets see what projects are out there and how they differ
This is the only lecture where we won't have a CNCF tile for. But worry not, there's tons of products out there to tempt you to buy their solutions.
Some of the tools listed in App-Delivery are relevant for DevX, but it's a bit all over the place
But: we need to make sure that git can really tell it's actually YOU
So, first off: let's set up some best practices, which is signing your commits:
https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits
Now test , that your commit show up as ‘verified' on github
export GPG_TTY=$(tty) git add something git commit -m "test if this commit is verified" git push
Rationale:
Signing your Git commits involves attaching a digital signature to each commit you make. This signature is created using a private key that only you should have access to.
The main benefits of signing your commits are:
These benefits help protect you from identity attacks. For example, if someone gains access to your Git account, they can't make signed commits without also having access to your private key. This means that even if they make unauthorized commits, those commits won't be signed and can be easily identified as not being made by you.
To verify the signature of a commit, others need your public key. If the commit's signature matches your public key, they can be confident that the commit was made by you and hasn't been tampered with.
Remember, the effectiveness of commit signing as a security measure ultimately depends on how secure your private key is. If your private key is compromised, an attacker could make commits that appear to be signed by you.
If you don't sign your Git commits, someone else can pretend to be you by simply configuring Git with your name and email address. Here's how they can do it:
They configure Git with your name and email address:
git
config
--global
user.name
"Your Name"
git
config
--global
user.email
"your.email@example.com"
They make a commit:
git
commit
-m
"This commit looks like it was made by you"
Now, when someone views this commit, it will appear to have been made by you, because the commit's author information matches your name and email address. However, if you sign your commits, this kind of impersonation can be easily detected. Even though the impersonator can set the author information to your name and email address, they can't create a valid signature for the commit without your private key. So, if someone checks the signature of the commit, they'll see that it's either missing or doesn't match your public key, indicating that the commit wasn't actually made by you.
Do not believe the internet, when it comes to so-called best practises, please stick to official recommendations, especially those from known security experts
You could start on killercoda https://killercoda.com/docker , we'll likely not have time during class for this one. You can do it in your own time.
Docker containers can replace local installations by encapsulating the software and its dependencies into a standardized unit for software development. Here's how it works:
In our Docker Compose file, the aocc-blog service is built from a Dockerfile and replaces a local installation of the software. It's configured to use specific volumes and ports, and it has a health check and restart policy. This setup ensures that the service runs in a consistent and predictable way, regardless of where the Docker container is running.
Well, yes, that can be true.
Some developers however do so much coding in their primary language that a full traditional installation makes sense.
However, it is definitely recommended for uncommitted people, who are just trying out a certain version of a language or are doing a small thing in a non-primary language.
It comes down to personal preference.
(Your instructor has GO installed, but nothing else. She is very happy to not have a gazillion of python installs anymore)
During the following exercise, we'll build a static webpage using the hugo
framework, for which we need nodeJS
. Please take notes on some of the images characteristics, at minimal , write down the size of each image
git clone git@github.com:AustrianDataLAB/multi-stage-build.git cd multi-stage-build git checkout features/tarball make generate_certs make image1 up
Let's look at the Dockerfile, it is very straightforward and also very problematic.
FROM node
COPY . .
RUN npm install -g grunt-cli \
&& npm install \
&& grunt lunr-index
RUN apt update -y && apt install -y hugo
Do not run the image as root, but create a proper user and fix the permissions.
We will be switching to Chainguard in the last step (Fix 4), and for nginx (or anything that traditionally needs to open sockets or low ports) we need to check who to best switch away from root
https://edu.chainguard.dev/chainguard/chainguard-images/reference/nginx/
The official Docker image starts as the root user and forks to a less privileged user. By contrast, the Chainguard nginx Image starts up as a less privileged user and no forking is required. For most users this shouldn't make a difference, but note the "User Directive Warning" outlined previously.
FROM
node:19.4-alpine
Alpine is a common choice for slim baseimage if you need an actual distro. If you don't need a distro, we ll see in the last step, that we can also go distroless.
make down rmi make image2 up
Discuss
why we are no longer running ‘apt get upgrade && apt get update'
WORKDIR
/opt/blog-search
make down rmi make image3 up
Discuss
COPY
package.json
Gruntfile.js
LICENSE
README.md
Makefile
./
make down rmi make image4 up
Discuss
Distroless base images are Docker images that contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would find in a standard Linux distribution. Here are some reasons why they are beneficial:
In the below Dockerfile, we're using specific versions of Node.js and Hugo to build the application, and then copying the built application into a distroless NGINX image. This ensures that the final Docker image contains only what's necessary to run the application, reducing its size and attack surface.
FROM node:19.4-alpine AS indexer
WORKDIR /opt/blog-search
COPY package.json Gruntfile.js LICENSE README.md Makefile ./
COPY blog blog
COPY nginxconfig nginxconfig
COPY openssl openssl
RUN npm install -g grunt-cli \
&& npm install \
&& grunt lunr-index
FROM klakegg/hugo:0.101.0-busybox AS builder
WORKDIR /opt/blog-search
COPY --from=indexer /opt/blog-search/ .
WORKDIR /opt/blog-search/blog
RUN hugo
FROM cgr.dev/chainguard/nginx@latest
FROM cgr.dev/chainguard/nginx:latest
COPY --from=builder /opt/blog-search/blog/public/*.html /var/www/html/public/
COPY --from=builder /opt/blog-search/blog/public/*.css /var/www/html/public/
COPY --from=builder /opt/blog-search/blog/public/*.js /var/www/html/public/
EXPOSE 443/tcp
make down rmi make image5 up
docker image inspect image5
This is certainly a production grade build now, but you should also notice that caching doesn't work anymore, since we are discarding the intermediate layers now. This could be annoying if some steps are very slow.
If there are truly static pieces of your build, you can prebuild the builder images ;) to eg already contain the right npm or other libs/SDKs.
Notice, how we went from 2,55 GB to 26 MB !
They are the golden standard (if you can't use SCRATCH) and have become a lot simpler to use in the last years.
By definition however, they contain only the binary and immediate dependencies, i.e. you should not use them for early development esp if you want to still debug
At least at the time of writing, it is several times more effortful to debug distroless images vs something small like alpine.
This means, you should have a development build and a distinctly separate production build , and make sure the development artefacts never end up in production :)
As mentioned in Lecture2, containers are tars of tars using a unionFS.
Let's examine what exactly happens when we include secrets:
make down rmi make imagetar1
Now, at this time, if you haven't looked into the Makefile, you should, because the two Dockerfiles TAR1 and TAR2 are slightly different.
It is your task to build both of them with the Makefile
Inspect the images
Then complete the Makefile where it says #Exercise
What did you discover?
How to automate building of images
To make it more time efficient to build images and of course to let others build them, too. Automation is crucial.
We ll now look into a very simple github build-scan pipeline for an image
As you may remember from the Terraform exercise, also github allows you to define how and when pipelines trigger. Take a moment to experiment with how they work
Exercise: setup the first pipeline in your own repo and run an empty pipeline to understand how it triggers. Make sure, that you can trigger it.
name: Build and push lecture5 image
on:
push:
branches:
- main
- features/lecture5-2
- features/tarball
pull_request:
branches: [ main ]
# Allows you to run this workflow manually from the Actions tab
Workflow_dispatch:
jobs: build: name: Build runs-on: ubuntu-20.04 steps: - name: Checkout code uses: actions/checkout@v3heckout@v3
Cat multi-stage-build/.github/workflows/publish-image.yml
Cause Intel is not alone anymore, so we can add the buildx (still not stable) feature and specify the desired platform.
docker buildx build \
--tag ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
--platform linux/amd64,linux/arm/v7,linux/arm64 -f Dockerfile5 .
We can use various standard FOSS tools to check the quality of our Images and auto-upload the results into the GitHub Tab. Here is a very common vulnerability scanning example.
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.18.0
with:
image-ref: '${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}'
format: 'sarif'
output: 'trivy-results.sarif'
https://github.com/AustrianDataLAB/multi-stage-build/security/code-scanning?query=branch%3Afeatures%2Ftarball+
Check on the artefact tab, Make sure the image name is correct. The image name should be in the format ghcr.io/OWNER/IMAGE_NAME:TAG
https://github.com/AustrianDataLAB/multi-stage-build/pkgs/container/multi-stage-step2
In our blog webpage, we were already using Docker Compose, in order to load volumes, expose ports etc . Let's now take it further and go multi-image
Docker Compose is a tool for defining and managing multi-container Docker applications. It uses YAML files to configure application's services and performs the creation and start-up process of all the containers with a single command.
git clone git@github.com:AustrianDataLAB/pacman.git make docker-deskop
Now, on localhost:8080 you can play a very sophisticated game.
Or : read the Docker Compose file and understand how you can use .env files, create networks, volumes etc
The entire local + CI process will be setup between lectures 4 AND 5
Now that you have a minimal reference for how to use existing images, build your own or modify other people's , plus how to glue containers together via docker compose -> start iterating over your startup idea
Every decision must be logged and the suggested approaches are ADRs (https://adr.github.io/ ) .
by the end of the 5th lecture (that's the next one, since your instructor is quite aware that the exam is in 2 days):
buy vs build
for at least one componentIn general:
many
images, one
is required)your writeup of why which ideas failed -> goes into your final report -> keep your notes (or have a setup in your repo/github/whatever where you dont lose those notes)
Congratulations, you've successfully setup your GPG signing key, created a best practise image, executed a image build pipeline on github and tried out Docker-Compose.