Multi-stage docker files Vs Builder patterns - tipping the scales

When it comes to containerization and packaging, docker has proven itself capable of shortening the development time and ensuring consistency across multiple environments. In this way, cross-team communication and collaboration have been made more efficient, with decreased set-up time while management of issues that arise have ceased to be #ItIsYourMachine.


Consider, for one moment, that you were building a project that required shipping to a different platform from the one we were using. Better yet, we want it to act exactly the same way as it does locally.

Sidenote: I have had my fair share of deployments during which some library was perfectly fine in development only to discover ubuntu 22.0 or some other version did not support it anymore.

For this illustration, we shall take the program, marvinkweyu/minigrep, a command-line tool that lets you search via text files; a mimic of the popular UNIX grep command-line utility.

How would you containerize this?

What method would be the most efficient given the resources?

Let us understand this.

(Clone the minigrep command-line program to your local file system if you have not done so already)

A single dockerfile

Create a dockerfile at the root of the project directory that looks like below:

FROM rust:latest
WORKDIR /myawesomeapp
# copy the source file onto the /app directory
COPY . .
# compile into executable
RUN cargo build
# run the project
ENTRYPOINT ["./target/debug/minigrep", "to" , "poem.txt"]

Above, we tell minigrep to look for characters that match the consecutive to.

Finally, build your image as the first of its kind by running this in the same root directory:

docker image build -t minigrep:v1 .

What is the possible size of this build?

docker images minigrep
REPOSITORY                                TAG                   IMAGE ID       CREATED              SIZE
minigrep                                  v1                    c4c56d10c4b2   About a minute ago   1.41GB

You can play around with a container of the image to see what minigrep finds:

docker container run minigrep:v1

Observe: minigrep:v1

Using the builder pattern

The v2 to reproducible code

Earlier, we build our image using a single dockerfile. Along with the executable, it had non-essential build files. That is, files it did not need at run-time to actually work.

We could change this by creating two images instead. the first would contain all the necessary files we needed before the build and act as the image build. The second would contain only what was needed for the package to run.

The builder image would look similar to the normal build:

FROM rust:latest
WORKDIR /myawesomeapp
# copy the source file onto the /app directory
COPY . .
# compile into executable
RUN cargo build
# run the project
ENTRYPOINT ["./target/debug/minigrep", "to" , "poem.txt"]

In comes the runtime image:

FROM rust:latest
WORKDIR /myawesomeapp
# copy the source file onto the /app directory
COPY . .
# compile into executable
RUN cargo build
# run the project
ENTRYPOINT ["./target/debug/minigrep", "to" , "poem.txt"]

We minimize our image even further by using an alpine image, 5MB in size, rather than the 800MB default rust image.

Remember, we have already compiled our code and all that it requires.

To manage assets from one image to the other, we need to copy the compiled code onto the runtime image. A transfer of files between two images if you may.

To achieve this, we would navigate into a container of the build image, copy the compiled code onto the current working directory and let the runtime image dockerfile copy the required file(s) into itself as it creates the final image.

The command would look something like this:

# notice the path of the main built app. It is the same as where we copied our file in our working directory
docker container cp minigrepv:/myawesomeapp/main .

Once this is done, we would then delete the initial build image as it is no longer required, giving us that one single docker image for spawning containers.

To shorten this and the cycle of commands you would have to run, create a develop.sh file with the following contents: (This assumes you have bash on your system)

# !/bin/bash
docker image build -t minigrep-build-image -f Dockerfile.build .
# Create container from the build Docker image
docker container create --name minigrepv2-build-container minigrep-build-image

# Copy build items from build container to the local filesystem 
docker container cp minigrepv2-build-container:/myawesomeapp/target/debug/minigrep .
docker container cp minigrepv2-build-container:/myawesomeapp/poem.txt .
# Build the runtime Docker image 
docker image build -t minigrep-runtime-image . 
# Remove the build Docker container
docker container rm -f minigrepv2-build-container 

rm minigrep

Our directory would look as below:

src
target
.gitignore
poem.txt
Cargo.lock
Cargo.toml
README.md
Dockerfile.build
Dockerfile
develop.sh

Run the develop.sh file to build the images:

./develop.sh

This will build the two images, copy the compiled code from one container to the other and delete the build container.

Let’s see how our images differ: docker images ls

REPOSITORY       TAG                  IMAGE ID       CREATED              SIZE
minigrep-runtime-image                    latest                0ef6f1b87748   14 seconds ago       11.8MB
minigrep-build-image                      latest                66d4a2a395c3   16 seconds ago       1.41GB

Walah! A whopping difference of over 1000MB!


In our use of the builder pattern, we have managed to save up to 1GB of file storage in our final image size. The trade-off was that we had to create two docker files, build the first, copy the final file to the local directory, and only the n move this t the final image. Perhaps there is a shorter way.

To observe: minigrep: v2

Multi-stage dockerfiles

Much like the builder pattern, a multi-stage dockerfile will create intermediate images and have our final image of equivalent size. The pro to this methodology, however, is the use of a single Dockerfile. Behold, say goodbye to the bash script and the need to have files copied to the local filesystem before transferring to the final image.

# give the initial image a name
FROM rust:latest as build-image
# Set the working directory 
WORKDIR /myawesomeapp
# Copy source file from current directory to container 
COPY . .
# Build the application
RUN cargo build
#* build the final image 
# give this a name
FROM alpine:latest as runtime-image
# Set the working directory 
WORKDIR /myawesomeapp
# copy the compiled code from the initial image onto the runtime image
COPY --from=build-image /myawesomeapp/target/debug/minigrep .
COPY --from=build-image /myawesomeapp/poem.txt .
# Run the application 
ENTRYPOINT ["./minigrep", "to" , "poem.txt"]

Go ahead and inspect your image.

REPOSITORY                                TAG                   IMAGE ID       CREATED              SIZE
minigrep                                  v3                    2c4927420843   19 seconds ago       11.8MB

What a difference.

What does this look like in other projects? Can we build dependencies separately from the final image in interpreted languages?

Hint: TambuaShamba

Conclusion

We could create a simple dockerfile for smaller products and applications, or go ahead and use multi-stage docker files. The choice remains to the engineer that #builds. Along the way, what you intend to #create , your capacity, your development experience and what you are willing to take differently all sum up to your ideal. Iterate.