TheGreenCodes

Necessity versus Opportunity

Marvin Kweyu — Tue, 11 Apr 2023 04:58:39 GMT

Over the past few weeks, I have been exploring the world of entrepreneurship, attending various events where accelerators gather, joining breakfasts for entrepreneurial-minded individuals, and participating in debates on the subject. Throughout this journey, I have been focusing on funding opportunities, education, and mindset shifts that ultimately lead to the creation of successful businesses.

An opinion piece that caught my attention was one by Adam Molai entitled "Africa lacks the right kind of entrepreneurs," along with the paper "Jobs, Economic Growth and Capacity Development for Youth in Africa," both of which form the basis of my thoughts today.

Indulge

With the world at 8 billion persons as of 15th November 2022, Africa remains one of the poorest continents on the globe, despite the fact that it will account for 2 out of 5 working-age individuals by the end of the 21st century. The peculiar part of this statement is the quality and purposes with which we build our ventures within the African economic zones.

The crux? Was this business built out of a pure innovative spirit - to create what once was not - or was it a means to an end?

Returning to the population question, with a 245.0% increase in population within the working age group, we expect a rise in the number of graduates and professionals alike. All this while, the market is not primed to absorb this number of individuals. If anything, the tech lay-offs that have happened over the last couple of months have been a testament to what happens when projects fail and companies are bloated.

Was this business built out of a pure innovative spirit - to create what once was not - or was it a means to an end?

This raises the question, what does this mean for education?

Having been brought up in an African household and in relation to how we raise this generation of dreamers and builders alike, provided we exist in the space where entrepreneurship is treated as a side-hustle for those incapable of being absorbed into the job market, we cease to create foundations upon which the future can stand.

The next Bill Gates will not build an operating system. The next Larry Page or Sergey Brin wont make a search engine. And the next Mark Zuckerberg wont create a social network. If you are copying these guys, you arent learning from them. - Peter Thiel (Zero To One)

It is high time we create sustainable solutions that navigate from bringing in the everyday meal to those that impart beyond sustenance. To leave you something to nibble and ponder, here is an article from TechCabal about tech talent navigating its way back to Africa.

Thought:

Perhaps the danger of Artificial Intelligence is not in it taking away employment ( unless you champion for bullshit jobs) but that with the increased use of the same technology, AI relearns from itself and other old content rather than what is made a-new. The limit remains; Artificial Intelligence will only learn from what already exists.

Recommendation systems on the web

Marvin Kweyu — Tue, 28 Mar 2023 07:03:39 GMT

Introduction

I have a confession to make; I am a huge fan of sci-fi.

Whether it is another episode of The Minority Report, Upload or Travelers, I am all for it. Perhaps the underlying cause is my fascination with how these societies use technology in their everyday lives. Moreso, perhaps it is in the chance to witness something that is on the brink of discovery in our modern-day information era.

One of the common theme that appears across these films is the ability of machines to predict what we want.

An observer need only take one look at how, say the show, The Minority Report, uses its predictive algorithm Hawk-Eye, to predict the occurrence of crime and compare it to Essex Universitys KeyCrime, to see a closeness between fiction and reality.

What if we could predict what would happen before it does? What if a machine could tell what you would buy before you actually get to it?

What is a Recommendation System?

A recommendation system is an information filtering technique that predicts what a user might prefer based on their historical interactions or preferences.

As an example, and coming back from our high on KeyCrime, if a user frequently watches romantic comedies on a streaming service, a recommendation system may suggest similar movies or TV shows. These systems are based on machine learning algorithms that analyze user behaviour to generate other details that they might be interested in.

They have become commonplace in industries such as e-commerce, social media platforms, security and healthcare. Lets talk about this and the different aspects that come to play around recommendation engines.

Types of Recommendation Systems

For brevity, there are different types of recommendation systems, including content-based, collaborative-based, hybrid, and demographic-based recommendations.

Content-based Recommendation

Say, for instance, you are called Ian, and you have a thing for apple products. You log into Instagram and double-tap on two or three images around apple products or advertisements. What happens in the background?

Content-based recommendation systems make recommendations based on the similarity of items and are attached to the particular user. the more interaction you have with a system, the more accurate the data gets. In essence, a content-based recommendation engine needs a base item(s) around which a users actions(reviews, likes etc) can be pegged.

We see you liked an image that has wheat. Here are other wheat-based products you might relish.

Collaborative-based Recommendation

What if we have a whole community of foodies? Could they ping each other based on what they enjoyed preparing?

Collaborative-based recommendation systems generate recommendations based on the behaviour of similar users. If two users have similar preferences, the system will recommend items that one user has liked to the other.

The Netflix Prize was one such event where builders across the board were invited to a collaborative filtering algorithm challenge to improve upon its own algorithm, Cinematch. A grand prize of $1000 000 for any team that came through with the solution.

Broadly speaking, collaborative-based recommendation systems are grouped into memory and model-based. For purposes of this guide, we shall focus on memory-based collaborative filtering within which we can further break into Item and user-based collaborative filtering.

User-based collaborative filtering

At its core, user-based collaborative filtering is based on similar consumption patterns.

Example: Lets say Rita finally set up her restaurant. We pay a visit, and you give it a 5-star rating. Now, your friend, Peter, does the same. Already, we have two users with a similar rating on the same location. It is very likely that because Rita and Peter have a 5-star rating on a specific item, they will have other items in common. Thus, we look for people with a rating of between 5 and 3.5 for the restaurant and suggest other restaurants they like to you.

This envelope we just created in our example is called a threshold. So, a user with a rating of 1 on the said restaurant will not be a source of suggestions for our next restaurant hopping experience.

Item-based collaborative filtering

Example: Most users that buy bread also buy butter. therefore, these items must be similar. Or if user X likes films with Iron man then they will like those with Spiderman. You get the gist.

Introduced by Amazon in 1998, item-based filtering bases recommendations on similarities between items. This decision will affect our pre-sales option during checkout. The suggestion is to get this customer to buy one more item, that is, spend more. This is the reason why everything in retail stores is where it is. Nothing by chance, nothing random.

Are there challenges to collaborative filtering? Well, certainly. Depending on which of the aforementioned you use you might encounter the following:

The early rater problem

So, you successfully have a user log in to your platform with 20 000 books. They rate 1 / 20 000. We literally have nothing to recommend.

Sparsity

You have a number of users all of whom have rated a significant count of products. However, these items are too far spread apart. Say, for instance, back to books, only two have rated the same books. All the others have rated items with no duplication.

A gray sheep

You know yourself. You like to remain unpredictable, you open browsers in incognito all the time, like random things every time and leave the platforms you visit. Just like Elle(laughs like Mojo Jojo).

The shilling attack

This happens when a single user or group of people create multiple accounts on the same platform and rate , say a product in a certain way in order to sway other users interest to the items they are rating highly. It can also happen to sway users away from a product by giving it bad ratings.

Hybrid Recommendation

Certainly, we must have come up with a workaround content and collaborative-based recommendation engines. In comes Hybrid recommendations.

Hybrid engines combine both content-based and collaborative-based approaches to generate recommendations.

Demographic-based Recommendation

Do you remember the piece, Up and Away with Scalability - specifically on caching and content delivery servers?

Demographic-based recommendation systems make recommendations based on demographic information such as age, gender, or location.

For one, we could have a location that is known for a type of music. Thus, instead of having our data centre closest to them caching everything and all other genres, why not cache what people actually listen to?

Pro tip: Bongo, a music genre, is popular in the coastal regions of Kenya and Tanzania. My music service would therefore hold a cache for this genre on servers closest to these regions as opposed to , say Alaska.

Demographic-based recommendation systems are often used for marketing purposes and can be particularly useful for identifying and targeting specific groups of users.

Applications of Recommendation Systems

A favourite example I like to reference when it comes to recommendation engines is one highlighted in the book, The Power of Habit, by Charles Duhigg, of the American retail store, Target.

Targets data analytics team got so good at building these systems, it got into the public limelight when it suggested diapers in one of its marketing campaigns to a father who was sure his daughter was not. It recommended products users might want to purchase based on their patterns and on the data of other people they collected using their gift cards.

Other examples you might find interesting include how, for instance, LinkedIn uses the You may also know or You may also like for companies or profiles to follow.

Thought: Did you know , Netflix has gone through phases in its Personalization using recommendation systems?

What does this mean for the products in the build process?

For a preview, you can take a sneak peek into marvinkweyu/marastore - an e-commerce platform for travellers and backpackers. It includes within it, an item-based collaborative filtering engine. We dare answer;

Does user A need product Y given user B bought X and Y?

Back to you, the reader: How would you rank the aforementioned results?

Another example of a similar implementation is the library project, "Urbanlibrary" - the Afrocentric literature bookshelf. Within it, I implemented a mix-and-match if you may of recommendation algorithms. Is it based on history? Is it based on rating? Youll never know unless you find out.

Conclusion

Recommendation systems have evolved to suit different needs and algorithms have been built on top of each other to meet this demand. You can combine, break away from, build your own or adopt an existing solution. Either way, it offers a path to understanding your customers and business. Take advantage. Personalize this experience.

Indulge: Perhaps all you need to be a mind reader is your browser history.

Docker image optimization: Tips and Tricks for Faster Builds and Smaller Sizes

Marvin Kweyu — Tue, 07 Mar 2023 09:00:08 GMT

I have been making a couple of changes in the build and deployment process of The Urbanlibrary. Notably, I opted to switch to docker altogether because going to the /etc directory to change some configurations was getting tedious.

I needed a central place for all this. I needed a localized address where I could get all my code and the requirements to, well, just work.

Scenario:

I deploy projects A and B to the same server. We might have two domain names, or each might use a different subdomain. To be sure, or if I wanted to change them, I would have to modify the nginx file(s) on the server.

Would it not be easier to look at these locally such that I deployed with confidence each time?

Come Ye Docker

Consequently, I packaged my projects into their respective images and deployed them. All the while, I was keen on performance and memory consumption.

Questions:

How long does it take to build?

How big are the respective images?

What does CPU usage look like?

Are they truly that much safer than bare metal?

In this journey of discovery, I built upon the following recommended practices of containerization - modifying each to suit what it needed to accomplish.

Minimizing the number of layers

At its core, docker sits on the premise of layers. Case in point; creating a software engineering project will require an operating system, the core components that will let your programming library/language work, the programming language itself and the files that you write.

Similarly, when you go the docker way, you create a system where you are in control of these layers. Do you need library X or would Y work better? What about cron tasks? Does my system require cron or curl to be installed somewhere within?

Across the commands that we have in docker files, the following modify the image size:

COPY

RUN

ADD

What is common amongst the above is that they involve the movement of files either from the internet (RUN or ADD) or within your directory(COPY).

Example:

Take the following docker file.

# many layersFROM python:3RUN python -m pip install --upgrade pipRUN python -m pip install --upgrade setuptoolsRUN pip install -r requirements.txt

To optimize this, we can merge multiple RUN commands and or use ADD instead

Image 2:

FROM python:3# notice the merge of RUN commands TO update the system and install any packages neededRUN python -m pip install --upgrade pip \\&& python -m pip install --upgrade setuptools \\&& pip install -r requirements.txt

You notice that every time we run either of the commands highlighted above, the image size changes. The same does not apply to other docker file commands (CMD, ENTRYPOINT)as they are limited to creating intermediate layers (0 bytes).

Choosing the right image

When it comes to choosing a base image for your containers, it is advised to use an image that matches your environment as closely as possible.

Example:

If deploying a java-based application, you would use the JDK image upfront rather than using an ubuntu-based image and installing the JDK on top of it. In this case, your Dockerfile would look as below:

FROM ubuntu:latest vs FROM openjdk:latest

This way, you free the image from having to install some apt dependency W all because it was needed by some background task that is recommended.

Using the exact image tag rather than choosing the latest tag for the image base

Correct. The previous Docker image file is wildly flawed.

Example:

Consider for a moment, that we are running a Django 1.17 project. However, our Dockerfile insists that it wants to build based on the latest version of python out there, so we add the following image.

FROM python:latest

What are the project dependencies that were deprecated? What module was moved or merged? What package had to change because the libraries of the latest python version no longer behave the same?

Instead, you could run, your product with a specified version:

FROM rustc:1.64.0

(The above snippet assumes you are a rustacean, but the same principle applies to any code base you might have).

Using a minimal-sized base image

Getting back to layers, an image can be built based on another. To this end, your stack of choice, say python3 might come as is, that is python 3 or be based on the Debian or alpine images. These image versions can appear as below:

python:3.9.16

python:3.9.16-slim

python:3.9.16-alpine

python: 3.9.6-slim-bullseye

To identify the base of your images, you observe the tags that follow. In the above case, images tagged with the keywords alpine are based on alpine-linux while those with the tags Jessie, stretch, buster and bullseye rely on Debian 8, 9, 10 and 11 respectively. The tag, slim*, is the trimmed version of the default image, that is, python:3.9.16 and may be included in either tag, Debian or just python default tags.

Note: The names jessie, stretch, buster and bullseye are not random but rather the actual names of the Debian Linux versions released at that time. Thus, if we had a new Debian linux version named, strange-happenings, we would consequently get a python image tagged: python3.9.16-strange-happenings.

Of the mentioned, the safest bet is on using the default image. In this case, python:3.9.16. We use alpine when we want to have the bare minimum setup and are willing to install only the required packages manually.

Remember, however, that alpine-based images will have challenges in terms of libc dependencies ( musl-based vs not glibc). For reference, these are common C libraries that you might find in data science packages like pandas, scipy and so forth.

Lets take a look at how these images might compare by pulling and doing an image size comparison.

From your terminal, pull the images, replacing the tags where needed.

docker pull python:3.9.16docker pull python:3.9.16-slimdocker pull python:3.9.16-alpine# add --quiet flag to keep the logs out of your screen or at a minimaldocker pull --quiet python: 3.9.6-slim-bullseye

To compare their sizes:

docker images

A great thread that you might be interested in is this: Super small images based on alpine linux. To cap a comment that stood out for me:

As another example, a co-worker recently was working with some (out-of-tree) gstreamer plugins, and the most convenient way to do so was with a docker image in which all the major gstreamer dependencies, the latest version of gstreamer, and the out-of-tree plugins were built from source. The offered image was over 10GB and 30 layers, took quite a while to download, and a surprising number of seconds to run. With just a few tweaks it was reduced to 1.1GB and a handful of layers which runs in less than a second. It was just a total lack of care for efficiency that made it 10x less efficient in every way, enough to actually reduce developer productivity ploxiln on Dec 23, 2015

Remember; choose the right image based on your needs.

Use multi-stage docker files

The concept of multi-stage docker files entails breaking your project build process into more than one. In this case, if building a project with say golang, the build files and the dependencies needed for them would be on one image while the compiled file would be on another, ready to be run.

Likewise, if building a python project, the installation of the dependencies would be on one image while the actual production-ready application would be on another. The result would look something like this.

You can read more about this in the piece: Multi-stage docker files vs builder patterns

Thought Digest: Did you know that when a Docker image is first built or used, Docker retains components that haven't changed, so it doesn't have to rebuild everything from scratch again? Thats called caching.

Only install the required dependencies

Alluding to installing the image that closely resembles the stack you are using, having only the required dependencies is key. For instance, using an alpine-based image and installing git, node js, and java jdk to a program that needs none of them is baggage brought forward.

Have a .dockerignore file

Your gitignore file knows to keep node_modules out of version control. Your docker image, however, does not. We keep any items that the image might add during installation later on or items that are not needed for the docker image to function.

Running your docker container commands as a non-root user

Every time we create a Dockerfile and build its image, the base-image file places you in the place of the root user. That is, you know what you are doing. This leaves you vulnerable to attackers(users/processes) that can gain access to your host system. They can, consequently access the project files, copy or add more scripts and so forth.

In the same way, you are not expected to use a VPS as root nor do I expect to see the almighty # on your terminal(an indication that you are navigating as root).

To avert this, you create a non-root user in your docker image by adding a USER directive to your Dockerfile.

For example, to create a non-root user called appuser101:

FROM python:3.9.16# Create userRUN useradd -m appuser101# Set user as defaultUSER appuser101

To run commands in the container as this new user, you can use the docker exec command as below:

docker exec -u appuser101 -it  <command>

Where we stand

So far, I have significantly reduced the sizes of my images and their build times as well as improved their maintainability. My billing has also changed. I get the room to have one or more projects for the price of one.

Is there more? I think so. I am pushing this to see how it goes but I will come back with more along this trail.

A word to the wise:

Caching is your friend. Caching is here to stay.

Multi-stage docker files Vs Builder patterns - tipping the scales

Marvin Kweyu — Tue, 14 Feb 2023 05:24:28 GMT

When it comes to containerization and packaging, docker has proven itself capable of shortening the development time and ensuring consistency across multiple environments. In this way, cross-team communication and collaboration have been made more efficient, with decreased set-up time while management of issues that arise have ceased to be #ItIsYourMachine.

Consider, for one moment, that you were building a project that required shipping to a different platform from the one we were using. Better yet, we want it to act exactly the same way as it does locally.

Sidenote: I have had my fair share of deployments during which some library was perfectly fine in development only to discover ubuntu 22.0 or some other version did not support it anymore.

For this illustration, we shall take the program, marvinkweyu/minigrep, a command-line tool that lets you search via text files; a mimic of the popular UNIX grep command-line utility.

How would you containerize this?

What method would be the most efficient given the resources?

Let us understand this.

(Clone the minigrep command-line program to your local file system if you have not done so already)

A single dockerfile

Create a dockerfile at the root of the project directory that looks like below:

FROM rust:latestWORKDIR /myawesomeapp# copy the source file onto the /app directoryCOPY . .# compile into executableRUN cargo build# run the projectENTRYPOINT ["./target/debug/minigrep", "to" , "poem.txt"]

Above, we tell minigrep to look for characters that match the consecutive to.

Finally, build your image as the first of its kind by running this in the same root directory:

docker image build -t minigrep:v1 .

What is the possible size of this build?

docker images minigrep

REPOSITORY                                TAG                   IMAGE ID       CREATED              SIZEminigrep                                  v1                    c4c56d10c4b2   About a minute ago   1.41GB

You can play around with a container of the image to see what minigrep finds:

docker container run minigrep:v1

Observe: minigrep:v1

Using the builder pattern

The v2 to reproducible code

Earlier, we build our image using a single dockerfile. Along with the executable, it had non-essential build files. That is, files it did not need at run-time to actually work.

We could change this by creating two images instead. the first would contain all the necessary files we needed before the build and act as the image build. The second would contain only what was needed for the package to run.

The builder image would look similar to the normal build:

FROM rust:latestWORKDIR /myawesomeapp# copy the source file onto the /app directoryCOPY . .# compile into executableRUN cargo build# run the projectENTRYPOINT ["./target/debug/minigrep", "to" , "poem.txt"]

In comes the runtime image:

FROM rust:latestWORKDIR /myawesomeapp# copy the source file onto the /app directoryCOPY . .# compile into executableRUN cargo build# run the projectENTRYPOINT ["./target/debug/minigrep", "to" , "poem.txt"]

We minimize our image even further by using an alpine image, 5MB in size, rather than the 800MB default rust image.

Remember, we have already compiled our code and all that it requires.

To manage assets from one image to the other, we need to copy the compiled code onto the runtime image. A transfer of files between two images if you may.

To achieve this, we would navigate into a container of the build image, copy the compiled code onto the current working directory and let the runtime image dockerfile copy the required file(s) into itself as it creates the final image.

The command would look something like this:

# notice the path of the main built app. It is the same as where we copied our file in our working directorydocker container cp minigrepv:/myawesomeapp/main .

Once this is done, we would then delete the initial build image as it is no longer required, giving us that one single docker image for spawning containers.

To shorten this and the cycle of commands you would have to run, create a develop.sh file with the following contents: (This assumes you have bash on your system)

# !/bin/bashdocker image build -t minigrep-build-image -f Dockerfile.build .# Create container from the build Docker imagedocker container create --name minigrepv2-build-container minigrep-build-image# Copy build items from build container to the local filesystem docker container cp minigrepv2-build-container:/myawesomeapp/target/debug/minigrep .docker container cp minigrepv2-build-container:/myawesomeapp/poem.txt .# Build the runtime Docker image docker image build -t minigrep-runtime-image . # Remove the build Docker containerdocker container rm -f minigrepv2-build-container rm minigrep

Our directory would look as below:

srctarget.gitignorepoem.txtCargo.lockCargo.tomlREADME.mdDockerfile.buildDockerfiledevelop.sh

Run the develop.sh file to build the images:

./develop.sh

This will build the two images, copy the compiled code from one container to the other and delete the build container.

Lets see how our images differ: docker images ls

REPOSITORY       TAG                  IMAGE ID       CREATED              SIZEminigrep-runtime-image                    latest                0ef6f1b87748   14 seconds ago       11.8MBminigrep-build-image                      latest                66d4a2a395c3   16 seconds ago       1.41GB

Walah! A whopping difference of over 1000MB!

In our use of the builder pattern, we have managed to save up to 1GB of file storage in our final image size. The trade-off was that we had to create two docker files, build the first, copy the final file to the local directory, and only the n move this t the final image. Perhaps there is a shorter way.

To observe: minigrep: v2

Multi-stage dockerfiles

Much like the builder pattern, a multi-stage dockerfile will create intermediate images and have our final image of equivalent size. The pro to this methodology, however, is the use of a single Dockerfile. Behold, say goodbye to the bash script and the need to have files copied to the local filesystem before transferring to the final image.

# give the initial image a nameFROM rust:latest as build-image# Set the working directory WORKDIR /myawesomeapp# Copy source file from current directory to container COPY . .# Build the applicationRUN cargo build#* build the final image # give this a nameFROM alpine:latest as runtime-image# Set the working directory WORKDIR /myawesomeapp# copy the compiled code from the initial image onto the runtime imageCOPY --from=build-image /myawesomeapp/target/debug/minigrep .COPY --from=build-image /myawesomeapp/poem.txt .# Run the application ENTRYPOINT ["./minigrep", "to" , "poem.txt"]

Go ahead and inspect your image.

REPOSITORY                                TAG                   IMAGE ID       CREATED              SIZEminigrep                                  v3                    2c4927420843   19 seconds ago       11.8MB

What a difference.

What does this look like in other projects? Can we build dependencies separately from the final image in interpreted languages?

Hint: TambuaShamba

Conclusion

We could create a simple dockerfile for smaller products and applications, or go ahead and use multi-stage docker files. The choice remains to the engineer that #builds. Along the way, what you intend to #create , your capacity, your development experience and what you are willing to take differently all sum up to your ideal. Iterate.

Tambua Shamba - Highlighting soil organic carbon content across Kenyan farms

Marvin Kweyu — Wed, 08 Feb 2023 07:40:43 GMT

Introduction

The Covid-19 epidemic revealed the large gaps in our social systems, infrastructure, and outlook on life. We lost jobs, we lost loved ones, and we had to prioritize basic needs over impulse purchases; understanding that that extra broccoli is not so bad after all.

Soil organic carbon is the measure of carbon stored in soil organic matter. It plays an essential role in determining the quality of soil and the productivity of a farm. Poor soil organic content can lead to soil erosion, reduced crop yields, and decreased water infiltration. Therefore, it is important for farmers to track the soil organic content of their farms and identify areas for improvement.

What if there was a way to map this data out? What if you could get a glimpse of which farms were performing better based on their soil organic carbon content?

Tambua Shamba

I developed Tambua Shamba, an analytical dashboard view of farms based on their soil organic content. This dashboard will enable users to view the best and worst farms based on their content measures, as well as to identify trends in soil measurements over time.

The Build

I had to decide which tools to use to build this project. I could have used Phoenix LiveView, an elixir framework whose concurrency would have been beneficial if the project had multiple users at once. On the other hand, I could have gone with a REST API approach and opened the product up to different clients around the globe. I chose the former.

To maintain this project, I needed to answer a few questions:

1. How would farms be added?

2. Who would this product speak to?

3. Would it be as easy for a farmer to understand as it would be for a soil analyst?

4. How would anyone across the project trace their steps?

5. How would the data be rendered?

From a system design perspective, I wanted to record the actual files of the sources of data. I also needed a mechanism to track which farms belonged to which instance of a data update, as well as which farms existed in that specific update. Additionally, I needed to know who did the update and when it was done. All this would eventually need to be rendered on a map, plotted as a multi-polygon.

Its never what it seemsThe Challenges

The main challenge I encountered was ensuring data integrity. We could not have random file formats uploaded. For example, a video file recording should not be treated as a source of truth for a quantifiable entity. Additionally, if there are specific identifiers the farms require, then all farms must meet this spec.

Moreover, since data can be collected in multiple formats, we need to account for when a scientist uses polygons to demarcate farms on one occasion and a multipolygon on the next.

To render this information, we need to consider what happens when we have two farms to record data on. Would we get the same result if we queried for the best and worst farms? What would happen if we had two files with the exact same farms? What if we forgot to add a field on one of them?

Hitting the nail on the wallNavigating product edge-cases

To address these challenges, I overwrote the ModelViewSet create method to read and check the file before saving. I let the API accept CSV files as a data source as they are much easier to work with when inputting data into a database table. To ensure I did not get a duplicate list of best and worst farms, I filtered through the already returned list of best farms; Warning: By no means is this scalable.

I also color-coded the farms to show what kind of farms we had and added popups to give more detail about specific farms. Conventionally, the project is split into modules and lazy loaded to reduce the bundle size and increase its speed.

The Horizon

For the next release, I plan to add features such as asynchronous file management and a standard user management system to the dashboard. Additionally, I will look into providing more user-friendly features such as searching for farms by name and pinpointing their location on the map and updating information on farms.

This should let users have a comprehensive understanding of the soil organic content in their farms, as well as allow them to compare their farms to others in the same region.

Indulge: Can we include a data analytics engine that will enable users to track trends in their soil measurements over time?

For reference, the project can be found on marvinkweyu/tambua-shamba

Conclusion

As I continue to build Tambua Shamba and explore the relationship between soil and food security, I invite you to join in the conversation. #build

Originally published on marvinkweyu/projects/tambua-shamba

Semantic Versioning vs Git commit hashing

Marvin Kweyu — Mon, 30 Jan 2023 05:05:39 GMT

As with all great software, we build our products and releases in the spirit of continuous delivery. It is akin to creating a picture memory of where your product was at that point in time - the SDLC snapshot.

In the packaging of one of my projects, I came across a scenario that caught me off guard: versioning. Specifically, my challenge came with the choice of what versioning system I would use or how I would combine multiple versioning techniques if the need arose.

Working with multiple teams and in different demographics, packaging has always been different. For one, we have the SemVer evangelists, the CalVer antagonists(e.g Ubuntu packaging), the build version engineers and my just discovered git commit hashing brooders. Lo! How The choices go on.

I want to share with you the two that have piqued my interest so far and what options I use to curate experiences for the web.

A Case for Semantic Versioning

Semantic Versioning or simply SemVer, is a system of packaging software in which major, minor and bug fixes or enhancements are noted.

MAJOR**.MINOR.**FIXES

Examples:

Django 4.1.5

ColorDetect v1.6.0

fancy-git v7.1.9

Here, we would bump each version according to what purpose it serves. With this in mind, one of the major advantages that comes with SemVer is human readability. Engineers and users alike can clearly tell what came after the other. User A, will see that your product, Legendary Lemons has a version, v3.0.0 while they have v1.2.0. Subsequently, they can tell that what is on your site is of a higher value if we were counting from 1 to 10. The same goes for your team members, albeit with a little more technicality.

Git commit Hashing strikes back

When it comes to using git commit hashing for software versioning, Fred Simons 7 Deadly sins of Versioning comes to mind.

Scenario

What if I needed to consider a component that had no impact on either functionality, performance or quality? Would I release v1.4.5 due to a spelling mistake in the README? Clearly, this is a developer-centred update in which a config file on the main branch needs to be updated for the project to run on the local environment. Should this be transferred to the user? What is the extent of items that should be? Ahh... the conundrum. Release management can be messy.

However, the challenge comes with interpretation. Surely 5517829933722 should mean something more than an array of digits. Is it higher than the others? Is it lower than the package I have?

Release management can be messy.

The pill

I packaged The Urbanlibrary as a docker container. We have dev-centered versioning and what is made known to the public. A mix and match if you may. Taking what works, modifying what I needed and sharing the much-needed value.

Example:

Consider the Windows operating system packaging. We have Windows 11, Windows 10 and so forth. In between these, we constantly have updates for the software. Patch upon patch each modifying the previous build.

I might as well merge these two schemas. Ultimately, what matters is that:

The product has clear documentation
I can see what happens across multiple builds
Automation sits at the center of deployment and changelogs happen on the fly as much as they can (I am not going to do an SSH every single time ).

Conclusion

What I have come to understand is that just because something is well-known, does not always mean it is the right way to do it. There is a plethora of choices on the table and I welcome thoughts and conversations on the same. I am keen to discover what your team or you personally use or would use to version your project. Reach out, touch base and lets #build.

PS: For some anecdote, I stumbled on Julies video on how she chose to version her projects in general.

Image credits:

People illustrations by Storyset
Monkeyuser

The Value Chain Factory hackathon

Marvin Kweyu — Fri, 18 Nov 2022 04:27:42 GMT

Built by Africans and for Africans

Its been a week since we had our first hackathon here at Value Chain Factory. A week since we had a buzz of activity with young engineers teeming with ideas on how to create the next solution that would address Africas most daunting challenges.

Over the weekend, my team had the pleasure of hosting one of the most exciting hackathons we have had this quarter. Our aim was to call upon innovators in the technical space to share and build upon their ideas on agriculture, health and transport; matters that are paramount in a mineral and nutrient-rich continent.

Of the possible startups brought forth, I thought I would mention a few notable ones. That is, that in conjunction with my team and the participants, would provide more value and be impactful in their own right.

Lifeline medicare

To assist emergency responders, team #lifelinemedicare created a solution that stores key details of their patients. Succinctly, the determining whether a particular treatment would be viable for an incapacitated patient.

SmartBins

With an ever-growing population, currently, at 8 billion, our primary approach to waste disposal and management has been to bury it. Practically equivalent to burying our heads in the sand and letting the tides of time do its thing. To address this, the smartbins startup would provide waste management services. Taking up this role from collection, recycling and selling of compost manure back to the local farmer.

The Farmer-market bridge

To complete the cycle, the market bridge team would open up the marketplace to expose farmers to their immediate sellers. In a way, making the farmer aware of their produces movement through to the consumer.

What stood out was how these builders were willing to take a leap in validating their enterprises. An emerging trend amongst this lot of African engineers, was their embrace of new technology. From the use of NFC tags in medicine, the leverage of blockchain technology for transparent transactions to the use of geo-sensors in garbage collection. They took a step beyond ideation to provide the next possible solution.

Remember:

More important than starting a particular startup, is getting to meet a number of potential cofounders. From there, instead of working on what you intend to make and then finding the audience, work in the reverse by thinking of the public. Ideate.

You should only start a startup if you feel compelled by a certain problem and you think starting a startup is the only way to solve it. The passion should come first, and the startup comes second. - Sam Altman

Thus, to you the reader: If you knew success was a certainty, what would you do?

PS: Yes. My team came second. Because winning is what we do. Hi Wachira.

TechStartup weekend: The place of technology in Africa

Marvin Kweyu — Sat, 12 Nov 2022 17:34:45 GMT

I had the opportunity to participate in The #TechStartUpWeekend - a 3-day event aimed at bringing innovators from all fields and ages to participate in a challenge that would build solutions for Africa.

During this time, I got to interact with the best in their fields; lawyers, psychologists, mechanical engineers , security analysts, fellow startup founders and so forth. Each of us was brought together to spark sustainable solutions that would impact our communities in one way or another.

Truth be told, it was a nerve-wracking 54-hour brain-storming session.

Suffice it to say, from this, we brought to life a number of products:

ABCs of mental Health (A place to get mental to the corporate workforce)
Knock knock(Accessible emergency services to those with hearing disabilities.)
Iko Network (A solution that provides easy registration to visitors within a premises)
Vamva (A #fintech solution solving the barrier-to-growth problems faced by ride-hailing drivers and other mobility )
Caes International (An organisation that gives the local boda-boda rider rechargeable batteries.)
Instruct Kenya - An easy-to-access legal advisory platform
ShopOkoa - money management services to university students

One point that resonated with me, however, was the fact that technology should not be the #1 go-to to solving the that Africa faces.

Technology should not be the #1 go-to to solving the challenges that Africa faces.

Rather than shoving technology down the throat of our future product users, our question should bring us back to one single question:

How will this product affect the local citizen(mama mboga)?

Along with the network built and the products created, I got #first-hand information as to the success points and pitfalls of pushing a successful product through the African market. How do you identify the different types of users, and tell whether that specific cluster needs what you have to offer?

Do the people want this? Talk with the people. Who are your users? How are they different? What are their needs?

The best feedback you're going to get on your product is in the 3 seconds after you tell them the price.

Ultimately, the baseline sits with where you intend to take your product and what solution it intends to tackle. Removing the technical jargon and complexity, we ask:

'Is this the most suitable way to solve this problem?'

Originally published at: https://www.marvinkweyu.net/indulge/the_place_of_technology_in_africa

Binary vs Interpolation Search

Marvin Kweyu — Tue, 11 Oct 2022 03:55:44 GMT

This guide acts as a follow-up to the talk Binary vs Interpolation Search.

Time complexity and search algorithms. A walk through their definition, purpose, use cases and constraints. A search algorithm is a technique used to locate an item in a certain data structure.

Introduction

A search algorithm is a technique used to locate an item in a certain data structure.

Before we begin, you should have an understanding of the below:

Arrays
Iteration and recursion

Let's give an overview of time complexity and what it entails.

Time Complexity

A way to show how the runtime of a function increases.

Broadly speaking, the time complexity of a program is grouped into three:

Linear time
Constant time
Quadratic time

Linear Time O(n)

The processing time of a program will increase as the size of the input increases.

ExampleWe can create a program that processes a list of items stored in an array. A case example can be a program to process locally stored files.

Constant Time O(1)

Here, no matter what input we parse to our program, we get the results in the same time frame. Hence, a process taking 10_000_000 items returns in the same time as the one taking in 7 items.

Quadratic time O(n2)

Where n2 is n squared.As the name suggests, a program will return results in a time period that resembles a quadratic curve. This time is common with multi-dimensional arrays/lists of lists, since:

We loop over the list(say A) to get individual lists
We loop over a specific list(say B) in this list to get the item and perform an operation on it.

For reference, the equation:

a + bx + c

Example: A program to calculate the product of integers in an array.

int array_products(int[] a){    int product = 0; // O(1)    for(int number: a){       product = product * number; // O(1)   }return product; // O(1)}

We can get an overview of how long a function will take, based on how large of an array we have in this particular function/ method.

In our case, since we can get the time complexity of this function as below:

O(1) + n * O(1) + O(1)

Since we are adding a value to the summation x number of times, we have O(1) happening for each of those occurrences. Thus n O(1)*.

Since a constant + another constant is still a constant.

O(1) + n * O(1)

Get the fastest-growing term from the equation, i.e:

n * O(1)

Then remove the coefficient to get O(n) (linear time i.e, the function takes longer to complete the larger the array)

Binary Search

Given a sorted array, binary search locates an item in question, using the 'divide and conquor'

Time complexity: O(log n)

Binary Search pseudocode

Get the first and last item of the sorted array.
Add the first and last item indexes and divide by 2
Get the element in the middle from the step above i.e middle
If the value at the middle index is equal to the searched item, return the index
If the value at the index is greater than the searched item, set the middle item to be the new high and discard the right
If the value at the index is less than searched item, make the item the new low and discard the left side
Repeat the process until the index is obtained
Return -1 if the item is absent

Visual representation

Implementation of binary search

int search( int[] a, target){    int arrayLength = a.length;    int left = 0;    int right = a.length;    while(left <= right){        int middle = (left + right) / 2;       if(a[middle] == target){           return middle;        }       if(target < a[middle]){             right = middle - 1;        }else{         left = middle + 1;    }  return -1;}

Applications of Binary Search

Database indexing

Within your database of choice, a binary tree will be used where it will precompute the middle point on each step, until reaching the single item. More details of database indexing can be found here

3D games and applications.

Binary space partitioning occurs as space is divided into a tree structure and a binary search is used to retrieve which subdivisions to display according to a 3D position and camera.

Git debugging with git bisect
Autocomplete search

Consider a user typing in a URL on a browser. Given a history of searches, instead of going through each one and comparing, a binary search is used to perform a faster lookup across the history.

Searching for prefixes

Instead of an array of integers, you can have an array of strings where you intend to search for an item that starts with a certain string or character sequence.

Going through a dictionary to find a word
Git cherry picking

A clever example of where this is used is by Sarv Shakti, who used it to identify what commit broke the application

Example: Consider a scenario where you have a list of maps storing student records in your classroom. Given a name, or phone number, which algorithm would you use to search through this? What if the records got to 10_000 or more? How would this grow?

Interpolation Search

This is an improvement over binary search.Gets an element from where it is more likely to exist.

Time complexity: O(log log N)

Constraints of interpolation search

The array should be sorted
The array should be uniformly distributed.

Note:Uniform distribution in an array implies that the difference between subsequent elements should be relatively equal. In this case, an array [10, 30, 300000, 23_000_000] would fail to meet this condition.

Example: Searching for the word 'zelda' in a dictionary of words from letters 'a' to 'z'.

Instead of starting from the middle item in the array, we go closer to the end of the array.

Time complexity and its relation to interpolation search

A walk through the equations

y = m*x + c

arr[index] = m*index + c

where:index: the position of the element in the arrayarr[index]: the value at this index in the array

arr[low] = m*low + c (1)

arr[high] = m*high + c (2)

Subtract equation (1) from (2), and we get

arr[high] arr[low] = m * (high low)

m = (arr[high] arr[low]) / (high low) (3)

Say we are searching for an element, K in this array.

arr[index] = m*index + c

Then replacing target in the above equation, we get

target = m*index + c (4)

target arr[low] = m * (index low)

index low = (target arr[low]) / m

index = low + (target arr[low]) / m

index = low + (target - arr[low])*(high - low) / (arr[high] - arr[low])

Example

Array to search through:

[32, 35, 37, 39, 42, 44, 46, 48]

We will search for target = 42 in the above array.

The size of the array is 8.

low = 0,

high = 7,

arr[low] = 32,

arr[high] = 48

index = 0 + (42 - 32) * (7 - 0) / (48 - 22) = 2.6923 ~ 2arr[2] = 37 < 42

Remember to floor the value of the index

Interpolation search pseudocode

Calculate the Index using the interpolation index formula.
If the target is smaller than the item at that index, search in the lower sub-list. Calculate the index for the left sub-array. Let low = index - 1
If the target is greater than the item at that index, search in higher sub-list. Calculate the position of the right sub-array by assigning low = index + 1.
If the target is a match, return the index of the item from the array and exit.
If it is not a match, probe position.
Repeat until the match or array is eventually 0

Implementation of Interpolation search

int search(int[] array, int target) {    int n = array.length;    int low = 0;    int high = n - 1;    while (low <= high) {        int index = low + ((target - array[low]) * (high - low)) / (array[high] - array[low]);        if (array[index] < target) {            low = index + 1;        } else if (array[index] > target) {            high = index - 1;        } else {            return index;        }    }    return -1;}

Applications of interpolation search

Interpolation search can be used in the same areas where binary search is used. The only caveat is that this data must be uniformly distributed. A failure to meet this condition will give you a time complexity of O(n).

Benchmark

Comparatively, given uniformly distributed data, this difference might occur as below:

Resources and questions

These are locations to practice your data structures along with links to various challenges that may hold the above mentioned concepts or pivot them accordingly.

Other types of search algorithms

Linear search / sequential search
Breadth-first search
Depth-first search
Jump search
Exponential search
Fibonacci Search

Conclusion

There are more search algorithms out there. This is by no means a comprehensive list, and by no means are you expected to know all of them. A single algorithm can be used as the basis for other searches. With that said, there is no silver bullet. You choose what works based on the constraints at hand.

References:

Mohammed, A. S., Amrahov, . E., & elebi, F. V. (2021). Interpolated binary search: An efficient hybrid search algorithm on ordered datasets. Engineering Science and Technology, an International Journal, 24(5), 10721079. https://doi.org/10.1016/j.jestch.2021.02.009

Manolopoulos, Y., Theodoridis, Y., & Tsotras, V. (2012). Advanced Database Indexing. In Google Books. Springer Science & Business Media. https://books.google.com/books?hl=en&lr=&id=pD3tBwAAQBAJ&oi=fnd&pg=PP13&dq=+database+indexing+using+binary+search+&ots=HecPStLWUK&sig=9A5wLiIHxg1ygaiZ-UxiSob7-Qs

Originally published at https://www.marvinkweyu.net/talks/binary_vs_interpolation_search

Going Enterprise and its Aftermath

Marvin Kweyu — Mon, 18 Jul 2022 13:08:03 GMT

I thought I knew what scalability was. Just the same way most of us thought we knew what distributed systems were.

I realized I had to completely relearn these systems because of the cloud.
~Satya Nadella

There are a number of considerations that you as a developer, architect or engineering team have to make as your user base grows. Going from products with 10 to say 30 or a million and so forth has its considerations. This, of course, depends on the user base we are talking about.

Remember astute reader, a customer can be an organisation with a plethora of other members inside it, or just as well be a single user called Jack.

Explore with me

Imagine thousands, half a million and so forth people performing a search query against a server. If those queries land on your end, would it break? Would it be able to hold?

We should be able to handle more data, more concurrent connections and higher interaction rates.

A note on interaction rate:

How often a user changes the interface on a blog application vs a game.

Before we begin, it is important to note that setting up our product for scalability should be as easy and cheap to grow 10x as it should at 0.2x as is common with startups whose needs change rapidly over a short period of time.

Foremost, there are two types of scaling we need to observe.

a) Vertical scalability

b) Horizontal scalability

Both of the aforementioned have their pros and cons - each with a use case which would be deemed fit. Let's narrow this down.

Example :

For solo and relatively small ventures, you might opt for a VPS(virtual private server) which is essentially a physical machine holding multiple 'servers', yours included, each with its own OS and so forth. Once at a certain cap, you might see the need to increase resources and hence, the incremental hunt; more RAM, more storage, more CPU and so forth. almost everything gets more.

The example above, is one implementing vertical scaling, where, we upgrade the hardware and or the network throughput. In short, you do not need to modify your application architecture.

Vertical Scalability

Broken to bits, we resolve to a number of techniques.

Adding more drives in RAID arrays and thus distributing our reads and writes during database interaction.
Switching to solid-state drives from hard drives
Increasing the RAM
Adding more cores to the server to reduce the context switching.
Upgrading network adapters, especially for systems reliant on media file processing.

There are a number of drawbacks, however. For instance, the change to a solid-state drive might not show a significant difference if working with MySQL which uses its own data structures for sequential disk operations or Cassandra. At a certain point, lock contention is bound to catch up and here, more CPU does little to any good. Vertical scaling also comes with a steep financial cost as you add more and more given the cash input doubles or triples as a unit of resource is added.

Example:

A RAM chip will double in cost as it grows in size. That is, x GB is $z , 2x GB is $2z and so forth. A reality check hits us once we get to a stage, say 128GB where the rules are not quite the same.

Horizontal Scalability

The alternative and preferred method - horizontal scaling. While it may come at a steep cost initially, it does get significantly cheaper as your enterprise grows to reach more users. Let's see how.

Caching

Use of content delivery networks (CDN)

To manage our assets, we save the hassle of having to perform the read and write operations on our own servers. Ever wondered why we have CDNs for most packages(Bootstrap, Material etc)?

Example:

We have our servers for, say UrbanDesk in South Africa. For access, an audacious Sergei (hint - thy knowest thyself) an enthusiastic early adopter from Germany, will hit the CDN(which are just data centres on their own). The CDN, acknowledging it lacks this data - images in our case, will hit our server for the resources it needs (CSS, images, javascript) and serve it to him while storing the same data and resources for further calls to the site. In this way, the next time, the call does not have to go a whole continent away.

Services like Cloudflare, Cloudinary or GoogleCloud CDN should ring a bell.

We have our static assets sorted out, but what about our data?

Edge Cache Servers

I've expounded on the data server to include functional partitioning, yet , another enhancement to our scaling needs. The concept behind it is that our project should be decomposed into individual actions those components need to perform - to group components of our application based on what they do.

For instance, we have sections that address this page you are currently reading (thegreencodes.com), then others that speak to different types of users, say, those interested in merch (merch.thegreencodes.com), or the admin (admin.thegreencodes.com). Each of these sections has been compartmentalized.

We might also move the database to its own server and update our configs to point to it, hence to each its own, that is, the server the application runs and the server the database runs are different but still communicating.

You will notice that I also added the GeoDNS. It works very similarly to an ordinary DNS, the only difference being that it is ' geo-tagged '.

The DNS

A Domain Name Server(DNS) is a way to identify your server's IP address on the web. In this way, if your server has an IP of say 127.0.0.1, it is displayed as example.com, to whoever else is looking at it.

Back to GeoDNS

Essentially, when you attempt to connect to thegreencodes.com, the server IP will be resolved to the data centre closest to you.

Elaborately, once your product has a certain cap of users , serving all across the globe, having a single server in a single location might degrade the experience these very users get.

Example

Google has multiple servers. While you might connect to its root domain from Australia and person B from Germany, these might not be served from the same server. The GeoDNS works to resolve the IP addresses of these servers/ data centers based on the location of the end user.

You will access the same data, only without knowledge of what IP the domain you are visiting is currently resolving to.

What makes horizontal scaling stand out is the basis on which it stands. At its core, horizontal scaling lets you deploy and maintain your project on several medium or small servers. A single server does not necessarily have to be extensive with vast amounts of resources - we split the load between different servers.

Of note, is that we need to consider whether our project is stateful or stateless. What this means is to identify whether the split needs data and whether that data needs to be synchronised.

Managing data in stateful horizontally scaled products

Behold, the CAP theorem and the ability to identify whether we need more reading or more writing of our database.

If we indeed have data that needs management, we might opt for three techniques.

a) Sharding by datab) Sharding by hashc) Data replication

On the one hand, we have sharding by data. In a case example similar to the one shown, we can have a database that holds data from those in Europe and another that holds data from those in America.

On the other hand, we can opt to use a consistent technique of hashing specific keys on our model and use those hashes to store data in different databases. For instance, hashing the profile_id and stating that data for users with the hash algorithm ABC will sit on this server and not the other.

The above are optimizations meant to prevent hitting the same database for all requests.

While not in-depth, I do intend to come back to this in future. So keep an eye up for this.

More than the type of scaling, we have to consider a number of techniques and practices to maintain.

As these platforms grow, the following have to be considered.

Testing

There is more to testing than unit tests. Larger systems eventually comprise of larger subsystems. How these systems work together (integration testing), whether the application is behaving according to the specification(functional testing), whether it can handle different loads (performance testing) and so forth.

Logging

The more the components, the more integration is needed and the more the system is prone to fail. The more servers we have, the higher the chances. We hope for the best, but prepare for the worst. Your team needs to know when a module went down, whether the data maintained its integrity, which modules are still up and so on.

Choose to go for the Crash-only approach; let the system always be ready for a crash and whenever it reboots, it should be able to continue without human interaction. An example of such a system is Netflix's Chaos Monkey which runs to kill random components of the system within working hours to test its reliability.

Conclusion

The practices above are intertwined. Functional partitioning would lead to the decomposition of the project, decomposition would lead to a 'maybe' use of microservices, this would bring a restructure of your team to smaller domain-driven teams, a change in how data is cached and managed and so forth. Build upon this. Go agile and iterate. This path is not linear, nor should it be treated as such.

Treat these principles as your northern star, but feel free to morph them based on the value they create for your business.

Understand

In the creation of sustainable software applications, we place the business logic at the centre. The inverse, you'll find, will get you great code, rather than solving the need to be addressed.

By placing technology first, we may get a great rails application, but it may not be a great pharmaceutical application ~ Robert C. Martin

Onto Architecture - the stuff we wish we'd gotten right at the start of our project.

Vue Unit Testing: The Breakdown

Marvin Kweyu — Thu, 26 May 2022 17:13:56 GMT

So far so good. We have our application boilerplate and have successfully written our first test for the home page of our todo application.

Next, we have to get through fetching, displaying this information and manipulating it.

Let's use beloved axios to grab and interact with our API.

npm install --save axios

For our application, we'll use an already created database - TheGreenCodes data source.In the same manner of thinking as before, we create tests for the non-existent feature; listing to-do items.

We shall display uncategorised items on one page, all the incomplete items on another, and then follow and display those that have actually been completed on the last page.

As we are starting off, we ensure, just like before, that since we have nothing in our database, yet, we display an appropriate message. Among others, here are the questions we need to ask:

Does it show an appropriate message if no item exists in the database yet?
Does our application fetch to-do items if any?
Does the app show any error message in case it cannot get to the server?
Can I even create a task?

Bear with me on this one. In the spirit of TDD, we are going to write our application based on our tests. So from here on out, no UI until we have something complete. That is, after all, the whole point of testing; pushing code with the confidence it is not going to break.

Modify the test suit to watch for changes in our test files. That way, we do not have to run npm run test: unit after every modification.

Load your package.json and add the below:

..."test:unit": "vue-cli-service test:unit","test:unit:watch": "vue-cli-service test:unit --watchAll", // line to add...

From here, we open our console and run:

npm run test:unit:watch

Leave it open for now. You can always open it to see any changes that may occur.Let's fulfil our queries as we go.

We want our home page to display the 'empty database' message for an empty database. Let's test that.

To start, we do some modifications to our test suite, making it easier to work with as it grows.

import { BootstrapVue, BootstrapVueIcons } from 'bootstrap-vue'import Vue from 'vue'describe('Home.vue', () => {// before any test case in this test suite, import and use bootstrap UI components   beforeEach(() => {    Vue.use(BootstrapVue)    Vue.use(BootstrapVueIcons)  })  it('Displays welcome message', () => {    const wrapper = shallowMount(Home)    const welcomeMessage = 'Welcome to TheGreenCodes awesome to-do list'    expect(wrapper.text()).toMatch(welcomeMessage)  }))}

Along with the other previous imports, we import bootstrap UI components.

Remember:

Our test case is assuming a black box. We are testing this specific component. Nothing outside. So it does not know what we have. Without the use of the UI import, the test would surely work but will show a warning as to what b-container or b-card means. It doesn't know.

Let's test for a message in case there are no to-do items. We expect this to fail at the first instance as we have not implemented any feature to display to-do items.

Append the below code snippet to your suite:

  it('Displays a message when there are no todo items', () => {    const wrapper = shallowMount(Home, {      data () {        return {          todoItems: null        }      }    })    const titleFound = wrapper.find('h4')    const emptyTodoListMessage = 'You have no existing todo!'    expect(titleFound.text()).toMatch(emptyTodoListMessage)  })

Reasoning

We mount the Home component and set a data property to hold the data from the API. We also grab the h4 tag, an element we expect to hold the message when there are no items on the to-do list.

We should also check that this 'You have no existing todo!' message does not display if there are indeed items on our to-do list.

  it('Does not display message when there are todo items', () => {    const wrapper = shallowMount(Home, {      data () {        return {          todoItems: [{ id: '1', title: 'new item', content: 'make awesome content' }]        }      }    })    expect(wrapper.find('h4').exists()).toBe(false)  })

Run the tests, in case you have not done so yet, and take a look at the spectacular failure.

PS to self:

Stop logging 'madness' on Sentry servers.

Let's fix this.

<script>import axios from 'axios'export default {  name: 'Home',  data () {    return {      todoItems: null    }  }}script><style scoped>.list-container{    max-width: 170%;}style>

As you may have noticed, we have redundant code. We create a new wrapper each time we have a test. We can have this centralized. Modify your test to look as below. We explain, as always what each means.

For brevity, we ignore other components of the test file.

...describe("Home", () => {  const build = () => {    const wrapper = shallowMount(Home, {});    return { wrapper };  };  beforeEach(() => {    Vue.use(BootstrapVue);    Vue.use(BootstrapVueIcons);  });...

At the very top of the description statement of the test, we shallowMount the Home component. That said, further tests can be used as below:

 it("Displays application title", () => {    const { wrapper } = build();    const welcomeMessage = "Welcome to TheGreenCodes awesome to-do list";    expect(wrapper.text()).toMatch(welcomeMessage);  });  it("Displays a message when there are no todo items", () => {    const { wrapper } = build({});    wrapper.setData({      todoItems: []    });    const titleFound = wrapper.find("h4");    const emptyTodoListMessage = "You have no existing todo!";    expect(titleFound.text()).toMatch(emptyTodoListMessage);  });

With this refactor, we declare at the very start that it is the Home component in testing mode. We make the code cleaner and reduce having to remember that every time since we know that only the component's properties would be changing every time.

To set these individual properties, we can call the methods: setData, setMethods, setProps and so forth directly.

Because our component has no children, we navigated to shallowMount. However, if it did and we wanted to show whatever these children have in them as well, we could as well call mount and import it from the same path we got shallowMount.

What about the listing of all items?

// add imports...import axios from "axios";import flushPromises from "flush-promises";// more tests go here...it("Gets all todo items regardless of status", async () => {    const { wrapper } = build();    const expectedItems = {      // ToDo: our items contain articles to read      data: [        {          id: "1",          title: "TheGreenCodes: Unit testing in Vue",          content: "A guide to better predictable code.",          complete: true        },        {          id: "2",          title: "TheGreenCodes: Tests must fail",          content: "Building blocks to test driven development",          complete: false        },        {          id: "3",          title: "TheGreenCodes: Tests must also pass",          content: "This share",          complete: false        }      ]    };    // on axios call, use expectedItems object    jest.spyOn(axios, "get").mockResolvedValue(expectedItems);    await wrapper.vm.getToDoItems();    // make sure API request promises are complete before looking for articles to read    await flushPromises();    expect(axios.get).toHaveBeenCalledTimes(1);    expect(axios.get).toHaveBeenCalledWith("todos/");    expect(wrapper.vm.todoItems).toEqual(expectedItems.data);    // Finally, we make sure we've rendered the content from the API.    const todos = wrapper.findAll('[data-test="todo"]');    expect(todos).toHaveLength(3);    // we have articles now    expect(wrapper.html().includes("You have no existing todo!")).toBe(false);    expect(wrapper.html().includes("TheGreenCodes: Unit testing in Vue")).toBe(      true    );  expect(      wrapper.html().includes("Building blocks to test driven development")    ).toBe(true);  });

Through this case, we build an object we expect to get as a response. It returns a list of the series you are reading right now.

We say, ' Whenever you call on axios with a get request, return data that looks similar to this - '

Since our component calls the getToDoItems method, we can call it directly by using await wrapper.vm.getToDoItems(). This is almost similar to how we edited or set our data before.

Because the call is asynchronous, i.e, a request is made to the server and data sent back, we have to ensure our data replicates a wait time as well. For this, we call to flushPromises after adding it to the list of imports.

We are also ascertain that the get request is made only once and to the specific URL we need it to.

As axios will return the information wrapped in an inner data object, we want our data to get out of that to be compared directly to an array we intend to store. Thus the line:

expect(wrapper.vm.todoItems).toEqual(expectedItems.data);

For the last part, this should come out clearly.

If you have seen the light at the end, you might have seen the errors all snarly and already changed your Home component. For reference:

<script>import axios from "axios";export default {  name: "Home",  data() {    return {      todoItems: [],      errorFound: null,    };  },  created() {    this.getToDoItems();  },  methods: {    getToDoItems() {      axios        .get("todos/")        .then((res) => {          this.todoItems = res.data;        })        .catch((error) => {          this.errorFound = error;        });    },  },};script><style scoped>.list-container {  max-width: 170%;}style>

This change should make our current tests pass.

Zoom in on this line:

...for="(item, index) in todoItems"              :key="index"              data-test="todo"             >...

To be specific data-test="todo".

This is a property added to the list of items for the articles yet to be read. This helps us call:

 const todos = wrapper.findAll('[data-test="todo"]');

You can equate this to a CSS class that is relevant to writing unit tests. We all the items in one swoop and check their length.

We can go on with the tests and features but this is the baseline;

Write your failing tests
Make the tests pass
Refactor the code.

Working iteratively through this , enables not just error-free code, but also prevents us from writing extra redundant code.

Over the lifetime of the repository,awesome-to-do, we shall iterate through this process and come up with a complete application. I invite both you and your counterpart in crime, to join in and build better software together.

Alas! Let the OpenSource contributions begin.

Regards,

Marvin K.

PS: Picture creds: monkeyuser.com

The era of the builder

Marvin Kweyu — Mon, 31 Jan 2022 13:43:11 GMT

I start this piece at the big bang of it all; TheGreenCodes and all it entails.

This here goes out to all those who have reached out or are yet to, with regards to articles shared, both new and old.
To you, the reader that you are and builder that keeps at it.

I took a moment, a beat, to reflect on why I got started. The reason as to why the wheels turned to this particular alley, consequently bringing with it an array of an audience.

This here outlines an engineer's path to a targetted mastery. A nudge towards paths, journeys and challenges he encounters on his quest; a story by all means.

For the frequents, 'welcome back ' and to the visitors 'we are glad you made it'.

Over the last three years or so, I have journeyed with engineers across the board towards writing pages, useful to both us and those around us. It has been a tremendous journey, down to its hours of reading and research to beckon what I thought best for the audience at the time. Three years in, and here we are.

Series that have sparked conversation or passionate debates and those once-in-a-while articles that open up doors to collaboration and open source contribution. Hurrah!

A heads up to all those unexpecting eyes and ears, and in the words of those before me:

I refuse to be nothing but a full-time blogger.

As I take the time to pass this along, I open up the box of what is to come through this next phase; the builder's era.

With this, comes a voyage through which I shall not only share the nuggets of my findings but build around those same principles and guidelines.

For the start of this, however, I share with you Project Urbanlibrary to lead you into the world of content aggregation.

Along the way, I share routines and practices that have helped and still help to keep me sane in both mind and body. Keep healthy at all times and make time for your wellness, lest you are forced to make time for your illness - balanced, as all things should be.

Note to a future self and anyone else reading this:

Good work isn't created in a vacuum and the experience of art is a two-way street; incomplete without feedback. - Kleon Austin

I keep it short. I keep it sweet. Just like Thanksgiving.

So who are we? We are TheGreenCodes and at its heart, is a drive to build and talk about what we have built and continue to build. I urge both you and the friend you share this with, to join us and discover a place that could be.

Regards,

Marvin Kweyu

The Age of Monolithic systems

Marvin Kweyu — Mon, 10 Jan 2022 12:43:51 GMT

Coming in first in the series; the monoliths of web applications - monolithic architectural patterns.

Broken down, a monolithic system is one that is deployed as a single unit of interconnected components.

Let's give an example.

We are creating a hotel management system for our local favourite spot in Django. Our application will hold, amongst other things, account management , rooms, rating and so forth. Break it into as many modular applications as you might come up with.

Essentially, a monolithic system is one in which multiple smaller applications live within the same server, share a file system and same database. Sound familiar? To easily identify if a project is monolithic, ask, can you scale an individual modular part of the application (memory or CPU) while leaving other apps unattended? If not, then this is a monolithic application.

Projects built like this are known to be simple in terms of development, easy to scale up to a certain cap, easy to deploy and be fast during the development phase. Fast, because all its components are together hence easy to collect and it would be written majorly in one language hence no need to learn another language/framework.

Have in mind , for instance, a java application using a jar file that contains all the apps logic for deployment.

A common architectural pattern , one we aim to discuss , that follows this , is the layered architectural pattern.

The Layered Architectural Pattern (n-tier architecture)

Governed by the Single responsibility Principle - a class or module has one and only one function - the layered approach leverages the separation of concerns. Broadly speaking, it splits software products into the presentation layer, business logic layer , persistence layer and the database layer.

These are modular sections of the application.

Each section/layer outlined has a specific function without needing to know what the other layer entails.

The presentation layer, for instance, deals with the display of information to the user. The business layer gets data from the persistence layer and processes it against application logic (say merge with another table or filter against a third party ) without the need to know anything about the user interface. Hence, you might as well change the presentation without changing the logic of what your project aims to do or change from an SQLite database to MySQL without your business logic requiring to know.

On a number of smaller applications, the business and persistent layer may be merged into one business layer. As observed, we get to have 3 layers in one case, four in another and so forth, hence the name n-tier, where n is the number of layers.

Requests flow from one layer to the next, say business logic to the persistence layer. This is a classic example of a closed layered system. That is, one in which, a lower-level layer cannot be accessed directly.

Example:

Not letting the user access the database without walking through the business logic. What is handled in this 'logic' you might ask?

For one, authentication and authorization. No one should just access our local hotel database. at the same time, an ordinary user should not be able to perform managerial actions within our platform. That is left to the manager and only the manager.

In its basic form, a layered architectural system works in a closed circle.

We may, however, need to have access to lower-level layers without the intermediaries - an open layered system.

Example:

Given the premise of the n-tier architecture, we have an extra layer, the services layer . Our project is now a 5-tier system. Despite having this layer, we might not use it as often. Hence, it does not serve our purpose for every request to go through it and have no change or mutation. What this layer would be adding to this specific request, is time; and time is of the essence. We have it, yes, but only use it when needed.

We bypass all that, and keep this layer open. This allows our request to access the level lower to it without having to go through it.

Note:

Within your project, aim to have at most one open layer. Having more will create interdependencies between these layers, and you lose the whole point of having the layered system in the first place.

Another variation of the n-tier architecture is the use of cached layers. Here, requests may be cached in between layers so that concurrent requests do not have to go all the way down. Plainly put, not every request made must hit the database (save the user some time).

Let's take our app idea from the top

A guest wants to make a reservation. They log onto our page and see that we have , on top of rooms, fun activities and games for our guests (the presentation layer). They go ahead to make a reservation. Our screen gets their request and sends it to our module within the business logic layer. Here, we want all available rooms and as a bonus, the activities available within the time frame of booking (notice the filtering of time, price and so forth). Now, call the room and activities data access object (dao) within the persistence layer, our ORM. These will, in turn, execute the queries necessary from the room and activity database table. Push this data back to the user and let them know they can have sushi on the house!

A little perspective

So, we've given a primer on monolith architectures using the n-tier approach. We have seen a case application that uses the monolith system and used it to get our new client a room. Our system works and our local hotel pays in kind. What happens when they get new outlets? More activities and more traffic? Will it scale?

Monoliths are good, and monoliths are great. How you use them depends on your needs. Use a monolith when you want simplicity in development, deployment and scaling , work in a relatively small team or are creating a prototype. That works 100%.

A monolith will, however, bite you as the system gets larger due to a number of reasons.

Framework /Language lock

The whole codebase is in one framework that might not be suited for future feature additions. For example: we wrote our application in one language that is good for speed but we now need a section of it to use image processing libraries. How suited is it for image processing?

Inefficient resource allocation

If one endpoint needs more time either due to IO operations or processing, other functions may be locked.

Scaling becomes difficult

1000 or so requests to the search functionality have been made. In monolithic architecture, a new instance of the running application might be created during load balancing to handle multiple accesses. Here, we have a whole new instance of the same application created, when all we needed was the search to grow .

Slowed development

Take it that we now have a chain of hotels across the town. Heck, across the country. We have added feature 1 and gotten to feature 300. We now want to change a variable name within an application in our project. We have not touched anywhere else but that section. The whole codebase, including the untouched , has to be compiled once more. We are still developing, remember that! That's 10 minutes for build and compilation that could have been better served elsewhere not to mention an IDE load time that puts us all to sleep.

Conclusion

So far, we have a birds-eye view of what we have been creating. A capture of what monoliths involve and what they do not - layer upon layer. Great start. Is there more? Stick around a while longer.

Software Engineering Architectural Patterns

Marvin Kweyu — Tue, 04 Jan 2022 15:29:20 GMT

And when you close your eyes, what do you see? Do you see ducktape? Do you see robust infrastructure? Do you see fragile systems? Do you see elegant architecture - Software engineering architectural patterns ?

Part of development, includes a thought process - the software development life cycle(SLDC). A point in time during which we remember that software development is more than just code; it involves more thinking.

The software development life cycle involves a series of steps engineering teams undergo to create, enhance and maintain sustainable software solutions.

As an overview:

Each step outlined above, acts as an input to the next and so forth.

We focus, at least in this series, on Architecture, which gives room for development (the part all engineers love).

Take note, that the steps outlined in the development of sustainable systems (SDLC) may have more detailed steps in each. Do reach out , either in the comment section or directly, if you would like an elaboration of the same.

So what is Software architecture and what do Architectural patterns involve ?

Software architecture is the definition of how components of a software system are organized and assembled and how these components communicate.

The output of the architectural design stage includes and is not limited to; prototypes, pseudocode, architecture reports and diagrams for technical details. This is a key step. Miss this or mess it up, and development becomes a financial and technical nightmare.

Sidenote:

Understand that design patterns and architectural patterns are not the same.

When determining the software architecture you intend to use, you ask a number of questions; does the system need the high performance? How adaptable should it be? How secure ? How modular should it be? Do we start with large components that have smaller sections, much like layered systems, or do we start off with small sections that form larger components - similar to the use of microservices?

Herein, and in the shares that follow, we shall discuss the most known patterns. We shall aim to understand:

Monolithic patterns
Service-based patterns
Distributed systems

We shall break these down to know when and where to use layered systems, microservices , service-oriented patterns, space-based architecture, event-driven and microkernel architecture.

Through this, we discuss the advantages of each against its conns ; whiteboard software development.

Walk with me through this series, and let us talk about architecture.

The State of Software engineering

Marvin Kweyu — Thu, 11 Nov 2021 12:03:37 GMT

It has been more than a year since I jotted down The Developer's mental Day, and this, is but an iteration, a reminisce , if you may, of the same.

This here, is dedicated to Kelvin (who , unfortunately, I did not get to go ziplining with), and all those like him. To the fallen engineers. May they rest in peace, wherever they are.

Why though, do I push for the developer, the human behind the code? Well, for a great number of reasons , quite honestly.

For one, it's the era of information. It is the era of information. Let that sink in.

A period in time where the 'yatza' moment , is largely attributed to how well you adapt to new data from the 7 billion people on earth. An era where, a miss of the internet , as a developer for a month or so will have you look like you came off a bandwagon from the past. A timelapse of the universe , like a glitch in the matrix. Woah!

Sidetrack:

[Spills coffee. Frantically tries to wipe it off the keyboard with a cloth]

You: "Oh, great, I just deleted 500 rows from the DB."

[ Beep! Boop! You have a new notification! ]

Message preview: ' To the stake!'

You: [ Breathing heavily. Sweaty palms. ]Oh no!

Spare a minute with me before you open that slack notification and let us talk about you.

The importance of conversation

How are you? Like really?

Have you had your breakfast? What about a walk in the park? I'm not talking about the cold pizza you had leftover or the energy drink you keep staring at. I'm talking about real food. Have you worked on any part of your body other than your finger's typing skills? Is your back okay? Stop slouching!

Move fast and break things. Like privacy and trust.

No.

Move fast. Break things. Break a lot of things. Go forth and break! Exclude privacy and trust from this list. Definitely exclude yourself.

Take a beat and reflect on the journey you have had so far. Understand that this is a marathon and not a sprint. Check up on your colleague who keeps staring into the empty space. They might be calling out. They might need your help.

Spare a moment and talk about anything other than code. Talk about the fact that you're alive and get to breathe. You can actually breathe.

Share with me; about the moment you realized your belt buckle was worn out.

Mundane?

No. Not really.

Let's talk about those migraines you get whenever you miss your morning coffee. How cranky do you feel?

"I've been taking three cups of coffee everyday for the last 2 years. I think I'd know if I was addicted." - a random developer.

Indulge me on the struggle you have with substance abuse. I'll listen, Kelvin. I will listen.

Take deep breaths. Get out of the chair and touch your toes (I honestly still can't do this ).

Break from cancel culture and the bid to keep up all the time, every minute of your day. Do not gain it at the cost of you. You are more important. You are the driving force for it all. I dare you not to code when on holiday.

The keyword in this narrative, is conversation; to talk (with a skipping rope in tow, have to keep blood flowing).

So here we go, an open invitation to talk about our well being as developers. Be gone and away from the stack of algorithms , endless lines of code and join us. Let us talk about our health. Let us chime in on our human interaction. let us be human. Be a superstar at your health.

To any developer out there, the gate; twitter or mail is open.

To a friend I who should have had a closer look at, to those who might need context and those who might have a friend in need.

Kelvin was an engineer in his own right struggling with substance abuse and depression, unknown to some or most. With his loss , I have truly come to understand that depression isn't dark rooms and endless crying. Sometimes, it's getting up, going to work , smiling and getting home to feel empty inside.

And to dearest El , for keeping me sane at times that were dark. We all need a helping hand.

For this is war, and win we must.

Win.

To Opensource and Beyond.

Marvin Kweyu — Wed, 13 Oct 2021 19:07:32 GMT

It is yet another October, or better yet, another period and timeframe set aside for opensource contribution.

We have had an incredible year here at TheGreenCodes, and with all this , we still come back at some of the projects that might spark interest and might give in to some learning along the way. We mention a few whose source code is out for the naked eye to peruse, learn from and contribute to (Find a bug? Make a pull).

TheAssessMe Project

https://github.com/MarvinKweyu/AssessMe

Built as a quiz application, The AssessMe Project has two kinds of users; a teacher and a student. The instructor, in our case, the teacher, creates multiple-choice questions form where the students enrolled in a certain topic can give their answers and have the results right on. No, wait times anymore! The instructor then gets the results in a CSV file neatly formatted together with the average score. I'm sure we can work more off of this. Keep an eye on the project. There's more to come here.

So what seems to be the issue? Simple. We intend to split the settings into those for production and development. That's about it!

We , as well, want to show a timer on our student portal. How else would users know the time left on the quiz?

FeedCreator

https://github.com/MarvinKweyu/FeedCreator

Created as a Django SSR, FeedCreator is a blog application serving to publish blog articles of any kind, with tags, RSS feed , comments and article filtering. A great entry point for Django eyy? We all know that tutorial all so well.

Here, however, we aim to improve upon and build it together. To work through the changes we deem fit for an application of its like.

We currently have two issues attached to the project.

ColorDetect

ColorDetect repo

We know this one all too well by now. So we shall just give an overview of the tasks at hand.

We want to not only get the colors present in a video , but also get the colours present at a specific time cap in the file parsed (Sneaky one here).
We also still need tests ! Have I mentioned it? Tests. Any test really. Find a feature you think has not been tested well enough or could be done better and make a task of that.

Tickle that curiosity of yours and see what you come up with as you work through the issues or walk through the code you find. Ask any questions you may in the project pages and share what you discover. The community , you and I included , are online.

Software Engineering Principles To Live By

Marvin Kweyu — Tue, 15 Jun 2021 17:06:23 GMT

Back again, with another off-the-top shelf; Software principles to recite before bed - literally.

Introduction

Throughout the software engineering lifecycle, practices have come and gone. Those that have stuck and those that have withstood the test of time - the building blocks of what we have today. They've pushed teams and developers alike to better codebases and practices. Here, we discuss a few of the most notable.

YAGNI (You Ain't Gonna Need It)

An echo off the last article, where we discussed the endless stream of libraries , and our constant want to know them all. Are you going to use it really? Another dead end , eyy? The libraries you are actually going to use are those that you stumble upon while looking for a specific need and problem to solve, not those that you blindly go online and scroll for.

SOLID

Not a state of matter. Coined by Robert C. Martin, SOLID is a software engineering acronym, broken down to:

S - Single-responsibility Principle

O - Open-closed Principle

L - Liskov Substitution Principle

I - Interface Segregation Principle

D - Dependency Inversion Principle

The goal of this is to make whatever project you, as the engineer, are working on, be maintainable and extendable. Elaborate this.

Single-responsibility principle

A class or module should have one and only one function.

For instance, we have a class that renders a GUI application, for instance, a PyQt program. Lots of things happening within this. Our program, in this case, streams our favourite music streams from source ABC, a video player of sorts. What about creating just one class for this? Seems simple enough, right? Well, no.

Beyond that corner of 'the program worked! ' lies a wet mop willing and ready to slap you in the face.

As with any other program, we expect downtime, a ' stream not found ' error, and additional functionality to be added to make it pomp out. You get this gist. So break it down. Our single class cannot have all this all at once, make modules to work on error logging, make modules specific for the video player, a class-specific to listing our feed and another to show history. If it can be small, make it smaller. Just do not overdo it and clear out the whole point of it.

Open-closed Principle

Objects or entities should be open for extension but closed for modification.

A function, class or module can be extended, but not modified by an external entity. Within this function, method or module, include items that are mandatory for controlling the, but none of the optional methods which would limit the flexibility of the implementations.

" Good Architecture maximizes the number of decisions not made " - Robert C. Martin

All notification services show a, well, notification, but do all of them show an error notification? Break it off. Utilize polymorphism. All my error service needs to do is pass ABC to XYZ. The rest should be none of its concern. Pass it relevant info and let the rest be handled. This same principle would be extended for the success or information notification services.

Liskov Substitution Principle

Derived classes must be substitutable for their base classes.

Objects in a program should be replaceable with instances of their subtypes, without changing the correctness of the program.

Let's simplify this: what we mean is if S is a T subtype, then Type T objects can be replaced with Type S objects.

Bear with me; A school application.

We have a class Staff that houses all the staff of an institution. As well, we have two subclasses, that is, Instructor and Support staff, both of whom are still staff members.

As usual, we have our assumptions as to what properties and methods belong to what object. Following the same principle, we state the below:

Do not enforce stricter rules in the subclass

Using the example; we break the programming principle by stating, as a property that class Instructor, must not only be an instructor but also be an instructor of a specific institution. Pause a little and reason this out.

We are modifying the parent class from within the child. This is similar to saying a dog species class, say Grayhound, modifies a parent class, Dog, to have the color red. We just break everything else that depends on that single class Dog. We have, essentially, stated that all dogs are red, even though it is just this one species.

Interface Segregation Principle

Make fine-grained interfaces that are client-specific.

We have a program, a class, Library, that helps manage our books. Creating an instance of this class is the same as interfacing with it.

Take, as another example, an e-commerce platform. We lean towards microservices at this point. We have a database that houses our stall, an admin panel for the store owner and the user navigation section, where our buyers get to see and hopefully purchase our products. As microservices, at the top level of this becomes:

Admin panel microservice
Client microservice(we assume our buyers are the users)

Why should the buyer interface with the admin panel when they are not using it in the first place? Why would all that code be with them at that specific instance?

Dependency Inversion Principle

Depend upon abstractions, [not] concretions

The last of the solid principles, which, if you have been following those before, should fall right into place.A higher class should always depend upon the abstraction of the class rather than the detail.A good example of this is the implementation of abstract classes in Django models; the Abstractuser model.

In our application models, we would, upon migration, have models from classes that inherited from it, but have no table within the database, to represent the abstract model itself, because it's, well, abstract.You may note, how:

High-level modules do not depend on low-level modules. Both should depend on the abstraction.

Abstractions should not depend on details. Details should depend on abstractions.

A high-level module in any program is one that depends on others. We specify, I repeat, abstraction; an interface upon which we build.

DRY(Do not repeat yourself)

Oh, duplicate code, where have you been? You have a piece of code from A that is exactly familiar to the one in C. you have a div, (a little web development for a while), across multiple pages. Repeating the same CSS styles and methods across those multiple pages. Why not make it a component in itself? Then tag it and write one CSS file and so forth and change data based on props?

KISS(Keep It Stupid Simple)

Often than not, engineers find themselves getting lost in algorithms and data flow other than the value the application is going to bring to the table. We focus more on 'features' as opposed to what the user will actually want our program to do.

Conclusion

So go forth engineer, KISS it, keep it SOLID, keep it DRY and remember to install libraries you actually need. No bloatware! Even on your current device. Do you need that app or is it there for that one time you thought about it?

Yours,

TheGreenCodes

Vue unit testing: Tests must fail

Marvin Kweyu — Mon, 03 May 2021 17:06:33 GMT

Kicking off from A guide to better predictable code, we create our project boilerplate; assuming you have the vue-cli already installed. Right on, our awesome-todo

vue create awesome-todo

We manually select our project setup. Using Vue-2, allowing the router, vuex as well as unit testing along with the defaults selected.

Select jest when it comes to our unit testing solution and store the configuration in your package.json file. Are you ready? Good, let's get on to the next step.

For purposes of this guide, we'll use bootstrap; particularly, bootstrap-vue. Let's shorten those CSS classes.

npm install bootstrap-vue

In your main.js file, add the necessary configurations.

import { BootstrapVue, BootstrapVueIcons } from 'bootstrap-vue'import 'bootstrap/dist/css/bootstrap.css'import 'bootstrap-vue/dist/bootstrap-vue.css'Vue.use(BootstrapVue)Vue.use(BootstrapVueIcons)

Spot on!

On the home/ landing page of our application, we want a welcome message displayed. For our case;

' Welcome to TheGreenCodes awesome to-do list. '

To begin, though, we test. As I said, we think before we implement. We think of the feature we want to implement before actually implementing it. We then write tests for this feature, to avoid the addiction cheat trap where we say we'll write a test after then never actually get to it.

Create a file home.spec.js under the unit directory in the tests folder.

Notice how we name the file. Our test runner will look for javascript files with the spec keyword, through the project, under this directory. Now copy the following code snippet below.

import { shallowMount } from '@vue/test-utils'import Home from '@/views/Home.vue'// what componet are the tests referring todescribe('Home.vue', () => {// what feature or spec are we targetting specifically  it('Displays a welcome message', () => {    const welcomeMessage = 'Welcome to TheGreenCodes awesome to-do list'    const wrapper = shallowMount(Home)    expect(wrapper.text()).toMatch(welcomeMessage)  })})

To run this as well as consecutive tests:

npm run test: unit

Our test fails; rather horribly. Reading the shell, you see:

  expect(received).toMatch(expected)    Expected substring: "Welcome to TheGreenCodes awesome to-do list"    Received string:    ""

We gave it a variable with the message to expect in the component Home. What we know for certain, however, is that we haven't even touched that component.; hence the failure. Head over the Home component under views, remove the HelloWorld import and use and add an h2 tag with our welcome message. Re-run this test and see the difference.

Before we get any further, we should explain what the elements in our Home test mean.

As we have made use of comments, we shall describe target areas:

    const wrapper = shallowMount(Home)

We create a variable , wrapper, that holds our component. How we do this you ask? We import shallowMount from Vue test utils. Just like the default component life-cycle hooks, our component is initialized, only this time, since we specified we wanted a shallow mount, any child component within this parent component is not included.

We then ask the question:'Hey! Is there any mention of our title from within this component?' To which the suite complies with a yes or no depending on what we have. We expect this component to have text, not only text but that which matches our welcome message.

Behold! We have done the foundation building block of a test; tests must fail, tests must pass and the code must be refactored.

We break this statement down:

Tests must failOur feature has not been implemented yet, so why on earth would we expect a test to pass?

Test must pass

Hey Marv, I wrote down that cool little feature. What next? Simple; the test must pass. The test whose feature we just wrote should pass.

Code must be refactored

When the same piece of code is edited later, does the code still pass? Can this component or code be edited and still let the test pass gracefully?

Do we get the 'It broke everything else!' exclamation?

Tests must fail, tests must pass and the code must be refactored.

We could go further with this test and specify what element we wanted to have the title:

    const titleFound = wrapper.find('h2')    expect(titleFound.text()).toMatch(welcomeMessage)

You get the same result. Ignore the warning for UI element registration, incase you are already using it within your code, for now, we'll fix this in a while.

Let's not just make our application, but let's make a great UI. Adjust your code to look as below:

Home component

<script>export default {  name: 'Home',  components: {  }script><style scoped>.list-container{    max-width: 170%;}style>

Refactor the routes: (This would mean another component under views with the name Completed)

 {    path: '/completed',    name: 'Completed',    component: () => import(/* webpackChunkName: "about" */ '../views/Completed.vue')  }

Our application entry component would also have a link to the completed. section, either as a replacement to the about page or an add-on

"/completed">Completed

In the end, we should have something similar to this:

We have a basic layout up, and a first taste of what testing involves. To dive or not to dive is the question I pose to you.

Hush for a moment and let it settle. We will proceed with our application in an upcoming article.

Be sure to check the code, if need be, from TheGreenCodes repo. Specifically, the project tag awesome-todo v0.1.0.

Stick around for a while as we delve more into the internals; and yes, we can continue this conversation on tech twitter, where Potato, oh, my bad, Larry and I, Marvin, talk everything between code and smelling flowers over the weekends.

Peace out.

A small laugh, in case Lewis, finds himself here again:

'Kuingia kimahuru'

Unit testing in Vue

Marvin Kweyu — Tue, 27 Apr 2021 14:37:07 GMT

Hold up! I got a confession to make.

I pushed my code to the master branch without tests.

Until I stumbled upon TDD, I had never really understood the purpose or relevance of tests. I remember writing a whole project without tests and publishing it. Can you believe it? Was I out of my mind blind?

Thinking about this still gives me heartache. Did the project work? Of course, it did. Until it didn't and I had to spend 3 whole weeks staring into the screen wondering what on earth had gone wrong. I mean, I just added a mini-feature, it should work! Sound familiar?

This could be us ... but we are writing tests now.

We are going to split this up into the fundamental building blocks of all tests, which are; Tests must fail, tests must pass, and the code must be refactored. Let's dive in, shall we?

First of all, we have outlined the pain that comes from not testing, but not really as to the importance of testing in itself. For a view of this, we would summarize into the below off the head pointers.

Giving the developer an understanding of the project requirements from the client's point of view.

In this sense, a developer who writes tests following the principles outlined earlier in this piece is made to think in terms of the project requirements earlier on. Which in short says; programming is more thinking than coding. Once you visualize what you want to achieve before doing it, it significantly reduces the chances of veering off the project in itself. You know this part of the page should display what, where this data comes from and expected results should the fetch not happen. You get the project's aerial view; writing, in the end, smaller modular code.

The confidence in shipping without fear of breaking other features of your code.

Small project? Sure, fine. Can this scale? How certain are you? What about that last little piece the project manager requested? Will it mean my service or call or method will not work? What about when the project grows larger and we change something? Will we go page by page checking whether each part still works and shows the expected message? Do we have this time?

Shorter feedback loop

Shorten the 'x y and x pieces need w z k' feedback loops. When you can tell directly from the onset that this piece of code will not work as it is given how the backend team has refactored the API, you shorten the time it takes for the QA team to notice it, the time it gets to get to you and the time it takes to figure out where in the code you need to fix. You have, at this point, already identified it.

These, are some of the reasons we write tests.

So what are some of the ways we can use to get this done? Where do we get started here? Glad you asked.

To kick us off, we highlight the tools and options.PS: We are not installing the internet of node_modules in our project, you can breathe.

Jest

Created by Facebook, Jest is an out-of-the-box package that comes bundled with assertion. Plainly, it shows not only that the test fails, but also where it failed; whether variable Y was not equal to X, and so forth. This is necessary more so when you find a test comparing, for instance, an arithmetic sum 5 to a test data of 5, failing, only later to realize that you passed the string '5' instead of an integer.

Mocha

As an elder brother to its counterpart, Mocha works just the same way, but with a little bit more configuration. To be precise, you have to include an assertion library separately. Most commonly used as a partner her, would be Chai.

Whichever of the two most used packages you use is up to preference. It all depends on how customizable your tests need to be and or what you are more comfortable with.

The two mentioned are global across javascript frameworks. What about the specific framework of choice here in our case (Vuejs)? We could say, we have libraries that make it much easier to test in Vuejs. Specifically, we are talking about Vue test utils, which is the official unit testing utility library for Vue.js and the Vue testing library, an abstraction of the previous.

So, first, we decide whether we want to use Mocha + Chai or jest, then we go ahead with what works for us between Vue test utils and the vue testing library or perhaps both.

To engrave this knowledge on testing, we intend to build a simple web application; a ToDo list. With this, we can track items, check them off as done, see what was done, edit items, delete these items, and so forth. Along the way, we use different approaches of testing and ping what one approach has to offer vs the other. Every step of the way will be guided by a systematic approach, to give a clear outline of the intent beforehand. So pause here, for a while at least, as we get our tools ready for the next section of this series.

ColorDetect: Python Image processing algorithms

Marvin Kweyu — Mon, 29 Mar 2021 09:17:29 GMT

Pssst! You can find us on ProductHunt!

It's been a while since we touched on ColorDetect. Having taken an overview of what exactly we could achieve in the past, between getting colors from both images and video and in different formats and counts. We went ahead and described some of the use-cases of such a package, just to mention but a few.

In this piece, we highlight the improvements we have so far made, along with some notable contributors since the piece rolled out. Most notable, Clifford, whose algorithms have helped push from v1.1 ... to v1.4. Hurray Cliff!Let's get down with it.

In case you still haven't done so:

pip install ColorDetect

We'll take it from the new features and enhancements. Specifically, text customization and color segmentation (which, I realized, came in at just the right time)In the use of ColorDetect, we faced two to three challenges.

We now have the ability to get the colors off media files, but we want this text in a customizable format. The ability to have our own font and or styling to this text. This comes in handy especially in instances where we have a dark image. We cannot write black text on this and still maintain readability now, can we?
Hey there, what does an RGB code 5.0, 211.0, 212.0 even mean? Can the algorithm give me a more friendly color to decipher?
Okay great, I can get the percentage colors and dominance off the files I have, but can I tell what this color code or percentage refers to on the image? Can I mark the image to make it visible without guessing?

These, among others, are what we got to tackle. Let's do a quick overview of each.

To start with, In our virtual environment, we create a file custom_styling.py, and write our colors and text differently, from the defacto configurations.

For this, we will use virginia lackinger's photo on Unsplash

from colordetect import ColorDetect# parse the image.flowers = ColorDetect('./images/flowers.jpg')flowers.get_color_count(color_format="human_readable")flowers.write_color_count(top_margin=40)flowers.save_image(location='./images', file_name='processed-human-image.jpg')

The output you might ask?

A human-readable color display. We gave some space between the text and top section of the image, as well as implicitly clarify that we want a human-friendly color format for the output. We may go further into specification of the font color to write in by parsing font_color=(0,255, 0) (for green), for instance, to the method write_color_count. Along with this, comes the font_size, font_thickness, line_type and left_margin, among other configurable options.

Let's take a look at how we separate our colors from the image. With this, we refer to color segmentation. How do we tell which sections, no matter how small, from the given image, are color XYZ?

We parse in the color ranges we want to grab.

import cv2from colordetect import ColorDetect# parse the image.flowers = ColorDetect('./images/flowers.jpg')# provide a lower and upper range for our target colorsmonochromatic, gray, segmented, mask = flowers.get_segmented_image(lower_bound=(20,50,50), upper_bound=(40,255,255))cv2.imshow('Segmented', segmented)cv2.imshow('monochromatic', monochromatic)cv2.imshow('mask image', mask)cv2.imshow('grey image', gray)cv2.waitKey(0)

You get four results. For brevity, we show three, assuming you know what gray looks like in an image; more of black and white.

Segmented image

masked image

monochromatic

Just highlight the color I need. Turn the rest of the image into black white

Yatza! It works!

As of now, we have color codes of the images, in the format we want, and a return of images with our target color-highlighted. The same could be applied to videos, without the segmentation and one or two features, for now.

A call to action

We, however, know that much more could be achieved; and that is why we have a call to action. What features would you want to be implemented? What bug did you find?

For one, we need more tests. Do you feel an aspect of the codebase has not undergone sufficient testing? Let it be known. Get the ColorDetect repo,take a look at the contribution guidelines and make a pull.

We can leave it here for a while. Do feel free to go through the documentation as we update it with one or two features down the road.

As for this articles code, we have it on TheGreenCode's page

The Full-stack dilemma

Marvin Kweyu — Sun, 28 Feb 2021 13:26:40 GMT

We are back here one more time. Talking about, among other things, full-stack software engineering and tech stacks. An elaboration into what technologies we need to know and an overview as to why we indeed call this a dilemma; a crash waiting to happen.

Stacks: The talk of the town

Now, a stack is a set of tools and technologies used to build an application. What we combine in various proportions for a product.Well known in the software engineering environment, are:

MERN( MongoDB, express, react and node)
MEVN(MongoDB, express, vue and node)
MEAN(MongoDB, express, angular and node)
LAMP(Linux, Apache, MySQL, Pear/PHP/Python)
Serverless stack

These may be shifted accordingly to suit what works best.For instance, using PostgreSQL instead or using a frontend framework combined with, say a non-javascript-based language/framework for your backend. As well, there may be more tools depending on how strongly your organization or enterprise works with say, java(spring) or the .NET family. Heck, you might even have containerization (what would you possibly be doing without a container(s) for your apps right now?).

Now, back to the core business; ' I am a full-stack developer'

What this term has been warped to mean, is this:

"I can develop an application in the backend with any tool, or technology out there, create stunning designs and user experiences, implement these interfaces both on the web and mobile, create cross-platform desktop applications, run machine learning models and neural networks, image processing, and artificial intelligence ...."

I'll stop here before I find myself describing the internet.

While a developer might master one or two tools in combination, and this is highly encouraged, the grounded developer has more chances of sustainability. One might master a set of tools associated with say e-commerce development or a particular field. By all means, buckle them up, but have a reason as to why you have them under your belt.

We can learn absolutely anything we want, but we cannot learn everything.

Are you google?

Consider the masters of old, I love classical music, my favourite, Wolfgang Amadeus, composing masterpieces in the classical realm. So full-stack, would assume, afro beats, pop culture, rock, folk, jazz, and so on, just to mention but a few. Spreading over the place like butter. Would he have truly mastered composition in all these genres and still retained the accord?

So yes, you say you are full-stack; but can you design and build web interfaces and pleasant experiences that will make the user shudder with awe and still work with optimal database queries and or Kubernetes while analyzing user interaction with the site and build statistical models around this at once all with standard top-notch level performance in each with deadlines?

So where does data science fit in all of this? Is it a tech stack

Nope. Not at all. Don't mix up the words with what we talked about mix and match. This is a totally different career, separate from web development. To go in line with this so is web development and web design. These are in the same area but refer to different things.

Meet a developer, ask the right questions.

Are you a web developer? What is your tech stack? What do you have experience with?

Often we love defining ourselves as full-stack developers because we assume the other alternative would be a half-stack developer, and no one wants those, because it gives a half-baked rhyme. Use a tech stack, be specific with the tools you are familiar with in your description. In one minute, you'll make a really great API. An incredible masterpiece. Then somewhere along the line, start creating game engines. All learning is good, all learning is great, but it is not as focused.

All learning is good, all learning is great, but it is not as focused.

Photo by Immo Wegmann on Unsplash

When looking at your description, one would wonder,

'What was he/she trying to achieve? Were they just reading stuff?'

Get the gist?

Conclusion

Parting shot dear subscriber, choose your weapon or weapons, choose them wisely, choose them with purpose. So what say you? Do you choose a tech stack (s) or do you want to be a full-stack engineer?

Preparing Django for deployment

Marvin Kweyu — Sun, 21 Feb 2021 14:33:26 GMT

Deployment day, and it's Django. Where to start. We got a server we can SSH into, we got our code and endpoints - if any- ready to be consumed.

Right here, we are going to look into the best practices in terms of developing our backend applications, or SSRs(if we were using Django templating). Through the process, we elaborate on the whys of having a production-ready application as soon as we create our project as opposed to waiting on the completion of the whole development process.

Why do it now?

For one, a good product is never really finished. There is always that one UI change, that one bug or logic discrepancy, or better yet, user feedback prompting a feature addition - our application must be getting some traction. Look at this in this way, would we wait till we got all our user feedback before optimizing our application for production? What about how we serve our media files? Would we still be serving it directly all through? Would we have passwords to our databases hardcoded? What if I add a feature or adjust a UI on my local setup? Would it break the main production app right off the batt?

A pointer we are getting to is this: launch your application and make small incremental changes as you go. There is no 'Tadaaa! It works and will never break or be changed!' moment. If you were simply coding as though it was on your machine all through and waiting for the 'perfect' opportunity, you have to STOP. Stop it. Get your application and pipelines up as soon as you start the project. Be production-ready with 'hello world'.

Enough talk. Let us have something we can work with here. We create a virtual environment, install our dependencies, and have our boilerplate project.

GreenCodes  mkdir launch-ready                                                                                                                                       GreenCodes  cd launch-ready                                                                                                                                          launch-ready  python -m venv .venv                                                                                                                                   launch-ready  source .venv/bin/activate                                                                                                                              (.venv) launch-ready

We've made the directory called launch-ready, our intended application name. We've also created and activated the virtual environment. Go ahead and install Django and create the project while at it.

pip install djangodjango-admin startproject LaunchReady .

Our directory should now look like this:

(.venv) launch-ready  ls                                                                                                                                             LaunchReady  manage.py(.venv) launch-ready  ls LaunchReady                                                                                                                                 asgi.py  __init__.py  settings.py  urls.py  wsgi.py(.venv) launch-ready

Running python manage.py runserver will give the default launch page for Django.

Our application is up and running. To prepare for production, however, we have a couple of things to configure.

Create production and development settings.

The LaunchReady directory has one setting file. Within this directory, we create yet another directory named settings and move our settings file into it, renaming it in the process to base.py.

mkdir settingsmv -i settings.py settings/base.py

Create two additional files within the settings directory and name them dev.py - to contain development settings - and prod.py - to have production settings. Our directory ends up looking as below:

(.venv) settings  ls                                                                                                                                                 base.py  dev.py  prod.py

Locally, on our development machine, we will opt for our local installation of MySQL. Go ahead and create a database and user then head over back for the next step.

As with best practice in terms of version control, never store your passwords in code. Opt at all times, for the environment variables. To manage this, we create a .env file that will never make it past our local machine. Just to be sure, we add it to the .gitignore file.

echo .env >> .gitignore

We then, go ahead and install python-dotenv (to read the variables from the .evn file )and mysqlclient (a wrapper to interact with MySQL from Django).

pip install python-dotenv mysqlclient

Our development settings will look as below.

dev.py

from dotenv import load_dotenvfrom LaunchReady.settings.base import *load_dotenv()# since it's running on my machine, show me the errorsDEBUG = TrueSECRET_KEY = os.getenv("SECRET_KEY")DATABASES = {    "default": {        "ENGINE": "django.db.backends.mysql",        "NAME": os.getenv("DATABASE_NAME"),        "USER": os.getenv("DATABASE_USER"),        "PASSWORD": os.getenv("DATABASE_PASSWORD"),        "HOST": os.getenv("DATABASE_HOST")    }}# show mail messages on the terminalEMAIL_BACKEND = "django.core.mail.backends.console.EmailBackend"# run on every host.ALLOWED_HOSTS = ["*"]

So we have our environment variables being read from within the development settings, but where do we set these items. Simple. We create a .env file next to our dev settings.

(.venv) settings  ls  -a                                                                                                                                               base.py  dev.py  prod.py .env

The file content?

SECRET_KEY = 'django-generated-secret-key'DATABASE_NAME=database_nameDATABASE_USER=database_userDATABASE_PASSWORD=database_passwordDATABASE_HOST=localhost

Note: We can have multiple ways with the database and secret key configuration. It is not linear.

For our production application, we take the example off of Heroku. This works the same as with any other server we might get, with one or two tweaks.

We install dj-database-url - a python package that lets us read and perform actions on the database in the event it was on a separate platform or other from our main application.

pip install dj-database-url

This, we use in our production-ready settings as below:

prod.py

import dj_database_urlfrom LaunchReady.settings.base import *ADMINS = (("Developer name", "Developer email"),)# always set this to false in productionDEBUG = FalseSECRET_KEY = os.environ["SECRET_KEY"]ALLOWED_HOSTS = ["launch-ready-domain.com", "server-ip-address"]DATABASES = {}DATABASES["default"] = dj_database_url.config(conn_max_age=600, ssl_require=True)# ToDo: get an email host providerEMAIL_BACKEND = "django.core.mail.backends.smtp.EmailBackend"EMAIL_HOST = "smtp.gmail.com"EMAIL_HOST_USER = os.environ["EMAIL_HOST_USER"]EMAIL_HOST_PASSWORD = os.environ["EMAIL_PASSWORD"]EMAIL_PORT = 587EMAIL_USE_TLS = True

As our DEBUG is set to False, we, instead of getting the not found error, want to get a notification via mail. This is especially so when one of our views returns an exception. In this case, we set the developer(s) responsible.

ADMINS = (("Developer name", "Developer email"),)

As we have our Linux server up. We create persistent environment variables, this time, read from the bash configuration files, setting them in either, /etc/environment (for system-wide environment configuration), or in ~/.bashrc- for per-user configuration in case we have multiple.

/etc/environment

SECRET_KEY=''DATABASE_URL=''# this is common with heroku and can be set via the console or dashboard interfaceDJANGO_SETTINGS_MODULE=''EMAIL_HOST_USER=''EMAIL_PASSWORD=''

The only major difference from the previous development setting would be how we read our database configuration.

DATABASES = {}DATABASES["default"] = dj_database_url.config(conn_max_age=600, ssl_require=True)

This would be how we configure our application to read from an external database from our application server location.

Let's try this run. Go to your shell and run your development server. Remember, we nested and split our settings. So to run anything while in development, we have to remind Django which settings we want to use.

python manage.py runserver --settings=LaunchReady.settings.dev

Our application default template is back up. For production, we would have it run as below.

python manage.py runserver 0.0.0.0:8000 --settings=LaunchReady.settings.prod

Launch your browser and head to http://launch-ready-domain.com:8000. You should get the very same page as with your development environment.

What next?

Of course, there's more in terms of deployment server settings and configuration, especially in terms of serving static files. This, however, can be left for the next hour. We touch base with this rather lengthy piece as we fuel up and do this further in the next, and yes, the code can be accessed right off TheGreenCodes repository.

Till next time,

TheGreenCodes

Keeping it green.

Marvin Kweyu — Sat, 30 Jan 2021 15:37:40 GMT

Hurrah and a wonderful time to be back from where we left off! What a break it is we've taken. Having taken the time to plug and play - to recharge and reconnect with those around us. Well worth it! I hope the same has applied to you too! To the subscriptions that have been knocking at the doorstep, yes, you can open your mail more frequently now.

Looking back, we've had quite a ride. From the getting started as a junior guide, where we talked about what possible projects we could possibly do to keep the concepts locked in, to the 'not so ugly terminal after all'. The possibilities to the miniature tools could be endless if scaled correctly.

So what about this year here at The GreenCodes?

Well, we intend to up our game a little. Scale it and see just how far we can get. Along the way, sure hope we can build tools, helpful tools, contribute to more opensource projects (pssst ... the ColorDetect package has really come along since we talked about it) and interact more around our mental health and well-being as developers, which I know is a cornerstone for us all. Where would we be without our health?

During this daunting journey, at the core, we hope to grow, not only in our ability to code efficiently but also in our BDD, the behavioural driven development. Relax for once and remember the ugly code review you are about to write was written by a fellow human being. So be kind and help grow. Be strict only where necessary. Oops! I pushed to master without tests! My bad!

From scripts (who doesn't love automation?) to applications, from A to C or K to M, let's have it. Planting a seed of patience and working through the challenges. Our every once in a while code challenge or shared project. Looking at Django or the everlasting javascript, or is it typescript? , or even back to the root of it all mother of algorithms. Oh, hey David! Glad you're on Linux now. Cannot wait for our collaborations this year!

All in all. One thing certainly stands out. We'll have an exciting ride keeping it green.

Non-primitive non-linear data structures: Trees

Marvin Kweyu — Tue, 15 Dec 2020 11:53:09 GMT

When it comes to data storage, using linear structures, be it linked lists and arrays or even stacks and queues, we have accountability over time complexity as the size of our data increases. That is, the larger the data being stored, the more time it takes to extract this data or manipulate it.

'Ohh well great! You've shown me how to store data only to give me the cons afterward?! What on earth!'

Hold on for one moment, will you! That's why this piece is here. The non-primitive non-linear data structures.

As with family lines, from grandpa to great grandpa and yourself, so is the same applied to data structures.

A little theory before we get into the why. We'll start off with the parent; the root. It's from here that everything emanates. Just like a Linux directory structure. The / directory.

Our tree root has two 'children', A and B, defined by engineers as nodes. From here, we see B with its own pair of children, 'C' and 'D' of whom C proceeds to get a single child node 'E'. We might as well call C a parent at this point.

The little lines marking the links between parent and child would be named edges.Following a specific path to the final child, for instance, following through to E would name this bottom descendant a leaf node or external node. It's the last child in this specific path with no children. The same applies to A and D.

Depth of nodes

The number of edges from the root to the node. In our case, E would have a depth of 3.

Height of a node

How far is the smallest child from it in this specific path? B, for instance, will have a height of 2. Given there are two edges before we get to the furthest descendant in its path. Depending on how many children it has, how far is the furthest child of a node? How many nodes does this child have?

Height of Tree

The height of a tree's root node.

Degree of Node

The number of children of a node. Above, C has a degree of one while B has that of two.

A practical train of thought. We have data to be queried from an API REST endpoint. Our system has users, profiles, and articles. Thinking logically, a user profile is a child node of the user's endpoint. We will have to get a user to log in, get the token, and query the profile based on the token. Something like this:

So in your thought process, seeing as it is, software development is more of thinking than writing code, it would make sense to create a user model before a profile. Our User model is the parent, the root.

Compare this to having a linear data structure.

We could go on to break trees into more individual types:

a) General Tree

b) Binary Tree

c) Binary Search Tree

d) AVL Tree

Binary Tree

A tree, as displayed above, in which case parents have at max, two children. Due to this, the children are referred to as the left and right child. Think of how we might have this in decision trees, where we perform an action based on a condition. To elaborate further, we have the below groups.

Full Binary tree

Every node of the tree has two or zero children.

Perfect Binary Tree

The tree internal nodes have two children. All leaves also have the same depth. A perfect uniform.

Balanced Tree

If the height of the left and right subtree at any node differs at most by 1, then the tree is called a balanced tree. So in the case of the perfect binary, if either of the last child nodes had one more child, one and only one, it would become a balanced tree. It's a strange way of calling something balanced, I know!

General Tree

Consider a binary tree. Any type of binary tree, but without limitations to what number of children a node can have. Do you want 4? Do you want 17? Go ahead and fill your database or whatever it is your structuring.

Binary Search Tree

A different way of looking at Binary search trees if you may. Here, the left nodes are always smaller than their parents in terms of the value they possess. The right nodes, however, are a little cocky, always having more value than their parents or being equal.

'Well, that's not right. I've never seen this man in my life!'

Used maps/dictionaries before? Yeap. binary search trees you deserter!

Searching for a specific item in an array of 100 or 200 might not be a problem. Increase this to 10 000 or 1 000 000 items, however, no one wants to be the user.

AVL Tree

Named after Adelson, Velsky, and Landis, this is a self-balancing binary tree in which each node has extra information - the balance factor- whose value is either -1, 0, or 1.

You remember [heights](####Height of a node), right? Well, the balance factor comes in as below:

BalanceFactor = height_of_left_subtree - height_of_right_subtree

Whenever a tree is out of balance, AVL trees work around this by rotating nodes. Whether moving the right node to the left or the right, or left-right rotation or perhaps right-left rotation.

What this means is that at one point, the right node which seemed heavy/ larger to the left will become the parent and hence the parent becomes the left child while the left child becomes the new right child and so on. think of it in terms of the hour clock. It will rotate itself in either direction until the tree is balanced. More detail on this can be elaborated on in a later conversation. Keep an eye up!

Conclusion

We've taken a bucket load of information. Trees and searching, children, and siblings. What is all this knowledge without implementation? Fire up that laptop one last time this year and think of how you could have consumed that endpoint differently. Take a sabbatical, if you may, and have the rest you need, then come back and break the time complexity barrier. Write more efficient code, more productive code, and let's make the user experience smoother.

Here,

TheGreenCodes

Data Structures: Stacking and Queueing

Marvin Kweyu — Sun, 29 Nov 2020 16:25:49 GMT

Ohh...I know. Trust me I know. All this talk about non-primitive data structures without stacks and queues? Impossible. How would we ever?

We've already covered lists and gotten linked lists under our sleeves. Still unfamiliar? Go ahead with this primer. Handling these two at once pairs beautifully to remove some of the confusion that lingers. For this piece, we'll detail the next set.

Stacks

Consider, for a moment, a literal haystack. Our horses are hungry, and food needs to be stored. We place these square bits of nicely packaged treats, one on top of the other - a stack. Getting them out for delivery, we'll take the very first we get from the whole pile. Just like that.

Now for the relevance of this in engineering. Items in a stack are of fixed size, that is, their sizes are known beforehand rather than the compiler trying to figure this out and finding the next available space. Since we have one point of entry and exit, this saves time, making their use fast.

Hence, local function variables are stored in a stack and popped off the stack when the function is done with its task. The differences between this and a heap are just discussed before. Like a pile of clothes, well push new items to it and like any cap, pop the first item off.

Note:

A stack is dynamic. It grows or shrinks in size. Items being placed in a stack, however, already have a known size. We do not intend to have our compiler second guess.

Queues

As opposed to stacks, here the first item in will be the first one out. (FIFO). Think of it as 'first come first served'. Adding an item to the eternal queue is, in its basic form enqueuing while getting an item off the queue would be dequeuing.

So my computer is running a lot of processes right now. We debunked the myth of doing many things all at once in a word on multi-threading, where we showed how the CPU will do context switching to give the feel of running multiple things at once. All these actions, need to be tracked. they would be placed ina queue - where the first to request resources gets access.

A little systems programming here.You've got your program that does 123 things. It spawns one or two processes from within itself. Remember that your program is also a process when you run it! Now one of the child processes within it says :

'Hey! A software event occurred. You messed up with the type of parameter you gave!'

(Or something like that).

Based on how many interrupts your program gets; hardware, terminal special character(Ctrl + C), and so on, the system has to place them in a queue and show the messages appropriately based on what came first.

We can, if we wanted to, see what the first item is with peek(), check if we are done with items in the queue, hence an empty queue with isEmpty or check if we are swamped and probably not going out soon since we have a full queue with isFull.

A simple queue implementation such as defining an array(which are items of fixed size), would serve as an example.

As you might have noticed. We have re-used the array data structure to show how, all these structures, are used side by side to form more complex structures. Layers upon layers with different implementations.Building blocks.

Let's do more of this in the next session. Going to the non-linear, where everything seems to be falling off trees with weird graphs.

Data structures: The linear non-primitives

Marvin Kweyu — Sun, 22 Nov 2020 16:08:11 GMT

In our first set of data structures, we get into the definition and scope of non-primitive structures. Have a look at the previous read on The Power of Data structures in case you feel a little lost. Right off the batt, we define what it means to be a non-primitive set, and how this can be further broken down.

Let's get right into it. A non-primitive structure is what happens when you combine two or more primitives. What we mean is simple. A char and int, for instance, are primitives. The simplest representations of data.

Broadly speaking, linear non-primitives can be grouped further as below:

Linear non-primitives

Stacks
Queues
LInked lists
Arrays

Here, as stated before, data is sequential - one after another. Some of these structures might ring a bell, stackify, is that you?

Arrays

These are what we call stores of homogenous (meaning similar) data. So one might have an array like this:

my_array = [1,2,3,4,5,6,7,8,9] ;vowels = [' a', 'e', 'i' , 'o', 'u' ] ;

Note how my_array has only integers while vowels has char. In both instances, our array has a collection of primitives - one type of primitive per array.

Arrays are of a fixed size. Once declared, the computer knows the amount of memory reserved for items to be stored there. You might ask:

"But what if the type of data I have changes in size greater than the array?"

Well, you might need to consider a different type of structure to store this data. Use arrays, for instance, to store months of the year. We will not be changing this count anytime.

Above declared, the sizes of the arrays are inferred. Meaning the types, since not defined per se, are 'figured out' based on the values and how we use them.

We might as well have this:

For python.

import arraycount_down = array.array('i', [5,4,3,2,1])# where *i* stands for int (an array of integers).

Types may go on as:

c for charf for floatd for doubleu for unicode and so forth.

Have a look at the documentation for more insight into this. AS well, take note that the array module has different functions to enable array manipulation.

Rust:

let count_down: [i32; 5] = [5,4,3,2,1];// where *i32* is signed integers, and *5* in *[i32; 5]* is the size of the array. i.e have five integers

Important pointer: Lists and arrays are not the same!

Operations on an array will be based on the index of the array element, with indexing starting from 0. Hence, the first item, in either case, would be:

count_down[0]

The result from the above is 5 (the semi-colon has been left out for brevity but should be used based on the language you are using).

Let's take a look at another structure that looks similar, but is not the same - Linked lists.

Linked lists

Unlike arrays, linked lists store data in a non-contiguous form. This implies one piece of data is not placed side by side to the other as: [1,2,3,4,5], rather it is stored in form of nodes with pointers to the next as so:

[data1, pointer_to_data2] ...  [data1, pointer_to_data3] ... and so on

The last item in our linked list, since there is no pointer to the next, would have null.

As well, linked lists are dynamic- their size is not fixed at initialization.

Linked lists will be further broken to:

a) Singular linked lists (used in the elaboration of linked lists above)

b) Circular linked lists

c) Doubly linked lists

Circular linked lists

These have three items in a node: the previous data, the data, and finally, the next data in the sequence.

[null, data1, pointer_to_data2] ...  [pointer_to_data_1 ,data1, pointer_to_data3]

The first item (the head), has a null as the previous pointer while the last node has a null on the next pointer.

Doubly linked lists

Similar to circular linked lists, these have two pointers as well. The only difference would be the last node, instead of a null, has a pointer back to the first item in the list. 1 + 1 = circular.

In most of what we implement through code, we have implementations of linked lists. You will hardly get yourself doing this manually, but understand that some of the things we call 'mutable arrays ' are actually implementations or wrappers around more complex structures.

Example:

If navigating through my directory, my computer needs to know where I am, where I'm from, and possibly have the correct link/structure to the nested directory I'm navigating to. So to the top-level directory, I cannot go any higher, no previous node. Likewise to the last directory or file, there is no 'next' option.

PSST: Ring ring ... we can do some File Management too.

Till now, we have come to understand the importance of data structures, the groups they have been placed into, and we have an idea of what some of the linear data structures involved. We could go on and on about the details of each, but that is a tale for another day. A breather for us both at this point. Go read something non-algorithm-like for a moment. We'll still meet here, same time, the same drive.

The Power of Data Structures

Marvin Kweyu — Fri, 13 Nov 2020 10:47:29 GMT

A trip down memory lane avid reader. Let's take a walk through the core of it all: data structures. What are they and why are they so important? A 'hello' to a reader that might have missed our talk on Memory management, where we delved into what happens to our code in variable assignment. Do take a look, even if it's a refresher you're looking for.

Alright then. What is a data structure?

It's a way of organizing data. Say we have a couple of friends coming over for our fortnightly game night( ah...the things on our to-do list). We need snacks, a number of people expected to be there, a list of activities to partake. I can go on and on. So how do we organize this in say a program we want to implement to fix the hustle of planning? How do we store the things to be done or the items to be purchased? It's all data in the end. We need structure. We need data structures. The how of getting our data and manipulating it from these structures are our algorithms.

Right back to our game, who will do what?

How do we decide what comes before what in terms of structuring this data? Does our simple program make it easy to retrieve this information? Is it fast enough? What about efficiency?

Whatever kind of application we are talking about, we need a data structure(s) that will allow traversing, insertion, searching, deletion( in case the last casserole was outright nasty), sorting, or merging of bits of data. Call it CRUD with extra steps.

As from the Memory management article, we got data, then collections of data. It's all data. While memory allocation might not be something we think of with javascript of even Python, it holds water when doing C or C++. So keep this in mind.

Broadly speaking, we group data structures, much like data types in itself, as:

a) Primitive

b) Non-primitive

Primitive structures would hold primitive data types; that is the chars, the int the float. Basically, data types that hold a single value.

Non-primitive structures, on the other hand, break down to linear (which holds data in a sequence) and of course non-linear. Hold your breath in case you lost it a moment there. We'll go into the details soon enough.

As of now, you have a general idea, a scope if you may, of what structures involve.They are quite a number.

Over the course of the next couple of weeks, we'll be covering some of these structures. Unraveling them to bits and pieces. Finding what makes them tick as so and getting why many a time, developers new and old alike, have challenges addressing them. Stick a while, I cannot wait for our next piece in this!

A Developer's mental day

Marvin Kweyu — Sun, 08 Nov 2020 17:53:30 GMT

A moment of reflection dear dev, dear engineer, and dear LANister. A brief reminder that yes, we are here, and yes we are alive. We made it to this day! What a joy! Ohh, hey Kelvin! Let's talk about our mental health today. A little sanity in all the chaos of whatever bugs, errors, and painful hours we might have had to this moment.

Often than not, the life of a developer revolves around getting up, throwing laundry on the bed, coding, letting the laundry hug the seat once more, and sleeping. We've been here once or twice.

Do this often, and we know what happens. It's exhausting, draining, and demotivating.

In as much as we strive to perfect the skill, or do one more ' if fail print success! ', the natural order of things is to let things settle in.

To allow your body to absorb that new way of doing what you've just got a Eureka for. So code, perfect and learn, but get out as well.

The trick? Learn one thing then move to the next. One step at a time. One side project (to completion damn it! ) at a time, one breaking change at a time and one small commit at a time. Beginner or advanced, give it time but make it quality time.

Shred some or leave some, may the force be with you.

So if I am taking on three projects for a weekend, does this work? Or does this end up with multiple shitty lines of code on the last one and most probably a burnoutGui early Monday morning when I am actually supposed to get something done? What works for you? Will this motivate you or leave you dreading code for a couple of days?

At the end of this road - up for interpretation-, will you say,

'Ahhh... I remember the beautiful algorithm back when I was xx years' (that you might have thrown away and moved on to the next unfinished side project)

or will you say

' I made this one project that was quite challenging. I met this random person that helped me while out on a trip that weekend'.

One thing at a time! Got a bug? Get out of the chair and take a walk, have a breather instead of glaring into the bottomless pit of red - error lines. Builds fail all the time. You looking at the error message printing 'failed to build' for three hours will not make it magically disappear. It might, however, go away while taking an impromptu coffee break with a friend or two - and you'll go 'ahh... why didn't I think of it?! It was right there!'. We know it was right there. It was taking coffee with you. A thought just came by when your mind was not applying pressure to one part of the brain.

Avoid, if you can, long eternal coding sessions followed by an even longer break - can take days, weeks or even months - , of body unrest resulting from pushing yourself too hard during which you forget 90% of whatever it is you worked so hard on. Prioritize, consistency over intensity. No bug will go solved if the person looking for it is not okay, either mentally or physically.

So let's do more. And what oh what do I mean? I mean more living, more workouts, more balance, getting more outside of our constrained world of pixels, of conversations ordained by algorithms, destined to make us meet people thinking exactly like us, ignoring the opposites. We do have two brain hemispheres by the way - one head(think about that). So dear dev, go meet a friend and talk about something other than code.

Yours,

Marvin

PS:

By the way, Kelvin, are we going out this weekend or what. I need a fresh brew of brain cells. Ziplining will work just fine.

ColorDetection - Python ColorDetect package

Marvin Kweyu — Mon, 28 Sep 2020 06:28:38 GMT

To you dear reader. As ColorDetect is always evolving, some of the sections you encounter here might not be up to date with the current version. This is not to mean that this ceases to be a resource. It is still invaluable! So go through it and update your information with the official docs

Images. That's it. Images. As a point of practicality, take a fashion designer (as a forum member vividly described to me at one point). You are given an image or have an image at your disposal that simply tickles your curiosity and want to incorporate it in one of your new lines. Let's swerve a little into the genetics section. Given a petri dish image for instance, with pigmented bacteria or similar organisms, and you would like to find the abundance of that organism or organisms in this specific image. Get the gist?

That's why we have ColorDetect. Letting you knab those colors right from the image, or if you were a tad bit crazy, a video. For an overview, we'll kick it off by installation.

As all Pythonistas have it, create a virtual environment, and install.

pip install ColorDetect

For this demonstration, we'll borrow Greyson Joralemon's photo.

A program that gets the colors in this specific image.

from colordetect import ColorDetectuser_image = ColorDetect('media/random_balls.jpg')colors = user_image.get_color_count()print(colors)

ColorDetect will go, do its thing and return this:

$ python get_colors.py{'[59.0, 70.0, 198.0]': 7.63, '[245.0, 155.0, 186.0]': 9.0, '[232.0, 22.0, 103.0]': 11.98, '[207.0, 143.0, 3.0]': 35.54, '[88.0, 70.0, 34.0]': 35.85}

A color description of the image, breaking down the relevant most abundant colors to say, hey, this image has 7.63 % of it occupied by this RGB color: '[59.0, 70.0, 198.0]'.

If we wanted the hex for this instead, we'd pass that as the parameter to get_color_count().

colors = user_image.get_color_count(color_format='hex')#colors# {'#3b46c6': 7.63, '#f59bba': 9.0, '#e81667': 11.99, '#cf8f03': 35.56, '#584622': 35.83}

We could, of course, look for more color than just five of the most dominant.

user_image.get_color_count(color_format='hex', color_count=8)

Depending on the size or ratio of the image, it may take a moment. A case in point, a low graphic image, say 2MP vs a high definition top of the line camera image. These two images, despite pointing to the same object or scene, have very different pixel ratios.

Let's go ahead, and instead of getting the color as a variable, write this color onto the image.

user_image.write_color_count()# save the image after writing the color count to ituser_image.save_image('media', 'processed.jpg')

We save this image in the media directory with a name processed.jpg:

Perfecto!

we did have something about the crazy people with videos, didn't we?Now, where is that video...oh, here it is. Our earth.mp4 file.

from colordetect import VideoColormy_video = VideoColor('media/earth.mp4')video_colors = my_video.get_video_frames( progress=True)print(f"{video_colors}")

{'[137.0, 165.0, 182.0]': 0.92, '[71.0, 84.0, 95.0]': 2.16, '[24.0, 30.0, 50.0]': 11.17, '[7.0, 10.0, 26.0]': 17.59, '[0.0, 0.0, 0.0]': 68.83, '[143.0, 170.0, 187.0]': 0.84, '[76.0, 89.0, 101.0]': 2.09, '[26.0, 32.0, 52.0]': 11.18, '[8.0, 11.0, 28.0]': 16.69, '[135.0, 163.0, 181.0]': 0.95, '[76.0, 88.0, 98.0]': 2.05, '[8.0, 11.0, 27.0]': 15.43, '[127.0, 160.0, 179.0]': 0.94, '[71.0, 83.0, 94.0]': 2.38, '[7.0, 11.0, 27.0]': 15.72, '[124.0, 160.0, 181.0]': 0.9, '[69.0, 84.0, 96.0]': 2.26, '[26.0, 32.0, 53.0]': 13.12, '[125.0, 160.0, 182.0]': 0.89, '[68.0, 82.0, 95.0]': 2.27, '[132.0, 166.0, 186.0]': 0.79, '[71.0, 87.0, 100.0]': 2.1, '[25.0, 32.0, 52.0]': 14.18, '[132.0, 164.0, 183.0]': 0.89, '[70.0, 85.0, 97.0]': 2.08, '[132.0, 165.0, 183.0]': 0.9, '[73.0, 88.0, 99.0]': 2.06, '[26.0, 33.0, 53.0]': 12.11, '[8.0, 10.0, 27.0]': 16.76, '[134.0, 166.0, 184.0]': 0.88, '[132.0, 165.0, 185.0]': 0.86, '[74.0, 88.0, 100.0]': 2.0, '[26.0, 33.0, 52.0]': 10.65, '[7.0, 10.0, 27.0]': 16.93, '[124.0, 157.0, 178.0]': 0.99, '[68.0, 82.0, 93.0]': 2.14, '[25.0, 31.0, 50.0]': 10.66, '[124.0, 160.0, 182.0]': 0.88, '[67.0, 82.0, 94.0]': 2.19, '[25.0, 31.0, 49.0]': 10.68, '[124.0, 160.0, 183.0]': 0.85, '[67.0, 83.0, 95.0]': 2.0, '[25.0, 30.0, 49.0]': 11.04, '[123.0, 160.0, 182.0]': 0.87, '[24.0, 29.0, 47.0]': 11.15, '[23.0, 29.0, 47.0]': 10.6, '[6.0, 9.0, 26.0]': 19.34, '[67.0, 83.0, 97.0]': 2.0, '[24.0, 29.0, 48.0]': 9.31, '[125.0, 161.0, 184.0]': 0.85, '[67.0, 84.0, 97.0]': 1.98, '[127.0, 162.0, 183.0]': 0.87, '[67.0, 83.0, 96.0]': 1.96, '[23.0, 29.0, 46.0]': 8.58, '[5.0, 8.0, 25.0]': 17.77, '[125.0, 161.0, 183.0]': 0.88, '[68.0, 84.0, 98.0]': 1.9, '[24.0, 29.0, 46.0]': 6.95, '[67.0, 84.0, 99.0]': 1.89, '[133.0, 166.0, 186.0]': 0.81, '[67.0, 86.0, 99.0]': 1.85, '[23.0, 28.0, 45.0]': 6.83, '[5.0, 8.0, 24.0]': 22.22, '[135.0, 165.0, 186.0]': 0.85, '[69.0, 86.0, 100.0]': 1.79, '[22.0, 27.0, 43.0]': 7.22, '[5.0, 7.0, 24.0]': 22.48, '[73.0, 91.0, 105.0]': 1.69, '[129.0, 163.0, 185.0]': 0.85, '[69.0, 85.0, 98.0]': 1.9, '[21.0, 27.0, 44.0]': 7.25, '[4.0, 7.0, 24.0]': 21.7, '[68.0, 86.0, 101.0]': 1.9, '[22.0, 27.0, 45.0]': 7.91, '[126.0, 160.0, 181.0]': 0.94, '[66.0, 83.0, 96.0]': 1.91, '[22.0, 27.0, 46.0]': 9.19, '[129.0, 164.0, 185.0]': 0.84, '[69.0, 86.0, 99.0]': 1.96, '[21.0, 27.0, 46.0]': 10.65, '[133.0, 165.0, 185.0]': 0.85, '[23.0, 29.0, 48.0]': 10.61, '[7.0, 9.0, 26.0]': 17.7, '[135.0, 165.0, 185.0]': 0.85, '[73.0, 88.0, 100.0]': 1.96, '[24.0, 29.0, 50.0]': 11.34, '[139.0, 164.0, 177.0]': 0.92}

We may find the colors are too much for our use case. So let's shorten this:

print(my_video.color_sort(color_count=6))

{'[0.0, 0.0, 0.0]': 68.83, '[5.0, 7.0, 24.0]': 22.48, '[5.0, 8.0, 24.0]': 22.22, '[4.0, 7.0, 24.0]': 21.7, '[6.0, 9.0, 26.0]': 19.34, '[5.0, 8.0, 25.0]': 17.77}

This will return the top 6 most dominant colors from the whole video, having taken a frame for every second. Looks much better! Unless you want to use all the colors, that is.

I'll iterate on this. This all depends on the quality of the input media file, and length if it is a video in this case. take it this way, a video 5 minutes long, showing a wide variety of colors from all sorts of crayons vs a short video as just shown. Remember, the process is per frame of every second. I'm certain this will be addressed in a future release.

We can hold off here and let the steam cool off.Do keep up to date with the package as more features and performance improvements come to light.

Yours,

TheGreencodes

Memory management - Deep and Shallow Copying

Marvin Kweyu — Sat, 22 Aug 2020 16:58:00 GMT

Let's go back one moment. A little further down to our data structures. The dear heaps and stacks of them.

What happens when I assign a variable?

What about when I pass it as a parameter?

How does the program know this is what I'm talking about?

Before we go to a higher level of abstraction, we should look at how the machine 'thinks' about it.

Assume you have declared a variable:

my_variable = "random";

Every time you do this, you add to the list of things the program has to remember - one on top of the other - a stack.

As it stands, your program will store your string, "random", as below:

The first diagram with the pointer (location of the variable in memory), len(for the length of the string) and capacity(the amount of memory, in bytes, the machine gives to this variable for use) is a stack and the other is a heap.

Whatever we do with this variable, be it a mutation, concatenation, making a substring or whatever there is, refers to the stack. Here, the pointer holds the location of where the actual data is stored. So we are just given a reference to it.

Why do we do this? Why do we store it in different data structures?

It's about the speed.

Stacks are faster compared to heaps. So instead of moving around a whole chunk of data(the heap) while mutating it, just carry the reference to it. I mean, the program is already doing its tasks (whether heavy or not), so there is no need to add overhead here.

However, not all data is assigned as such. Static data types, that is, boolean, integers, floats, and chars, variables are added directly onto the stack. So we would have no heap to store a simple 456.98 because the program already knows the sizes of these types except in the rare case it is user input.

The size of these types, more so numbers(integers and floats), are determined based on whether they can be negative (signed) or exclusively positive(unsigned). This should remind you of how you declare your variables in math. You would say that any number in your paper is positive unless stated otherwise, or as we call it here, unless signed.

So this assignment would work with compound data types - the result of combining two or more static types.

Example:

string (a combination of chars)
arrays
tuples ... and so forth, depending on how your language of choice calls it, for instance, dictionary vs. javascript object.

Back to copying.

You want two variables to refer to the same thing and you want to edit one of the variables without affecting the other.

You might assume that all you had to do was a simple re-assignment.

my_variable = [0,1,2,3,4,5,6,7,8,9]my_other_variable = my_variable

A declaration like the above will lead to two variables showing the same result, an array from 0 to 9. The caveat? They will both reference the same heap.

So what happens if I mutate one variable?

my_other_variable.append(45)print(f"My second variable: {my_other_variable}")print(f"My variable: {my_variable}")

In both cases, the output is a list:

[0,1,2,3,4,5,6,7,8,9,17]

Strange. Huh?

What if we wanted to mutate each of these variables independently? For example, have `my_variable` change to [0,1,2,3,4,5,6,7,8,9,17] and my_other_variable to [0,1,2,3,4,5,6,7,8,9, 45, 129]?

To get completely two different items with the same data, in that both can be mutated independently, you have to take a different approach; deep copy.

A warning

As far as memory is concerned, deep copying is memory consuming as it has to get the pointer and follow it to where the data is stored then duplicate this heap.

Depending on what language you are using, we have the inbuilt copy module in python, javascript and or copy for lower-level languages and so on and so forth (We cannot simply list all the ways to deep copy across the multiverse)

import copymy_variable = [x for x in range(10)]my_other_variable = copy.deepcopy(my_variable)

Love JavaScript much?

let my_variable = [0,1,2,3,4,5,6,7,8,9]let my_second_variable = `${my_variable}`my_second_variable = my_second_variable.push(100)console.log('My first variable',my_variable)console.log('My second variable',my_second_variable)

There are, of course, other multiple ways of doing this. It is, after all, javascript. A point to mark, especially with objects, [lodash](https://lodash.com/], dearest ramda or rfdc work perfectly. Custom method for your implementation? Go ahead, just not JSON.stringify().

The mad rustacean?

let my_variable = String::from("random");let my_other_variable = my_variable.clone();

Having done this, you can manipulate your new variables in any way you want. Go to the moon if need be. Just need a couple of dollars more.

It is this same principle that governs the passing of variables across functions and objects. Passing a pointer to the original data and not the whole heap. Comprende? I sure hope so. So go forth and choose wisely.

Let's leave this piece at that, and chat in the comments if need be.

And yes, we can chat tech on Twitter too. marvinus_j

A word on multi-threading.

Marvin Kweyu — Mon, 20 Jul 2020 10:16:19 GMT

A post or so ago, as a follow up to A Dive into Python Implementation, we had a discussion on the differences between multi-threading and multi-processing, this being a result of IronPython's feature to allow multi-threaded code to use multi-core processors. So in this article, I thought we'd open up the boxes for a while just to see what these two mean. Sit back, take a coffee, and lets fire up those neurons.

Background Check

Now, for an understanding of the concepts, we should differentiate between a process and a thread.

Process

Say I write a program: sample_prog.py. Once I ran this and let it do what it does best, I have created an instance of the program. In definition, this is a process. To break this even further, if my program sample_prog.py was, say, taking multiple files, editing the title of these files, and moving them in a different directory once processed, I would have broken the process down to threads. In this case, I have two threads:

Read the file and change the title within it.
Copy this file into a different directory of processed files.

Don't get it wrong, this is the same program instance(also called process) doing different things.

If we were to come back to something closer to home, you have opened a tab on TheGreenCodes right now, while at the same time, I assume, have another tab open locked into your favourite site. In this case, you have one program running, your Firefox browser as a program (Of course, it's not Internet explorer open, is it?); this is the instance, that is, a process. With this process, you have two tabs, in escence, two threads of the same process.

So what is it with threads?

Now threads;

You have decided to download a file from the page while you continue reading the rest of the installation instructions. You have two threads running here, based on the same browser instance. Do they execute at the same time? Not at all. At any given time, a single thread is running. By now you may be wondering what this means as you can clearly see the download progress bar. What is actually happening is context switching. They do not happen at the same time, but rather keep switching based on time allocated per thread or the thread being over(download in this case) altogether. To you, it will look like it is happening all at once.

Back to cores for a moment. Your underlying hardware, unless from a time-traveling adventure, definitely has more than one core. Take a dual-core machine, for instance, multithreading will now allow you to take single threads through dedicated cores. That's why, hardware with more cores seems faster, simply because it is. It is taking advantage of the multiple cores to run each thread differently. It's a win-win, no core is burdened as they share the load while you get a faster experience.

Photo by Roman Kraft

Now, to see it in action, let's do something simple to illustrate just how threading works:

import threadingdef my_amazing_function(user_input):    print(f"Duplicating user input for process 1: ")    print(user_input * 10)    print("Do many other things for my_first_process: ...")def my_other_function(user_input):    new_output = user_input + "*"    print(new_output * 10)    print("Do many other things for my_second_process:...")if __name__ == "__main__":    my_input = "7 "   # create threads for each function    thread1 = threading.Thread(target=my_amazing_function, args=(my_input,))    thread2 = threading.Thread(target=my_other_function, args=(my_input,))    #  start both processes in parallel.Go forth and do many things    thread1.start()    thread2.start()    # report and tell me (the parent process) the results    thread1.join()    thread2.join()

Consider, in this instance, we have a program that has two methods. Fell free to add more and adjust accordingly. These two methods each take a parameter. Leaving the function definitions aside, let's take a look at the thread.

 thread1 = threading.Thread(target=my_amazing_function, args=(my_input,)) thread2 = threading.Thread(target=my_other_function, args=(my_input,))

For thread1, we said 'create a thread that addresses the function my_amazing_function'. Now from our definition of the my_amazing_function, it should take and argument, and hence the extra param: args=(my_input,). This same principle applies to thread2. Place a class or function, the power is in your hands.

We then set off to start each of these threads. You can relate this to your OOP principles of creating an object, then calling a method of the object.Once these things are done, they need to be linked back to the parent process(our program instance) so that they are not left hanging in the system. Simple as that. We created two threads from one process.

The result of this, in most programs, is simple:

Shorter execution time.
Application responsiveness.

Take, for instance, not having to wait for your browser to complete the download before scrolling down through the page; responsiveness. This is the fundamental of threading.For safety reasons, we cannot have threads access the same point of memory(RAM). Hence we have GIL. Because the interpreter will work with one thread at a time, without proper management we might have a scenario where, during context switching, one thread needs a resource that the other has in its possession; a deadlock.

If both threads will use a particular resource, they are locked to limit the amount of that resource they can access. This means that no one particular thread will take 100%(lock) everything of the resource, leaving the others starved during the context switch.Another interpreter using the same principle would be PHP and Perl. The concept behind this is memory management or as another might call it memory safety. In particular, garbage collection which can be further termed as, in this case, reference counting.

How do other languages handle this?

For one, we have our very own C and C++ which require manual memory management. These trust the developer with the power to allocate and free memory themselves, hence the methods: malloc, realloc, calloc, and free. Do not get started with the pointers right now, you'll need another brew of coffee. Among others, we also have automatic reference counting or ARC, for Swift and Objective C.

Conclusion

You can have as many threads as you want! It just gets a little tricky handling all of them at once if you are not careful about what exactly is started, what is stopped, what is working as a daemon, and so on. Get to know what program using threads might be an advantage and use these accordingly.

Yours,

TheGreenCodes.

Python Command-line arguments: Part 4

Marvin Kweyu — Wed, 10 Jun 2020 13:16:24 GMT

Before we catapult command-line arguments to infinity and beyond, we should go via one more item as suggested by an avid reader, Paddy. The docopt. In case you've just popped in, here's what we have done to this moment:

So far so good, but here's the pick-up line; a bonus section.

docopt has a lot of similarities to the argparse. Like its counterpart, it comes with helper functionality, that way it's easier to know if you did something wrong in your use or if your program is being used by a new user for the first time somewhere.

The difference, however, comes with how docopt parses. It relies on the docstring you give it as a description.

The basic idea is that a good help message has all the necessary information in it to make a parser.

In my walks through the internet, I happened to come across this one comment, and I quote:

I feel you, man. It's so simple and unorthodox it's confusing.

And I had to crack or two reading it. I laughed so hard, partially because it is true, such that you might need a background of command-line argument parsing prior to using it.

This said, docopt is an external module, meaning it has to be installed. Hence, in a virtual environment:

pip install docopt

In this particular instance, we are going to make a calculator. Nothing fancy, just something to advance from the previous programs we wrote so that you can see the subtle differences.

Our program will do the below,

Addition
Multiplication
Squares
Root

So go ahead and create a file: my_calculator.py, and paste the below in. We will go through this step at a time.

"""My Advanced Calculator v1.0Usage:  my_calculator.py add ()...  my_calculator.py mult ()...  my_calculator.py square [--verbose]   my_calculator.py root   my_calculator.py (-h | --help)  my_calculator.py --versionExamples:  my_calculator.py add 9 4 67 101  my_calculator.py mult 88 43 20458 1 134   my_calculator.py square --verbose 9 Options:  -h --help        Show this screen.  -v --version     Show version.     --verbose         Show details verbosely."""import mathfrom docopt import docoptclass MyCalculator:    def get_options(self):        self.args = docopt(__doc__)# loop via commands getting what is needed        if self.args["add"]:            self.addition()        elif self.args["mult"]:            self.multiply()        elif self.args["square"]:            self.get_square()        else:            self.get_root()    def addition(self):        """        Get the summ of all numbers passed        """        summation = sum([int(number) for number in self.args[""]])        print(f"{summation}")    def multiply(self):        """Get the product of the list of numbers"""        product = math.prod([int(number) for number in self.args[""]])        print(f"{product}")    def get_square(self):        number = int(self.args[""])        if self.args["--verbose"]:            print(f"{number} * {number} = {number*number}")        else:            print(f"{number*number}")    def get_root(self):        """      Get the square root      """        number = self.args[""]        print(f"{math.sqrt(int(number))}")if __name__ == "__main__":    arguments = docopt(__doc__, version="MyCalculator 1.0")    calculator = MyCalculator()    calculator.get_options()

Now, docopt accepts the below :

docopt(doc, argv=None, help=True, version=None, options_first=False)

The first argument, being doc, which essentially, is this, the program description.

"""My Advanced Calculator v1.0Usage:  my_calculator.py add ()...  my_calculator.py mult ()...  my_calculator.py square [--verbose]   my_calculator.py root   my_calculator.py (-h | --help)  my_calculator.py --versionExamples:  my_calculator.py add 9 4 67 101  my_calculator.py mult 88 43 20458 1 134   my_calculator.py square --verbose 9 Options:  -h --help        Show this screen.  -v --version     Show version.     --verbose         Show details verbosely."""

It is here that we define the logic of our parser, following the rule that a good help message has all the necessary information in it to make a parser.

In the docstring, we give a Usage description, followed by various examples, and the Options available in our program for the values passed.Breaking it down a bit further,

  my_calculator.py add ()...

Do take note, add, mult, square, and root are commands and not arguments nor are they options.They tell the user what is going to happen to the arguments parsed if any.

For our addition, we specify that if a user wants to add, they should pass two or more values to the program. To tell the program to create such a parser, we use (). Wrapping in () is just a specification that lets the program know that it should accept two numbers. The ... after tells the program, Hey, create the parser accepting two arguments, but remember, there may be two or more! , hence the ellipses. The same goes for multiplication, which we have shortened as mult.

For the squares and roots, however, our program accepts only one number. By default, arguments passed to docopt are required, therefore, we can safely ignore the ( ). If we wanted the user to give the argument as an option (i.e, not mandatory), we would wrap our argument(s) in [ ].Something you may have seen as well is this:

  my_calculator.py square [--verbose]

This particular description, along with having a command square, has the option --verbose in square brackets. The great part about this is that it does not matter where you place this as you get the square of your number. So running my_calculator.py square --verbose 9 would give the same exact value as running my_calculator.py square 9 --verbose. You can tell the difference from how the other options are used (they reused independent of arguments):

my_calculator.py (-h | --help)my_calculator.py --version

A quick run of our program, without positional arguments, would give something like this:

 python my_calculator.py

Output:

Usage:  my_calculator.py add ()...  my_calculator.py mult ()...  my_calculator.py square [--verbose]   my_calculator.py root   my_calculator.py (-h | --help)  my_calculator.py --version

Out of the box, we would get the usage detail, from where we would be able to get the commands and options available. Running with the -h option for help would give the whole docstring.Back to the main call:

    arguments = docopt(__doc__, version="MyCalculator 1.0") # give the program version number/name    calculator = MyCalculator()    calculator.get_options()

We create an instance of MyCalculator and call its get_options. from where we can get all its arguments and options. So we can call python my_calculator.py add 1 23 4556

# get the arguments and options parsedself.args = docopt(__doc__)self.args == {'--help': False, '--verbose': False, '--version': False, '': None, '': ['1', '23', '4556'], # can accept two or more numbers because of the ellipse 'add': True, 'mult': False, 'root': False, 'square': False}

You can see that this is a dictionary or as web developers may call it, a JSON object. The list of numbers, contains the numbers as expected, but as strings, So to sum them, we would need to convert each into integers before.

# convert each item in the list to an integer and sum hte resultant listsum([int(number) for number in self.args[""]])

So go ahead and run python my_calculator.py add 1 23 4556, see the results for yourself.

As we are not getting into the details of list comprehension in this piece, I will leave the calculations for you to ponder upon. We are getting back to our package:

docopt(doc, argv=None, help=True, version=None, options_first=False)

We already gave it the doc option,(document description). For argv, docopt will take everything passed to it excluding the program name, that is, sys.argv[1:]. Do take a look at argv if you need some clarification with this.The help option is set to True by default for the sake of the help message when needed while the options_first is disabled and hence we can position our options either before or after or between our arguments while parsing.

It is not just using commands that does all the talking. We may well just use options and parameters. For example:

"""Usage: arguments_example.py [-vrh] [FILE] ..."""

An example program taking in multiple optional options with a FILE that is also optionally passed. You can default it to a file in the local directory, much like our file organizer in the File management series

We can combine this as far as we want (optional arguments inside optional, required arguments inside optional or even mutually exclusive).

# have an optional command taking optional argumentsmy_program [command --option ]# arguments are optional, but if  is passed,  should be present.my_program [( )]

On and one it could go, but this is up to you to do more digging to find the particular use-case you need.Paddy McCarthy I hope I got the Knitty details, I know it took a while.

For any lover of the command-line out there, you can get docopt working for your language of preference be it

C++
C
PHP
Haskel

The full list of supported languages can be found here. Till next time, @codes_green. Code, as usual, can be found from TheGreenCodes.

A Dive into Python Implementation

Marvin Kweyu — Sun, 26 Apr 2020 15:37:34 GMT

You read it right. It's all about implementation. Today, we will talk about the different implementations of Python. A heads up on the different kinds, be it Cpython, Brython, you name it. This in conjunction with their main advantages, in essence, option A vs option B. I you have been in the game for a while, I'm sure you've heard a name or two pop up. Let's dive in a little deeper.

First thing's first, though, a note

A language is one thing, implementation is another.

Basically what this means is that an egg and a cake are not the same things. One comes from the other.

Python as a language is interpreted, the implementation, however, is why we have this article.

CPython

The C implementation of python. The de facto implementation I'm certain you have and it's CPython and not Cython which is something totally different. You see, when you run a simple python script to print say Hello world, this is what happens in the background; the program is compiled to bytecode and interpreted. The compilation creates a *.pyc file, short for 'python compiled' file you might put it. You will not see this file though, as it is deleted when the program exits. In larger projects, however, you might notice the here and there *.pyc files. In this case, the project in question, once run again, will use the pyc file, which is obviously much faster as there is no need to compile it first then run it, instead.

As a somewhat different implementation of CPython, we have CrossTwine Linker. It would be a disservice if I do not mention this under CPython. By default, Python is slow the larger, say the project. This attributed to dynamic typing(not having to define the type of variable upon initialization). Here is where CrossTwine comes in. Giving Python a little juice. How it does this exactly, is by packaging CPython with their own add-on libraries, these, of course, being hidden from public view.

Note the comparison on the speeds in runtime. Lower, of course, being better. Ohh, what power!

Wow! So why have I not heard about it? Ummm... Perhaps because it's commercial? You have to pay for usage. You can, of course, absolutely go for its full details in CrossTwine's page. If you thought it was all fine with speed, then woosh, are you in for a disruption.

Brython

Ever wanted to skip writing Javascript on the client-side? Full disclosure, I have. Context switching sometimes catches up. Sneaky little thing. Brython, allows you to write python code directly onto the browser. As a hello world example:

# create a pop up with information fed into the HTML5 input fieldfrom browser import document, alertdef echo(ev):    alert("Hello {} !".format(document["zone"].value))document["test"].bind("click", echo)

For a list of some talks, events, and so on or even jumpstarts on Brython, you might take a look at their wiki.

IronPython

The same python, with C# implementation, letting you run your code on the Microsoft Common Language Runtime.So why us IronPython? My take would be, among other things, avoiding the Global Interpreter Lock. In short, GIL prevents your normal python installation from fully doing multiprocessing. Forget about the under the hood bury your head in the sand multithreading. In this case, we are talking about taking full advantage of different processors working on different things at the same time. It allows multi-threaded code to use multi-core processors. Hold on a minute before you go here though, you might want to take a look at the previous talk, If you don't know about it, you don't need it. If you are a .NET kind of person, this would be your GoTo. Integrate the two together and make some good worthwhile soup. Definitely worth the effort. Where to start? Why not their official page?

Jython

The fuss of the month, Jython. The Java implementation of python meant to be complementary to, you guessed it Java. Repeat after me and say complementary! Do not go there one-sided. Just like IronPython, it allows you to embed scripting, in this case to Java systems, making it possible to play with your java program in almost a similar way as the interactive python shell. So you can use Java in python code or vise versa. Your python code will produce Java bytecode, which will, in turn, run on a JVM(Java Virtual Machine). I assume we all know how Java bytecode, once compiled, is portable across a magnitude of devices already having this support.

PyPy

I thought I'd just start off with a mild comparison of PyPy with CPython.

Now, here is where we talk about machine code; the reason why C has remained the big man in the playground. It all comes down to this. You see, unlike Java's bytecode, for instance, C will compile to machine code, the direct language of the CPU. So there are no intermediaries. It's a boom and clap.

A simple analogy would be saying machine code is your operating system, while bytecode is an operating system installed on a virtual machine. One one side, you have something that is directly in contact with your PC, while on the other, you have a slight layer before the actual machine hardware.

This speed is the reason why a lot of data libraries, for instance, SciPy or NumPy are written in C.Call PyPy a cheat sheet. So, if, for instance, you have two python modules and one of them gets imported a lot to be used somewhere else, PyPy will take note of it, turn it to machine code, and cache this, so that next time you import it, you are no longer dealing with the code on top-level per se, rather telling the CPU: "remember that thing we did?"

Just how fast are we talking about? You can see on the PyPy site how their carefully crafted example, a similar program written in C and PyPy, PyPy beat C by 50%! A whole 50% faster than C for this use-case!

If only wishes were horses though. Despite its low memory usage, speed, and the micro-threading, PyPy has its weaknesses. For one, it's implemented in Python(weird, I know). This means that while CPython enjoys the ocean of modules written in C, PyPy sits on the sidelines. An example, PyGame. So while you might find Django to be lightning-fast in some instances when using PyPy, database management might be the opposite, as most of the database libraries were written in C. A little bit of patience is needed here. We simply cannot have it all!

There are numerous other python implementations, but these top the list. So go forth and choose wisely!That's it for now. Reach out in the comment section as to which you'd go for. till next time; TheGreenCodes.

Oh, and you can get us on our twitter handle: TheGreenCodes.

If you don't know about it, you most likely don't need it

Marvin Kweyu — Thu, 26 Mar 2020 15:15:32 GMT

If you don't know about it, then you don't need it. Seriously, you don't.

In the last share about The Future of the terminal, I had a rather interesting conversation with one of the readers. It sparked a thought or two.
The baseline was:

Most of the differences are things people don't use that regularly.

Absolutely right.

With the rapid diversification of technology, languages, and frameworks alike, too often we get lost in the complexity and get sucked into the love of the challenge. We forget our initial reason; to solve a problem.

Let's sit down, shall we? See exactly what this means.

Say, for example, you had deployed a Django application and for some reason, you wanted to make some changes. That's where fabric comes in. You see, it allows you to SSH into your server, upload a file, activate a virtual environment - hell, it even lets you restart your application all from right within a sample simple script. All you need to do is import it:

from fabric.api import cd, env, prefix, run, task

You are basically running a script as if you were getting into your server directly.

# access my server and create a new user.>>> from fabric import Connection>>> c = Connection('db1')>>> c.run('sudo useradd mydbuser', pty=True)[sudo] password:'sudo useradd mydbuser' exited=0>>>> c.run('id -u mydbuser')1001'id -u mydbuser' exited=0>

Speaking of server management, we got Nagios, and for this, we have Shinken. If at one point you are like, ' Who is going to do all these things again ! ' , then, by all means, use it.

From re-using Nagios config files to deployment, it handles it with a slight layer of abstraction so that it does not seem that much of a task. After all, we have to be DRY.

Now if in reading about the last two packages you are like, 'Wow! I mean!', 'That looks cool !', 'This is incredible ...' and so forth, dare to ponder; "Do you really want to use them? Or are you just amazed?"

If you were to go to their homepages right now, would you import them in a file or just install and keep them in stock in some random virtual environment? These are the questions you need to ask yourself.

Do you really want to use them? Or are you just amazed?

Consider another scenario. When will you read that bookmark you made a couple of days, weeks, or even months ago? Was it another impulse bookmarking? I have to admit, I am guilty as charged of this as well.

The well of frameworks, packages, and libraries is deep and extensive, but if you do not know what you are going to use a particular item for, chances are you do not need it in the first place.

A rather short page this time, trust me, I know. This is just to get that thought going in.For now, this is it. And remember, it's time to go through your bookmarks.

Till next time ,

The Green Codes

The Future of the Terminal

Marvin Kweyu — Thu, 12 Mar 2020 12:20:55 GMT

The Unix shell; the power of text-driven commands. Simply out of this world.You can do anything with it; literally (Okay okay, you can't cook pancakes). From getting a list of last modified files, viewing files without opening them with our friends cat, more or even less.Oh and by the way, less is more.

One moment you hear bash, the next its zsh and before you know it it's oh-my-zsh and they just keep coming.

For those in a slight mess of things, or those seeking to find just how deep this rabbit hole goes, this is definitely for you. There are different types of shells in this beautiful text-driven command world and as a heads up, I'll list a couple of them here.

sh
csh
tcsh
ash
korn
bash
zsh

Let us get into each of these to find out what they involve.

Sh

Hush for the godfather of them all; the template, the first of its kind, the one and only sh; the Borne Shell,named after its creator, Borne. This is the first shell to get into the Unix world. Its path? /bin/sh. Its the boilerplate that allows the listing of directories, the file management navigation and most of the things you are aware of now if using a shell. Is this still used? Yes, absolutely. Despite its age and limited functionalities in some ways, sh is still being used and as such many of the newer shells strive to be compatible with it, seeing as every Unix distro I have seen so far has it pre-installed. This is very common and is known to bring some confusion as to the 'it's not working' problem with Bash whose scripts are sometimes run with $ sh some_random_script. by accident when instead it should be $ bash some_random_script.

So, yes. As the godfather, it earns the right to be valid to this date and will still send satellites to Mars.

Csh

The lost sheep: the csh. It's very similar to the C programming language. Hence the prefix -C. Coming in complete with tab completion. To show you just what this means, try getting into the sh shell and head over to your home directory. Something simple, perhaps getting to your documents(or any other folder name) with tab completion.

thegreencodes@thegreencodes:~$ sh$ cd Doc  #hit tab key

Nothing happens. The equivalent of being thrown deep into the deep-end.Despite this bonus point against sh, csh has lots of bugs, left for the end-user to sort out, and hence hardly used for scripting.

I doubt you use it even on your day to day use, but if you do, take a look at reasons why not to . The conversation between the Novice and expert cracks me up every time.

Tcsh

Finally, an improvement of the Csh shell. You'll still need a little background with C but you would have far much less struggle as you normally would with csh. The support for this, however, is limited in comparison to the Bourn Again Shell, as is obvious with the default shell for most Linux distributions. So your question on StackOverflow might not get the traffic you might need.

Ash

Yes, that's right. As in burnt wood ashes. It's just Another Shell. Get it? Ash.Hold your horses one minute, I'm just messing with you. It's the Almquist shell sometimes confusingly so-called sh for short. (Remember this is a fork).

With opensource, a fork of the popular Bourne shell was made and called the Ash. In Debian based systems, this has been implemented as dash so its a /bin/dash directory. As expected, in most distributions based on ubuntu, /bin/sh is just in the end a link to /bin/dash, so you will not actually be using ash.

Unlike its counterpart parent, this is simple and light. To invoke it, go to your terminal and simply type sh.

thegreencodes@thegreencodes:~$ sh$

Just like that! You'll see the little dollar sign replacing your usual prompt. Because of its rather small size, it's used in systems that do not use a lot of resources. For instance, if you have a rooted smartphone and simply love the terminal too much to let go, so much so you had Busybox installed, then there you have it; Ash. Simple, precise and light.

This, however, comes with the disadvantage of not having all the nips and tricks up its sleeve. For instance, no completion of commands, or a progress bar to show how far a process is on before it completes (Say you are installing a program). Consider it a portable shell.

Korn

Corn? I mean what? No.. its Korn.

See, David Korn(creator of Korn), went into the lab at one point in 1986 and came out with the best of csh. What makes it stand out is the use of coroutines, regular expressions, speed, command-line editing(using vi mode or emacs style right from the prompt) and debugging.

You might need to install it on your system. More information can be found from their official site.

Bash

The Bourne Again Shell

You got it right, the Bourne Shell, sh came back bigger and better. Just head up and take a look at all the advantages of the previous shells, then bring them here. Of course, don't come along with the pains to the party.

This is the most common of shells, beginning every script with the #! /bin/bash. From the extensive support as you can tell with this graph in comparison to tcsh, to the flexibility, this will be your best choice of the so far mentioned.The source code to this can be found here just in case you need to manually enter the details for comparison. Rob made a beautiful repo.

Zsh

Then I stumbled upon zsh, and things got real. I'll place it up front, that this is, in my opinion, the best of them. True to its name, it holds a much similar functionality to the bash shell but with way lots more.

Now we both know that you can create bash scripts, add them to your profile and call a single command to do who knows what. For example:

#say a script that makes child processes do several things in the #backgroundthegreencodes@thegreencodes:~$ make_babies

This comes with one disadvantage; load time. You see, because your script is external, it needs to be loaded first. So you'll append something like this to your .bashrc

source ./path/to_some/random/goodie/script

If you need the slight kick, that slight jolt when executing, you would rather have a shell that already knows what you are saying. As an example, it supports floating points which, you know what, bash does not. A simple calculation with the bash calculator :

3.5/21# zsh calculator3.5/21.75

Now combine zsh with oh my zsh and you get cranberry pie. You know how you get the terminal with a prompt showing you whether you are in a git repo without having to write the all well known git status? Something like:

 ~ RandomRepository git:(master)

Yeah.. that little guy that makes it much more relaxing. Coupled with its 101 plugins, you are bound to get something worthwhile suited just to your needs. As an example; the autojump plugin that lets you get to a directory without typing out its full or relative path. Just key in its directory name and walla! I'll leave this bookmark here for you to try it out yourself.

By the way, for bash users, the same can be implemented via fancy-git. A neat prompt changer especially if you are a regular git user. Diogo has your back.

Before we close this for loop of shells, I should mention we also have fish, and (I know.I know I should not make a fish joke) do not get me started on the autocompletion. This is something I have not seen in the other shells. Out of the box. I would place zsh and fish at loggerheads but seeing as I am yet to dive in head to toe with it, I'll be back once my finger muscle memory works better.

In the end, with all these shells in play, the choice comes down to your preference and set up. What makes you feel productive? What are you comfortable using on the fly? This is the million-dollar question even I cannot answer. Well, fam, it's been fun and as I promised, I'm going fishing. Till next time: TheGreenCodes

PS:Set up a twitter account for all the regular post updates and out of the oven content. You should check TheGreenCodes account out. As always, here to learn, share and grow.

Python Command-Line Arguments: Part 3: getopt

Marvin Kweyu — Sun, 16 Feb 2020 15:53:09 GMT

As we get into more details with python command-line arguments, we have to remember the generic household name in the command line family; getopt.

See, before we had the fancy argparse, we had this swiss army knife of a package. If you are a 'regular-rish' C programmer, I bet this rings a bell. As a notable difference between the two, there is definitely more code involved. There are a few more differences that will come to light as we explore the same program from before. Here goes. As a little challenge, we are going to leave the project section out just to see how far we can get without the extra nudge. Remember the square program?

import getoptimport sysfrom math import sqrt"""get the square but get the square root in case the argument 'root' is provided"""def usage():    """    Show help for the CLI program    """    print("python advanced_square.py --number  \n OR\n")    print("python advanced_square.py -n \n")    print("To get the square root: python advanced_square.py -n  -r")    print("Example: get the square\n\tpython advanced_square.py -n 5")def main():    try:        option, arguments = getopt.getopt(sys.argv[1:],"hn:r",["help","number=","root"])    except getopt.GetoptError as error:        print(error)        sys.exit()    # initialize variables for for loop    number = None    root_number = False    for opt, variable in option:        if opt in ("-h", "--help"):            usage()        elif opt in ("-n", "--number"):            number = int(variable)        elif opt in ('-r','--root'):            root_number = True        else:            usage()            sys.exit()    if root_number:        print(f"The square root of {number} = {sqrt(number)}")    else:        print(f"The square of {number} = {number* number} ")if __name__ == '__main__':    main()

What this program does is simple. Get the square of a number parsed as a command line or get it's root if the --root or -r is parsed.

To run it.

# get the square of 5python advanced_square.py -n 5# get the square root of 5python advanced_square.py -n 5 --root

Let's break this down.Unlike grandson argparse which just knows how to display helpful messages, in getopt we do not get this right out of the box. So running python advanced_square.py will simply give a user an error screen like so

Traceback (most recent call last):  File "advanced_square.py", line 48, in     main()  File "advanced_square.py", line 43, in main    print(f"The square of {number} = {number* number} ")TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType'

we have to specify how we want this run. That's where our custom function usage comes in. Telling the user, 'Hey, know what? You messed up'

Take a look at this line here:

option, arguments = getopt.getopt(sys.argv[1:],"hn:r",["help","number=","root"])

What getopt.getopt() does is accept three arguments,

#Accept a list of command-line arguments excluding the program #name itself.sys.argv[1:]

Accept short options. The reason why in your Unix-like terminal you can say ls -a or ls --all to list contents of a directory including hidden files. -a would be the short option in this instance

"hn:r"

Note how we show short options. Our program, advanced_square.py accepts three short options, namely, h for help, n for the number whose square or square root we want, and r to specify whether it is indeed the number's root we need.

See this n:? That little colon after the letter n specifies that after we write n, we need the actual number parsed. Hence -n 5 or -n 29 and so on. The long option alternative, as you might have noticed, would be number=. We add an equal sight to it to show that this option needs an argument to follow it, unlike its counterpart root and help

We follow it up with r to specify that the user can just write -n 5 -r which would mean get me the root of 5. The long alternatives to this program would be :

python advanced_square.py --number 5 --root

The order does not really matter. Whether --number comes after --root or not is up to you. getopt will know what to do. Isn't that cool?

Just note, however, that if you specify n to be capable of receiving a number, then that is what must follow it. So you cannot do this

# wrong movepython advanced_square.py --number  --root 5

As a whole;

getopt.getopt(sys.argv[1:],"hn:r",["help","number=","root"])

The return of the above we get two items. The first, which we name 'options' looks like this [('-n', '5')] while the second, named 'arguments' is simply this [], an empty list! That's odd. One might say.

The arguments variable holds what extra arguments have been parsed to the program. So if a user does:

python advanced_square.py -n 7  9

options will look like:[('-n', '5')] while arguments will look like ['9']. It would hold all those weird extras you might pass in. So if we randomly decide to use:

python advanced_square.py -n 7  -h

options would look like this [('-n', '7'), ('-h', '')], a list of tuples. See some light at the end of the tunnel?

Let's move to the next line.

getopt.GetoptError as error:

Notice how we get the error.In this way, anything that is not in the required command-line argument specification is caught as an error. For instance, you use python advanced_square.py --animal 5. In such a case, we want to display the error message and gracefully exit the program. In short, using the phrase animal is nowhere defined in our program!

Because our main focus is on the arguments the program actually needs, we abandon use on the arguments(extras) variable and loop via the list of the options. We are saying, if ('-n', '7'), in this case, n is present and has value, take it and do '1 2 3' else, if ('h', '') which is present is there, call this function - help me!.

We convert the value of the parsed item, our '5' to an integer as getopt assumes everything coming in is a string. So we change the input to an int and assign it to a variable called 'number' number = int(variable)

Getting the number and the value of root as either True or False, we can safely get our square and root for use through the program.

This walk through's code, as usual, is at TheGreenCodes. Till next time though, adios!

Command-line arguments: Part 2: argparse

Marvin Kweyu — Mon, 11 Nov 2019 18:41:57 GMT

And...We're back, once more with our CLI programming. We're taking a deeper dive into the waters of command-line arguments, this time, looking at a friend to argv ; using argparse. That's right, it's a whole family.

Last time we talked, we had a whole array of possibilities with using argv, but what if I told you there is much more that could be done while using its big brother module? Let's see what this just means.

Before, we used argv to get the square of a number, passing in the number we wanted the square of and getting the result. What if we had a new user who did not know how to work with the program and you weren't there to explain your new-found love for CLI programs? What if you wanted your program to do more than just print the result right to your face?

Okay, I had to laugh a little at that last bit. Let's give it a shot on how we would get this to work seamlessly.

# advanced_square.py# get the square of numbers using command_line_argumentsimport argparseparser = argparse.ArgumentParser(description="get the square of the input value")parser.add_argument("square", type=int)arguments = parser.parse_args()print(arguments.square**2)

If you haven't noticed yet, we have used the very same example we used in the last section only this time, we called in some help. Time to open the cookie jar.

parser = argparse.ArgumentParser(description="get the square of the input value")

Above, we are simply giving the program description. ( What does this program do? ). Unlike the previous section where we just parse in arguments and use a list to get the item we want, here, we specify using the below line:

parser.add_argument("square", type=int)

Above, apart from giving the program description, we have added an argument which is an object of type int, a number. We state beforehand and tell the program to expect this type of integer. We tell the program to name this variable square so that we can retrieve it with that same variable name.

Take a look at this line:

arguments = parser.parse_args()

All our arguments added are contained in our variable called parser. Using parse_args is a way of getting our variables from the whole lot we might have added and within it, we have a tuple-like object:

Namespace(square=8)# note that the number after the equal sign depends on what you have #passed to the program as an argument

Adding more arguments would increase its size with the respective mapping of value to the variable name. For simplicity's sake, we place all this into a variable called arguments which will hold whatever arguments we may have passed to the program after which, we call the variable we want to work with its name using arguments.square and do the math printing it to the terminal. Hence,

python advanced_square.py 864

Note how using argparse gives us a little push help. For instance, with the current file, if we use the common -h or --help for assistance when we have amnesia, we get something like this:

python advanced_square.py --helpusage: advanced_square.py [-h] square get the square of the input valuepositional arguments:  square      the input value whose square is neededoptional arguments:  -h, --help  show this help message and exit

Our program description is nicely printed for anyone who would want to know about what we do within the program while our arguments are printed with their details.We pass in the number we want to square because we simply cannot square an empty space! Wait, can we?

Now as much as we would love to, we cannot keep subtracting numbers all day! Just to show you how powerful argparse can be, we are going to modify one of the programs we did in the second piece of File management with python. Do go through that, if you haven't already, and get back once done.

We modify the program as below:

import osimport shutilimport globimport argparsedef organize(folder):    """Organize files according to extension """    # get into the directory passed    os.chdir(folder)    all_files = [x for x in os.listdir('.')]    file_types = set((os.path.splitext(f)[1] for f in all_files))    for ftype in file_types:        new_directory = ftype.replace(".", '')        os.mkdir(new_directory)        for fname in glob.glob(f'*.{ftype[1:]}'):            shutil.move(fname, new_directory)def main():    parser = argparse.ArgumentParser(description="sort files according to file extension")    parser.add_argument("path", help="Path of directory with unsortsed files")    parser.add_argument("-v","--verbose", help="Descriptive message of directory after sort", action="store_true")    arguments = parser.parse_args()    organize(arguments.path)    if arguments.verbose:        print("\n The following files were extensions were found and organized successfully...")        print(os.listdir('.'))    else:        print("Done!")main()

Now that is much better! Things are finally starting to take shape. You have the option to either:

Pass the relative/absolute path of the messy directory to clean
Pass the path to the directory with an optional argument -v or --verbose to list the folders in the cleaned up directory.

We intend to advance on our knowledge on command-line arguments in next week's article, but for now, take a break, relax and let it sink in.

You can go through the code section, at TheGreenCodes and or give out your thoughts or ideas on just what you think is possible with what we have done so far!

Cheers!

Python Command Line Arguments: Part 1: argv

Marvin Kweyu — Fri, 01 Nov 2019 16:40:04 GMT

Command-line arguments. What are they, really? Why would you ever use them?

Command-line arguments are a way of reducing hard-coded variables in a program. In this case, if your program takes in user input every now and then and does something to it, you can avoid the need for a prompt for instance:

Enter your first name: Enter your second name:

Instead, you can simply run the program like this;

python user_name.py John Doe

Here, the program identifies the first name as John and second as Doe without you having to implicitly write two input statements in your code, assign them to variables and then print them back out. You've made the work much simpler. Easy pizzy, right?In python, command-line arguments are handled in the following ways.

argv
argarse
getopt

To kick-off, let's get a start with using argv.For this illustration, we will be writing a simple program that takes a number and returns the square. Now is that not something cool?

We'll start off by writing the program as one that you would normally have:

# square_numbers.py#program that allows the user to enter a number and print the square.def main(number):    """Return the square of the number """    return number* numbernumber = int(input("Enter a number to square:"))print(main(number))

Now to get the result, you would normally have something like this on the command-line

python square_numbers.py

You would then get a prompt:

Enter a number to square:

Where you would be allowed to input a number and get back the square of that number. Compare this with a python program that uses command-line arguments.

# square.py# get the square of numbers using command_line_argumentsdef main(number):    """Return the square of the number """    return number* number#convert the argument to an integer since it is read as a string.number = int(sys.argv[1])print(main(number))

To get the result from this:

python square.py 5

Of-course the result would be 25!
Take note, however, on how we get the value 5 from the command-line.In the program, we import sys , a module short form for system-specific parameters and functions.We then use it in conjunction with argv. Why? Simply because as a module, sys has a lot of barrels under its belt. We are simply specifying that from this well of tools you have, we want access to command-line arguments and thus, so named argv, quite literally.

Here's the catch, sys.argv reads items as a list. What does this mean?It means everything after the word python is an item in the 'virtual' list. So our list looks like this:

[ 'square' , '5' ]

That would shed some light into why we have sys.argv[1] to get the second item on the list. Remember, unless you are an R programmer, who gets right on with 1, counting starts from 0 (zero). I know, right? R programmers are so weird. Harrison if you're reading this don't delete my twitter account.

Photo by Charles on Unsplash (It could be Harrison but no one will ever know)

Anyway, where were we? Oh, right, getting values from the terminal.

So if we were to pass in more values, it would increase the length of our list.Let's take an example, say, a program that takes two numbers and adds them.

# add.py# get the sum of two numbersimport sysdef main(first_number, second_number):    """Return the SUM of the number """    return first_number + second_numberfirst_number = int(sys.argv[1])second_number = int(sys.argv[2])print(main(first_number, second_number))

This time, if we write python add.py 5 6, we would get a result of 11, the first_number being 5 while the second, 6. We did it! The simplest program in the world other than, well, Hello world but with a twist.This brief article should give you a great foundation for building a ton of great programs. Just pass in a variable, absolutely anything; a URL, a path to a file, you name it. Provided your program gets to read and use it.

We'll take a step deeper into the use of command-line arguments in the next article, comparing yet another way of doing it and its differences with what we have just gone through. Need to go through the files? Take a look at the repository from TheGreenCodes.

File management with python: Part 2

Marvin Kweyu — Thu, 26 Sep 2019 09:19:53 GMT

This is the second part of the series File management with python. We pick up from where we left last time Part 1, where we organized files according to the extension. So, let's get started.

Sometimes, organizing files might need just a bit more than knowing their extensions. For instance, take a directory where all the files are of the same type. Whether .pdf, .doc , .mp4 and so on. In this piece, we take our organization a little further. Say you have a folder with slides(.ppt ). In this case, you've just received a whole lot of lecture files, but they are not exactly easy to go through. So instead of having a whole slide having everything for the first lecture, they were broken down to a slide for a session. Our folder, in this case, is assumed to look as below.

DataStructures/|_Datastructuressession1Slide1.ppt|_Datastructuressession1Slide2.ppt|_Datastructuressession1Slide3.ppt|_Datastructuressession2Slide3.ppt|_Datastructuressession7Slide8.ppt|_Datastructuressession9Slide2.ppt... and so on

What's happening? We got the slides alright, but they are a mess. You would have to look through the folder for a specific slide that follows from where you've just read. We should make this easier, Let's have all the slides organized according to the session. Remember how we generated random files in the previous article? We'll do the same thing, only this time, all the files will be of the same type. Have a look at that here for a quick refresher. Our file looks much like the create_random_files.py.

#!/bin/python3# create_lectures.pyimport osfrom pathlib import Pathsessions = [str(x) for x in range(1,21)]  # create 20sessions sessions = [str(0)+item if int(item) < 10 else item for item in sessions]# Datastructuressession01Slide1.ppt# get into the DataStructures directoryos.chdir('./DataStructures')for item in sessions:    # create 20 slides for each session    for num in range(21):        file_to_create = f"Datastructuressession{item}Slide{num}.ppt"        Path(file_to_create).touch()

Okay, okay. I'll admit I went a little overboard with the number of files this time. That's quite the number.

Let's draw our attention to this line:

sessions = [str(0)+item if int(item) < 10 else item for item in sessions]

The line just before this makes a list of twenty numbers, but here's the catch, we convert each of these numbers to strings. Why? We are appending the number 0(zero) as a string to each number if indeed it is below 10. That would make each number in the lower 10 range look like this; 01, 02, 03,... and so on.

Above, we created a number of files for each session in a range of 20 sessions.

What we do next is simple, group these files according to the session.

# clean_reading.py#!/bin/python3# move files to directories according to the file name patternimport osimport shutil# get into the Datastructures directoryos.chdir('./DataStructures')# Datastructuressession01Slide1.pptfor f in os.listdir("."):    folder_name = f[14:23]    # print(folder_name)    if not os.path.exists(folder_name):        os.mkdir(folder_name)        shutil.move(f, folder_name)    else:        shutil.move(f, folder_name)

The only line that might need some explaining would be:

folder_name = f[14:23]

We have counted the number of characters for the kind of name we want our folder to be named after. In this case, we broke down the string for the file name Datastructuressession01Slide1.ppt from where we get that the first s for the session is character number 14 in the whole string while the last session count would be character number 23 which translating from our generated files would mean the last digit of the number 20.Running this would get all our slides in their respective sessions quickly, and clean.There is a lot more that one would want to do, say let the program know which sessions are included without a manual key in the character position, but more advanced tools exist for this especially under the UNIX environment. Feel free to do some scouting and find what works best for you. As a heads up, here's a cool GUI sorter made with python. Paul did an awesome job on this. Questions? Do reach out in the comment section. As usual, all this code can be found at TheGreenCodes.

File management with Python

Marvin Kweyu — Mon, 02 Sep 2019 11:59:39 GMT

Most of us have had that one experience where we had a ton of dis-organized files in our machines. It happens. One minute, you're opening a large zip file, the next thing you know, the files are everywhere in the directory, mixing with all your important files and randomly placed leaving you with the task of manually sorting what needs to go where. It's a real pain. To ease this process, we're going to delve into file management with python the smart way.

Work smart, not hard.

Let's begin. We'll be using python 3.4 or greater. Assuming you've got python up and running already, we're going to take a walk with the OS module and a few others we will introduce along the way. Most of these come with python, so there's no need to install anything else to follow along.

Creating random files

Create a directory to work with. Call it ManageFiles. Inside this folder create another folder RandomFiles. Your directory structure should now look like this:

ManageFiles/ | |_RandomFiles/

We're going to create random files to play with in the RandomFiles directoryCreate a file create_random_files.py inside ManageFiles directory. You now have this:

ManageFiles/ | |_ create_random_files.py |_RandomFiles/

Done? Now get in the following code, we'll get into its details in a moment.

import osfrom pathlib import Pathimport randomlist_of_extensions = ['.rst','.txt','.md','.docx','.odt','.html','.ppt','.doc']# get into the RandomFiles directoryos.chdir('./RandomFiles')for item in list_of_extensions:    # create 20 random files for each file extension    for num in range(20):        # let the file begin with a random number between 1 to 50        file_name = random.randint(1,50)        file_to_create = str(file_name) + item        Path(file_to_create).touch()

As of python 3.4, we 've got pathlib, our little magic box. We also import python's random function for creating random numbers; Hold on to that thought, we're going to cover it when as we get to the line that uses it.

First off, we create a list of file extensions from where we will get our random files. Feel free to add to it. Next up, we change to the RandomFiles directory, then comes our loop, so here goes.We are simply saying, take each item in this list_of_extensions and do the following to it. Let's take the .txt for instance. We get into another loop, where to this .txt, we do something to it 20 times.

Remember our import of random? We use it to select a random number between 1 and 50 for our file. In short, what this little loop does is save us, the less creative lot(don't worry, I'm part of this crew), the time of naming random files. We will simply create a file say 23.txt or 14.txt, provided it falls within our range of 50, twenty times. This is just so as to create a mess large enough to give pain when moving manually. The same process will be done with the other extensions. Next? Run this in your terminal.

python create_random_files.py

Congratulations! We now have a mess of a directory. Now to clean it up. In the same location where our create_random_files.py is, create a file clean_up.py and get the below in.

Method 1:

import osimport shutilimport glob# get into the RandomFiles directoryos.chdir('./RandomFiles')# get the list of files in the directory RandomFilesfiles_to_group = []for random_file in os.listdir('.'):    files_to_group.append(random_file)# get all the file extensions presentfile_extensions = []for our_file in files_to_group:    file_extensions.append(os.path.splitext(our_file)[1])print(set(file_extensions))file_types = set(file_extensions)for type in file_types:    new_directory = type.replace(".", " ")    os.mkdir(new_directory)  # create directory with given name    for fname in glob.glob(f'*.{type[1:]}'):        shutil.move(fname, new_directory)

For this, we import two new libraries; shutil and glob. The shutil will help us move our files while the glob will help find the files to classify. Just like before, this will all become clear as we get to the line.

First off, we get a list of all the files in the directory.

Here, we assume that we do not have a clue of what files are in the directory. This means unlike where you can get all the extensions present manually and use if statements or switch, we want the program to look through the directory and do this for us. What if the file had dozens of extensions or log files? Would you do this manually?

Once we get a list of all the files in the folder, we get into another loop, to get the file extensions of these files.Notice how we use:

os.path.splitext(our_file)[1]

Currently, the our_file variable looks something like this 5.docx (for instance). When we split it, we get this:

`('5', '.docx')`

we then get the index [1] from it which in turn takes .docx since 5 is index [0].So we now have the list of all file extensions present in the folder, whether repeated or not.

To make it non-repetitive, we make a set. This takes all the items from the list and gets only the unique items. In our case, if we had a list where we had an extension say .docx repeating itself over and over in the set would ensure we had only one of it.

# create a set and assign it to a variable file_types = set(file_extensions)

Remember our list of file types still has the . for every file extension. This would mean if we were to create a folder named exactly the same way, we would end up creating hidden folders and that is something we do not want.

So, as we loop over this set, we create a directory with the same extension name, only this time, we replace the . in the name with an empty string.

new_directory = type.replace(".", " ")# our directory would now be called  'docx'

We still need the .docx extension to move the files.

for fname in glob.glob(f'*.{type[1:]}')

This simply implies take any file that ends with the .docx file extension(Notice the spacing used in f'*.{type[1:]}').There is no space.

The wild card * means a file can be named anything, provided it ends in .docx. Since we have already placed the period . we take the string we have and have everything else afterwards and that's why we use [1:] which just means take from after the first character, hence take docx.

What next? Move any file with this extension into the directory named as so.

shutil.move(fname, new_directory)

In this way, once a directory for the first file found in the loop has been created, no other duplicates can be made. In short, we will not have a folder to store 5.docx and many others to store 34.docx and so on. Once we have a directory made, all other folders looking like so will move there. That's it!

Method 2

You can alternatively, use generators. This is a fancy way of creating a list with a one liner.

import osimport shutilimport glob# get into the RandomFiles directoryos.chdir('./RandomFiles')#take every file from the directory and add to a list for all filesall_files = [x for x in os.listdir('.') ]# make a set for the extensions present in the directoryfile_types = set((os.path.splitext(f)[1] for f in all_files))for ftype in file_types:    new_directory = ftype.replace(".", '')    os.mkdir(new_directory)    for fname in glob.glob(f'*.{ftype[1:]}'):        shutil.move(fname, new_directory)

Both of these will work. You've now got all your files sorted according to extension.

ManageFiles/ | |_create_random_files.py |_RandomFiles/    |_doc    |_docx    |_html    |_md    |_odt    |_ppt

Woosh! That was a lot. We did save some time though. Any questions? Feel free to reach out. That's it for now, Stick around as we take it up a notch next week. For the code on this, check TheGreenCode.

Getting Started as a Junior software developer

Marvin Kweyu — Thu, 27 Jun 2019 14:08:41 GMT

Before we begin, I'd like to congratulate you on this beautiful path you've chosen; Software development. It is indeed a long journey that has its rewards but also comes with its pitfalls as with anything around you.So, what is software development?

"Software development is the process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components."

Quite a mouthful, isn't it? In simple terms, software development may be described as the process of writing computer programs with the purpose of solving real-world problems. Having chosen to take a step into this so-called 'realm of weirdos' as some would describe it, here are a few pointers that can be of help along the way.

Choose one language as you start and stick with it.

I cannot insist on the importance of this one key pointer starter developers need to get. I know you've probably looked around and seen the amazing things a certain programming language can do, only to look back and see another cool feature from this other language. Stop. You won't get any better by ogling the vast library of programming languages in existence. Trust me, they are quite a number.

Rather, other than looking around at other people's projects and doing a "Hello world" in many languages, choose one, just one. You start pushing your limit and the limit of your tools when you focus on one thing: one framework, one language, one project, etc. And you will get deeper knowledge along the way. Resist the urge to become a jack of all trades and master of none - at the start

Get the basics down.

Before creating a house on quicksand, get the basics down. This would likely be going through the documentation of your language of choice. Get the nitty-gritty details at your fingertips. Know the basic syntax, how to create basic functions, how to declare variables, work with OOP, and the likes. Learn the underlying concepts and algorithms. Remember, languages and frameworks change, but the algorithms will always remain. It is bound to save you a lot of time as you develop your project, which brings me to the next point.

Create a real-world project.

It doesn't have to be the 'next google' or the next big thing. This is a marathon, not a sprint. Start small. Create something that directly impacts your day-to-day activities. For instance, as a first project, I created a simple house chores timetable with python. It was an excel sheet that would change every week it was run to suit our needs at the time. Having this on my local computer, the next step would have been to mail the participants of the table the week's chores. Simple as that. With this, I learned some OOP concepts, interacted with excel, and finally got to send an email with python!

In essence, learning never stops. There is never a point where a software developer can say "I've just finished learning C!" and with pride stash it somewhere in a closet as a finished product. It never happens. To show just how true this fact is, Dennis Ritchie, the creator of C , when asked where he would scale himself as a C programmer on a scale of ten gave a seven. I think you can get what this means.

So get out, create a TODO list in Javascript (if this is your choice of language.) for your club or make a simple website to display events your club has in store for the week, month, and so on. Make it personal, something you would want to be done in the actual setting you are in.

Get in touch with the development team around you.

You should have gotten a few things up your sleeve by this point. So what next? Collaboration.From github to bitbucket,get team you can work with,peer to peer coding. Build things together, give each other challenges, review each others' code. Get to know what you can do more to improve while helping a friend out write better code.

Learn.

As a checkout for the post,learn. Learn learn and learn. As a constant reminder:

The only constant in software development is change.

There is always that deprecated function, that new library to make your work all the much easier, that new way of declaring functions, variables, you name it. It's as they say, "Staying offline voluntarily as a web developer for a month or so would be a mistake. " Why? Because the needs in web applications and user preferences are always changing. This feature today is likely to be shunned tomorrow. So learn. Always be on the lookout and don't be afraid of change.

PS: For those still using Python 2.x, security updates stop in 2020.