Memory management - Deep and Shallow Copying

Memory management - Deep and Shallow Copying

Let's go back one moment. A little further down to our data structures. The dear heaps and stacks of them.


What happens when I assign a variable?

What about when I pass it as a parameter?

How does the program know this is what I'm talking about?


Before we go to a higher level of abstraction, we should look at how the machine 'thinks' about it.

Assume you have declared a variable:

my_variable = "random";

Every time you do this, you add to the list of things the program has to remember - one on top of the other - a stack.

basic_stack.png

As it stands, your program will store your string, "random", as below:

memory_management.png

The first diagram with the pointer (location of the variable in memory), len(for the length of the string) and capacity(the amount of memory, in bytes, the machine gives to this variable for use) is a stack and the other is a heap.

Whatever we do with this variable, be it a mutation, concatenation, making a substring or whatever there is, refers to the stack. Here, the pointer holds the location of where the actual data is stored. So we are just given a reference to it.

Why do we do this? Why do we store it in different data structures?

It's about the speed.

Stacks are faster compared to heaps. So instead of moving around a whole chunk of data(the heap) while mutating it, just carry the reference to it. I mean, the program is already doing its tasks (whether heavy or not), so there is no need to add overhead here.

However, not all data is assigned as such. Static data types, that is, boolean, integers, floats, and chars, variables are added directly onto the stack. So we would have no heap to store a simple 456.98 because the program already knows the sizes of these types except in the rare case it is user input.

The size of these types, more so numbers(integers and floats), are determined based on whether they can be negative (signed) or exclusively positive(unsigned). This should remind you of how you declare your variables in math. You would say that any number in your paper is positive unless stated otherwise, or as we call it here, unless signed.

So this assignment would work with compound data types - the result of combining two or more static types.

Example:

  • string (a combination of chars)

  • arrays

  • tuples ... and so forth, depending on how your language of choice calls it, for instance, dictionary vs. javascript object.

Back to copying.

You want two variables to refer to the same thing and you want to edit one of the variables without affecting the other.

You might assume that all you had to do was a simple re-assignment.

my_variable = [0,1,2,3,4,5,6,7,8,9]

my_other_variable = my_variable

A declaration like the above will lead to two variables showing the same result, an array from 0 to 9. The caveat? They will both reference the same heap.

So what happens if I mutate one variable?

my_other_variable.append(45)

print(f"My second variable: {my_other_variable}")
print(f"My variable: {my_variable}")

In both cases, the output is a list:

[0,1,2,3,4,5,6,7,8,9,17]

Strange. Huh?

What if we wanted to mutate each of these variables independently? For example, have `my_variable` change to [0,1,2,3,4,5,6,7,8,9,17] and my_other_variable to [0,1,2,3,4,5,6,7,8,9, 45, 129]?

To get completely two different items with the same data, in that both can be mutated independently, you have to take a different approach; deep copy.

A warning

As far as memory is concerned, deep copying is memory consuming as it has to get the pointer and follow it to where the data is stored then duplicate this heap.

Depending on what language you are using, we have the inbuilt copy module in python, javascript and or copy for lower-level languages and so on and so forth (We cannot simply list all the ways to deep copy across the multiverse)

import copy

my_variable = [x for x in range(10)]

my_other_variable = copy.deepcopy(my_variable)

Love JavaScript much?


let my_variable = [0,1,2,3,4,5,6,7,8,9]


let my_second_variable = `${my_variable}`


my_second_variable = my_second_variable.push(100)

console.log('My first variable',my_variable)
console.log('My second variable',my_second_variable)

There are, of course, other multiple ways of doing this. It is, after all, javascript. A point to mark, especially with objects, [lodash](lodash.com/], dearest ramda or rfdc work perfectly. Custom method for your implementation? Go ahead, just not JSON.stringify().

The mad rustacean?

let my_variable = String::from("random");

let my_other_variable = my_variable.clone();

Having done this, you can manipulate your new variables in any way you want. Go to the moon if need be. Just need a couple of dollars more.

memory_management_deep_copy.png

It is this same principle that governs the passing of variables across functions and objects. Passing a pointer to the original data and not the whole heap. Comprende? I sure hope so. So go forth and choose wisely.

Let's leave this piece at that, and chat in the comments if need be.

And yes, we can chat tech on Twitter too. marvinus_j