PROLOGUE
Welcome to this series where together we will move from simple Data structures such as Lists to more complex ones like B-Trees. I will be using C Language for coding but feel free to use any language of your choice. Let's get right into it.
Definition
In General a Data Structure is a form in which we store data in computer memory for efficient processing using algorithms we develop, Hence the reason why topics on data structures are taught in sequence with algorithms. Wikipedia
LIST
A list is a structure in which one piece of data is aware of the the next one hence forming a chain. To Allow for such linking the actual data is stored in a container which has information of how to get to its neighbouring containers. Such a container is knows as a node. With this in mind we can redefine a list as a collection of nodes. We are going to cover 2 types of lists in this series:
- Single Linked List
- Double Linked List
THIS ARTICLE IS ABOUT SINGLE LINKED LIST
Structure of a Node
A single Node has 2 properties. The data being stored by it and a reference to the next node in the list.
A struct in C mimics a class in object oriented languages.
Structure of a List
From our definition we gathered that a list is collection of linked nodes as such our list object should be able to give us access to this collection of nodes and also the number of nodes in this collection. The property size holds the number of nodes while the head and tail properties point to the beginning and end of the collection respectively.
A pointer in c i.e *head just a number which translates to a location in memory where our data is stored.
Due to techniques employed by operating systems and hardware such as virtual memory a pointer does not directly translate to a physical memory location.
Creating a new Node
We need a way to initialise a new node for every data to be inserted in the list. To initialise a node we need the we will store in the node and also a reference to the next node. In C first have to allocate memory on the heap in where our node will live.
TODO write a article on memory handling in c
This is the equivalent of a class constructor if your language is object-oriented.
Freeing list Node
If you use a language other than C/C++ chances are memory allocation and deallocation is handled by the language itself so this routine is not necessary. It you use C like me then we need a way to free the memory we allocated during creation of a the node. The free function takes as an argument a pointer to a memory location allocated int the heap.
In modern languages with garbage collection an object in memory that is no longer being referenced in code is automatically deallocated. You can achieve this by assigning the object to a null
Creating a new list
Since we will probably need to use several lists in our code we need a way of creating a new List when we need one. The head and tail initially point to null( not a valid location in memory) and the size is zero since our list is empty.
Adding values to the list
Assuming we now have our list, we then need a way of adding items into this list. Our implementation must hide the fact that the list is internally being represented by nodes instead of the actual data being stored in the list. Our list implementation allows as to add items to the list in two different ways, either the beginning or end of the List.
Adding elements at the beginning | Prepending
Since we our list has a reference to the start of the node collection then prepending our new element unto the collection is trivial. Since our node constructor handles the linking of nodes during creation we pass it the value to be stored int the newly created node and also the current head of the list. Once we get our new node we just reassign the list head to point to this node. We also increment the size of the list by 1.
When adding the first element into the list we must ensure that the list's tail is updated to point to the newly created node because this node is both the start and end of the list.
Adding elements at the end | Appending
This routine is almost similar to the prepending routine we described above. The only significant difference is that we use the tail instead of the head.
Note: Make sure NOT to pass the current list's tail to the node constructor as we did with the head when adding items to the beginning of the list.
Try figuring out the effect of passing the list's tail to the node constructor.
Index of an Element in the List
After adding a couple of elements into our list we need a way to check if a particular element is present in the list and if so at what index. Since our nodes are chained there is a way to navigate from the head to the tail of the list while coming across all the nodes present in the list. While navigating the the chain if nodes we maintain a counter which we increment by one with each transition to the next node in the chain. When we finally come across the node with the value we are looking for we return the value of the counter up until this point, this is the 'index' of the element. If we ever reach the end of the chain (i.e past the tail of the list ) then we return -1 to indicate that the element is not present in the list.
The index of an absent element doesn't have to necessarily be -1. The value you choose must only meet the condition that no element in the list can return that value when queried for its index.
Check if an Element is present in the list
You may also want to check if an element is in the list and return either true or false. If we already have a way of checking for the index of an element in the array then we can utilise this to implement this routine. This is trivial in that we only have to compare the result of the indexOf function against an index representing absence an element in the list i.e ( -1 in our case ).
Again it doesn't have to always be -1
Removing Element From the List
Once we are done using an element in the list we may want a way of removing it from the list. We can go about the removing process in 3 different ways: the element is at the beginning of the list, the element is somewhere in the middle of the list and the element is at the end of the list. Either way an element is removed from the list thus reducing the the size of the list.
Element is at the beginning of the list
This is the trivial of all the 3 cases. Since the head of the list has a pointer to the next node, say node X, Then we can make the X our head and free the memory used by the previous head we just removed from the list if need be.
Depending on how we implemented our insert functions we may have to reassign the value of tail if the removed element was the only element in the list.
Element is at the end of the list
Even though we have a reference to the tail node of the list its removal is not as trivial as that of the head. Can you guess why ?
The answer is we need to reassign the tail after removal of the node it points to but since our nodes doesn't hold information of the previous node it is chained to we need a way to obtain this node first before removing the tail.
To do this we must cycle through the list and get the node's whose next node is the tail, say node X. Once we have this node we can remove the tail node from the list and reassign tail to point to X.
Based on the implementation you use to check if you reached the end of the list, you may have to reassign the value of X's next accordingly. What i mean by this is that you may have a specific node that marks the end of the list instead of a NULL like we used in our implementation.
Element somewhere in the middle of the list
With our understanding of how to remove an element at the tail, this becomes trivial. We can assume that the given node at ( index-1 ), then reassign its next to point to the next of the node we are about to remove.
If X->next == Y(node being removed) then X->next = Y->next.
To prevent duplication of code, an index that translate to either the head or tail nodes is handled by the removeFront and removeBack routines respectively.
Looping through the list
If your language has support for higher order functions then this routine achieves the same purpose. The foreach function takes as a parameter a function that will be called with 2 parameters for every node in the list.
- Value of current node
- Boolean flag as to whether this is the last element in the list
An example of function would be
void printer(const int i, const int isLast) {
if(!isLast) {
printf("%d, ", i);
} else {
printf("%d", i);
}
}
In C a true value is represented by any number other than 0, 0 represents false
Freeing A list
Once we are done with using a list we need a way to free the resources being utilised by the list. To free the list we first make sure that we free all the nodes in the list to prevent a memory leak.
If your language has garbage collection then this routine is not necessary.
Tests
To make sure that your code works as expected you must write tests for it. The tests shown were used to evaluate and ascertain that the code worked as expected.
TODO write an article about Test Driven Development TDD.
With that we are finished with our first data structure. All the source code is available on my Github.
Feel free to leave a comment on what you thought of the article.
If the article helped you in any way then you could share it for others to find it with ease.
As always you can reach me on Twitter with any comment or suggestion.