A Short Guide to Data Structures

From Wikiid
Revision as of 09:50, 10 June 2009 by SteveBaker (Talk | contribs) (New page: This document makes a quick trip through the most common data structures encountered in day-to-day programming. I'll use C++ syntax here - but the same principles apply in other C-like la...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This document makes a quick trip through the most common data structures encountered in day-to-day programming. I'll use C++ syntax here - but the same principles apply in other C-like languages.

== Built-in data structures: I'll skim quickly through these since you should already know them:

POD

Plain Old Data. This is a term that covers the basic, built-in data types in C-like languages, this pretty much boils down to:

 char, unsigned char
 short, unsigned short
 int, unsigned int
 long, unsigned long
 bool
 float
 double

There are variations on this basic set in some languages - but these are the essentials. POD also includes pointers to these basic types.

Pointers

A pointer is a small variable (typically 4 or 8 bytes depending on whether you're using a 32 or 64 bit computer) that points to some other variable. The pointer contains the address of whatever it's pointing at:

 int *x ;
 int y ;
 x = & y ;   // X equals the address of Y.

Diagrammatically:

 X --------> Y 

Structure

Generally refers to the idea (in C) that you can group together a bunch of objects (which could be POD, pointers, other data structures, etc) and wrap them in a 'structure'. eg:

 struct MyStruct
 {
   int A ;
   bool B ;
   double C ;
 } ;

Class

A structure that also contains some functions that operate on the data elements of the class objects. In C++, 'structures' and 'classes' are really the same thing - except that the contents of a 'struct' are public by default and those of a 'class' are private by default. eg:

 class MyClass
 {
   public:
     int A ;
     void DoSomeWork () { A = A + 1 ; }
 }

Most of the more complex data structures will be implemented as some kind of a class object.

Arrays

A simple linear collection of variables, all of the same type, stored consecutively in memory so that they can be indexed with a simple integer:

  float X [ 10 ] ;

Notice that you can have arrays of more complex types too:

  MyStruct X [ 10 ] ;

...declares an array of ten 'struct XYZ' objects.

  X [ 6 ]

...picks the SEVENTH object out of the array - because we start counting at zero.

Stack

Sometimes known as a LIFO (Last-In, First-Out), a stack is like a pile of objects which you can pile more objects on top of - or remove objects, but ONLY from the top. It generally comprises an array to hold the objects and a pointer (called the 'stack pointer') that shows where the top of the pile is.

                      +-------------+ Top of stack
                      |             |
  StackPointer ------>|             |
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      +-------------+ Bottom of stack

It is conventional to have the stack pointer point to the first EMPTY space in the array because that allows you to initialise it to point to the first slot in the array.

   StackPointer = & Array [ 0 ] ;

The two most common operations you can do to a stack are called 'push' and 'pop':

  • Push adds an object to the stack: It writes the new item of data where the stack pointer is pointing and increments the stack pointer to the next free slot.
   void Push ( int x )   // For a stack of integers
   {
      *StackPointer = x ;
      StackPointer++ ;
   }
  • Pop removes the topmost objects from the stack: It decrements the stack pointer (so it's now pointing at the most recently pushed object) and returns whatever it points at. That location is now 'free' again.
   int Pop ()
   {
     StackPointer-- ;
     return *StackPointer ;
   }

Additionally, there needs to be a way to check how many items there are in the stack and to handle the error conditions when too many items are pushed and you run out of space in the array - or when someone tries to pop something when the stack is empty.

Queue (aka Ring buffer, aka Circular buffer, aka FIFO)

A Queue (or FIFO...First-In, First-Out) is similar in concept to a stack - except that you when you remove something from the pile, you always take from the bottom (not from the top). You can think of it as a queue of people waiting in line at a bank counter. As the teller becomes free, a person can be popped off the 'head' of the queue - when more people arrive, they politely add themselves to the tail of the queue. If you use a structure similar to a stack - you need two pointers - one where the data will be removed (called the 'head' pointer) and one where new data will be added (the 'tail' pointer).

                      +-------------+ Top of stack
                      |             |
  TailPointer ------->|             |
                      |/////////////|
                      |/////////////|
                      |/////////////|
  HeadPointer ------->|/////////////|
                      |             |
                      +-------------+ Bottom of stack

As with a stack, the TailPointer points to the next empty space. The HeadPointer points to the next data item to remove. When the queue is empty, the head and tail pointer point to the same address.

  HeadPointer = TailPointer = & Array [ 0 ] ;

We also have operations called push and pop that add and remove data. The 'push' operation is almost identical to the stack example - but 'pop' has to take data from where the HeadPointer points and increment it to point at the next data item.

However, there is a problem. Because we keep adding data at one end and removing it from the other - the section of the array with data in it creeps upwards - and sooner or later, the TailPointer will be pointing off the end of the array. When this happens with a stack - the stack is full and that's probably an error condition. But in a queue, there may well be a ton of space at the bottom of the array where we'd previously popped a bunch of stuff from. So what we have to do is to pretend that the top of the array is wrapped around and glued to the bottom end. That's why this data structure is sometimes called a 'ring buffer' or a 'circular buffer'. Hence, you might at some time see this:

                      +-------------+ Top of stack
                      |/////////////|
                      |/////////////|
  HeadPointer ------->|/////////////|
                      |             |
                      |             |
  TailPointer ------->|             |
                      |/////////////|
                      |/////////////|
                      +-------------+ Bottom of stack

This is easy enough to implement - after you increment either HeadPointer and TailPointer when you pop and push - you check to see if they are pointing off the end of the array - and if so, wrap them back to the beginning. That, in turn leads to some additional complexity with error checking because if the queue fills up completely, the head and tail pointers will wind up pointing to the same location...which is exactly the same as they are when the queue is completely empty. There are several ways to avoid this problem - the simplest is to have another variable that contains the number of items in the queue - increment it on every 'push' and decrement it on every 'pop'.