Introducing data structures

To start with our journey, we first need to agree on a common language to describe and evaluate algorithms.

Describing them is pretty much a standard process: algorithms are described in terms of the input they take and the output they provide. Their details can be illus­trated with pseudo-code (ignoring implementation details of programming lan­guages) or actual code.

Data structures (DS) follow the same conventions, but they also go slightly beyond. We also have to describe the actions you can perform on a data structure. Usually each action is described in term of an algorithm, with an input and an output, but in addi­tion to those, for DSs we also need to describe side effects, the changes an action might cause to the data structure itself.

To fully understand what this means, we first need to properly define data structures.

1. Defining a data structure

A data structure is a specific solution for organizing data that provides storage for items and capabilities for storing and retrieving them.

The simplest example of a DS is an array. For instance, an array of characters pro­vides storage for a finite number of characters and methods to retrieve each character in the array based on its position. Figure 1.1 shows how array = [‘C’,  ‘A’,  ‘R’ ] is stored: an array of characters storing the characters C, A, and R, such that, for instance, array[ 1] corresponds to the value ‘A’.

Figure 1.1 The (simplified) internal representation of an array. Each element of the array in the picture corresponds to a byte of memory, whose address is shown below it. Each element’s index is shown above it. An array is stored as a contiguous block of memory, and each element’s address can be obtained by adding its index within the array to the offset of the first element. For instance, the fourth character of the array (array[3], empty in the figure), has address 0xFF00 + 3 = 0xFF03.

Data structures can be abstract or concrete:

  • An abstract data type (ADT) specifies the operations that can be performed on some data and the computational complexity of those operations. No details are provided on how data is stored or how physical memory is used.
  • A data structure (DS) is a concrete implementation of the specification pro­vided by an ADT.

Back to our array example, a possible ADT for a static array is, for example, a con­tainer that can store a fixed number of elements, each associated with an index (the position of the element within the array), and access any element by its position (ran­dom access).

Its implementation, however, needs to take care of details such as

  • Will the array size be fixed at creation or can it be modified?
  • Will the array be allocated statically or dynamically?
  • Will the array host only elements of a single type or of any type?
  • Is it going to be implemented as a raw chunk of memory or as an object? And what attributes will it hold?

Even for such a basic data structure as arrays, different programming languages make different choices with respect to the previous questions. But all of them make sure their version of arrays abides by the array’s ADT we described earlier.

Another good example to help understand the difference might be a stack. We will describe stacks in appendices C and D, but I assume you have likely heard of stacks before.

A possible description of a stack as an ADT is the following: a container that can store an indefinite number of elements, and can remove elements one at a time, start­ing from the most recent, according to the inverse order of insertion.

An alternative description could break down the actions that can be performed on the container. A stack is a container that supports two main methods:

  • Insertion of an element.
  • Removal of an element. If the stack is not empty, the element that was added most recently will be removed from the stack and returned.

It’s still a high-level description, but it’s clearer and more modular than the previous one.

Either description is abstract enough to make it easily generalizable, allowing you to implement a stack in a wide range of programming languages, paradigms, and systems.

At some point, however, we will have to move to a concrete implementation and will need to discuss details such as

  • Where are the elements stored?
    • An array?
    • A linked list?
    • A B-tree on disk?
  • How do we keep track of the order of insertion? (Connected to the previous question.)
  • Will the max size of the stack be known and fixed in advance?
  • Can the stack contain elements of any type or must all be the same type?
  • What happens if removal is called on an empty stack? (For instance, returning null versus throwing an error.)

And we could keep on going with questions, but hopefully you get the idea.

2. Describing a data structure

The crucial part of an ADT definition is to list the set operations that it allows. This is equivalent to defining an API,[4] a contract with its clients.

Every time you need to describe a data structure, you can follow a few simple steps to provide a comprehensive and unambiguous specification:

  • Specifying its API first, with a focus on the methods’ input and output
  • Describing its high-level behavior
  • Describing in detail the behavior of its concrete implementation
  • Analyzing the performance of its methods

We will use the same workflow for the data structures presented in this book after describing a concrete scenario in which each data structure is actually used.

Starting in chapter 2, with the description of the first data structure presented, we will also explain in further detail the conventions we use for the API description.

3. Algorithms and data structures: is there a difference?

No, they are not exactly the same thing; technically they are not equivalent. Neverthe­less, we might sometimes use the two terms interchangeably and, for the sake of brev­ity, use the term data structure to mean a DS and all its relevant methods.

There are many ways to point out the difference between the two terms, but I par­ticularly like this metaphor: data structures are like nouns, while algorithms are like verbs.

I like this angle because, besides hinting at their different behavior, it implicitly reveals the mutual dependency between them. For instance, in English, to build a meaningful phrase, we need both nouns and verbs, a subject (or object), and an action performed (or endured).

Data structures and algorithms are interconnected; they need each other:

  • Data structures are the substrate, a way to organize an area of memory to repre­sent data.
  • Algorithms are procedures, a sequence of instructions aimed to transform data.

Without algorithms to transform them, data structures would just be bits stored on a memory chip; without data structures to operate on, most algorithms wouldn’t even exist.

Every data structure, moreover, implicitly defines algorithms that can be per­formed on it—for instance, methods to add, retrieve, and remove elements to/from the data structure.

Some data structures are actually defined exactly with the purpose of enabling one or more algorithms to run efficiently on them. Think of hash tables and search by key.

So, when we use algorithm and data structure as synonyms, it’s just because in that particular context one implies the other. For instance, when we describe a DS, for that description to be meaningful and accurate, we necessarily need to describe its meth­ods (that is, algorithms).

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *