Dynamic Arrays

More often than not, when you are programming some routine that requires an array, you don't know how many elements you want in that array. It may be ten, one hundred, or a thousand, but certainly it's only at run time that you can come up with the answer. Furthermore, because you don't know, it's hard to declare the array as a local variable declaring the maximum size you may encounter might put strains on the stack, especially in Delphi 1 it certainly makes sense to allocate it on the heap....

Deletion from a RedBlack Tree

Compared with insertion, deletion from a red-black tree is filled with special cases, and can be difficult to follow. As usual with binary search trees, we start off by searching for the node to be deleted. Like before, we'll have three initial cases the node has no children or, in red-black tree terms, both of its children are external nodes the node has one actual child and one external child and, finally, the node has two real children. We delete the node in exactly the same way as we did...

Binary Search Tree Rearrangements

When discussing the binary search tree, I stated that adding items to a binary tree could make it woefully unbalanced, sometimes even degenerating into a long spindly tree like a linked list. The problem with this degeneration is not that the binary search tree stops functioning properly items are still being stored in sorted order , it's that the tree's good efficiency takes a fatal knock if it happens. For a perfectly balanced tree on average, all parent nodes have two children, and all...

Pseudorandom Probing

The next alternative is pseudorandom probing. This algorithm needs a random number generator that we can reset at a particular point. Of the ones we introduced in Chapter 6, the best one for this algorithm would be the minimal standard random number generator since its state is completely defined by one longint value, the seed. The algorithm uses the following steps. Hash the key to give a hash value, but do not take the modulus with the table size. Set the generator's seed value to this hash...

Creating a Binary Tree

Creating a binary tree is trivial. At its most simple, the root node in a binary tree defines the binary tree. If MyBinaryTree is nil, there is no binary tree, so this value serves as the initial value of the binary tree. initialize the binary tree MyBinaryTree nil However, in practice, we tend to use a dummy node analogous to the dummy head node in a singly linked list, so that every real node in the tree has a parent, including the root. The root node can be either the left or right child...

Using State Machines Parsing

An example will help us understand the process. Let us suppose we wish to devise an algorithm that would extract the individual words in a string of text. We'll put the words we extract into a string list. Moreover, there's a wrinkle we would like quoted text inside the string to be considered as a single word. So, if we had the string He said, State machines the routine would ignore the punctuation and white space and return Notice that the white space and punctuation inside the quoted text...

Deterministic and Nondeterministic State Machines

Now that we have seen a couple of fairly complex state machines and are more familiar with them, I will introduce a couple of new terms. The first is automaton plural automata . This is nothing more than another name for a state machine, but it is used extensively in computer science classes and textbooks. A finite state machine or finite automaton is merely a state machine whose number of states is not infinite. Both of our examples are finite automata the first has three states and the second...

Sort Basics

Sort algorithms separate themselves into two types stable sorts and unstable sorts. A stable sort is one where, if there are one or more sets of items that compare equally, the order of those items will be the same in the sorted sequence as they were in the original sequence. For example, suppose that there are three items A, B, and C, all with sort value 42 so that they compare equal , and they originally appeared in the list with A in position 12, B in position 234, and C in position 3456. In...

Insertion with a Binary Search Tree

For a user of a binary search tree, we can make the insert operation a lot easier to use all that has to be provided is the item itself. No longer does the user have to worry about which node becomes the parent and as which child the new node is added. Instead, the binary search tree can do it all, hiding all the details, by using the ordering of the items within the tree as a guide. In fact, it is relatively easy to insert a new item into a binary search tree, and we've seen the majority of...

The CPU Cache

Indeed, the hardware on which we all program and run applications uses a memory cache. The machine on which I'm writing this chapter uses a 512 KB high-speed cache between the CPU and its registers and main memory of which this machine has 192 MB . This high-speed cache acts as a buffer when the CPU wants to read some memory, the cache will check to see if it has the memory already present and, if not, will go ahead and read it. Memory that is frequently accessed that is, has temporal locality...

Collision Resolution through Bucketing

There is a variant of the chaining method for collision resolution called bucketing. Instead of having a linked list at each slot, there is a bucket, essentially a fixed size array of items. When the hash table is created, we have to allocate the bucket for each slot and mark all the elements in each bucket as empty. To insert an item, we hash the key for the item to find the slot number. We then look at each element in the bucket until we find one that is marked as empty and set it to the item...

Navigating through a Binary Tree

Now we've seen how to build a binary tree we can discuss how to visit all the nodes in such a structure. By visit, I mean process the item in the node in some fashion. This could be something as simple as writing out the data in the node, or it could be more complex. Unlike a linked list where the navigation of the structure is pretty well defined follow all the Next pointers until we reach the end , in a binary tree there are two ways we could take at each node and so the process becomes a...

Quadratic Probing

The first one is the quadratic probe. With this algorithm we try and avoid creating clusters by not always checking the next slot in sequence. Instead, we check slots that are farther and farther away. If the first probe is unsuccessful, we check the next slot. If that probe is unsuccessful, we check the slot four slots along. If that probe is unsuccessful, we check the slot nine slots along and so on, with subsequent probes jumping 16, 25, 36, etc., slots along. This would help break up the...

Simple Hash Function for Strings

The argument with the previous idea would seem to indicate that we should weight each character according to its position in the string to avoid collisions when using anagrams as keys. This results in the following implementation the source code can be found in TDHshBse.pas on the CD . Listing 7.1 Simple hash function for string keys function TDSimpleHash const aKey string i integer Hash longint begin Hash 0 Hash Hash 17 ord aKey i mod aTableSize Result Hash if Result lt 0 then inc Result,...

Queues Using Arrays

Now, let's consider how to implement a queue with an array. Again, to simplify things, we'll use a TList at least we won't have to worry about memory allocation or array growing issues. Having seen the linked list version, your first impulse might be to call Add to append items to the end of a TList instance for the enqueue operation, and to call Delete to remove the first item in the TList for the dequeue operation or vice versa insert at the front of the list, delete from the end . However,...

Assertions

Since the first rule indicates that we will always have to do some debugging, and the corollary states that we don't want to be embarrassed by inadequately tested code, we need to learn to program defensively. The first tool in our defensive arsenal is the assertion. An assertion is a programmatic check in the code to test whether a particular condition is true. If the condition is false, contrary to your expectation, an exception is raised and you get a nice dialog box explaining the problem....

Searching through a Skip List

If you look again at Figure 6.3, you'll notice that you can characterize the list as being several singly and doubly linked lists merged together. There is a doubly linked list at level 0, a singly linked list at level 1 that skips over single nodes i.e., it links every second node , another singly linked list at level 2 that skips over three nodes i.e., it links every fourth node , and another singly linked list at level 3 that skips over seven nodes i.e., it links every eighth node . So, to...

The Singly Linked List Class

Before delving into the design of the singly linked list class TtdSingleLinkList, a couple of notes are in order. First things, first. As I said earlier, it would be nice to be able to use the linked list without having to worry about nodes. Like the TList, we would like to have the class accept untyped pointers. To be able to access items in the linked list, we'd certainly like to use an index although, as I pointed out this can be slow , but better still would be to borrow some database...

Class Implementation of a Splay Tree

The TtdSplayTree class is a simple decendant of the TtdBinarySearchTree class, overriding the Delete, Find and Insert methods and declaring new internal methods to splay and promote a node. Listing 8.18 shows the interface to this class. Listing 8.18 The interface to TtdSplayTree Listing 8.18 The interface to TtdSplayTree TtdSplayTree class TtdBinarySearchTree function stPromote aNode PtdBinTreeNode procedure stSplay aNode PtdBinTreeNode procedure Delete aItem pointer override function Find...

Bubble Sort

The first sort that programmers come across when they're learning the tricks of the trade is the bubble sort. This is a shame, since, of all the sorts, it tends to be the slowest. It is perhaps the easiest to code though not by much . Bubble sort goes a bit like this. Fan your cards out. Look at the twelfth and thirteenth cards on the right side. If the twelfth is larger than the thirteenth, swap them. Now look at the eleventh and the twelfth cards. If the eleventh is bigger than the twelfth,...

Completing Heapsort

Having ordered an array into a heap, what then Removing the items one by one still means we need somewhere to put them in sorted order, presumably some auxiliary array. Or does it Think about it for a moment. If we peel off the largest item, the heap size reduces by one, leaving space at the end of the array for the item we just removed. In fact, the algorithm to remove an item from a heap requires the lowest, rightmost node to be copied to the root before being trickled down, so all we need to...

Insertion into a Skip List

Having seen how to search for an item in a pre-existing skip list, we should consider how to build one by inserting items. Looking back at Figure 6.3, it looks like an impossible task to build this extremely regular structure through a series of unknown item insertions and deletions. The cleverness of Pugh's insertion algorithm is that he realized that it was impossible or rather, much too long-winded and time-consuming to build the completely regular structure, so he proposed building a skip...

Traversing a Linked List

Traversing a linked list is pretty simple as well. We essentially walk the list, going from node to node following the Next pointers, until we reach the nil node that signifies the end of the list. TempNode FirstNode while TempNode lt gt nil do begin Process TempNode .Data TempNode TempNode .Next end In this simple loop the Process procedure is defined elsewhere and presumably will do something with the Data field it is passed. Emptying a linked list uses a slight variation of this technique in...

Heapsort

Now that we've implemented the heap version of a priority queue, we can observe that it can be used as a sorting algorithm add a bunch of items all at once to the heap and then pick them off one by one in the correct order. Note that, as written, the items are picked off in reverse order, that is, largest first, but using a reversed comparison method, we can get them in the correct ascending order instead. The algorithm to sort with a heap, unsurprisingly, is known as heapsort. If you recollect...

Unit Testing

Unit testing is the process of testing parts of your program divorced from the program itself. One of the new development methodologies being discussed at the time of this writing is extreme programming 3 . This methodology espouses a number of recommendations, some of which are fairly contentious, but at least one of them makes excellent sense. The recommendation is to write a test at the time you write a method of a class. If the method seems to require more than one test, then you do so....

Advantages and Disadvantages of Chaining

The advantages of chaining are fairly obvious. Firstly, we never run out of space in a hash table that uses chaining we can continue adding items ad nauseam to the hash table and all that happens is that the linked lists just keep on growing. Insertion and deletion are extremely simple to implement indeed, we've done most of the work in Chapter 3. Despite its simple nature, chaining has one main disadvantage. This is that we never run out of space The problem here is that the linked lists grow...

Data Alignment

Another aspect of the hardware that we must take into account is that of data alignment. Current CPU hardware is built to always access data from the cache in 32-bit chunks. Not only that, but the chunks it requests are always aligned on a 32-bit boundary. This means that the memory addresses passed to the cache from the CPU are always evenly divisible by four 4 bytes being 32 bits . It's equivalent to the lower two bits of the address being clear. When 64-bit or larger CPUs become more...

Efficiency Considerations

If that were all there was to say about linked lists, this would be a very short chapter. We'd just present a class encapsulation of a singly linked list and move on. However, there is more to be said before writing a linked list class, especially with regard to efficiency. Look again at the code for insertion and deletion. Doesn't it strike you as messy to have separate cases for both operations It does me. We have to have this special code in order to cope with processing the first node in...

Virtual Memory and Paging

The first performance bottleneck we should understand is virtual memory paging. This is easier to understand with 32-bit applications, and, although 16-bit applications suffer from the same problems, the mechanics are slightly different. Note that I will only be talking in layman's terms in this section my intent is not to provide a complete discussion of the paging system used by your operating system, but just to provide enough information so that you conceptually understand what's going on....

Splay Trees

Anyway, having seen these rotations and zig-zag and zig-zig operations, we can use them in a data structure known as a splay tree. A splay tree is a binary search tree constructed such that any access to a node results in the node being splayed to the root. Splaying is the application of zig-zag or zig-zig operations until the node being splayed is at the root of the tree, or is one level down from the root, in which case a single rotation can promote it to the root. Splay trees were invented...

Inserting and Deleting from a Doubly Linked List

Removing Node Doubly Linked List

How do we insert a new node into a doubly linked list For a singly linked list we had to break a single link and then form two new links to insert a node, so for a doubly linked list we have to break two links and form four new ones. We can insert either before or after a node in the list because the Prior pointers make it easy to traverse the list in either direction. Indeed, an insert before operation can be coded as a move back one node, insert after operation, so we'll just consider the...

Compare Routines

The very act of searching for an item in a set of items requires us to be able to differentiate one item from another. If we cannot tell the difference between any two items in general, there's no point in trying to look for one in particular. So the first hurdle we must overcome is how to tell the difference, how to compare two items that we may come across. There are two types of comparison we could make the first is for unsorted lists of items, where we only need to know whether one item is...

The PJW Hash Functions

During the discussion on hash tables, the Dragon Book Compilers Principles, Techniques, and Tools by Aho, et al 2 shows a hash function by PJ. Weinberger this routine is also known as the Executable and Linking Format ELF hash . This follows a similar algorithm to the routine in Listing 7.1, except it throws in a randomizing effect where the topmost nibble of the longint hash work variable the nibble that is going to disappear due to the overflow from the next multiplication , if it is...

Stacks Using Linked Lists

With a linked list implementation of a stack, the push operation is coded as inserting a new node at the front of the list. The pop operation is coded as deleting the node at the front of the list and returning the data. Neither operation depends on the number of items in the list so we can categorize both as O 1 operations. That's it, we're done with the design. Of course, implementing this design involves a little more decision-making. We could code a stack class either to descend from the...

Full Skip List Class Implementation

Having shown the three complex operations for the skip list, we can now reveal the interface to the class itself. Unlike its simpler linked list brethren, the skip list class does not provide any array-like functionality. It's not that we couldn't add an access by index it's just that the skip list is the first of the data structures in this book where it no longer makes sense to do so others being the hash table and the binary tree . The reason for this is that providing the correct index for...

The Linear Probe Hash Table Class

Listing 7.3 shows the interface for our linear probe hash table the entire source code for this class can be found in TDHshLnPpas on the CD . There are a couple of things to note about this implementation. First, we make the convention that the key for an item is a string, separate from the item itself. This makes the underlying code a lot easier to understand, and also makes designing and using the hash table easier. In the vast majority of cases, keys will be strings anyway, and converting...

Benefits and Drawbacks of Linked Lists

Linked lists have one large benefit insertion and deletion are O 1 operations. It doesn't matter where you are in the linked list or how many items exist in the list, it takes the same amount of time to insert a new item or to delete an existing item. The one main drawback to linked lists is that accessing an item by index is a O n operation. In this case, how many items are in the list matters to find the nth item, we have to start at some point in the list and follow links, counting as we go....

Extendible Hashing

The algorithm we need to use is called extendible hashing, and to use it we need to go back to square one with our hash function. Originally, we knew the size of our hash table and so, when we hashed a key, we would then immediately mod it with the table size and use the result as an index into our hash table. With extendible hashing, on the other hand, we do not know how big our hash table is, since it will grow whenever required to avoid a bucket overflow. In previous versions of our hash...

The Poker Test

The third test is known as the poker test. The random numbers are grouped into sets or hands of five, and the numbers are converted into cards, each card actually being a digit from 0 to 9. The number of different cards in each hand is then counted it'll be from one to five , and this result is bucketed. Because the probability of only one digit repeated five times is so low, it is generally grouped into the two different digits category. Apply the chi-squared test to the four buckets there...

Implementation of the Extended Priority Queue

As far as the user of the priority queue is concerned, this new interface is only slightly more complicated than before. Listing 9.9 shows the class interface to TtdPriorityQueueEx, the extended priority queue. Listing 9.9 The TtdPriorityQueueEx class interface Listing 9.9 The TtdPriorityQueueEx class interface procedure pqTrickleDown aHandle TtdPQHandle procedure ChangePriority aHandle TtdPQHandle function Enqueue aItem pointer TtdPQHandle function Remove aHandle TtdPQHandle pointer property...

Example of Using a Stack

Stacks are used wherever you have to calculate things in reverse order but then return them in the correct one. A simple exercise that I sometimes use when conducting an interview is to ask the candidate to devise some code to reverse a string. With a character stack, the exercise is trivial push the characters from the string onto the stack and then pop them off in reverse order. There are other ways of completing the exercise, of course. An interesting variation on this theme is the problem...

Class Implementation of a Binary Search Tree

As usual, we shall encapsulate a binary search tree as a class, although I would warn you again that you should only use it if you can be sure that the items being inserted are sufficiently random or small in number that the tree doesn't degenerate into a spindly affair. What we are trying to achieve with a binary search tree class is hiding the mechanics of the tree from the user. This means that the user should be able to use the class to keep a set of items in sorted order, and traverse...

Efficiency Considerations 1

Like the singly linked list, we can be a little more proactive with regard to efficiency. With a singly linked list we saw that having a head node improved matters nicely with regard to insertion and deletion. The corresponding case for doubly linked lists is to have two dummy nodes the head node and the tail node. With these two placeholder nodes we can easily walk the list from the first node to the last, and also backward from the last node to the first. There are no longer special cases for...

Hashing and Hash Tables

In Chapter 4, we looked at searching algorithms for finding an item in an array for example, a TList or in a linked list. The fastest general method we discussed was binary search, which required a sorted container. Binary search is a O log n algorithm. Thus, for 1,000 items, we'd need approximately 10 comparisons to either find a given item in a list, or to discover that it wasn't actually there since 210 1024 . Can we do better If we had to rely on a comparison function to help us identify an...

Levelorder Traversals

The one traversal method we have not yet looked at is level-order, where we visit the root, visit the two possible nodes at level 1 from left to right, visit the four possible nodes at level 2 from left to right, and so on. This traversal method looks to be difficult to code, but in fact is very simple once you know the trick. The trick is to use a queue in the following manner. Enqueue the root node and enter a loop until the queue is empty. Dequeue the top node. Visit it. If its left child...

The Doubly Linked List Class

The interface to the TtdDoubleLinkList class is as follows Listing 3.13 The TtdDoubleLinkList class Listing 3.13 The TtdDoubleLinkList class function dllGetItem aIndex longint pointer procedure dllSetItem aIndex longint altem pointer procedure dllError aErrorCode integer const aMethodName TtdNameString class procedure dllGetNodeManager procedure dllPositionAtNth aIndex longint public constructor Create aDispose TtdDisposeProc destructor Destroy override procedure dllSetItem aIndex longint altem...

Array Types in Delphi

There are three main types of language-supported arrays in Delphi. The first type is the standard array, as defined by the array keyword. The second type was introduced in Delphi 4 to mimic something that has been available in Visual Basic for a long time, the dynamic array an array that can be re-dimensioned at run time. The final array type is not usually thought of as an array, although Object Pascal provides several variants of it. I am speaking, of course, of strings length-byte strings or...

Shuffling Generators

The final generator we'll discuss that produces a more random sequence of random numbers is the shuffling algorithm. In this book, we'll discuss the variant that uses just one internal PRNG, although there is another that uses two in a similar manner. Like the additive generator, we make use of a separate table of floating-point random numbers. The number of elements in this table is not overly important Knuth suggests a number in the neighborhood of 100, and we'll use 97, a close prime 11 ....

Deleting Items from a Linear Probe Hash Table

Before we look at some code, let us discuss deleting items from our hash table. It seems easy enough hash the key for the item to delete, find it using as many linear probes as required , and then mark the slot as empty. Unfortunately, this simplistic method causes some problems. Suppose that our hash function hashes the keys Smith, Jones, and Brown to 42, 42, and 43 respectively. We add them in that order to an empty hash table, to result in the following situation In other words, Smith...

Sorting

Sorting is commonplace in regular day-to-day programming. When we display a list box on a form, it looks better and is easier to use if the items in the list are in alphabetical order. We as human beings prefer order when presented with data patterns help us visualize the distribution of the data. Imagine how difficult a phone book would be to use if it weren't in alphabetical order but some other sequence. The chapters in this and every book are sorted by chapter number. In our development...