Further Programming in Java

Chapter 5: Hash Tables

Linear Probing Hash Tables

The linear probing hash table is a fairly simple structure where data items are stored directly inside the hash element array. It gets its name by the way it handles the unique problems that exist when storing the data items in the table directly.

linear probing table

Source Code

class LinearTable
{

   public final int SIZE = 7;
private HashItem[] table = new HashItem[SIZE];

declaring the table size and element array


   public int code(String key)
{
return (Math.abs(key.hashCode()) % SIZE);
}


.. more methods will follow from here ..
}

a method to calculate a hash code for a given string


Adding Data

simple addition

Data can be added to the table by calculating the hash code from the key of the data that is being stored, and then placing the data in the corresponding hash element.

collision when adding

However, there are many more possible search keys than there are hash elements. It is inevitable that some keys will resolve to the same hash code. This is known as a hash collision, and it must be handled.

Why doesn't it make sense to have a hash element for each possible piece of data?

Hash collisions will always occur, but they can be avoided. This is the reason why a good hash function is so important: by spreading the data evenly across the entire table, on the whole there is a smaller chance of collisions occuring.

Resolving Collisions

If faced with a collision situation, the linear probing table will look onto to subsequent hash elements until the first free space is found.

This traversal is known as probing the table; and as it goes by one element at a time, it is linear probing.

There are other kinds of probing; for example quadratic probing is where the traversal skips one element, then two, then four, etc. until a free space is found.

Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In order to resolve the collision, the add algorithm will need to probe the table in order to find the first free space (after 'C').

probing to resolve collision

Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In order to resolve the collision, the add algorithm will need to probe the table in order to find the first free space (after 'C').

looping back while probing

If the probe loops back, and finally reaches the same element that it started at, it means that the hash table is full, and can no longer hold any more data. The addition operation will fail.

trying to add to a full table

Algorithm

We can summarise the addition algorithm as follows:

Source Code

This method is part of LinearTable.java.

public boolean add(String key, String data)
{

   int probe;
   

variable to store probing location


   int code = code(key);
   

calculating the hash code


   if ( (table[code] == null) ||
table[code].isDeleted() )
{
table[code] = new HashItem(key, data);
probe = -1;
}

if the hash item is empty, add the data straight away...


   else
{
if (code == (table.length - 1))
probe = 0;
else
probe = code + 1;
}

...otherwise, probe through the list for a free spot, looping back to zero if necessary


   while ( (probe != -1) &&
(probe != code) )
{

keep probing data is stored or entire table has been visited


      if ( (table[probe] == null) ||
table[probe].isDeleted() )
{
table[probe] = new HashItem(key, data);
probe = -1;
}

if the probed element is free, add the data and mark as being stored


      else
{
if (probe == (table.length -1) )
probe = 0;
else
probe++;
}

}

otherwise probe to the next item


   if (probe != -1)
return false;
else
return true;

}

if data was stored, return true, otherwise false


Retrieving Data

In order to retrieve data from the hash table, we need a key to search for. From this key, we can calculate the hash code. This tells us where in the data array we need to start searching.

finding position to probe from

Because of the collision resolution of the add operation, the target data might reside at a location other than the element referred to by the hash code.

Therefore, it is necessary to probe the hash table until an empty hash element is found, and for an exact match between each data item and the given key. (The probing stops at an empty element, since it signals the end of where potential data might have been stored.)

searching and probing

Consider a situation where 'G' maps to the same hash code as 'B', and a search is undertaken. The retrieval algorithm will start looking at data items starting at that hash code, and continue comparing each hash item's contents for a match with 'G', until either the blank element is found, or (if the array is full) the probing loops back and ends up where the traversal started.

a failed search

Algorithm

We can summarise the retrieval algorithm as follows:

Source Code

This method is part of LinearTable.java.

public String retrieve(String key)
{

   int probe;
   

variable to store probing location


   int code = code(key);
   

calculating the hash code


   if (table[code] == null)
return null;

if the position is empty, immediately return failure...


   else if (table[code].getKey().equals(key))
return table[code].getData();

...but if it's a match, return the data straight away...


   else
{
if (code == (table.length - 1) )
probe = 0;
else
probe = code + 1;
}

...otherwise, probe to the next item, looping to zero if necessary


   while ((probe != -1) && (probe != code))
{

keep probing until data is found or entire table has been visited


      if (table[probe] == null)
return null;

if the probed element is completely empty, return failure


      else if (table[probe].getKey().equals(key))
{
return table[probe].getData();
}

if the probed element is a match, return the data...


      else
{
if (probe == (table.length - 1) )
probe = 0;
else
probe++;
}

}

...otherwise, keep probing for the next item, looping back to zero if necessary


   return null;
}

if nothing has been returned by now, data is not present


Deleting Data

The basic deletion operation is merely a retrieval followed by data removal (clearing the hash element, once the target has been found.)

simple deletion

Unfortunately, this has a negative side-effect on the way the retrieval works. Since the data retrieval operation relies on blank hash elements as the signal to stop probing, there is the possibility that a deletion operation will render some data items unfindable. Consider where a search for 'R' (which has the same hash code as 'A') is attempted, after 'D' has been deleted:

corruption due to simple deletion

The data 'R' will never be found, as the probing had terminated too early; this is due to the hash element that stored 'D' (and kept the probing going) being deleted.

The solution to this problem is to define two different kinds of blank hash elements:

These can be used to differentiate the situations in how a clear hash element came to exist; something that will be necessary to make the hash retrieval work again.

When a data item is deleted, it is not completely cleared, but instead has the "empty but deleted" mark. The retrieval function must then be modified so that it will terminate probing only on a purely empty element, and continue probing if an "empty but deleted" element is encountered.

An add operation can store data in the "empty but deleted" element.

As the deleted flag is only necessary to continue searching, adding data to one of these elements makes it work like just another normal element again (as far as the probing algorithm is concerned.)

using a deleted flag to restore search capability

Table Pollution

The 'deleted flag' approach is a relatively convenient way to resolve the problem of bridging together clusters of data for allowing data retrieval, however it has the side effect of causing those retrievals to take longer.

a slow table due to pollution

The reason for this is because over time, as the data structure is being manipulated, the number of deleted flags increase; as data is added and then deleted, what was once a purely empty element, becomes an empty but deleted element. Recall that for the 'deleted flag' technique to work, the retrieval can only terminate on a purely empty hash element.

These deleted flags may bridge together otherwise unrelated data (in terms of the search algorithm), and will gradually deteriorate the structure's efficiency. In the worst case, the hash table may prove no better than a basic linear search.

regenerating a table

The only solution to the table pollution issue is to create a brand new hash table, add the existing data into the new table, and discard the old one.

Simply deleting the deleted flags won't work; they are an integral part of the probing process. It's just sheer coincidence that in the above diagram, none of the data items happened to move.

In a linear probing hash table, the place where data is actually stored depends a lot on what has happened in the table; past data collisions, earlier deletions and re-additions, etc.

Algorithm

A summary of the deletion algorithm is below. Note that it is very similar to that of the retrieve -- this is because the data must be found before it can be deleted.

Source Code

This method is part of LinearTable.java.

public boolean delete(String key)
{

   int probe;
   

variable to store probing location


   int code = code(key);
   

calculating the hash code


   if (table[code] == null)
return null;

if the position is empty, immediately return failure...


   else if (table[code].getKey().equals(key))
{
table[code].setDeleted();
probe = -1;
return true;
}

...but if it's a match, delete straight away...


   else
{
if (code == (table.length - 1) )
probe = 0;
else
probe = code + 1;
}

...otherwise, probe to the next item, looping to zero if necessary


   while ((probe != -1) && (probe != code))
{

keep probing until data is found or entire table has been visited

      if (table[probe] == null)
return null;

if the probed element is completely empty, return failure


      else if (table[probe].getKey().equals(key))
{
table[code].setDeleted();
probe = -1;
return true;
}

if the probed element is a match, delete the data...


      else
{
if (probe == (table.length - 1) )
probe = 0;
else
probe++;
}

}

...otherwise, keep probing for the next item, looping back to zero if necessary


   return null;
}

if nothing has been returned by now, data is not presen