Jaegool_'s log

IT 279: Algorithms and Data Structures, Week 5 Hash Tables 본문

Courses

IT 279: Algorithms and Data Structures, Week 5 Hash Tables

Jaegool 2023. 10. 5. 11:33

Progress Check 11 on Hash Tables


1. Insertion and searching in a search tree dictionary has a best case of O(log(n)) and we sometimes aren't even that lucky. What is the best search time that we can reasonably achieve in a hash table?

    O(1), constant time search

 

2. Which of the following hash functions is correct for an integer key being stored in a dictionary of size tableSize?
    hash(key) = key % tableSize

 

3. Which of the following concepts is closest to a hash table that employs separate chaining? 
    An array of linked lists

 

Perfect Hash Characteristics

Must be able to efficiently determine the index from the key value.

Must have no collisions among key values

 

Imperfect Hashing Challenges

1. Determining the hash bucket from the key

2. Handling collisions

 

Separate Chaining Evaluation

Advantages? simple implementation, no worry about clustering

Disadvantages? extra memory overhead, potential for poor average performance

 

While Linear Probing is simple and effective, it has several issues:

  1. Primary Clustering: One of the primary problems with linear probing is the formation of clusters. As more collisions occur, they tend to form groups of occupied slots. As these clusters grow, the chance of a newly inserted key colliding with a slot in that cluster increases, making the cluster even larger. This clustering effect can severely degrade the performance of the hash table.
  2. Performance Degradation: As the load factor (ratio of occupied slots to total slots) of the hash table increases, the average probe length for insertion increases significantly. This can make operations on the table slower.
  3. Poor Cache Performance: Due to the linear nature of probing, when a collision occurs, the cache may need to fetch different parts of the hash table from memory, leading to cache misses and performance degradation, especially if the table is large.

 

 

Quadratic Probing:
The general formula to find the ith slot after the original hash position is:

Here's a step-by-step procedure:

  1. Compute the hash value .
  2. If the slot at is unoccupied, insert the key there.
  3. If the slot is occupied (a collision has occurred), compute the next slot as h(k) + 1^2. Check this slot.
  4. If this next slot is also occupied, compute the next slot as h(k) + 2^2. Continue this process until an unoccupied slot is found or until we've checked the entire table.

 

더블 해싱(double hashing)은 충돌 해결을 위한 또 다른 오픈 어드레싱(open addressing) 기법입니다. 더블 해싱은 두 개의 해시 함수를 사용하여 충돌이 발생했을 때 새로운 위치를 찾습니다.

기본적인 아이디어는 첫 번째 해시 함수 로 초기 위치를 찾고, 두 번째 해시 함수 를 사용하여 연속적인 충돌 해결을 위한 간격(즉, 점프 크기)를 결정하는 것입니다.

더블 해싱을 사용하여 i번째 슬롯 위치를 찾는 일반적인 공식은 다음과 같습니다:

여기서:

  • 는 첫 번째 해시 함수입니다.
  • 는 두 번째 해시 함수입니다.
  • 은 해시 테이블의 크기입니다.
  • 는 0에서 시작하여 1씩 증가하는 카운터입니다.

prime n < table-size

 

Progress Check 12 on Hash Tables continued

1. Which of the following is a major disadvantage of linear probing? c

a. It takes up more space in the array.

b. It is mathematically compex.

c. The table has a tendency to develop clumps of data that can hurt performance.

d. You have to use % to wrap around to 0 when you come to the end of the table.

 

2. When is the most appropriate time to rehash a hash table?

Right before inserting the item that will make the table more than half full.

 

3. Given the following:

curLoc is the current index into the hash table
is the number of times you have had to compute a new index
hashVal is the result of the hash function (the original location for this item)
offset is the offset to use and was initialized to 1

Which of the options below is the best way to compute a new index with quadratic probing?

 

curLoc = curLoc + offset;
offset = offset + 2;

 

Progress Check 13 on Even More Hashing

 

 

quadra offset: i^2

double hashing offset(this case):  key%7i+1

location(if it is occupied) = (original hash add + offset) % m