- A bucket is a unit of storage containing one or more records (a bucket is typically a disk block).
- The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1... bucketM-1.Typically, a bucket corresponds to one (or a fixed number of) disk block.
- In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash
Function, h (K).
- The record with hash key value K is stored in bucket, where i=h(K).
- Hash function is used to locate records for access, insertion as well as deletion.
- Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.
- Primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.
- h(K) mod M = bucket to which data entry with key k belongs. (M = # of buckets)
- Good for database that grows and shrinks in size
- Allows the hash function to be modified dynamically
- Extendable hashing – one form of dynamic hashing
- Hash function generates values over a large range —typically b-bit integers, with b = 32.
- At any time use only a prefix of the hash function to index into a table of bucket addresses.
- Let the length of the prefix be i bits, 0 _ i _ 32.
Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the database grows and shrinks.
- Multiple entries in the bucket address table may point to a bucket.
- Thus, actual number of buckets is < 2i
- The number of buckets also changes dynamically due to
Coalescing and splitting of buckets.