Could anyone explain why an array will always be faster than a hash object described in SAS Prog3 course note?
Remember to vote up questions/answers you find interesting or helpful (requires 15 reputation points)
|
3
|
|
|
|
|
3
|
If you have the memory for an array you don't need a hash object. The point of a hash table is to provide access to a particular keyed item almost as fast as you'd get with a simple array. Consider a file with 10 million SSNs as keys. Now, there are 1 billion possible SSNs (minus those covered by special rules); but you probably don't have the memory for slots representing all 1 billion. If you did, you'd put the info in memory and access it as lastname(ssn) or address(ssn) or whatever. And that sort of access is easy, because calculating the memory location is simple: start + index*length. But you do have memory for the 10 million in your database. A hash table trades computation time for compactness: it costs more to find an item in the hash table but--you can do it without investing in a new motherboard. |
||||||
|
|
2
|
My understanding is that there are additional overheads when working with a hash object. These can be broken down into 3 main areas:
The hash object takes its name from the hashing algorithm that is used to calculate where an observation will be stored in the hash table (ie. in memory). The hashing algorithm is extremely fast though and the result of applying the algorithm to a key value is the exact location in memory it is stored. This means that the hashing algorithm needs to be applied to a key value any time you want to store or retrieve information from the hash table. When dealing with an array the hash algorithm is not required because you explicitly specify the index from the array that you want to store or retrieve information from. The location in memory can be almost instantly determined once the index is supplied. This assumes you already know the array index that contains the data you require. Both hash tables and arrays are loaded into memory. If you don't have sufficient memory the performance of a hash table can severely suffer. (http://support.sas.com/kb/34/193.html) If you wanted to create hash tables in earlier versions of SAS prior to the existence of the hash object then you had to create it manually using arrays. |
||
|
|