About Me

Sunday, 13 November 2011

Hashing in Oracle


What is Hashing?
A hash function returns a value for an input, and the output is generally shorter or more compact than the input. For a relatively simple hash function, there is no guarantee that inputs map one to one for outputs, that is, two different inputs can have the same hashed output value.
Before getting more into where hashing takes place in Oracle, let’s look at some examples of how hashing is used.The table of names takes the sum of the ASCII values of each letter, sums them, and then lists the mod 13 remainder. The table is reproduced below with an additional column to help illustrate what is taking place.


create table hash_example
(lastname varchar2(12),
  ascii_sum number,
  hashed_value number);
insert into hash_example (lastname) values ('Brecker');
insert into hash_example (lastname) values ('Corea');
insert into hash_example (lastname) values ('Davis');
insert into hash_example (lastname) values ('Hancock');
insert into hash_example (lastname) values ('Harris');
insert into hash_example (lastname) values ('Marsalis');
insert into hash_example (lastname) values ('Parker');
commit;

declare
  cursor c is
  select lower(lastname) lastname 
  from hash_example 
  for update;
  l_name hash_example.lastname%type;
  l_sum hash_example.ascii_sum%type := 0;
  l_tmp number := 0;
begin
  for r in c loop
    for i in 1..length(r.lastname) loop
      l_tmp := ascii(substr(r.lastname,i,1));
      l_sum := l_sum + l_tmp;
    end loop;
    update hash_example
    set ascii_sum = l_sum,
        hashed_value = mod(l_sum,13)
    where current of c;
    l_tmp := 0;
    l_sum := 0;
  end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> select * from hash_example;

LASTNAME      ASCII_SUM HASHED_VALUE
------------ ---------- ------------
Brecker             734            6
Corea               522            2
Davis               535            2
Hancock             727           12
Harris              649           12
Marsalis            860            2
Parker              645            8



Overall, this was a pretty simple example. Given that you can pass in a value and get an output based on the hashing algorithm, and that the output is like a fingerprint of the original data.

Hashing in a Database

How does Oracle know if a SQL statement is new or already parsed? A parsed statement has a hash value that is by and large unique enough (collisions, or same output for different inputs can happen, but are rare). Shown below is an extract from a trace file, and the lines show each statement’s identifying information.
1 PARSING IN CURSOR #1 len=46 dep=0 uid=5 hv=810835083 ad='1d9cd494'
select ename from scott.emp where empno = 7369
2 PARSING IN CURSOR #2 len=46 dep=0 uid=5 hv=1290101743 ad='1d9cca00'
select ename from scott.emp where empno = 7499
3 PARSING IN CURSOR #1 len=46 dep=0 uid=5 hv=1589217292 ad='1d9cc7c4'
SELECT ENAME FROM SCOTT.EMP WHERE 7499 = EMPNO
4 PARSING IN CURSOR #2 len=46 dep=0 uid=5 hv=810835083 ad='1d9cd494'
select ename from scott.emp where empno = 7369
Statements 1, 2 and 3 are different, so each has its own hashed value (the hv value). Statements 1 and 4 are the same, and note both have the same hv value. One of the more desirable characteristics of a hashing function is that it should be quite fast and inexpensive to perform. If your statement has already been parsed (has a hash ID value), then a comparison is all that is needed. As pointed out in numerous other sources, hard parsing is undesirable. Re-use of statements can be accomplished two ways. One is via the use of bind variables, and the other is to embed commonly used statements in functions or procedures.

Where else is hashing used?

Hash partitioning is useful for data that doesn’t cleanly fall into categories such as range or list. Composite partitioning such as range-hash can be used to further categorized and distribute data. In these examples, the indexing feature or benefit is being used.


No comments:

Post a Comment