### Back from a time out...

I just got back from a short week in the woods:

We were very much bandwidth challenged - I was using my Verizon wireless card but connectivity was spotty to say the least. The best place to receive a signal was not a very convenient one:

We used the second floor balcony and perched the laptop on the railing as far away from the cottage as possible. There we achieve blazing speeds of up to 80/90 kbps! (yes, that is about 10 to 11 Kilobytes per second). So, once I realized that - I just checked out. I would check my email in the morning and the evening - attend to anything that needed immediate attention and either said "sorry, on vacation, but lots of forums out there to try!" or filed it away for later (now being later - I have a scroll bar on my inbox and that drives me nuts). My goal is to achieve this "nirvana" again.

Anyway, it was a very secluded place

and for some of the time the only cottage with anyone in it (there are only three to begin with) was ours. That is my idea of down time. The best part of this vacation - I did not have to pull my wallet out of my back pocket all week - the nearest store was far far away and I wasn't leaving the grounds.

Anyway - was just digging through the email and got a question from someone about how to do something (imagine that). It piqued my curiosity (meaning, it was interesting enough that I didn't just reply "sorry, please don't email me" :) In short, they needed a function that would accept two strings and return Y if the second string was an anagram of the first string, N otherwise. They sent their code with the comment "I think this can be done better".

So I gave it a try. Wow, did it feel good to write a piece of code after a week of not doing much at all computerwise. It woke me right up. Here is my solution:

Now, one up it - any easier approaches you can think of?

ops$tkyte%ORA10GR2> create or replace package anagram_pkg

2 as

3 type anagram_array is table of varchar2(1);

4 function compare

5 ( p_src in varchar2, p_tgt in varchar2 )

6 return varchar2

7 deterministic;

8 end;

9 /

Package created.

ops$tkyte%ORA10GR2> create or replace package body anagram_pkg

2 as

3

4 function compare

5 ( p_src in varchar2, p_tgt in varchar2 )

6 return varchar2

7 deterministic

8 is

9 l_data1 anagram_array default anagram_array();

10 l_data2 anagram_array default anagram_array();

11 l_len number default length(p_src);

12 l_return varchar2(1);

13 begin

14 if ( l_len <> length(p_tgt)

15 or p_src is null

16 or p_tgt is null )

17 then

18 l_return := 'N';

19 else

20 l_data1.extend(l_len);

21 l_data2.extend(l_len);

22 for i in 1 .. l_len

23 loop

24 l_data1(i) := upper(substr( p_src,i,1 ));

25 l_data2(i) := upper(substr( p_tgt,i,1 ));

26 end loop;

27

28 if ( l_data1 = l_data2 )

29 then

30 l_return := 'Y';

31 else

32 l_return := 'N';

33 end if;

34 end if;

35 return l_return;

36 end compare;

37

38 end;

39 /

Package body created.

ops$tkyte%ORA10GR2> select a, b, substr(anagram_pkg.compare(a,b),1,1)

2 from t;

A B S

------------ ------------ -

a N

N

b N

Richard Rihcard Y

Richard RihcaDR Y

Richard Rihhard N

Thomas Kyte They AskTom Y

abba baab Y

abba baaa N

9 rows selected.

In Oracle 9i and before, you can use this query:

select decode( count(*), l_len, 'Y', 'N' )

into l_return

from (select column_value a,

row_number()

over (order by column_value) rn

from TABLE(cast(l_data1 as anagram_array) )

) src,

(select column_value a,

row_number()

over (order by column_value) rn

from TABLE(cast(l_data2 as anagram_array) )

) tgt

where src.rn = tgt.rn

and src.a = tgt.a;

to compare the two collections (lines 28..33). In order to use the query, you will use "create or replace type anagram_array as ..." and create a SQL type. That is, the anagram_type will be a SQL object type - not a PL/SQL type.

## 18 Comments:

"Thomas Kyte" "They AskTom " are anagrams: I found this one wonderful !

Pierre Forstmann

Wow, I didn't know you could do "if ( l_data1 = l_data2 )"...That certainly made your code simple.

I can't imagine a neater way, but if you're going for performance, this algorithm might be the way to go - http://www.gatago.com/comp/programming/17888209.html

This algorithm gives you linear complexity, whereas the sort Oracle must have to do to compare the two collections is worse (the best sorting algorithms are n log n).

But I'd love to see a benchmark. ;)

Fun!

I learned something new with your way... here's my stab at it using basic functions. I have no idea on the performance:

CREATE OR REPLACE FUNCTION is_anagram(

p_src IN VARCHAR2

, p_tgt IN VARCHAR2)

RETURN VARCHAR2 DETERMINISTIC IS

l_src VARCHAR2(32767) DEFAULT UPPER(REPLACE(p_src, ' '));

l_tgt VARCHAR2(32767) DEFAULT UPPER(REPLACE(p_tgt, ' '));

l_return VARCHAR2(1) DEFAULT 'N';

BEGIN

WHILE l_src IS NOT NULL

AND l_tgt IS NOT NULL LOOP

EXIT WHEN LENGTH(l_src) <> LENGTH(l_tgt);

l_src := REPLACE(l_src, SUBSTR(l_tgt, 1, 1));

l_tgt := REPLACE(l_tgt, SUBSTR(l_tgt, 1, 1));

IF l_src IS NULL

AND l_tgt IS NULL THEN

l_return := 'Y';

END IF;

END LOOP;

RETURN l_return;

END is_anagram;

var a varchar2(30);

var b varchar2(30);

exec :a := 'Thomas Kyte'; :b := 'They AskTom';

select decode(count(*), 0,'YES', 'NO') is_anagram

from (

select c

from (

select upper(substr(:a,rownum,1)) c, 1 a, null b

from dual connect by level <= length(:a)

union all

select upper(substr(:b,rownum,1)) c, null, 2

from dual connect by level <= length(:b)

)

where c <> ' '

group by c

having count(a) <> count(b)

);

Wouldn't it be easier to simply convert each letter in the source and target strings using ASCII(UPPER()) and add up the numbers? If the numbers match, the 2 strings are anagrams of each other.

Anonymous - Adding up the ascii values of letters only works for two letter words. For example, 1+2+5=1+3+4, but 1,2 and 5 is not an anagram of 1,3 and 4. The sum of x numbers does not uniquely determine the collection of x numbers. However, if you mapped letters to prime numbers (A-2, B-3, C-5, ...) and multiplied the numbers together, that would work becaus of the fundamental theorem of arithmetic (http://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic). I wouldn't recommend this, because the numbers are going to get way too big!

Bryan, product of primes is too big, but the log of the product isn't too bad:

create or replace function anagram_num (s varchar2)

return float

as

type P_type is table of integer index by binary_integer;

P P_type;

n float default 0.0;

begin

if s is null then return 0; end if;

P(1) := 2; P(2) := 3; P(3) := 5; P(4) := 7; P(5) := 11; P(6) := 13; P(7) := 17;

P(8) := 19; P(9) := 23; P(10) := 29; P(11) := 31; P(12) := 37; P(13) := 41; P(14) := 43;

P(15) := 47; P(16) := 53; P(17) := 59; P(18) := 61; P(19) := 67; P(20) := 71; P(21) := 73;

P(22) := 79; P(23) := 83; P(24) := 89; P(25) := 97; P(26) := 101; P(27) := 103;

P(28) := 107; P(29) := 109; P(30) := 113; P(31) := 127; P(32) := 131; P(33) := 137;

P(34) := 139; P(35) := 149; P(36) := 151; P(37) := 157; P(38) := 163; P(39) := 167;

P(40) := 173; P(41) := 179; P(42) := 181; P(43) := 191; P(44) := 193; P(45) := 197;

P(46) := 199; P(47) := 211; P(48) := 223; P(49) := 227; P(50) := 229; P(51) := 233;

P(52) := 239; P(53) := 241; P(54) := 251; P(55) := 257; P(56) := 263;

P(57) := 269; P(58) := 271; P(59) := 277; P(60) := 281; P(61) := 283; P(62) := 293;

for i in 1..length(s) loop

if substr(s, i, 1) <> ' ' then

n := n + log( 10, P(ascii(upper(substr(s, i, 1))) - 33 + 1) );

end if;

end loop;

return n;

end;

/

select decode(anagram_num('ORACLE WORLD'), anagram_num('CLEAR OLD ROW'),'YES', 'NO') is_anagram from dual;

quick test shows that

anagram_pkg used about 20% more cpu than is_anagram (50,000 calls took about 6 cpu seconds for anagram_pkg and about 5 cpu seconds for is_anagram using ALL_OBJECTS and comparing object_name to itself)

anagram_num however used over 80 cpu seconds to do the same.

although for anagram_num, we can get the cpu down to about 10 cpu seconds simply by:

type P_type is table of binary_double index by binary_integer;

P P_type;

n binary_double default 0.0;

using native floating point numbers for the table type instead of integers...

and remember with floats - we might need to use

abs( a-b ) <= some_tolerance

instead of just equals.... Using the original anagram_num (with integers)

ops$tkyte%ORA10GR2> variable letters varchar2(52)

ops$tkyte%ORA10GR2> column l format a20 word_wrap

ops$tkyte%ORA10GR2> column rl format a20 word_wrap

ops$tkyte%ORA10GR2> exec :letters := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select :letters l,

2 reverse(:letters) rl,

3 anagram_num( :letters ) - anagram_num( reverse(:letters) ) diff,

4 decode( anagram_num( :letters ) ,

5 anagram_num( reverse(:letters) ),

6 'Y', 'N' )

7 from dual;

L RL DIFF D

-------------------- -------------------- ---------- -

abcdefghijklmnopqrst ZYXWVUTSRQPONMLKJIHG 1.0000E-36 N

uvwxyzABCDEFGHIJKLMN FEDCBAzyxwvutsrqponm

OPQRSTUVWXYZ lkjihgfedcba

ops$tkyte%ORA10GR2>

ops$tkyte%ORA10GR2> select anagram_pkg.compare(:letters,reverse(:letters)),

2 is_anagram(:letters,reverse(:letters))

3 from dual;

ANAGRAM_PKG.COMPARE(:LETTERS,REVERSE(:LETTERS))

-------------------------------------------------------------------------------

IS_ANAGRAM(:LETTERS,REVERSE(:LETTERS))

-------------------------------------------------------------------------------

Y

Y

those strings appear different.

CREATE OR REPLACE FUNCTION ARE_ANAGRAMS

(p_str1 VARCHAR2, p_str2 VARCHAR2)

RETURN VARCHAR2 AS

l_len number := length(p_str1);

l_str varchar2(32767) := upper(p_str2);

BEGIN

if length(p_str2) <> l_len or p_str1 is null or p_str2 is null then

return 'N';

end if;

for i in 1..l_len loop

l_str := translate (l_str,upper(substr(p_str1,i,1)),'x');

end loop;

if rpad ('x',l_len,'x') = l_str then

return 'Y';

else

return 'N';

end if;

END;

A variation:

CREATE OR REPLACE FUNCTION are_anagrams2(p_str1 VARCHAR2, p_str2 VARCHAR2) RETURN VARCHAR2 AS

l_len NUMBER := LENGTH(p_str1);

BEGIN

IF l_len <> LENGTH(p_str2) OR p_str1 IS NULL OR p_str2 IS NULL THEN

RETURN 'N';

END IF;

IF rpad(TRANSLATE(upper(p_str1), upper(p_str2), '+'), l_len, '+') = rpad('+', l_len, '+') THEN

RETURN 'Y';

ELSE

RETURN 'N';

END IF;

END;

Albert Nelson A

Translate won't be extremely useful here - neither are_anagrams* work correctly:

SQL> select anagram_pkg.compare( 'baa', 'aaa' ) a,

2 are_anagrams( 'baa', 'aaa' ) b

3 from dual;

A B

- -

N Y

SQL> select anagram_pkg.compare( 'bbab', 'abba' ) a,

2 are_anagrams2( 'bbab', 'abba' ) b

3 from dual;

A B

- -

N Y

nice discussion.

Nobody thought of a recursive

solution,

which should be simple enough like the following

(am on holidays without any Oracle

and cannot test it though):

create function is_rec_anagram

(s1 in varchar2, s2 in varchar2) return varchar2 deterministic is

n integer;

begin

if s1 is null then

return

case when s2 is null

then 'Y' -- '' and '' are anagrams I would say

else 'N'

end;

end if;

n := instr(upper(s2),

upper(substr(s1,1,1)));

return

case n when 0 then 'N'

else

is_rec_anagram(

substr(s1,2),

concat(substr(s2,1,n-1) /* probably false if n=1, to test */,

substr(s2,n+1)))

end;

end is_rec_anagram;

Are those

<pre>tags?yes, they are pre tags :)

but only because live writer didn't change the spaces into

technically did not *need* the pre tag, but I agree it makes it easier.

Another approach ... not sure about the efficiency, though.

CREATE OR REPLACE FUNCTION anagram_test( p_str1 IN VARCHAR2, p_str2 IN VARCHAR2 )

RETURN VARCHAR2

IS

l_count NUMBER;

l_retval CHAR(1) := 'N';

BEGIN

IF (p_str1 IS NOT NULL) AND

(p_str2 IS NOT NULL) AND

(LENGTH(p_str1) = LENGTH(p_str2))

THEN

l_count := 0;

FOR i IN 1..LENGTH(p_str1) LOOP

l_count := l_count + NVL(LENGTH(REPLACE(p_str2,SUBSTR(p_str1,i,1),'')),0)

- NVL(LENGTH(REPLACE(p_str1,SUBSTR(p_str1,i,1),'')),0);

END LOOP;

IF l_count = 0 THEN

l_retval := 'Y';

END IF;

END IF;

RETURN l_retval;

END;

This is both interesting and fun.

One of my interests is in fuzzy address matching technology. You can use soundex, double metaphone, Levenshtein distance, etc. I really like using the Levenshtein distance, but one limitation is that I have rarely seen one that also counts adjacent character swaps as a distance of one instead of two. I wonder if some of the code used for this anagram function could be adapted and incorporated into some of the code used for LD to create a better LD function?

I just stumbled upon this old thread and saw Cameron's comment about,

"I really like using the Levenshtein distance, but one limitation is that I have rarely seen one that also counts adjacent character swaps as a distance of one instead of two."

There is a Damerau-Levenshtein distance algorithm that does that:

http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance

I took the pseudo-code from the link above and applied it to my existing ld function, changing most of the variable names for easy comparison, and here is the result:

SCOTT@orcl_11g> CREATE OR REPLACE FUNCTION dld -- Damerauâ€“Levenshtein distance

2 (str1 VARCHAR2 DEFAULT NULL,

3 str2 VARCHAR2 DEFAULT NULL)

4 RETURN NUMBER DETERMINISTIC

5 AS

6 lenStr1 NUMBER := NVL (LENGTH (str1), 0);

7 lenStr2 NUMBER := NVL (LENGTH (str2), 0);

8 TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;

9 TYPE myarray IS TABLE OF mytabtype INDEX BY BINARY_INTEGER;

10 d myarray;

11 cost NUMBER := 0;

12 BEGIN

13 IF str1 = str2 THEN

14 RETURN 0;

15 ELSIF lenStr1 = 0 OR lenStr2 = 0 THEN

16 RETURN GREATEST (lenStr1, lenStr2);

17 ELSIF lenStr1 = 1 AND lenStr2 = 1 AND str1 <> str2 THEN

18 RETURN 1;

19 ELSE

20 FOR j in 0 .. lenStr2 LOOP

21 d (0) (j) := j;

22 END LOOP;

23 FOR i in 1 .. lenStr1 LOOP

24 d (i) (0) := i;

25 FOR j in 1 .. lenStr2 LOOP

26 IF SUBSTR (str1, i, 1) = SUBSTR (str2, j, 1) THEN

27 cost := 0;

28 ELSE

29 cost := 1;

30 END IF;

31 d (i) (j) := LEAST (

32 d (i-1) (j) + 1, -- deletion

33 d (i) (j-1) + 1, -- insertion

34 d (i-1) (j-1) + cost -- substitution

35 );

36 IF i > 1 AND j > 1

37 AND SUBSTR (str1, i, 1) = SUBSTR (str2, j-1, 1)

38 AND SUBSTR (str1, i-1, 1) = SUBSTR (str2, j, 1)

39 THEN

40 d (i) (j) := LEAST (

41 d (i) (j),

42 d (i-2) (J-2) + cost -- transposition

43 );

44 END IF;

45 END LOOP;

46 END LOOP;

47 RETURN d (lenStr1) (lenStr2);

48 END IF;

49 END dld;

50 /

Function created.

SCOTT@orcl_11g> SELECT ld ('Kyte', 'Ktye'),

2 dld ('Kyte', 'Ktye')

3 FROM DUAL

4 /

LD('KYTE','KTYE') DLD('KYTE','KTYE')

----------------- ------------------

2 1

SCOTT@orcl_11g>

POST A COMMENT

<< Home