Stringing them up...
Suppose someone asks you:
I need to take the results of a query and pivot a value. That is, I would like output from the EMP table to look like this:
DEPTNO ENAMES
------- ------------
10 clark king miller
20 adams ford …
…
can this be done in just SQL?
With the addition of analytic functions in Oracle 8i Release 2 and the SYS_CONNECT_BY_PATH() function in Oracle 9i Release 1 – this became something we can in fact do rather easily in SQL. The approach we will take is:
- Partition the data by DEPTNO and within each DEPTNO sort the data by ENAME and assign a sequential number using the ROW_NUMBER() analytic function
- So, we end up with a record eventually that is the result of connecting 1 to 2 to 3 to 4 and so on for each DEPTNO
- The SYS_CONNECT_BY_PATH() function will return the list of ENAMES concatenated together for us.
The query would look like this:
SQL> select deptno,
2 max(sys_connect_by_path(ename, ' ' )) scbp
3 from (select deptno, ename,
row_number() over
(partition by deptno order by ename) rn
4 from emp
5 )
6 start with rn = 1
7 connect by prior rn = rn-1 and prior deptno = deptno
8 group by deptno
9 order by deptno
10 /
DEPTNO SCBP
---------- ----------------------------------------
10 CLARK KING MILLER
20 ADAMS FORD JONES SCOTT SMITH
30 ALLEN BLAKE JAMES MARTIN TURNER WARD
I used to use STRAGG for this (prior to sys_connect_by_path and analytics) - but now find this approach preferable.


43 Comments:
why? I like your stragg
why? I like your stragg
no need to create the type and type body to implement the user defined aggregate.
This is "cleaner"
indeed, that looks nicer than stragg
is
select deptno, scbp from
(
select deptno, sys_connect_by_path(ename, ' ' ) scbp, co, rn
from (
select deptno, ename,
row_number() over (partition by deptno order by ename) rn,
count(*) over (partition by deptno) co
from scott.emp
)
start with rn=1
connect by prior rn = rn-1 and prior deptno = deptno
)
where co=rn
order by deptno
an improvement or a worsening ?
Firstly, I find that it is more difficult to understand what the code is supposed to do, and it is much more easy (for most of us) to put an error in the connect by as by reusing YOUR stragg function.
Secondly, I think connect by will not scale well
A little test (well, it is just a test, it is not supposed to prove anything)
set timi on lin 9999 trims on newp none pages 0 termout off
spool /tmp/stragg.txt
select trunc(object_id/100),stragg(object_name)
from all_objects
group by trunc(object_id/100)
-- having stragg(object_name)='XYZ';
spool off
spool /tmp/connect.txt
select o,
max(sys_connect_by_path(object_name, ' ' )) scbp
from (select trunc(object_id/100) o, object_name,
row_number() over
(partition by trunc(object_id/100) order by object_name) rn
from all_objects
)
start with rn = 1
connect by prior rn = rn-1 and prior o = o
group by o
-- having max(sys_connect_by_path(ename, ' ' ))='XYZ'
/
spool off
the first query finished in 5 secs. After 3 min, the second one ended with ORA-30004: when using SYS_CONNECT_BY_PATH function, cannot have seperator as part of column value (why?)
If I replace all_objects by user_objects the first try takes 0.03 sec and the second one takes 0.35.
So when used with such a complex view as ALL_OBJECTS, connect by artifice is not very attractive to me.
I hope you write a book on complex sql u sing oracle features such as the model clause(very little documentation on this) and analytic functions.
these are often difficult to grasp for complex queries.
Yikes. ORA-30004 seems like enough of a reason to stay clear of this. Although you could ofcourse "fix" it with TRANSLATE(sys_connect_by_path(ename, CHR(1)), CHR(1), ' ').
The performance problems would definitely push me over the edge and stick with stragg, though. If I ever had a need for either...
I get the same ORA-30004 using the analytics approach if any of the strings in which I'm concatenating contain the character that is the same as the delimiter (space in this case). So, if I have a table called "my table" and the delimiter is ' ', then the ORA-30004 occurs. Try using sys_connect_by_path(object_name, ',' ).
How about a way to pivot but include duplicates? Picture a 1:M relationship, where the M may not necessarily be unique. Can that be done using sys_connect_by_path?
Funny this topic should come up. I just recently wrote a whole bunch of tutorials on a lot of the things discussed here.
Neil on wheels, I've got examples that show how to include or exclude duplicates with the sys_connect_by_path approach at
SQL Snippets: Delimited Strings - Rows to String - Hierarchical Method.
Those of your interested in comparing the sys_connect_by_path, STRAGG, and other approaches may want to check out the feature comparison chart at SQL Snippets: Delimited Strings - Rows to String and the related tutorials whose links are listed in that page.
For the anonymous user who asked about the MODEL clause, check out SQL Snippets: Delimited Strings - Rows to String - MODEL Method and SQL Snippets: SQL Features Tutorials - MODEL Clause.
Finally, there's a performance comparison chart of all the methods covered in the Delimited String tutorials at SQL Snippets: Delimited Strings - Rows to String - Performance Comparison Charts. The sys_connect_by_path approach turned out to use way more latches than the STRAGG approach so Laurent might be right about sys_connect_by_path not scaling as well as STRAGG.
Hope you find the info. useful.
BTW, I failed to make something clear in my last comment. The sys_connect_by_path solution I talk about in my tutorials differs slightly from Tom's solution in the original post. The performance comparison chart I mentioned compares STRAGG with my version of sys_connect_by_path, not Tom's.
Sorry for any confusion.
Laurent,
In your test, you must also consider that the analytic/sys_connect_by_path solution is giving you an ordered string aggregate, which is not true for the original stragg solution.
regards
Freek
One approach is to create and use a simple dynamic-SQL function called "listify":
SELECT DISTINCT e.deptno,
listify('select ename from emp e2 where e2.deptno='||e.deptno,' ')
FROM emp e
CREATE OR REPLACE FUNCTION listify
(sql_stmt_in IN VARCHAR2,
separator_in VARCHAR2)
RETURN VARCHAR2
/*
* Created 3/22/07 J. Martin to return a single value from a list of arbitrary items returned by
* an input SQL statement.
* The input SQL statement should select exactly one character column, less than 4000 characters.
* The returned value is up to 4000 characters, if truncation occurs then an ellipsis (...) is
* appended after the last value that can be fit into 4000 characters.
*/
IS
return_string VARCHAR2(4000):='';
list_item VARCHAR2(4000):='';
TYPE cur_typ IS REF CURSOR;
get_list cur_typ;
BEGIN
OPEN get_list FOR sql_stmt_in;
LOOP
FETCH get_list INTO list_item;
EXIT WHEN get_list%NOTFOUND;
IF list_item IS NOT NULL THEN
IF LENGTH(list_item) + NVL(LENGTH(return_string),0) + NVL(LENGTH(separator_in),0) + 4 < 4000 THEN
IF return_string IS NULL
THEN return_string := list_item;
ELSE return_string := return_string || separator_in || list_item;
END IF;
ELSE return_string := return_string || ' ...';
EXIT;
END IF;
END IF;
END LOOP;
CLOSE get_list;
RETURN(return_string);
END;
/
This requires a parse for each record, but is simple and has linear performance.
Thoughts?
I find it neat to use SQLX for pivoting as below. What do you think of it?
select deptno,
substr(replace(replace(xmlagg(xmlelement("x",ename) order by ename),'</x>'),'<x>',' '),2) scbp
from emp
group by deptno
order by deptno;
SnippetyJoe:
thanx for pointing out
connect_by_isleaf
Hi Tom the same stringing rows I posted in my blog some time back on topic Concatenating Rows in a Table...
After seeing your post I was happy that I have not explained it wrong..
But I wonder - has anyone ever submitted an Enhancement Request asking to implement a string concatenation grouping/analytic function ?
Something like xmlagg() which is the neatest solution in my opinion (but watch out for special characters such as "greater than" and "ampersand", that gets converted to their HTML/XML s escape representation ;) )
I think using SQL model clause ( Parallel processing using "Partition by (dept ) ") can combines the best of both worlds.
Frank
SELECT fin_str
FROM (
SELECT fin_str, dept, name FROM t
MODEL
PARTITION BY (dept)
DIMENSION BY (row_number() over (PARTITION BY dept ORDER BY name) rn )
RULES (
name_Str[ANY] ORDER BY rn =
CASE WHEN name[cv() - 1 ] IS NULL
THEN name[cv()]
ELSE name_Str[cv()-1]||','|| name[cv()]
END,
fin_str[ANY] ORDER BY rn = CASE WHEN name[cv()+1] IS NULL
THEN name_Str[cv()]
END
)
WHERE fin_str is not null
Or
http://www.jlcomp.demon.co.uk/faq/Stragg_Queries.html
From http://vadimtropashko.files.wordpress.com/2007/02/ch3.pdf
with concat_enames as (
select deptno, sys_connect_by_path(ename,',') aggr, level depth
from emp e
start with ename=(select min(ename) from emp ee
where e.deptno=ee.deptno)
connect by ename > prior ename and deptno = prior deptno
) select deptno, aggr from concat_enames e
where depth=(select max(depth) from concat_enames ee
where ee.deptno = e.deptno);
alberto:
it looks to me nobody here needs an
enhancement due to the abundance of solutions.
have you submitted an enhancement request?
Anonymous:
it looks to me that every solution here has its issues (or complications): "stragg" is neat but requires pre-install and context switches to the pl/sql engine; "connect by level" syntax is not intuitive plus the other gotchas; "model" clause syntax is a tad ugly; the closest to ideal is xmlagg, but has its gotchas as well with special characters.
Nothing would beat a natively implemented function modeled after xmlagg(); say:
select concat (ename order by ename) from emp;
and the analogous analytic function.
I have submitted no ER.
Very interesting!
A tip for people interested in the xmlagg() trick - xmlelement() converts the 5 XML special characters (ampersand, less than, greater than, single-quote and double-quote) in their alternative entity rapresentation.
To get them back after they are aggregated by xmlagg(), there's the handy dbms_xmlgen.convert() utility - just pass 1 as the second parameter.
Got the dbms_xmlgen trick from Paweł Barut.
Very Good article , this article make some interesting points.
Tactical Flashlights
r c helicopter
video game
Tactical Flashlight
it is very good.I love Oracle.Thanks
情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣用品,情趣用品,情趣,情趣,A片,A片,情色,A片,A片,情色,A片,A片,情趣用品,A片,情趣用品,A片,情趣用品,a片,情趣用品
A片,A片,AV女優,色情,成人,做愛,情色,AIO,視訊聊天室,SEX,聊天室,自拍,AV,情色,成人,情色,aio,sex,成人,情色
免費A片,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛
情色文學,色情A片,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色視訊,免費成人影片,視訊交友,視訊聊天,言情小說,愛情小說,AV片,A漫,AVDVD,情色論壇,視訊美女,AV成人網,成人交友,成人電影,成人貼圖,成人小說,成人文章,成人圖片區,成人遊戲,愛情公寓,情色貼圖,色情小說,情色小說,成人論壇
a片下載,線上a片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,成人網站,自拍,尋夢園聊天室
A片,A片,A片下載,做愛,成人電影,.18成人,日本A片,情色小說,情色電影,成人影城,自拍,情色論壇,成人論壇,情色貼圖,情色,免費A片,成人,成人網站,成人圖片,AV女優,成人光碟,色情,色情影片,免費A片下載,SEX,AV,色情網站,本土自拍,性愛,成人影片,情色文學,成人文章,成人圖片區,成人貼圖
Hi,
I am trying to get the user objects by procedure,tables and others( turning rows in to columns for each object_type) using below query. But all objects are displaying along with NULL's. How can I get rid of the NULL values between the objects.
SELECT TABLE_NAMES,PROCEDURE_NAMES,VIEW_NAMES,INDEX_NAMES
FROM (
SELECT
MAX(CASE WHEN OBJECT_TYPE='TABLE' THEN OBJECT_NAME END) TABLE_NAMES,
MAX(CASE WHEN OBJECT_TYPE='PROCEDURE' THEN OBJECT_NAME END) PROCEDURE_NAMES,
MAX(CASE WHEN OBJECT_TYPE='VIEW' THEN OBJECT_NAME END) VIEW_NAMES,
MAX(CASE WHEN OBJECT_TYPE='INDEX' THEN OBJECT_NAME END) INDEX_NAMES
FROM USER_OBJECTS
GROUP BY OBJECT_NAME);
Thanks,
Venkat
@Venkat
please specify your question in the form of a specification, similar to one you would give a programmmer, pretend I'm your "mother", explain it in enough detail with examples/whatever so it is understandable.
I'm pretty sure I know what you mean, I think - I THINK - you want four columns - each with a list of as many objects of that type as there are - sort of like you might have in a spreadsheet if you typed this stuff in - but I want you to enter a specification that one can generate code from.
I like this example and I have a case where my strings I need to concatenate are 4000 in size, therefore the resulting string becomes a clob. Also in the mix is that the table starts with 18 million records, and the resulting table once fields are concatenated will be 6+ million records. Is there a way to do this that will not take hours and hours and hours? I've been trying the function route, but cannot get it to finish after 7+ hours of running. I'd like to find a solution that is at least under 4 hours. Any ideas are appreciated.
@Deb,
you would on average turn 3 records into 1.
why bother? Why string these up? What would that accomplish?
What is the goal here.
I have to concatenate them because they are going in Matrix One as 1 single attribute on an object. This is a data migration from an oracle home-grown system to a full install of Matrix One (oracle-based) and we are loading the MatrixOne database via a tool that generates sql loader files. It's a long story but concatenating into a single column is the only solution for this piece of data.
@deb
if you
select max(sum_len)
from(
select [columns you would group by], sum(length(column-to-concat)) sum_len
...
)
what is the max length?
Hi Tom
The max length on some of the records exceeds 32767. Since my last post to you, I have converted code to use DBMS_SQL package and I can successfully insert into the LONG column anything < 32767. However, I am still stuck with the stuff > 32767. I tried piece-wise inserting by concatenation where my dynamical sql statement with bind variables looked like:
"...,:b1||:b2||:b3...)" where that is the column that is the LONG and each of the variables going into b1, b2, b3...were 32767 sized chunks of the original text. This didn't work either.
Seems like I am at the point were I need to consider SQLLDR or something else, not sure. Seems like this part cannot be done in PL/SQL. Is that correct? If so, what are my options? I cannot change the target schema because MatrixOne does not use CLOBs, only LONGs.
Any help is appreciated.
@deb,
your options are very (extremely) limited.
PLSQL - not a chance, it can piecewise READ a long, but that is it.
SQL - not directly, not just with SQL, we'll need an API
You will have to write code in some 3gl (java, C, VB, whatever) to do this. You could either
a) produce a file for sqlldr from this 3gl - plsql cannot produce it because plsql cannot generate a line exceeding 32k (plsql will inject a linebreak whether you want it or not)
b) have the program read the rows, construct the "long" and then "stream" it into the other table.
Very usefule example.
I have a scenario in which i have to do multiple aggregation. i.e. Something which displays
Name----------Likes---------Dislikes
Tom.......Apple, Orange....Mango, Banana
Bob........Blue Berry.........Apple, Banana
As you can see there are multiple columns with aggregated stuff. Any idea how to extend this approach to achieve it.
@Anonymous
simple, just use sys_connect_by_path more than once
They have finally implemented a version of STRAGG natively in 11GR2 as LISTAGG. Very nice, I wonder whether this blog discussion has contributed to the design ;)
Hi,
I'm having a hard time trying to work this out when trying to do multiple joins. Any tips? I am new to Oracle, so any optimization red flags appreciated as well.
SELECT DISTINCT d.ppuid ,
d.product_type ,
d.provider ,
d.title ,
SUBSTR(MAX(REPLACE(SYS_CONNECT_BY_PATH(metadata_value, '/'),'/','; ')),2) Concatenated_Keywords
FROM qpcin_rr_cat.cat_prod_metadata_pub a,
qpcin_rr_cat.cat_product_pub b ,
qpcin_rr_repo.inv_ingestion_event d
WHERE a.product_pub_id = b.id
AND b.uuid = d.product_uuid
AND d.status = 'SUCCESS'
AND a.metadata_name = 'keywords'
(SELECT cpmp.*,
row_number() OVER (order by seq_num) ROW#
FROM QPCIN_RR_CAT.CAT_PROD_METADATA_PUB cpmp
WHERE CPMP.PRODUCT_PUB_ID = 132214423
AND metadata_name = 'keywords'
) START
WITH ROW#=1 CONNECT BY PRIOR row# = row# -1
I'm using oracle developer.
@last anonymous
No one can answer your question, because ....
we don't know what you are really trying to do.
But, let's say you have a query
select c1, c2, c3, c4,..., cn, STRING
from .....
where .....
And you wanted to get a row for each distinct c1, c2, c3, c4, ... cn and havae STRING be "concatenated"
that would be
select c1, c2, ... , cn,
max(sys_connect_by_path(STRING, ' ' )) scbp
from (
select c1, ... , cn, string,
row_number() over ( partition by c1, ..., cn order by string )rn
from ....
where ....
)
start with rn = 1
connect by prior rn = rn-1 and prior c1 = c1 and prior c2 = c2 and .... prior cn = cn
group by c1, ... cn
order by c1, ... cn
10 /
Hi -- (Hamsandwhich here,aka most recent anonymous guy). Thanks for the tips. I'll explain a bit more what I am trying to do. I have a table that has a field named 'metadata_name', and another field called 'metadata_value'. For each product, I want to only pull metadata_name fields that are called 'keywords', and concatenate any of the metadata_value fields into a single string. So instead of this:
keywords cat
keywords hat
keywords bat
I want keywords cat|hat|bat
A couple things that are tripping me up though. The first is that I have an unknown number of keywords that need to be concatenated, and a large number of products ( ppuids) that would have their own corresponding concatenated strings. To make things extra fun, I need to access some other tables to get contextual info like product title, etc. So I think I want to do a select of selects, and also join a few tables. Not sure how to integrate the your example when I want to output more fields than just the field that has the concatenated values?
To use your example, it would be like adding address and phone numbers to your ename data... if you had to do your example while accessing other tables to get address/phone numbers, how would you do that?
Hope that made sense. :)
@Hamsandwhich said...
I can only generalize so far, you draw a simple example:
keywords cat
keywords hat
keywords bat
I want keywords cat|hat|bat
*and then* start throwing in things like ppuid (what?) and "access some other tables to get contextual info like product title, etc. "
I tried to generalize - I'll do it again, only verbally this time.
Get your query (the one that returns all of the data)
segregate your columns into one of two categories
category a: your "key", things to group by
category b: your columns to aggregate - the things to string up.
now, add to that list row_number() over (partition by CATEGORYA order by WHATEVER YOU WANT)
then
select categorya,
max(sys_connect_by_path())
max(sys_connect_by_path())
....
/* max as many times as you have elements in categoryb */
from (YOUR_QUERY_WITH_ROW_NUMBER)
start with rn =1
connect by rn = prior rn+1
and categorya_c1 = prior categorya_c1 ....
group by categorya;
Unless you give a CONCRETE example with "YOUR_QUERY", this is as far as well get.
Okay, thanks. Sorry for the vagueness. I appreciate the help.
I found that this query works when looking at a single item:
SELECT metadata_name,
MAX(sys_connect_by_path(metadata_value, '|' )) concatenated_keywords
FROM
(SELECT cpmp.*,
row_number() over (partition BY metadata_name order by metadata_value) rn
FROM qpcin_rr_cat.cat_prod_metadata_pub cpmp
WHERE cpmp.metadata_name = 'keywords'
AND cpmp.product_pub_id = '132214423'
) START
WITH rn = 1 CONNECT BY prior rn = rn-1
AND prior metadata_name = metadata_name
GROUP BY metadata_name
ORDER BY metadata_name
The output looks like this:
keywords |HOG|Harley|Harley-Davidson|american|biker|bikes|legend|logo|motorcyles|power|tough
This query/output represents one item of my table though. What if I want to remove the part of the query restricting the results to id '132214423', and want the output to include all ids? In other words, I want to group all the keyword data together, but I want to do that for every id. A grouping of groupings. I'm trying to figure out how to select the results from the first select. Does that make sense?
@Hamsandwhich said...
just change
metadata_name
to
metadata_name, product_pub_id
in the query (your categoryA is metadata_name, id)
and change the connect by to include product_pub_id
SELECT metadata_name, product_pub_id,
MAX(sys_connect_by_path(metadata_value, '|' )) concatenated_keywords
FROM
(SELECT cpmp.*,
row_number() over (partition BY metadata_name, product_pub_id order by metadata_value) rn
FROM qpcin_rr_cat.cat_prod_metadata_pub cpmp
WHERE cpmp.metadata_name = 'keywords'
) START
WITH rn = 1 CONNECT BY prior rn = rn-1
AND prior metadata_name = metadata_name and prior product_pub_id = product_pub_id
GROUP BY metadata_name, product_pub_id
ORDER BY metadata_name, product_pub_id
AH... thank you very much sir! I think I get it now.
POST A COMMENT
<< Home