Just wanted to answer a question on StackOverflow, where someone had a data model like this:
CREATE TABLE tvalues (cid NUMBER, cvalue NUMBER); INSERT INTO tvalues VALUES(1, 200); INSERT INTO tvalues VALUES(2, 1500); INSERT INTO tvalues VALUES(3, 3200); INSERT INTO tvalues VALUES(4, 4500); COMMIT; SELECT * FROM tvalues;
CID CVALUE ------------ 1 200 2 1500 3 3200 4 4500
The question was how to get the cid of the row so that a given value is between the value of the previous row and the value of the row for which to return the id (e.g. given value = 4000 should return id=4). Obviously, since in a relational database system, rows do not have an inherent order (there is no "next" or "previous" row), we need a sort criteria which I simply assumed to be the CID column.
The solution I had in mind was to use an analytical function like LEAD() or LAG(). These functions allow to access the next or the previous row (see how the analytical function requires an ORDER definition, otherwise it would not know what the next or previous row is):
SELECT cid, LAG(cvalue, 1, 0) OVER(ORDER BY cid) cmin, cvalue cmax FROM tvalues;
CID CMIN CMAX -------------------------- 1 0 200 2 200 1500 3 1500 3200 4 3200 4500
This returns a nice range-wise view of the data. Now the final solution should be easy: simply restrict the result set so that the given value is between the minimum and the maximum value:
SELECT cid, LAG(cvalue, 1, 0) OVER(ORDER BY cid) cmin, cvalue cmax FROM tvalues WHERE cmin < 4000 AND cmax > 4000;
However, this query returns an error message:
ORA-00904: "CMAX": invalid identifierThe reason is that it is generally not possible to use column aliases in the WHERE clause (or in any other part of the SELECT statement, with a few exceptions like ORDER BY). This also has nothing to do with the analytical functions - the following simple query does not work either:
SELECT cvalue avalue FROM tvalues WHERE avalue > 1000;
ORA-00904: "AVALUE": invalid identifier 00904. 00000 - "%s: invalid identifier"
The reason is that the select list (the projection) is the last part which is processed in an SQL statement, hence the SQL parser does not yet know the column aliases when it processes other parts of the query, like the WHERE statement (btw, while investigating this, someone else posted a very similar answer to the original question).
The final solution is easy: the query just needs to be wrapped into another query as a sub query:
SELECT * FROM ( SELECT cid, LAG(cvalue, 1, 0) OVER(ORDER BY cid) cmin, cvalue cmax FROM tvalues ) WHERE cmin < 4000 AND cmax > 4000;
CID CMIN CMAX -------------------------- 4 3200 4500
In Hierarchical queries using the Oracle RDBMS - Determine ordering within each level, I showed how to sort the rows of a hierarchical SQL query within each tree level by using ORDER SIBLINGS BY instead of ORDER BY.
There is one detail to keep in mind, though: even the ORDER SIBLINGS BY is done on the overall result of the query, which can cause problems when using UNION ALL to take additional rows into the result:
SELECT 0, 0, 'xyz' FROM DUAL UNION ALL SELECT LEVEL, Id, substr( LPAD (' ', 3 * (LEVEL - 1)) || Label, 0, 30) Node FROM TreeNode START WITH Parent = 0 CONNECT BY PRIOR Id = Parent ORDER SIBLINGS BY Label;
ERROR at line 7: ORA-30929: ORDER SIBLINGS BY clause not allowed here
Using parantheses to explicitly group the second select within the UNION ALL does not help, but there is a simple solution: just wrap the hierarchical query into a sub select, and then do a SELECT * from the sub select:
SELECT 0, 0, 'xyz' FROM DUAL UNION ALL SELECT * FROM ( SELECT LEVEL, Id, substr( LPAD (' ', 3 * (LEVEL - 1)) || Label, 0, 30) Node FROM TreeNode START WITH Parent = 0 CONNECT BY PRIOR Id = Parent ORDER SIBLINGS BY Label );
0 0 'XYZ' ------------------ ------------------ --------- 0 0 xyz 1 1 Root Node 2 2 Node 1 3 3 Node 1.1 3 4 Node 1.2 ...
This article shows how to execute hierarchical queries using the Oracle database: Hierarchical queries using the Oracle RDBMS
The GROUP BY clause is used to group selected rows by a certain expression, means it groups all rows for which the given expression returns the same value. Then, for each group, a single result is calculated by using aggregate functions (the number of result rows is the same as the number of groups). See Oracle® Database SQL Language Reference: group_by_clause for more information.
As an example, lets assume we have some revenue information for each month:
Description Month Revenue ----------- --------- ------- Deal 1 January 100 Deal 2 January 150 Deal 3 February 200 Deal 4 February 50 Deal 5 March 120
Using GROUP BY, we can group the data by Month and calculate a single result for each group, for example the summary of all revenues for each month:
Description Month Revenue ----------- --------- ------- Deal 1 January 100 Deal 2 January 150 =========================== Summary: 250 Deal 3 February 200 Deal 4 February 50 =========================== Summary: 250 Deal 5 March 120 =========================== Summary: 120
The result will then only contain the three rows with the summarized revenue values.
Lets see how this works in SQL - first, we create a table with some data, according to the example from above:
CREATE TABLE t_revenue ( c_id NUMBER(18) NOT NULL, c_description VARCHAR2(100) NOT NULL, c_month VARCHAR2(20) NOT NULL, c_amount NUMBER(18) NOT NULL ); INSERT INTO t_revenue VALUES(0, 'Deal 1', 'January', 100); INSERT INTO t_revenue VALUES(1, 'Deal 2', 'January', 150); INSERT INTO t_revenue VALUES(2, 'Deal 3', 'February', 200); INSERT INTO t_revenue VALUES(3, 'Deal 4', 'February', 50); INSERT INTO t_revenue VALUES(4, 'Deal 5', 'March', 120); COMMIT;
The most simple example calculates the summary of all revenues for each month, as above:
SELECT SUM(c_amount) -- The expression (aggregate function) which is applied to each group of rows FROM t_revenue GROUP BY c_month; -- The expression by which to group the rows
SUM(C_AMOUNT) ------------- 120 250 250
While this query works as expected, we can not see which month the results refer to - but we can simply add the month column to the result:
SELECT c_month, SUM(c_amount) FROM t_revenue GROUP BY c_month;
C_MONTH SUM(C_AMOUNT) -------------------- ------------- March 120 January 250 February 250
(Note that, with this data model, it is not easy to sort by the month - we can not simply ORDER BY c_month because that would cause "February" to appear before "January").
One error which often occurs with GROUP BY is ORA-00979: not a GROUP BY expression. That means what we have added an expression to the projection of the SQL statement (the columns which we want to select) which would not result in a single value. For example, we could try to add the c_description column as follows:
SELECT c_description, c_month, SUM(c_amount) FROM t_revenue GROUP BY c_month;
But since the c_description column can not be condensed to a single value for each group (the values neither have the same value within the group as specified in the GROUP BY clause, nor are they condensed to a single value through an aggregate function) we get:
SELECT c_description, c_month, SUM(c_amount) * ERROR at line 1: ORA-00979: not a GROUP BY expression
However, we could use another aggregate function, for example LISTAGG which returns a single result by concatenating the values of a given column for each row within the group. The LISTAGG function also requires some additional syntax to define the ordering of the values within the list. See the LISTAGG documentation for more information.
SELECT LISTAGG(c_description, ';') WITHIN GROUP (ORDER BY c_description) Deals, c_month, SUM(c_amount) FROM t_revenue GROUP BY c_month;
DEALS C_MONTH SUM(C_AMOUNT) ---------------- ------------ ------------- Deal 3;Deal 4 February 250 Deal 1;Deal 2 January 250 Deal 5 March 120