Adsense Ad

Monday 30 September 2019

Oracle SQl/PLSQL - String Aggregation Techniques

String Aggregation Techniques


On occasion it is necessary to aggregate data from a number of rows into a single row, giving a list of data associated with a specific value. Using the SCOTT.EMP table as an example, we might want to retrieve a list of employees for each department. Below is a list of the base data and the type of output we would like to return from an aggregate query.

Base Data:

    DEPTNO ENAME
---------- ----------
        20 SMITH
        30 ALLEN
        30 WARD
        20 JONES
        30 MARTIN
        30 BLAKE
        10 CLARK
        20 SCOTT
        10 KING
        30 TURNER
        20 ADAMS
        30 JAMES
        20 FORD
        10 MILLER

Desired Output:

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD


LISTAGG Analytic Function in 11g Release 2

The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings. The nice thing about this function is it also allows us to order the elements in the concatenated list. If you are using 11g Release 2 you should use this function for string aggregation.

COLUMN employees FORMAT A50

SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.


WM_CONCAT Built-in Function (Not Supported)

If you are not running 11g Release 2 or above, but are running a version of the database where the WM_CONCAT function is present, then it is a zero effort solution as it performs the aggregation for you. It is actually an example of a user defined aggregate function described below, but Oracle have done all the work for you.
COLUMN employees FORMAT A50

SELECT deptno, wm_concat(ename) AS employees
FROM   emp
GROUP BY deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

3 rows selected.


WM_CONCAT is an undocumented function and as such is not supported by Oracle for user applications (MOS Note ID 1336219.1). If this concerns you, use a User-Defined Aggregate Function described below.

Also, WM_CONCAT has been removed from 12c onward, so you can't pick this option.

User-Defined Aggregate Function

The WM_CONCAT function described above is an example of a user-defined aggregate function that Oracle have already created for you. If you don't want to use WM_CONCAT, you can create your own user-defined aggregate function as described at asktom.oracle.com. Thanks to Kim Berg Hansen for some corrections in comments.
CREATE OR REPLACE TYPE t_string_agg AS OBJECT
(
  g_string  VARCHAR2(32767),

  STATIC FUNCTION ODCIAggregateInitialize(sctx  IN OUT  t_string_agg)
    RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateIterate(self   IN OUT  t_string_agg,
                                       value  IN      VARCHAR2 )
     RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateTerminate(self         IN   t_string_agg,
                                         returnValue  OUT  VARCHAR2,
                                         flags        IN   NUMBER)
    RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateMerge(self  IN OUT  t_string_agg,
                                     ctx2  IN      t_string_agg)
    RETURN NUMBER
);
/
SHOW ERRORS


CREATE OR REPLACE TYPE BODY t_string_agg IS
  STATIC FUNCTION ODCIAggregateInitialize(sctx  IN OUT  t_string_agg)
    RETURN NUMBER IS
  BEGIN
    sctx := t_string_agg(NULL);
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateIterate(self   IN OUT  t_string_agg,
                                       value  IN      VARCHAR2 )
    RETURN NUMBER IS
  BEGIN
    SELF.g_string := self.g_string || ',' || value;
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateTerminate(self         IN   t_string_agg,
                                         returnValue  OUT  VARCHAR2,
                                         flags        IN   NUMBER)
    RETURN NUMBER IS
  BEGIN
    returnValue := SUBSTR(SELF.g_string, 2);
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateMerge(self  IN OUT  t_string_agg,
                                     ctx2  IN      t_string_agg)
    RETURN NUMBER IS
  BEGIN
    SELF.g_string := SELF.g_string || ctx2.g_string;
    RETURN ODCIConst.Success;
  END;
END;
/
SHOW ERRORS


CREATE OR REPLACE FUNCTION string_agg (p_input VARCHAR2)
RETURN VARCHAR2
PARALLEL_ENABLE AGGREGATE USING t_string_agg;
/
SHOW ERRORS
The aggregate function is implemented using a type and type body, and is used within a query.
COLUMN employees FORMAT A50

SELECT /*+ PARALLEL(2) */ deptno, string_agg(ename) AS employees
FROM   emp
GROUP BY deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

3 rows selected.

Specific Function

One approach is to write a specific function to solve the problems. The get_employees function listed below returns a list of employees for the specified department.
CREATE OR REPLACE FUNCTION get_employees (p_deptno  in  emp.deptno%TYPE)
  RETURN VARCHAR2
IS
  l_text  VARCHAR2(32767) := NULL;
BEGIN
  FOR cur_rec IN (SELECT ename FROM emp WHERE deptno = p_deptno) LOOP
    l_text := l_text || ',' || cur_rec.ename;
  END LOOP;
  RETURN LTRIM(l_text, ',');
END;
/
SHOW ERRORS
The function can then be incorporated into a query as follows.
COLUMN employees FORMAT A50

SELECT deptno,
       get_employees(deptno) AS employees
FROM   emp
GROUP by deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

3 rows selected.
To reduce the number of calls to the function, and thereby improve performance, we might want to filter the rows in advance.
COLUMN employees FORMAT A50

SELECT e.deptno,
       get_employees(e.deptno) AS employees
FROM   (SELECT DISTINCT deptno
        FROM   emp) e;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        
3 rows selected.

Generic Function using Ref Cursor

An alternative approach is to write a function to concatenate values passed using a ref cursor. This is essentially the same as the previous example, except that the cursor is passed in making it generic, as shown below.
CREATE OR REPLACE FUNCTION concatenate_list (p_cursor IN  SYS_REFCURSOR)
  RETURN  VARCHAR2
IS
  l_return  VARCHAR2(32767); 
  l_temp    VARCHAR2(32767);
BEGIN
  LOOP
    FETCH p_cursor
    INTO  l_temp;
    EXIT WHEN p_cursor%NOTFOUND;
    l_return := l_return || ',' || l_temp;
  END LOOP;
  RETURN LTRIM(l_return, ',');
END;
/
SHOW ERRORS
The CURSOR expression is used to allow a query to be passed to the function as a ref cursor, as shown below.
COLUMN employees FORMAT A50

SELECT e1.deptno,
       concatenate_list(CURSOR(SELECT e2.ename FROM emp e2 WHERE e2.deptno = e1.deptno)) employees
FROM   emp e1
GROUP BY e1.deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

3 rows selected.
Once again, the total number of function calls can be reduced by filtering the distinct values, rather than calling the function for each row.
COLUMN employees FORMAT A50

SELECT deptno,
       concatenate_list(CURSOR(SELECT e2.ename FROM emp e2 WHERE e2.deptno = e1.deptno)) employees
FROM   (SELECT DISTINCT deptno
        FROM emp) e1;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

3 rows selected.

ROW_NUMBER() and SYS_CONNECT_BY_PATH functions in Oracle 9i

An example on williamrobertson.net uses the ROW_NUMBER() and SYS_CONNECT_BY_PATH functions to achieve the same result without the use of PL/SQL or additional type definitions.
SELECT deptno,
       LTRIM(MAX(SYS_CONNECT_BY_PATH(ename,','))
       KEEP (DENSE_RANK LAST ORDER BY curr),',') AS employees
FROM   (SELECT deptno,
               ename,
               ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) AS curr,
               ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) -1 AS prev
        FROM   emp)
GROUP BY deptno
CONNECT BY prev = PRIOR curr AND deptno = PRIOR deptno
START WITH curr = 1;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

COLLECT function in Oracle 10g

An example on oracle-developer.net uses the COLLECT function in Oracle 10g to get the same result. This method requires a table type and a function to convert the contents of the table type to a string. I've altered his method slightly to bring it in line with this article.
CREATE OR REPLACE TYPE t_varchar2_tab AS TABLE OF VARCHAR2(4000);
/

CREATE OR REPLACE FUNCTION tab_to_string (p_varchar2_tab  IN  t_varchar2_tab,
                                          p_delimiter     IN  VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
  l_string     VARCHAR2(32767);
BEGIN
  FOR i IN p_varchar2_tab.FIRST .. p_varchar2_tab.LAST LOOP
    IF i != p_varchar2_tab.FIRST THEN
      l_string := l_string || p_delimiter;
    END IF;
    l_string := l_string || p_varchar2_tab(i);
  END LOOP;
  RETURN l_string;
END tab_to_string;
/
The query below shows the COLLECT function in action.

COLUMN employees FORMAT A50

SELECT deptno,
       tab_to_string(CAST(COLLECT(ename) AS t_varchar2_tab)) AS employees
FROM   emp
GROUP BY deptno;
       
    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        
3 rows selected.












Wednesday 18 September 2019

Oracle SQL: Optimizer Hint

It is often easy to forget this, but in many ways it is after we hit the execute button that the really exciting stuff starts with our code. A number of engines silently spring into action; including the optimizer. The optimizer analyses your SQL statement and decides the most efficient way to execute it based on the objects involved in the statement and the conditions you’re subjecting them to. Your database automatically gathers stats about your objects – stuff like the number of rows, number of distinct values, of nulls, data distribution – and the optimizer uses this information in its decision-making. (You can study the explain plan to see what decisions the optimizer has taken.)  The optimizer arrives at its conclusions, often in barely a whisper of time.
And when the SQL statement is executed, you sit back and you feel like a genius.
And that’s it, right? THE END.
Optimizer Hints
Well, not necessarily. The optimizer is the hero of our story; let me introduce the potential villains, Optimizer Hints. An optimizer hint is a code snippet within an SQL statement controlling the decisions of the optimizer. Hints give us the opportunity, in cases where we have superior knowledge about the database, to influence the optimizer. In fact, the very name is a misnomer – they are not hints; they are commands that override the optimizer (as long as the hint is valid and the _OPTIMIZER_IGNORE_HINTS initialization parameter is not TRUE).
Hints are injected into DML statements within the bounds of a comment. The syntax is as follows:
{DELETE|INSERT|SELECT|UPDATE} /*+ hint [text] [hint [text]] */
Also valid is the less fashionable
{DELETE|INSERT|SELECT|UPDATE} --+ hint [text] [hint [text]]
The + tells Oracle that this isn’t an ordinary comment, that it is in fact a hint. No spaces are allowed between the comment delimiter and the plus sign.
Here’s an example instructing that a full table scan should be carried out on the emp table:
Select /*+ FULL(emp) */ name
From emp
Where job = 'SALESMAN';
That’s pretty straightforward. So why am I painting optimizer hints as the baddies here? 

The rules
  1. Don’t.
  2. If you must use hints, assume you’ve used them incorrectly. So don’t.
  3. With every Oracle patch or upgrade, assume every hint is going to go wrong. So don’t.
  4. With every DDL, assume every hint applied to that object is going to go wrong. So don’t.
The reason to be wary of hinting is that by embedding hints in your SQL, you are overriding the optimizer and saying that you know more than it does – not just now, but every time in the future that your SQL will be run, irrespective of any other changes that may happen to your database. The likely consequence of this is that your SQL will possibly run sub-optimally now and almost certainly in the future.
Hints in Detail
Well, not necessarily. Cos telling you about these enchanting things called hints and then telling you to immediately forget everything you’ve just heard would be like telling you there’s a tree of knowledge in the middle of the garden but that you must never, ever eat its apple. And we know how that story ends.
So, more details. There are many dozens of different hints (have a look in the v$sql_hint view), however close to half of them are undocumented. So The First Rule of Hinting must really be borne in mind if you decide to use them. Here are a select few.
FIRST_ROWS(n): This hint instructs the optimizer to select a plan that returns the first n rows most efficiently.
SELECT /*+ FIRST_ROWS(10) */ empno, ename
FROM emp
WHERE deptno = 10;
You may also want to read up about FIRST_ROWS_1, FIRST_ROWS_10 and FIRST_ROWS_100. Of interest, also, is ALL_ROWS which details the optimizer to choose the plan that most effectively returns the resultset at the minimum cost.
NO_INDEX(<table_name> < index_name>): Instructs the optimizer to specifically not use the named index in determining a plan.
SELECT /*+ NO_INDEX(emp emp_ix) */ empno, ename
FROM emp, dept
WHERE emp.deptno = dept.deptno;
See also: the INDEX hint. You may also want to investigate INDEX_COMBINE, INDEX_JOIN, INDEX_ASC and INDEX_FFS.
LEADING(table_name): This hint tells Oracle to use the parameterised table as the first in the join order. The optimizer will consequently select a join chain that starts with this table.
SELECT /*+ LEADING (dept) */ empno, ename
FROM emp, dept
WHERE emp.deptno = dept.deptno;
Related to the LEADING hint is the ORDERED hint.  This hint instructs Oracle to join tables in the exact order in which they are listed in the FROM clause.
CACHE(table): This hint tells Oracle to add the blocks retrieved for the table to the head of the most recently used list. This might be useful with regularly-used lookup tables, for example.
SELECT /*+ CACHE (d) */ deptno, dname
FROM dept d;
Oracle caches small tables by default, making this hint redundant in many cases. Also often redundant is the NOCACHE hint, since this places blocks at the tail of the LRU list, which is also Oracle’s default behaviour with the majority of blocks.
CARDINALITY(table n): This hint instructs Oracle to use n as the table, rather than rely on its own stats. You may need to use this hint with a global temporary table, for instance.
SELECT /*+ CARDINALITY (gtt, 1000) */ gtt.gtt_id, dname
FROM dept d, global_temp_tab gtt
WHERE d.deptno = gtt.deptno;
REWRITE: This hint instructs Oracle to rewrite the query using a materialized view, irrespective of cost. 
PARALLEL (table n): This hint tells the optimizer to use n concurrent servers for a parallel operation.
APPEND: This hint instructs the optimizer to carry out a direct-path insert.  This may make INSERT … SELECT statements faster because inserted data is simply appended to the end of the table and any referential constraints are ignored.
RULE: This hint basically turns off the optimizer.  This hint has been deprecated and should not be used. Never ever. 
Conclusion

Well, not necessarily the end. I was holidaying in Serbia recently and, when you visit a foreign country, it is always advisable to arm yourself with a few handy local words: Good day (dobar dan), thank you (hvala) – and, of course, beer (pivo). Take the examples above as your first handy words as you discover optimizer hints. There is more to learn, much more (indeed Jonathan Lewis has written dozens of articles on the subject, as have others). Hopefully, you are now ready to tackle those articles.


How to use hints in Oracle sql for performance

With hints one can influence the optimizer. The usage of hints (with exception of the RULE-hint) causes Oracle to use the Cost Based optimizer.

The following syntax is used for hints:

select /*+ HINT */ name
from emp
where id =1;

Where HINT is replaced by the hint text.
When the syntax of the hint text is incorrect, the hint text is ignored and will not be used.

Hints for Optimization Approaches and Goals
ALL_ROWS The ALL_ROWS hint explicitly chooses the cost-based approach to optimize a statement block with a goal of best throughput (that is, minimum total resource consumption).
FIRST_ROWS The FIRST_ROWS hint explicitly chooses the cost-based approach to optimize a statement block with a goal of best response time (minimum resource usage to return first row). In newer Oracle version you should give a parameter with this hint: FIRST_ROWS(n) means that the optimizer will determine an execution plan to give a fast response for returning the first n rows.
CHOOSE The CHOOSE hint causes the optimizer to choose between the rule-based approach and the cost-based approach for a SQL statement based on the presence of statistics for the tables accessed by the statement
RULE The RULE hint explicitly chooses rule-based optimization for a statement block. This hint also causes the optimizer to ignore any other hints specified for the statement block. The RULE hint does not work any more in Oracle 10g.
Hints for Access Paths
FULL The FULL hint explicitly chooses a full table scan for the specified table. The syntax of the FULL hint is FULL(table) where table specifies the alias of the table (or table name if alias does not exist) on which the full table scan is to be performed.
ROWID The ROWID hint explicitly chooses a table scan by ROWID for the specified table. The syntax of the ROWID hint is ROWID(table) where table specifies the name or alias of the table on which the table access by ROWID is to be performed. (This hint deprecated in Oracle 10g)
CLUSTER The CLUSTER hint explicitly chooses a cluster scan to access the specified table. The syntax of the CLUSTER hint is CLUSTER(table) where table specifies the name or alias of the table to be accessed by a cluster scan.
HASH The HASH hint explicitly chooses a hash scan to access the specified table. The syntax of the HASH hint is HASH(table) where table specifies the name or alias of the table to be accessed by a hash scan.
HASH_AJ The HASH_AJ hint transforms a NOT IN subquery into a hash anti-join to access the specified table. The syntax of the HASH_AJ hint is HASH_AJ(table) where table specifies the name or alias of the table to be accessed.(deprecated in Oracle 10g)
INDEX The INDEX hint explicitly chooses an index scan for the specified table. The syntax of the INDEX hint is INDEX(table index) where:table specifies the name or alias of the table associated with the index to be scanned and index specifies an index on which an index scan is to be performed. This hint may optionally specify one or more indexes:
NO_INDEX The NO_INDEX hint explicitly disallows a set of indexes for the specified table. The syntax of the NO_INDEX hint is NO_INDEX(table index)
INDEX_ASC The INDEX_ASC hint explicitly chooses an index scan for the specified table. If the statement uses an index range scan, Oracle scans the index entries in ascending order of their indexed values.
INDEX_COMBINE If no indexes are given as arguments for the INDEX_COMBINE hint, the optimizer will use on the table whatever Boolean combination of bitmap indexes has the best cost estimate. If certain indexes are given as arguments, the optimizer will try to use some Boolean combination of those particular bitmap indexes. The syntax of INDEX_COMBINE is INDEX_COMBINE(table index).
INDEX_JOIN Explicitly instructs the optimizer to use an index join as an access path. For the hint to have a positive effect, a sufficiently small number of indexes must exist that contain all the columns required to resolve the query.
INDEX_DESC The INDEX_DESC hint explicitly chooses an index scan for the specified table. If the statement uses an index range scan, Oracle scans the index entries in descending order of their indexed values.
INDEX_FFS This hint causes a fast full index scan to be performed rather than a full table.
NO_INDEX_FFS Do not use fast full index scan (from Oracle 10g)
INDEX_SS Exclude range scan from query plan (from Oracle 10g)
INDEX_SS_ASC Exclude range scan from query plan (from Oracle 10g)
INDEX_SS_DESC Exclude range scan from query plan (from Oracle 10g)
NO_INDEX_SS The NO_INDEX_SS hint causes the optimizer to exclude a skip scan of the specified indexes on the specified table. (from Oracle 10g)
Hints for Query Transformations
NO_QUERY_TRANSFORMATION Prevents the optimizer performing query transformations. (from Oracle 10g)
USE_CONCAT The USE_CONCAT hint forces combined OR conditions in the WHERE clause of a query to be transformed into a compound query using the UNION ALL set operator. Normally, this transformation occurs only if the cost of the query using the concatenations is cheaper than the cost without them.
NO_EXPAND The NO_EXPAND hint prevents the optimizer from considering OR-expansion for queries having OR conditions or IN-lists in the WHERE clause. Usually, the optimizer considers using OR expansion and uses this method if it decides that the cost is lower than not using it.
REWRITE The REWRITE hint forces the optimizer to rewrite a query in terms of materialized views, when possible, without cost consideration. Use the REWRITE hint with or without a view list. If you use REWRITE with a view list and the list contains an eligible materialized view, then Oracle uses that view regardless of its cost.
NOREWRITE / NO_REWRITE In Oracle 10g renamed to NO_REWRITE. The NOREWRITE/NO_REWRITE hint disables query rewrite for the query block, overriding the setting of the parameter QUERY_REWRITE_ENABLED.
MERGE The MERGE hint lets you merge views in a query.
NO_MERGE The NO_MERGE hint causes Oracle not to merge merge-able views. This hint is most often used to reduce the number of possible permutations for a query and make optimization faster.
FACT The FACT hint indicated that the table should be considered as a fact table. This is used in the context of the star transformation.
NO_FACT The NO_FACT hint is used in the context of the star transformation to indicate to the transformation that the hinted table should not be considered as a fact table.
STAR_TRANSFORMATION The STAR_TRANSFORMATION hint makes the optimizer use the best plan in which the transformation has been used. Without the hint, the optimizer could make a query optimization decision to use the best plan generated without the transformation, instead of the best plan for the transformed query.
NO_STAR_TRANSFORMATION Do not use star transformation (from Oracle 10g)
UNNEST The UNNEST hint specifies subquery unnesting.
NO_UNNEST Use of the NO_UNNEST hint turns off unnesting for specific subquery blocks.
Hints for Join Orders
LEADING Give this hint to indicate the leading table in a join. This will indicate only 1 table. If you want to specify the whole order of tables, you can use the ORDERED hint. Syntax: LEADING(table)
ORDERED The ORDERED hint causes Oracle to join tables in the order in which they appear in the FROM clause. If you omit the ORDERED hint from a SQL statement performing a join , the optimizer chooses the order in which to join the tables. You may want to use the ORDERED hint to specify a join order if you know something about the number of rows selected from each table that the optimizer does not. Such information would allow you to choose an inner and outer table better than the optimizer could.
Hints for Join Operations
USE_NL The USE_NL hint causes Oracle to join each specified table to another row source with a nested loops join using the specified table as the inner table. The syntax of the USE_NL hint is USE_NL(table table) where table is the name or alias of a table to be used as the inner table of a nested loops join.
NO_USE_NL Do not use nested loop (from Oracle 10g)
USE_NL_WITH_INDEX Specifies a nested loops join. (from Oracle 10g)
USE_MERGE The USE_MERGE hint causes Oracle to join each specified table with another row source with a sort-merge join. The syntax of the USE_MERGE hint is USE_MERGE(table table) where table is a table to be joined to the row source resulting from joining the previous tables in the join order using a sort-merge join.
NO_USE_MERGE Do not use merge (from Oracle 10g)
USE_HASH The USE_HASH hint causes Oracle to join each specified table with another row source with a hash join. The syntax of the USE_HASH hint is USE_HASH(table table) where table is a table to be joined to the row source resulting from joining the previous tables in the join order using a hash join.
NO_USE_HASH Do not use hash (from Oracle 10g)
Hints for Parallel Execution
PARALLEL The PARALLEL hint allows you to specify the desired number of concurrent query servers that can be used for the query. The syntax is PARALLEL(table number number). The PARALLEL hint must use the table alias if an alias is specified in the query. The PARALLEL hint can then take two values separated by commas after the table name. The first value specifies the degree of parallelism for the given table, the second value specifies how the table is to be split among the instances of a parallel server. Specifying DEFAULT or no value signifies the query coordinator should examine the settings of the initialization parameters (described in a later section) to determine the default degree of parallelism.
NOPARALLEL / NO_PARALLEL The NOPARALLEL hint allows you to disable parallel scanning of a table, even if the table was created with a PARALLEL clause. In Oracle 10g this hint was renamed to NO_PARALLEL.
PQ_DISTRIBUTE The PQ_DISTRIBUTE hint improves the performance of parallel join operations. Do this by specifying how rows of joined tables should be distributed among producer and consumer query servers. Using this hint overrides decisions the optimizer would normally make.
NO_PARALLEL_INDEX The NO_PARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation.
Additional Hints
APPEND When the APPEND hint is used with the INSERT statement, data is appended to the table. Existing free space in the block is not used. If a table or an index is specified with nologging, this hint applied with an insert statement produces a direct path insert which reduces generation of redo.
NOAPPEND Overrides the append mode.
CACHE The CACHE hint specifies that the blocks retrieved for the table in the hint are placed at the most recently used end of the LRU list in the buffer cache when a full table scan is performed. This option is useful for small lookup tables. In the following example, the CACHE hint overrides the table default caching specification.
NOCACHE The NOCACHE hint specifies that the blocks retrieved for this table are placed at the least recently used end of the LRU list in the buffer cache when a full table scan is performed. This is the normal behavior of blocks in the buffer cache.
PUSH_PRED The PUSH_PRED hint forces pushing of a join predicate into the view.
NO_PUSH_PRED The NO_PUSH_PRED hint prevents pushing of a join predicate into the view.
PUSH_SUBQ The PUSH_SUBQ hint causes non-merged subqueries to be evaluated at the earliest possible place in the execution plan.
NO_PUSH_SUBQ The NO_PUSH_SUBQ hint causes non-merged subqueries to be evaluated as the last step in the execution plan.
QB_NAME Specifies a name for a query block. (from Oracle 10g)
CURSOR_SHARING_EXACT Oracle can replace literals in SQL statements with bind variables, if it is safe to do so. This is controlled with the CURSOR_SHARING startup parameter. The CURSOR_SHARING_EXACT hint causes this behavior to be switched off. In other words, Oracle executes the SQL statement without any attempt to replace literals by bind variables.
DRIVING_SITE The DRIVING_SITE hint forces query execution to be done for the table at a different site than that selected by Oracle
DYNAMIC_SAMPLING The DYNAMIC_SAMPLING hint lets you control dynamic sampling to improve server performance by determining more accurate predicate selectivity and statistics for tables and indexes. You can set the value of DYNAMIC_SAMPLING to a value from 0 to 10. The higher the level, the more effort the compiler puts into dynamic sampling and the more broadly it is applied. Sampling defaults to cursor level unless you specify a table.
SPREAD_MIN_ANALYSIS This hint omits some of the compile time optimizations of the rules, mainly detailed dependency graph analysis, on spreadsheets. Some optimizations such as creating filters to selectively populate spreadsheet access structures and limited rule pruning are still used. (from Oracle 10g)
Hints with unknown status
MERGE_AJ The MERGE_AJ hint transforms a NOT IN subquery into a merge anti-join to access the specified table. The syntax of the MERGE_AJ hint is MERGE_AJ(table) where table specifies the name or alias of the table to be accessed.(deprecated in Oracle 10g)
AND_EQUAL The AND_EQUAL hint explicitly chooses an execution plan that uses an access path that merges the scans on several single-column indexes. The syntax of the AND_EQUAL hint is AND_EQUAL(table index index) where table specifies the name or alias of the table associated with the indexes to be merged. and index specifies an index on which an index scan is to be performed. You must specify at least two indexes. You cannot specify more than five. (depricated in Oracle 10g)
STAR The STAR hint forces the large table to be joined last using a nested loops join on the index. The optimizer will consider different permutations of the small tables. (deprecated in Oracle 10g)
BITMAP Usage: BITMAP(table_name index_name) Uses a bitmap index to access the table. (deprecated ?)
HASH_SJ Use a Hash Anti-Join to evaluate a NOT IN sub-query. Use this hint in the sub-query, not in the main query. Use this when your high volume NOT IN sub-query is using a FILTER or NESTED LOOPS join. Try MERGE_AJ if HASH_AJ refuses to work.(deprecated in Oracle 10g)
NL_SJ Use a Nested Loop in a sub-query. (deprecated in Oracle 10g)
NL_AJ Use an anti-join in a sub-query. (deprecated in Oracle 10g)
ORDERED_PREDICATES (deprecated in Oracle 10g)
EXPAND_GSET_TO_UNION (deprecated in Oracle 10g)

Oracle SQL: Current Date and Time

Getting the Current Date and Time

In any language, it’s important to know how to get the current date and time. How to do that is often one of the first questions to come up, especially in applications that involve dates in any way, as most applications do.

Up through Oracle8i Database, you had one choice for getting the date and time in PL/SQL: you used the SYSDATE function, and that was it. Beginning with Oracle9i Database, you have all the functions in Table at your disposal, and you need to understand how they work and what your choices are.

Table. Comparison of functions that return current date and time

FunctionTime zoneDatatype returned
CURRENT_DATESessionDATE
CURRENT_TIMESTAMPSessionTIMESTAMP WITH TIME ZONE
LOCALTIMESTAMPSessionTIMESTAMP
SYSDATEDatabase serverDATE
SYSTIMESTAMPDatabase serverTIMESTAMP WITH TIME ZONE


The Oracle CURRENT_DATE function returns the current date in the session time zone, in a value in the Gregorian calendar of datatype DATE. The format in which the date is displayed depends on NLS_DATE_FORMAT parameter. The default setting of NLS_DATE_FORMAT is DD-MON-YY. This returns a 2-digit day, a three-character month abbreviation, and a 2-digit year.

Example Usage:


CURRENT_DATE

SYSDATE

SELECT TO_CHAR(CURRENT_DATE, 'DD-MON-YYYY HH:MI:SS') FROM dual;


SELECT TO_CHAR(SYSDATE, 'DD-MON-YYYY HH:MI:SS') FROM dual;


To see the current system date and time with the time zone, use just the CURRENT_DATE function. 

To see the current system date and time with fractional seconds and the time zone, use the following statement: 


SELECT SYSTIMESTAMP FROM dual;

For example to see the current date in the format "Wednesday, 18th September, 2019", use the following syntax:


SELECT TO_CHAR(sysdate,'Day, ddth Month, yyyy')"Today" FROM dual;

This will return the following (assuming today is 9/18/2019):


TODAY

------------------------

Wednesday, 18th September, 2019









Oracle / PLSQL: SYS_CONTEXT Function

Description

The Oracle/PLSQL SYS_CONTEXT function can be used to retrieve information about the Oracle environment.

Syntax

The syntax for the SYS_CONTEXT function in Oracle/PLSQL is:
SYS_CONTEXT( namespace, parameter [, length] )

Parameters or Arguments

namespace
An Oracle namespace that has already been created. If the namespace of 'USERENV' is used, attributes describing the current Oracle session can be returned.
parameter
A valid attribute that has been set using the DBMS_SESSION.set_context procedure.
length
Optional. It is the length of the return value in bytes. If this parameter is omitted or if an invalid entry is provided, the sys_context function will default to 256 bytes.

Returns

The SYS_CONTEXT function returns a string value.

Note

The valid parameters for the namespace called 'USERENV' are as follows: (Note that not all parameters are valid in all versions of Oracle)
Parameter
ACTION
AUDITED_CURSORID
AUTHENTICATED_IDENTITY
AUTHENTICATION_DATA
AUTHENTICATION_METHOD
AUTHENTICATION_TYPE
BG_JOB_ID
CLIENT_IDENTIFIER
CLIENT_INFO
CURRENT_BIND
CURRENT_SCHEMA
CURRENT_SCHEMAID
CURRENT_SQL
CURRENT_SQL_LENGTH
CURRENT_USER
CURRENT_USERID
DB_DOMAIN
DB_NAME
DB_UNIQUE_NAME
ENTRYID
ENTERPRISE_IDENTITY
EXTERNAL_NAME
FG_JOB_ID
GLOBAL_CONTEXT_MEMORY
GLOBAL_UID
HOST
IDENTIFICATION_TYPE
INSTANCE
INSTANCE_NAME
IP_ADDRESS
ISDBA
LANG
LANGUAGE
MODULE
NETWORK_PROTOCOL
NLS_CALENDAR
NLS_CURRENCY
NLS_DATE_FORMAT
NLS_DATE_LANGUAGE
NLS_SORT
NLS_TERRITORY
OS_USER
POLICY_INVOKER
PROXY_ENTERPRISE_IDENTITY
PROXY_GLOBAL_UID
PROXY_USER
PROXY_USERID
SERVER_HOST
SERVICE_NAME
SESSION_USER
SESSION_USERID
SESSIONID
SID
STATEMENTID
TERMINAL

Applies To

The SYS_CONTEXT function can be used in the following versions of Oracle/PLSQL:
  • Oracle 12c, Oracle 11g, Oracle 10g, Oracle 9i, Oracle 8i

Example

Let's look at some Oracle SYS_CONTEXT function examples and explore how to use the SYS_CONTEXT function in Oracle/PLSQL.
For example:
SYS_CONTEXT('USERENV', 'NLS_DATE_FORMAT')
Result: 'RR-MM-DD'

SYS_CONTEXT('USERENV', 'NLS_SORT')
Result: 'BINARY'

Oracle: CHAR / VARCHAR2

Difference between CHAR and VARCHAR2 In Oracle

Oracle database contains different types of Character data types.  Among those data types, CHAR and VARCHAR2 are widely utilized compared to other types such as VARCHAR, CLOB etc. in PL/SQL programming. Due to this it becomes evident to understand the difference between CHAR and VARCHAR2. Additionally, this is the most frequently asked interview question of Oracle database.

Difference between CHAR and VARCHAR2

VARCHAR2
CHAR
VARCHAR2 has maximum length of 32767 bytesCHAR has maximum length of 2000 bytes
Space is not padded to the values for unused lengthA space is padded to the right if length of variable is more than value passed to the variable
VARCHAR2 is used mainly for variable length character stringCHAR is generally used for fixed length character string, for example, Pin code
It is mandatory to specify length in VARCHAR2By default the length of CHAR is 1 if it is not specified

Difference between CHAR and VARCHAR2 with example

Let us declare two variables with same string and different data types,

For example,

DECLARE
  lv_char      CHAR(8) := 'Hello';
  lv_varchar2  VARCHAR2(8) := 'Hello';
  len_char     NUMBER(2);
  len_varchar2 NUMBER(2);
BEGIN
  len_char     := LENGTH(lv_char);
  len_varchar2 := LENGTH(lv_varchar2);
  DBMS_OUTPUT.PUT_LINE('Length of CHAR -> ' || len_char);
  DBMS_OUTPUT.PUT_LINE('Length of VARCHAR2 -> ' || len_varchar2);
END;

Output
Length of CHAR -> 8
Length of VARCHAR2 -> 5

Which data type is better?

It is a long going myth among some programmers that performance of CHAR is better than VARCHAR2. There is no impact on performance if you have used VARCHAR2 for fixed length character string.

I personally would recommend using VARCHAR2 instead of CHAR because it will help you save waste of unnecessary disk space in case you are incorrect about fixed length character string.

Hope you like this article and find it useful. We highly appreciate comments and feedback.

Oracle Forms: KEY-NEXT-ITEM / POST-TEXT-ITEM

Difference Between KEY-NEXT-ITEM and POST-TEXT-ITEM Triggers in Oracle Forms

Several times you must have faced this tricky interview question about the difference between KEY-NEXT-ITEM and POST-TEXT-ITEM triggers in Oracle Forms. This question is also prevalent in Oracle Apps Technical interviews. Although the functions of both the triggers are same, why has Oracle given them separately? Let us understand the differences between them in this article.

Difference Between KEY-NEXT-ITEM and POST-TEXT-ITEM Triggers


KEY-NEXT-ITEM
POST-TEXT-ITEM
The key-next-item trigger fires only when pressing TAB on the keyboard.Post-Text-Item fires once you leave the current item to the next item by a keyboard or a mouse.
It fires before the cursor is focused on the next item.It fires after the cursor is focused on the next item.
Without a key event, this trigger will not be fired.This trigger fires without a key event.



Please feel free to comment, if you find mistakes or errors in any of the topics. Your suggestions to improve the article helps us improve.

Oracle PL/SQL: Raise_Application_Error


What is raise_application_error?

The raise_application_error is a built-in procedure that allows the developer to associate his/her own error message to an oracle error number. It helps you in returning a custom error to your application so that you can avoid returning unhandled exceptions. Thus, the message given by a developer acts as an oracle error message here (Interesting!).
When raise_application_error raises an exception all uncommitted transactions are rolled back and an error message with an error number (ORA-) is returned to the user. This built-in procedure can raise an exception but cannot handle it.
raise_application_error procedure is a part of DBMS_STANDARD and STANDARD packages. You don’t have to reference them while using it.

Syntax of raise_application_error

raise_application_error(error_number, error_message [, {TRUE | FALSE}]);

error_number: Range of negative integers between -20,000 and -20,999
error_message: User-defined error message of datatype varchar2(2000)
TRUE/FALSE: Optional parameter that tells the procedure to add an error to error stack.
If TRUE then error is added into the stack of previous errors, if FALSE then error replaces all previous errors.

Example

SQL> create or replace procedure calc_absense (v_absense IN number) as
 2
 3 begin
 4 if v_absence > 10 then
 5 raise_application_error(-20001, 'Employee absence cannot be more than 10');
 6 end if;
 7
 8 end;

Procedure created.

SQL> declare
 2 v_abs number := &1;
 2 begin
 3 calc_absense (v_abs);
 4 end;
 6 /

Enter value for number: 20
 old 2: v_abs number := &1;
 new 2: v_abs number := 20;
 declare
 *
 ERROR at line 1:
 ORA-20001: Employee absence cannot be more than 10
 ORA-06512: at "PUBS.calc_absense ", line 5
 ORA-06512: at line 3

Note: Error numbers other than between -20000 and -20999 are reserved by Oracle to display its own standard error messages.

If you liked the above post, please leave your comments below.

Analytic functions by Example

This article provides a clear, thorough concept of analytic functions and its various options by a series of simple yet concept building examples. The article is intended for SQL coders, who for might be not be using analytic functions due to unfamiliarity with its cryptic syntax or uncertainty about its logic of operation. Often I see that people tend to reinvent the feature provided by analytic functions by native join and sub-query SQL. This article assumes familiarity with basic Oracle SQL, sub-query, join and group function from the reader. Based on that familiarity, it builds the concept of analytic functions through a series of examples.

It is true that whatever an analytic function does can be done by native SQL, with join and sub-queries. But the same routine done by analytic function is always faster, or at least as fast, when compared to native SQL. Moreover, I am not considering here the amount of time that is spent in coding the native SQLs, testing, debugging and tuning them.
The general syntax of analytic function is:

Function(arg1,..., argn) OVER ( [PARTITION BY <...>] [ORDER BY <....>] [<window_clause>] )
<window_clause> is like "ROW <?>" or "RANK <?>"
All the keywords will be dealt in details as we walk through the examples. The script for creating the schema (SCOTT) on which the example queries of this article are run can be obtained in ORACLE_HOME/sqlplus/demo/demobld.sql of any standard Oracle installation.

How are analytic functions different from group or aggregate functions?


SELECT deptno,
COUNT(*) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30)
GROUP BY deptno;

DEPTNO                 DEPT_COUNT             
---------------------- ---------------------- 
20                     5                      
30                     6                      

2 rows selected
Query-1

Consider the Query-1 and its result. Query-1 returns departments and their employee count. Most importantly it groups the records into departments in accordance with the GROUP BY clause. As such any non-"group by" column is not allowed in the select clause.

SELECT empno, deptno, 
COUNT(*) OVER (PARTITION BY 
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);

     EMPNO     DEPTNO DEPT_COUNT
---------- ---------- ----------
      7369         20          5
      7566         20          5
      7788         20          5
      7902         20          5
      7876         20          5
      7499         30          6
      7900         30          6
      7844         30          6
      7698         30          6
      7654         30          6
      7521         30          6

11 rows selected.
Query-2

Now consider the analytic function query (Query-2) and its result. Note the repeating values of DEPT_COUNT column.
This brings out the main difference between aggregate and analytic functions. Though analytic functions give aggregate result they do not group the result set. They return the group value multiple times with each record. As such any other non-"group by" column or expression can be present in the select clause, for example, the column EMPNO in Query-2.
Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query operates after the analytic functions. So analytic functions can only appear in the select list and in the main ORDER BY clause of the query.
In absence of any PARTITION or <window_clause> inside the OVER( ) portion, the function acts on entire record set returned by the where clause. Note the results of Query-3 and compare it with the result of aggregate function query Query-4.

SELECT empno, deptno, 
COUNT(*) OVER ( ) CNT
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, 1;

     EMPNO     DEPTNO        CNT
---------- ---------- ----------
      7782         10          8
      7839         10          8
      7934         10          8
      7369         20          8
      7566         20          8
      7788         20          8
      7876         20          8
      7902         20          8
Query-3


SELECT COUNT(*) FROM emp
WHERE deptno IN (10, 20);

  COUNT(*)
----------
         8
Query-4

How to break the result set in groups or partitions?

It might be obvious from the previous example that the clause PARTITION BY is used to break the result set into groups. PARTITION BY can take any non-analytic SQL expression.
Some functions support the <window_clause> inside the partition to further limit the records they act on. In the absence of any <window_clause> analytic functions are computed on all the records of the partition clause.
The functions SUM, COUNT, AVG, MIN, MAX are the common analytic functions the result of which does not depend on the order of the records.
Functions like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE depends on order of records. In the next example we will see how to specify that.

How to specify the order of the records in the partition?

The answer is simple, by the "ORDER BY" clause inside the OVER( ) clause. This is different from the ORDER BY clause of the main query which comes after WHERE. In this section we go ahead and introduce each of the very useful functions LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE and show how each depend on the order of the record.
The general syntax of specifying the ORDER BY clause in analytic function is:
ORDER BY <sql_expr> [ASC or DESC] NULLS [FIRST or LAST]
The syntax is self-explanatory.

ROW_NUMBER, RANK and DENSE_RANK

All the above three functions assign integer values to the rows depending on their order. That is the reason of clubbing them together.
ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers. In Query-5, the function ROW_NUMBER( ) is used to give separate sets of running serial to employees of departments 10 and 20 based on their HIREDATE.

SELECT empno, deptno, hiredate,
ROW_NUMBER( ) OVER (PARTITION BY
deptno ORDER BY hiredate
NULLS LAST) SRLNO
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, SRLNO;

EMPNO  DEPTNO HIREDATE       SRLNO
------ ------- --------- ----------
  7782      10 09-JUN-81          1
  7839      10 17-NOV-81          2
  7934      10 23-JAN-82          3
  7369      20 17-DEC-80          1
  7566      20 02-APR-81          2
  7902      20 03-DEC-81          3
  7788      20 09-DEC-82          4
  7876      20 12-JAN-83          5

8 rows selected.
Query-5 (ROW_NUMBER example)

RANK and DENSE_RANK both provide rank to the records based on some column value or expression. In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.
Query-6 shows the usage of both RANK and DENSE_RANK. For DEPTNO 20 there are two contenders for the first position (EMPNO 7788 and 7902). Both RANK and DENSE_RANK declares them as joint toppers. RANK skips the next value that is 2 and next employee EMPNO 7566 is given the position 3. For DENSE_RANK there are no such gaps.

SELECT empno, deptno, sal,
RANK() OVER (PARTITION BY deptno
ORDER BY sal DESC NULLS LAST) RANK,
DENSE_RANK() OVER (PARTITION BY
deptno ORDER BY sal DESC NULLS
LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;

EMPNO  DEPTNO   SAL  RANK DENSE_RANK
------ ------- ----- ----- ----------
  7839      10  5000     1          1
  7782      10  2450     2          2
  7934      10  1300     3          3
  7788      20  3000     1          1
  7902      20  3000     1          1
  7566      20  2975     3          2
  7876      20  1100     4          3
  7369      20   800     5          4

8 rows selected.
Query-6 (RANK and DENSE_RANK example)

LEAD and LAG

LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:
LEAD (<sql_expr>, <offset>, <default>) OVER (<analytic_clause>)
<sql_expr> is the expression to compute from the leading row.
<offset> is the index of the leading row relative to the current row.
<offset> is a positive integer with default 1.
<default> is the value to return if the <offset> points to a row outside the partition range.
The syntax of LAG is similar except that the offset for LAG goes into the previous rows.
Query-7 and its result show simple usage of LAG and LEAD function.

SELECT deptno, empno, sal,
LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,
LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, sal DESC;

 DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL
------- ------ ----- -------------- ---------------
     10   7839  5000           2450               0
     10   7782  2450           1300            5000
     10   7934  1300              0            2450
     20   7788  3000           3000               0
     20   7902  3000           2975            3000
     20   7566  2975           1100            3000
     20   7876  1100            800            2975
     20   7369   800              0            1100

8 rows selected.
Query-7 (LEAD and LAG)

FIRST VALUE and LAST VALUE function

The general syntax is:
FIRST_VALUE(<sql_expr>) OVER (<analytic_clause>)
The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY. The <sql_expr> is computed on the columns of this first record and results are returned. The LAST_VALUE function is used in similar context except that it acts on the last record of the partition.

-- How many days after the first hire of each department were the next
-- employees hired?

SELECT empno, deptno, hiredate ? FIRST_VALUE(hiredate)
OVER (PARTITION BY deptno ORDER BY hiredate) DAY_GAP
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, DAY_GAP;

     EMPNO     DEPTNO    DAY_GAP
---------- ---------- ----------
      7369         20          0
      7566         20        106
      7902         20        351
      7788         20        722
      7876         20        756
      7499         30          0
      7521         30          2
      7698         30         70
      7844         30        200
      7654         30        220
      7900         30        286

11 rows selected.
Query-8 (FIRST_VALUE)

FIRST and LAST function

The FIRST function (or more properly KEEP FIRST function) is used in a very special situation. Suppose we rank a group of record and found several records in the first rank. Now we want to apply an aggregate function on the records of the first rank. KEEP FIRST enables that.
The general syntax is:
Function( ) KEEP (DENSE_RANK FIRST ORDER BY <expr>) OVER (<partitioning_clause>)
Please note that FIRST and LAST are the only functions that deviate from the general syntax of analytic functions. They do not have the ORDER BY inside the OVER clause. Neither do they support any <window> clause. The ranking done in FIRST and LAST is always DENSE_RANK. The query below shows the usage of FIRST function. The LAST function is used in similar context to perform computations on last ranked records.

-- How each employee's salary compare with the average salary of the first
-- year hires of their department?

SELECT empno, deptno, TO_CHAR(hiredate,'YYYY') HIRE_YR, sal,
TRUNC(
AVG(sal) KEEP (DENSE_RANK FIRST
ORDER BY TO_CHAR(hiredate,'YYYY') )
OVER (PARTITION BY deptno)
     ) AVG_SAL_YR1_HIRE
FROM emp
WHERE deptno IN (20, 10)
ORDER BY deptno, empno, HIRE_YR;

     EMPNO     DEPTNO HIRE        SAL AVG_SAL_YR1_HIRE
---------- ---------- ---- ---------- ----------------
      7782         10 1981       2450             3725
      7839         10 1981       5000             3725
      7934         10 1982       1300             3725
      7369         20 1980        800              800
      7566         20 1981       2975              800
      7788         20 1982       3000              800
      7876         20 1983       1100              800
      7902         20 1981       3000              800

8 rows selected.
Query-9 (KEEP FIRST)

How to specify the Window clause (ROW type or RANGE type windows)?

Some analytic functions (AVG, COUNT, FIRST_VALUE, LAST_VALUE, MAX, MIN and SUM among the ones we discussed) can take a window clause to further sub-partition the result and apply the analytic function. An important feature of the windowing clause is that it is dynamic in nature.
The general syntax of the <window_clause> is

[ROW or RANGE] BETWEEN <start_expr> AND <end_expr>
<start_expr> can be any one of the following

  1. UNBOUNDED PECEDING
  2. CURRENT ROW
  3. <sql_expr> PRECEDING or FOLLOWING.

    <end_expr> can be any one of the following
  1. UNBOUNDED FOLLOWING or
  2. CURRENT ROW or
  3. <sql_expr> PRECEDING or FOLLOWING.
For ROW type windows the definition is in terms of row numbers before or after the current row. So for ROW type windows <sql_expr> must evaluate to a positive integer.
For RANGE type windows the definition is in terms of values before or after the current ORDER. We will take this up in details latter.
The ROW or RANGE window cannot appear together in one OVER clause. The window clause is defined in terms of the current row. But may or may not include the current row. The start point of the window and the end point of the window can finish before the current row or after the current row. Only start point cannot come after the end point of the window. In case any point of the window is undefined the default is UNBOUNDED PRECEDING for <start_expr> and UNBOUNDED FOLLOWING for <end_expr>.
If the end point is the current row, syntax only in terms of the start point can be can be
[ROW or RANGE] [<start_expr> PRECEDING or UNBOUNDED PRECEDING ]
[ROW or RANGE] CURRENT ROW is also allowed but this is redundant. In this case the function behaves as a single-row function and acts only on the current row.

ROW Type Windows

For analytic functions with ROW type windows, the general syntax is:
Function( ) OVER (PARTITIN BY <expr1> ORDER BY <expr2,..> ROWS BETWEEN <start_expr> AND <end_expr>)
or
Function( ) OVER (PARTITON BY <expr1> ORDER BY <expr2,..> ROWS [<start_expr> PRECEDING or UNBOUNDED PRECEDING]
For ROW type windows the windowing clause is in terms of record numbers.
The query Query-10 has no apparent real life description (except column FROM_PU_C) but the various windowing clause are illustrated by a COUNT(*) function. The count simply shows the number of rows inside the window definition. Note the build up of the count for each column for the YEAR 1981.
The column FROM_P3_TO_F1 shows an example where start point of the window is before the current row and end point of the window is after current row. This is a 5 row window; it shows values less than 5 during the beginning and end.

-- The query below has no apparent real life description (except 
-- column FROM_PU_C) but is remarkable in illustrating the various windowing
-- clause by a COUNT(*) function.
 
SELECT empno, deptno, TO_CHAR(hiredate, 'YYYY') YEAR,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) FROM_P3_TO_F1,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM_PU_TO_C,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) FROM_P2_TO_P1,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 1 FOLLOWING AND 3 FOLLOWING) FROM_F1_TO_F3
FROM emp
ORDEDR BY hiredate

 EMPNO  DEPTNO YEAR FROM_P3_TO_F1 FROM_PU_TO_C FROM_P2_TO_P1 FROM_F1_TO_F3
------ ------- ---- ------------- ------------ ------------- -------------
  7369      20 1980             1            1             0             0
  <font bgcolor=yellow>7499      30 1981             2            1             0             3
  7521      30 1981             3            2             1             3
  7566      20 1981             4            3             2             3
  7698      30 1981             5            4             3             3
  7782      10 1981             5            5             3             3
  7844      30 1981             5            6             3             3
  7654      30 1981             5            7             3             3
  7839      10 1981             5            8             3             2
  7900      30 1981             5            9             3             1
  7902      20 1981             4           10             3             0</font>
  7934      10 1982             2            1             0             1
  7788      20 1982             2            2             1             0
  7876      20 1983             1            1             0             0

14 rows selected.
Query-10 (ROW type windowing example)

The column FROM_PU_TO_CURR shows an example where start point of the window is before the current row and end point of the window is the current row. This column only has some real world significance. It can be thought of as the yearly employee build-up of the organization as each employee is getting hired.
The column FROM_P2_TO_P1 shows an example where start point of the window is before the current row and end point of the window is before the current row. This is a 3 row window and the count remains constant after it has got 3 previous rows.
The column FROM_F1_TO_F3 shows an example where start point of the window is after the current row and end point of the window is after the current row. This is a reverse of the previous column. Note how the count declines during the end.

RANGE Windows

For RANGE windows the general syntax is same as that of ROW:
Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE BETWEEN <start_expr> AND <end_expr>)
or
Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

For <start_expr> or <end_expr> we can use UNBOUNDED PECEDING, CURRENT ROW or <sql_expr> PRECEDING or FOLLOWING. However for RANGE type windows <sql_expr> must evaluate to value compatible with ORDER BY expression <expr1>.
<sql_expr> is a logical offset. It must be a constant or expression that evaluates to a positive numeric value or an interval literal. Only one ORDER BY expression is allowed.
If <sql_expr> evaluates to a numeric value, then the ORDER BY expr must be a NUMBER or DATE datatype. If <sql_expr> evaluates to an interval value, then the ORDER BY expr must be a DATE datatype.
Note the example (Query-11) below which uses RANGE windowing. The important thing here is that the size of the window in terms of the number of records can vary.

-- For each employee give the count of employees getting half more that their 
-- salary and also the count of employees in the departments 20 and 30 getting half 
-- less than their salary.
 
SELECT deptno, empno, sal,
Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) CNT_LT_HALF,
COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) CNT_MT_HALF
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, sal

 DEPTNO  EMPNO   SAL CNT_LT_HALF CNT_MT_HALF
------- ------ ----- ----------- -----------
     20   7369   800           0           3
     20   7876  1100           0           3
     20   7566  2975           2           0
     20   7788  3000           2           0
     20   7902  3000           2           0
     30   7900   950           0           3
     30   7521  1250           0           1
     30   7654  1250           0           1
     30   7844  1500           0           1
     30   7499  1600           0           1
     30   7698  2850           3           0

11 rows selected.
Query-11 (RANGE type windowing example)

Order of computation and performance tips

Defining the PARTITOIN BY and ORDER BY clauses on indexed columns (ordered in accordance with the PARTITION CLAUSE and then the ORDER BY clause in analytic function) will provide optimum performance. For Query-5, for example, a composite index on (deptno, hiredate) columns will prove effective.
It is advisable to always use CBO for queries using analytic functions. The tables and indexes should be analyzed and optimizer mode should be CHOOSE.
Even in absence of indexes analytic functions provide acceptable performance but need to do sorting for computing partition and order by clause. If the query contains multiple analytic functions, sorting and partitioning on two different columns should be avoided if they are both not indexed.

Conclusion


The aim of this article is not to make the reader try analytic functions forcibly in every other complex SQL. It is meant for a SQL coder, who has been avoiding analytic functions till now, even in complex analytic queries and reinventing the same feature much painstakingly by native SQL and join query. Its job is done if such a person finds analytic functions clear, understandable and usable after going through the article, and starts using them.