Spot the ‘error’ in the blog title? Do you expect a DBMS to take longer to return a result set ordered or unordered? Ask yourself the following question – what would take you longer if you were doing this by hand?
or
This question has nothing to do with databases – this is a practical question, one of common sense if you prefer. The answer is clearly it should take you longer to do two tasks, identify the words on the page with the vowel as the second letter, then sorting them. It’s a no brainer isn’t it?
In terms of DBMS sorting implementations, many DBMS’s don’t perform a formal sort to return an ordered result set. For example, as each row hit during execution of a piece of SQL is identified (that is the row satisfies the predicates, defined in SQL in the WHERE clause), selected row content can non-expensively be internally represented in the correct sorted order for retrieval from the outset. This is often achieved by inserting the result set ORDER BY column values into a variant of a B-tree structure, with the corresponding row or object ID’s, and subsequently traversing the tree returning the content in the expected ordered relational form. This approach clearly obviates any second stage sorting, that may be very costly in terms of time and system resource.
I am not havering. Everything written above is written to set out my stall – the argument being that there is likely an additional cost/time for returning an order result set over an unordered one. Sometimes the additional cost is small, and sometimes it is not.
Unless there is a technical case for sorting SQL result sets in the database, I generally shy away from sorting within the database, especially in big-data systems. Furthermore, if application database drivers don’t preserve result-set ordering, there is little point sorting within the database as the result set will need to be sorted again in the application middleware or client-side anyway. Furthermore (again), a property of any relation in 1NF is that the relation is unordered, so arguably the result set should be sorted as returned to the invoking process.
With a sound understanding of this basic underlying relational theory, I recently opted to remove sorting/ordering in a piece of production code SQL in an attempt to reduce database server overhead. To my surprise, removing the SQL ORDER BY clause in a very simple piece of SQL degraded response query times. How can this be?
Here is a test case. Firstly, below I will create some dummy data and a mock function, ftest, that represents a function that executed slowly.
SQL> SQL> SQL> CREATE OR REPLACE FUNCTION ftest(a NUMBER) RETURN NUMBER DETERMINISTIC IS 2 BEGIN 3 -- This is a mock function for a function that in a production 4 -- system executes very slowly. 5 dbms_output.put_line('a: ' || TO_CHAR(a)); 6 RETURN a/2; 7 END; 8 / Function created. SQL> SQL> CREATE TABLE tblTest AS 2 SELECT CASE 3 WHEN ROWNUM < 3 THEN 1 4 ELSE ROWNUM -2 5 END rn 6 FROM dual 7 CONNECT BY ROWNUM<=8; Table created. SQL> SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'TBLTEST'); PL/SQL procedure successfully completed. SQL> SQL> SELECT * 2 FROM tblTest; RN ---------- 1 1 1 2 3 4 5 6 8 rows selected. SQL>
Note:
Note too that the value ‘1’ in table tblTest appears in triplicate.
Upon executing a small piece of test code in SQL/Plus, I see the following:
SQL> SQL> SQL> set serveroutput on size 1000000 SQL> SQL> SQL> SQL> SELECT rn, ftest(rn) f1, ftest(rn)*2 f2 2 FROM tblTest 3 ORDER BY 1 DESC; RN F1 F2 ---------- ---------- ---------- 6 3 6 5 2.5 5 4 2 4 3 1.5 3 2 1 2 1 .5 1 1 .5 1 1 .5 1 8 rows selected. a: 1 a: 1 a: 2 a: 2 a: 3 a: 3 a: 4 a: 4 a: 5 a: 5 a: 6 a: 6 SQL> SQL>
First of all, although the function is DETERMINISTIC, it is evident that it is executed 12 times – there are 12 lines output prefixed by a:. I would have liked the function to have been executed only 6 times (there are 6 unique values for rn in table tblTest and the function is DETERMINISTIC, that doesn’t mean in itself that the function should be invoked 6 times, but it does mean that it need not be invoked more than 6 times). This is far from impressive. If Oracle had chosen to execute this slow function only 6 times, performance of the whole system would have been dramatically improved.
But, this misdemeanour aside, there is surely a cost with performing the sort, as discussed above, especially as in the production system, query execution produces more than a mere handful of rows as shown above.
Modifying the same SQL query as shown above to remove the ORDER BY clause yields the following result set.
SQL> SQL> SQL> SELECT rn, ftest(rn) f1, ftest(rn)*2 f2 2 FROM tblTest; RN F1 F2 ---------- ---------- ---------- 1 .5 1 1 .5 1 1 .5 1 2 1 2 3 1.5 3 4 2 4 5 2.5 5 6 3 6 8 rows selected. a: 1 a: 1 a: 1 a: 1 a: 2 a: 2 a: 3 a: 3 a: 4 a: 4 a: 5 a: 5 a: 6 a: 6 SQL> SQL>
The function ftest is now invoked 14 times (a: is output 14 times). That is worse than before. Why? I don’t know, and although I could probably poke around in a 10046 or 10053 trace and contrive some unsound reason for this behaviour, I believe it can just be summed up as an unforeseeable oddity of Oracle. It really is so unexpected – removing an ORDER BY query decreases performance as the function is repeated more times unnecessarily.
Executing the function ftest an additional two times on my production system increased SQL execution time by around 3 minutes.
Of course the elephant in room is the RESULT_CACHE, and executing a modified version of the function with ORDER BY not unsurprisingly results in ftest only being invoked 6 times as shown below. Six, great, the efficient scenario.
SQL> SQL> SQL> CREATE OR REPLACE FUNCTION ftest(a NUMBER) RETURN NUMBER RESULT_CACHE IS 2 BEGIN 3 dbms_output.put_line('a: ' || TO_CHAR(a)); 4 RETURN a/2; 5 END; 6 / Function created. SQL> SQL> SQL> SELECT rn, ftest(rn) f1, ftest(rn)*2 f2 2 FROM tblTest; RN F1 F2 ---------- ---------- ---------- 1 .5 1 1 .5 1 1 .5 1 2 1 2 3 1.5 3 4 2 4 5 2.5 5 6 3 6 8 rows selected. a: 1 a: 2 a: 3 a: 4 a: 5 a: 6 SQL> SQL> SQL>
As the RESULT_CACHE caches result sets in the SGA, not PGA, this would likely have a significant impact on other processes in the database, in other words I’d be trading one problem for another. The option was not explored.
Regardless however, I still cannot fathom why introducing an ORDER BY reduces the number of times function ftest is invoked in the original query. It really is a mystery to me.
The code was developed and tested on an instance of Oracle as below:
SQL> SQL> set linesize 132 wrap off trunc off SQL> SQL> SELECT * 2 FROM v$version; BANNER CON_ID -------------------------------------------------------------------------------- ---------- Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0 PL/SQL Release 12.1.0.2.0 - Production 0 CORE 12.1.0.2.0 Production 0 TNS for Linux: Version 12.1.0.2.0 - Production 0 NLSRTL Version 12.1.0.2.0 - Production 0 SQL> SQL>
— Published by Mike, 12:26:39 26 January 2017
By Month: November 2022, October 2022, August 2022, February 2021, January 2021, December 2020, November 2020, March 2019, September 2018, June 2018, May 2018, April 2018
Apple, C#, Databases, Faircom, General IT Rant, German, Informatics, LINQ, MongoDB, Oracle, Perl, PostgreSQL, SQL, SQL Server, Unit Testing, XML/XSLT
Leave a Reply