Ask the Informix Pro 10-Minute Solution

Using Informix's "First N" Construct in SQL Statements
By Joe Lumbley

This 10-Minute-Solution covers a couple of areas. First, I look at a new construct in SQL that simplifies many programming tasks. Secondly, I look at how to step outside of SQL and accomplish your desired results by using a combination of SQL statements and UNIX scripts.

Until recently, programmers and SQL users have had to resort to Rube Goldberg–like machinations in order to accomplish tasks that initially seemed simple. Many applications need to be able to select a subset of a table based upon some sort of ordering. For example, consider the following types of requests:

  • Give me the 50 largest purchases on a certain day
  • Show me the top 150 customers at a particular store
  • List the top five countries in the world in silver production
These tasks initially seem simple. However, the problem is made much harder because, until recently, SQL didn't have the capability to limit the number of rows returned for a query.

For example, consider the following SQL statement:

SELECT * 
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC
This statement comes close to what I want, but it does not list the top 50 customers. If there are a million customers shopping on the current date, the query will returns all million rows.

Until recently, users have had to resort to such tactics as writing an ESQL/C program that creates a cursor for the SELECT and then only takes the top 50 entries. Other kludges have included outputting the results to a file and then using UNIX commands such as "head" to select the desired number of rows. Certainly not an elegant solution.

Beginning with IDS version 7.3, Informix has given us a partial solution to the problem with the "FIRST N rows" construct. Here's the syntax:

SELECT FIRST N <column_list> 
FROM <tablename>
WHERE <wherestatement>
ORDER by <ordercolumn> <DESC/ASC>;
I found it incredibly difficult to find documentation on using the FIRST N construct in any of the SQL documentation. This is often the case with new features: the documentation lags behind the implementation.

This example now does what I want:

SELECT FIRST 50 * 
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC
The only difference is the presence of the FIRST N clause immediately after the word "SELECT." Otherwise, it's just like any other SQL statement. This statement selects only certain columns:
SELECT FIRST 50 customer_name, sale_date, purchase_amount
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC
There are a few limitations to the "FIRST N" construct, which force you to resort to kludges in some cases. Perhaps the most egregious is the fact that (at least in my copy of 7.30 UC.10) the SELECT statement cannot be used in an INSERT statement. Thus, this code won't work (it gets a "Cannot use first in this context" error):
INSERT INTO my_table <note:my_table already exists >
SELECT FIRST 50 customer_name, sale_date, purchase_amount
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC
It also does not work if you try to insert the rows into a temp table:
SELECT FIRST 50 customer_name, sale_date, purchase_amount
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC
INTO TEMP my_table
Another limitation is that the FIRST N statement cannot be used as a subquery. Thus, this will not work:
SELECT * from address 
WHERE customer_name in
(SELECT FIRST 50 customer_name
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC)
Again, it generates the context error.

So, what good is FIRST N in the real world if the rows can't be inserted into another table and they can't be part of a subquery? Simple tasks such as picking the top rows from a table with a single set of parameters can work, but what happens when you have a more complex task? Consider, for example, the following real-world task that arose in a CRM application.

Prepare a list of the top 150 customers at each store of an 800-store chain for a promotional mailing. You need the output in an ASCII file.

Here you have to create a Rube Goldberg script to do the job. The saving factor is that the following types of statements do work correctly:

UNLOAD TO myfile.txt
SELECT FIRST 50 customer_name, sale_date, purchase_amount
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC;

OUTPUT to myfile.txt
SELECT FIRST 50 customer_name, sale_date, purchase_amount
FROM sales
WHERE sale_date = "CURRENT"
ORDER by purchase_amount DESC;
Each of these commands works correctly, but due to the inability to use FIRST N in a subquery, you cannot do the assignment in one script. Actually, you have to create many output files, one for each store, and then manipulate them.

The solution is first to create a flat file with just the needed store numbers:

$ cat store_file
 12345
 67890
 42343
 . . . (all 800 store numbers)
Then create a shell script (Korn shell is easiest):
$cat top_150

dbaccess crm_database <<EOF
UNLOAD TO store_$1
SELECT FIRST 150 customer_name, accounting_period, purchase_amount
FROM sales
WHERE accounting_period = "week25" AND
 Store_code = $1
ORDER by purchase_amount DESC;
EOF
This script is a UNIX "here document" that runs dbaccess against the database "crm_database" and passes a parameter (the store_code) as the variable $1 into the script, generating one ASCII file with the data for one store.

Now create a driver script that runs the top_150 script 800 times, thus sending the output to 800 separate flat files:

$ cat run_top_150
for store in `cat store_file`
do
top_150 $store
done
These two scripts together generate 800 ASCII files called store_xxxx, where xxxx represents the store_code.

This entire process is an example of using a combination of UNIX scripts and SQL statements to get around the limitations of either UNIX or SQL alone. In many instances, the tasks can be accomplished better in UNIX than in SQL. On the other hand, there are many things that can be done simply in SQL that are difficult or impossible to do in UNIX. Common examples include grouping, summing rows, and complex sorting tasks. I've been known to approach a task that has no connection to the database by putting the data into ASCII files, loading it into a database table, doing the data manipulation in SQL, and finally outputting the files in ASCII.

If there's a moral to this little tale, it's to keep your mind open to alternative ways of doing something. Just because something does not work in SQL doesn't mean it can't be done.

 
Other 10-Minute Solutions
 Writing Procedural Code in Non-Procedural SQL
 Web Access to Informix via ODBC
 Accessing Informix from Microsoft Access
 Database Administration Over the Web: Installing Informix Server Administrator
 Using Informix Server Administrator
 Customizing Informix Server Administrator
 WinSQL: Non-Informix Tools to Access Informix Databases
 Installing and Testing JDBC for Informix Systems
 Using Informix's "First N" Construct in SQL Statements
 Date and Time Manipulation in Informix
 Parsing ASCII Files Elegantly Using SQL
 Hardening Your Informix Database System
 Keeping Your Informix rootdbs Lean and Clean
 Physical Design Issues for Large, Growing Database Systems
 Install and Set Up Server Studio JE for Informix




Sponsored Links


Advertising Info  |   Member Services  |   Contact Us  |   Help  |   Feedback  |   Site Map
Jupiterweb networks

internet.comearthweb.comDevx.comClickZ

Search Jupiterweb:

Jupitermedia Corporation has four divisions:
JupiterWeb, JupiterResearch, JupiterEvents, and JupiterImages

Copyright 2004 Jupitermedia Corporation All Rights Reserved.
Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Jupitermedia Corporate Info | Newsletters | Tech Jobs | E-mail Offers