Next: Outer Joins Up: Combining SELECTs Previous: UNION, EXCEPT, and INTERSECT

Subsections

Subqueries

Subqueries are similar to SELECT chaining. While SELECT chaining combines SELECTs on the same level in a query, however, subqueries allow SELECTs to be embedded inside other queries. They can perform several functions:

They can take the place of a constant.
They can take the place of a constant yet vary based on the row being processed.
They can return a list of values for use in a comparison.

Subqueries can be quite complicated. If you have trouble understanding this section, skip over it and return to it later.

Subqueries as Constants

A subquery, also called a subselect, can replace a constant in a query. While a constant never changes, a subquery's value is computed every time the query is executed.

As an example, let's use the friend table from the previous chapters. Suppose we want to find friends who are not in the same state as Dick Gleason. We could place his state in the query using the constant string 'NJ'. If he moves to another state, however, we would have to change the query. Using the state column is more reliable.

One query uses a self-join to do the comparison to Dick Gleason's state. (Self-joins were covered in Section .) The last query uses a subquery that returns the state as 'NJ'; this value is used by the upper query. The subquery has taken the place of a constant. Unlike a constant, however, the value is computed every time the query is executed.

Although we have used table aliases in the subquery for clarity, they are not required. A column name with no table specification is automatically paired with a table in the current subquery. If no matching table is found in the current subquery, higher parts of the query are searched for a match. The state, firstname, and lastname in the subquery refer to the instance of the friend table in the subquery. The same column names in the upper query automatically refer to the friend instance in that query. If a column name matches two tables in the same subquery, an error is returned, indicating the column is ambiguous.

Subqueries can also eliminate table joins . For example, consider the mail-order parts company used in Figures and on page . To find the customer name for order number 14673, we join the salesorder and customer tables, as shown in the first query in Figure . test=> SELECT name test-> FROM customer, salesorder test-> WHERE customer.customer_id = salesorder.customer_id AND test-> salesorder.order_id = 14673; name -------------------------------- Fleer Gearworks, Inc. (1 row) test=> SELECT name test-> FROM customer test-> WHERE customer.customer_id = ( test(> SELECT salesorder.customer_id test(> FROM salesorder test(> WHERE order_id = 14673 test(> ); name -------------------------------- Fleer Gearworks, Inc. (1 row)

The second query in the figure does not have a join , but instead gets the customer_id from a subquery. In general, if a table is involved in only one join, and no columns from the table appear in the query result, the join can be eliminated and the table moved to a subquery.

In this example, we have specified salesorder.customer_id and customer.customer_id to clearly indicate the tables being referenced. However, this specification is not required. We could have used only customer_id in both places. POSTGRESQL finds the first table in the same subquery or higher that contains a matching column name.

Subqueries can be used anywhere a computed value is needed. Each has its own FROM and WHERE clauses. It can also have its own aggregate, GROUP BY , and HAVING clauses. A subquery's only interaction with the upper query is the value it returns. This approach allows sophisticated comparisons that would be difficult if the subquery's clauses had to be combined with those of the upper query.

Subqueries as Correlated Values

In addition to acting as constants in queries, subqueries can act as correlated values. Correlated values vary based on the row being processed. A normal subquery is evaluated once and its value used by the upper query. In a correlated subquery, the subquery is evaluated repeatedly for every row processed.

For example, suppose you want to find the name of your oldest friend in each state. You can accomplish this task with HAVING and table aliases, as shown in the first query of Figure . test=> SELECT f1.firstname, f1.lastname, f1.age test-> FROM friend f1, friend f2 test-> WHERE f1.state = f2.state test-> GROUP BY f2.state, f1.firstname, f1.lastname, f1.age test-> HAVING f1.age = max(f2.age) test-> ORDER BY firstname, lastname; firstname | lastname | age -----------------+----------------------+----- Ned | Millstone | 27 Sandy | Gleason | 25 Sandy | Weber | 33 Victor | Tabor | 22 (4 rows) test=> SELECT f1.firstname, f1.lastname, f1.age test-> FROM friend f1 test-> WHERE age = ( test(> SELECT MAX(f2.age) test(> FROM friend f2 test(> WHERE f1.state = f2.state test(> ) test-> ORDER BY firstname, lastname; firstname | lastname | age -----------------+----------------------+----- Ned | Millstone | 27 Sandy | Gleason | 25 Sandy | Weber | 33 Victor | Tabor | 22 (4 rows)

Alternatively, you can execute a subquery for each row that finds the maximum age for that state. If the maximum age equals the age of the current row, the row is output, as shown in the second query. This query references the friend table two times, using the aliases f1 and f2. The upper query uses f1. The subquery uses f2. The correlating specification is WHERE f1.state = f2.state, which creates a correlated subquery because the subquery references a column from the upper query. Such a subquery cannot be evaluated once and the same result used for all rows. Instead, it must be evaluated for every row because the upper column value can change.

Subqueries as Lists of Values

The previous subqueries returned one row of data to the upper query. If any of the previous subqueries returned more than one row, an error would be generated: ERROR: More than one tuple returned by a subselect used as an expression. It is possible, however, to have subqueries return multiple rows.

Normal comparison operators like equal and less-than expect a single value on the left and on the right. For example, equality expects one value on the left of the equals sign (=) and one on the right--for example, col = 3. Two special comparisons, IN and NOT IN, allow multiple values to appear on the right side. For example, the test col IN (1,2,3,4) compares col against four values. If col equals any of the four values, the comparison will return true and output the row. The test col NOT IN (1,2,3,4) will return true if col does not equal any of the four values.

You can specify an unlimited number of values on the right side of an IN or NOT IN comparison. More importantly, a subquery (instead of a constant) can be placed on the right side. It can then return multiple rows. The subquery is evaluated, and its output used like a list of constant values.

Suppose we want all employees who took sales orders on a certain date. We could perform this query in two ways. We could join the employee and salesorder tables, as shown in the first query of Figure . test=> SELECT DISTINCT employee.name test-> FROM employee, salesorder test-> WHERE employee.employee_id = salesorder.employee_id AND test-> salesorder.order_date = '7/19/1994'; name -------------------------------- Lee Meyers (1 row) test=> SELECT name test-> FROM employee test-> WHERE employee_id IN ( test(> SELECT employee_id test(> FROM salesorder test(> WHERE order_date = '7/19/1994' test(> ); name -------------------------------- Lee Meyers (1 row)

Alternatively, we could use a subquery, as shown in the second query. In this case, the subquery is evaluated and generates a list of values used by IN to perform the comparison. The subquery is possible because the salesorder table is involved in a single join, and the query does not return any columns from the salesorder table.

A NOT IN comparison returns true if a column's value is not found. For example, suppose we want to see all customers who have never ordered a product. That is, we need to find the customers who have no sales orders. This task cannot be accomplished with a join. We need an anti-join , because we want to find all customer rows that do not join to any salesorder row. Figure shows the relevant query. test=> SELECT name test-> FROM customer test-> WHERE customer_id NOT IN ( test(> SELECT customer_id test(> FROM salesorder test(> ); name ------ (0 rows)

The subquery returns a list of customer_ids representing all customers who have placed orders. The upper query returns all customer names where the customer_id does not appear in the subquery output.

NOT IN and Subqueries with NULL Values

If a NOT IN subquery returns a NULL value, the NOT IN comparison always returns false. NOT IN requires the upper column to be not equal to every value returned by the subquery. Because all comparisons with NULL return false--even inequality comparisons--NOT IN returns false. NULL comparisons were covered in Section .

We can prevent NULL values from reaching the upper query by adding IS NOT NULL to the subquery. As an example, in Figure , if any NULL customer_id values existed, the query would return no rows. We can prevent this situation by adding WHERE customer_id IS NOT NULL to the subquery. An IN subquery does not have this problem with NULLs.

Subqueries Returning Multiple Columns

Although most subqueries return a single column to the upper query, it is possible to handle subqueries returning more than one column. For example, the test WHERE (7, 3) IN (SELECT col1, col2 FROM subtable) returns true if the subquery returns a row with 7 in the first column and 3 in the second column. The test WHERE (uppercol1, uppercol2) IN (SELECT col1, col2 FROM subtable) performs equality comparisons between the upper query's two columns and the subquery's two columns. Multiple columns in the upper query can then be compared with multiple columns in the subquery. Of course, the number of values specified on the left of IN or NOT IN must be the same as the number of columns returned by the subquery.

ANY, ALL, and EXISTS Clauses

IN and NOT IN are special cases of the more generic subquery clauses ANY, ALL, and EXISTS. ANY will return true if the comparison operator is true for any value in the subquery. For example, the test col = ANY(5,7,9) returns true if col equals any of the three values. ALL requires all subquery values to compare as true, so col != ALL(5,7,9) returns true if col is not equal to all three values. IN() is the same as = ANY(), and NOT IN() is the same as <> ALL().

Normally, you can use operators like equal and greater-than only with subqueries returning one row. With ANY and ALL, however, comparisons can be made with subqueries returning multiple rows. They allow you to specify whether any or all of the subquery values, respectively, must compare as true.

EXISTS returns true if the subquery returns any rows, and NOT EXISTS returns true if the subquery returns no rows. By using a correlated subquery , EXISTS permits complex comparisons of upper-query values inside the subquery. For example, two upper-query variables can be compared in the subquery's WHERE clause. EXISTS and NOT EXISTS do not specify anything in the upper query, so it does not matter which columns are returned by the subquery.

Figure shows the IN subquery from Figure , with the query rewritten using ANY and EXISTS. SELECT name FROM employee WHERE employee_id IN ( SELECT employee_id FROM salesorder WHERE order_date = '7/19/1994' ); SELECT name FROM employee WHERE employee_id = ANY ( SELECT employee_id FROM salesorder WHERE order_date = '7/19/1994' ); SELECT name FROM employee WHERE EXISTS ( SELECT employee_id FROM salesorder WHERE salesorder.employee_id = employee.employee_id AND order_date = '7/19/1994' );

Notice that the EXISTS subquery uses a correlated subquery to join the employee_id columns of the two tables. Figure shows the NOT IN query from Figure , with the query rewritten using ALL and NOT EXISTS. SELECT name FROM customer WHERE customer_id NOT IN ( SELECT customer_id FROM salesorder ); SELECT name FROM customer WHERE customer_id <> ALL ( SELECT customer_id FROM salesorder ); SELECT name FROM customer WHERE NOT EXISTS ( SELECT customer_id FROM salesorder WHERE salesorder.customer_id = customer.customer_id );

Summary

A subquery can represent a fixed value, a correlated value, or a list of values. You can use any number of subqueries. You can also nest subqueries inside other subqueries.

In some cases, subqueries simply provide an alternative way to phrase a query. In others, a subquery is the only way to produce the desired result.

Next: Outer Joins Up: Combining SELECTs Previous: UNION, EXCEPT, and INTERSECT

Bruce Momjian
2001-05-09