next up previous contents index
Next: Unique Indexes Up: Performance Previous: Performance

  
Indexes

  When accessing a table, POSTGRESQL normally reads from the beginning of the table to the end, looking for relevant rows. With an index, it can quickly find specific values in the index, then go directly to matching rows. In this way, indexes allow fast retrieval of specific rows from a table.

For example, consider the query SELECT * FROM customer WHERE col = 43. Without an index, POSTGRESQL must scan the entire table looking for rows where col equals 43. With an index on col, POSTGRESQL can go directly to rows where col equals 43, bypassing all other rows.

For a large table, it can take minutes to check every row. Using an index, finding a specific row takes fractions of a second.

Internally, POSTGRESQL stores data in operating system files. Each table has its own file, and data rows are stored one after another in the file. An index is a separate file that is sorted by one or more columns. It contains pointers into the table file, allowing rapid access to specific values in the table.

POSTGRESQL does not create indexes automatically. Instead, users should create them for columns frequently used in WHERE clauses.

To create an index, use the CREATE INDEX command, as shown in Figure [*].  

        test=> CREATE INDEX customer_custid_idx ON customer (customer_id);
        CREATE
 

In this example, customer_custid_idx is the name of the index, customer is the table being indexed, and customer_id is the column being indexed. Although you can use any name for the index, it is good practice to use the table and column names as part of the index name--for example, customer_customer_id_idx or i_customer_custid. This index is useful only for finding rows in customer for specific customer_ids. It cannot help when you are accessing other columns, because indexes are sorted by a specific column.

You can create as many indexes as you wish. Of course, an index on a seldom-used column is a waste of disk space. Also, performance can suffer if too many indexes exist, because row changes require an update to each index.

It is possible to create an index spanning multiple columns. Multicolumn indexes are sorted by the first indexed column. When the first column contains several equal values, sorting continues using the second indexed column. Multicolumn indexes are useful only on columns with many duplicate values.

The command CREATE INDEX customer_age_gender_idx ON customer (age, gender) creates an index that is sorted by age and, when several age rows have the same value, then sorted on gender. This index can be used by the query SELECT * FROM customer WHERE age = 36 AND gender = 'F' and the query SELECT * FROM customer WHERE age = 36.

The index customer_age_gender_idx is useless if you wish to find rows based only on gender, however. The gender component of the index can be used only after the age value has been specified. Thus, the query SELECT * FROM customer WHERE gender = 'F' cannot use the index because it does not place a restriction on age, which is the first part of the index.

Indexes can be useful for columns involved in joins, too. They can even be employed to speed up some ORDER BY  clauses.

To remove an index, use the DROP INDEX command. See the CREATE_INDEX and DROP_INDEX manual pages for more information.


next up previous contents index
Next: Unique Indexes Up: Performance Previous: Performance
Bruce Momjian
2001-05-09