This article explains indexes, unique keys and primary keys. It explains what is their purpose, how to create these types of objects, and what are the differences between them.
The table below summarizes their properties:
|type||faster search?||rejects duplicate values?||rejects null values?||can be only one per table?|
What is an index in SQL:
The purpose of an index is to retrieve information from a table faster.
When creating an index, you give:
- the name of the index.
- the name of a table.
- the name of column(s) in this table that is (are) part of the index.
When executing a query on a table with many rows, the where condition should use the column(s) part of the index. The query will then find the result faster.
Example of index:
The table named “clients” has the columns first_name, last_name, phone_number, state,email_address,salesperson_id.
The administrator creates an index on columns first_name and last_name.
You can make the following query faster thanks to this index:
select * from clients
where first_name = 'aamir' and last_name = 'matao'
Why do you say ‘can’ be faster, is it not always faster?
It is not always faster. In the following cases, the index is not useful:
- if most rows have the same value in the column chosen as index.
- if there are very few rows.
Why not putting an index on every column?:
You should use an index only when necessary, because:
- each index takes place on the disk, just like a table does.
- when you add an index on a table, it takes more time to update information in this table. Indeed, when you update the table, you need to update the index as well.
What is a unique key in SQL?
Unique keys also help to find information faster, but also make sure that:
- if the unique key has one column, you cannot have two rows with the same value in this column.
- if the unique key has several columns, you cannot have two rows with the same combination of values in these columns.
Example with one column:
If you have a row in this table with email_address equal to ‘email@example.com’, you cannot insert a new row with email_address also equal to ‘firstname.lastname@example.org’. If you try, then you receive an error message like this one:
Duplicate entry ‘email@example.com’ for key ‘ux_clients_email_address’’
Example with two columns:
What is a primary key in SQL?
A primary key can also help finding information faster and prevents duplicate values. Additionally:
- there can only be one primary key per table.
- some DBMSes like mySQL order the rows in the table using the column(s) related to the primary key. As a result, we will retrieve a range of value in this(these column(s) faster.
- the primary key will prevent insertions of records for which the value of the column(s) related to the primary key is null.
Note that a primary key can also consist of more than one column.
In this statement:
- clients is the name of the table on which we add the primary key.
- id is the column used as primary key
What do you mean by ordering rows in the table?
When you do not specify a primary key, rows appear in the order you inserted them.
Example: you executed 3 inserts in this order (you inserted Spain first, then India, then Thailand):
insert into countries(id,name) values (2,’Spain’)
insert into countries(id,name) values (1,’India’)
insert into countries(id,name) values (3,’Thailand’)
When using the query “select * from countries”, you will see the following table:
If you create a primary index on the column “id” (alter table countries add primary key(id)), then the same query will display this result:
You can see that the result is ordered by id: row with id 1 (India) is first (appears at the top), row with id 2 is second, row with id 3 is third and last (appear at the bottom of the results).
What do you mean by retrieving a range of values faster?
Example: you have a primary key in column id of the table salespersons. The following query will be faster:
select * from salesperson
where id between 10 and 100
As results are ordered by id (as explained in the answer to the last question):
- id 11 is on the line just under 10
- 12 is on the line just under 13
- …and so on until row 100.
As a result, once the DBMS retrieves row with id 10, it is easy for him to find the rest of the rows: they are just underneath.