2. Anatomy of a SQL Query - SELECT and WHERE
Understanding the structure of a SQL query is essential for data analysis, and in this section, we’ll dive into the fundamental components of a query, focusing on the SELECT and WHERE clauses. SQL (Structured Query Language) allows users to interact with databases, extracting specific data points through queries. Let’s break down the full anatomy of a query, step-by-step.
1. SELECT
The SELECT
clause is at the core of every query. It specifies the columns you want to retrieve from one or more tables. For example:
SELECT users.email, categories.name
This query retrieves the email
from the users table and the name
from the categories table. The format is always table.column
to clearly define where the data comes from.
- You can also include functions like
COUNT(*)
in theSELECT
clause for aggregation, which will be discussed further.
2. FROM
The FROM
clause determines which table or tables you are querying. It defines the source of the data:
FROM users
In this example, we are selecting data from the users table. The FROM
clause is typically followed by one or more tables that serve as the root for the query.
3. JOIN
If you need to retrieve data from multiple tables, SQL allows you to join them together using a JOIN
clause. The JOIN
clause links tables based on related columns, such as primary and foreign keys:
JOIN requests ON users.user_id = requests.user_id
This example joins the users table with the requests table based on the user_id. You can join as many tables as needed to combine relevant data.
4. WHERE
The WHERE
clause filters the data returned by the query. It specifies conditions that the data must meet to be included in the result:
WHERE users.name = 'Anderson'
This query will return only the rows where the name is "Anderson." You can use multiple conditions with AND
or OR
, depending on the logic required:
WHERE users.name = 'Anderson' AND users.user_id < 100
Here, the query retrieves users whose name is "Anderson" and whose user ID is less than 100.
5. GROUP BY
The GROUP BY
clause allows you to aggregate data based on one or more columns. For example, if you want to group data by email and category name, you would use:
GROUP BY users.email, categories.name
This ensures that each unique combination of email and category is grouped together in the results.
6. HAVING
The HAVING
clause works similarly to WHERE
, but it applies to aggregated data (data grouped by GROUP BY
). For instance, after grouping data, you can filter the results further:
HAVING COUNT(requests.id) > 5
This will return only groups where the count of requests exceeds five.
7. ORDER BY
To sort the results, use the ORDER BY
clause. You can specify ascending or descending order:
ORDER BY users.user_id DESC
This query orders the results by user_id
in descending order.
8. LIMIT
The LIMIT
clause restricts the number of rows returned, which is helpful when dealing with large datasets:
LIMIT 10
This query will return only the first 10 results.
Practical Examples with SELECT and WHERE
Let’s explore simpler queries that use the SELECT
and WHERE
clauses to filter and retrieve data.
1. Retrieve All Data from the Users Table
To return all data from the users table, you can use SELECT *
to retrieve every column:
SELECT * FROM users;
The *
wildcard tells SQL to return all columns.
2. Retrieve Specific Columns (Name and Email)
To return only the name and email columns:
SELECT name, email FROM users;
This will only return the name and email columns, ignoring other data in the table.
3. Distinct Values
To return unique (distinct) values, use the DISTINCT
keyword:
SELECT DISTINCT name FROM users;
This will return each unique name from the users table. Similarly, if you want unique combinations of name and email:
SELECT DISTINCT name, email FROM users;
4. Count the Total Number of Users
You can use the COUNT()
function to count the number of rows (users) in the users table:
SELECT COUNT(*) FROM users;
5. Filtering with WHERE
The WHERE
clause allows you to filter data. For example, to return all users named “Bernardo”:
SELECT * FROM users WHERE name = 'Bernardo';
You can also filter using numerical or date comparisons:
SELECT * FROM users WHERE created_at > '2023-01-01';
You can filter data in a range:
SELECT * FROM users WHERE user_id BETWEEN 10 AND 100;
6. Using LIKE for Pattern Matching
If you want to match partial strings, use the LIKE
operator. For example, to find all names that start with “Bern”:
SELECT * FROM users WHERE name LIKE 'Bern%';
The %
symbol is a wildcard that matches any characters.
7. Combining Multiple Conditions
To filter by more than one condition, use AND
or OR
operators:
SELECT * FROM users WHERE name = 'Bernardo' AND user_id < 100;
Exercises
Here are some practical exercises to reinforce these concepts. Try to answer the following queries using SQL:
- How many users have the name "Lucas"?
- How many users were created after April 15, 2018?
- How many users have signed in between 5 and 10 times?
- How many users have confirmed accounts (i.e., non-null confirmation date)?
- How many URLs contain the phrase "reformas e reparos"?
The answers can be written in SQL queries. For example, to find how many users are named "Lucas":
SELECT COUNT(*) FROM users WHERE name = 'Lucas';
By learning to craft SQL queries, you gain control over data analysis, allowing you to extract insights efficiently and make informed decisions faster as a Product Manager.