# SQL Intermediate Cheatsheet — Aggregate Functions

The intermediate SQL tutorial will go over 2 main areas: aggregate functions (aggregate data using summary statistics) and joins (combine 2 or more tables). I will cover various aggregate functions in this section, ranging from COUNT and SUM to GROUP BY and CASE WHEN. In the next part, I will also go over the various types of joins, such as INNER JOIN and LEFT JOIN.

These cheatsheets focus on first presenting the theories and then applying them to real-world examples to illustrate how the concepts work in practice.

# Aggregate Functions

SQL is excellent at aggregating data similar to how you might in an Excel pivot table. Aggregate functions are used all the time, so it’s definitely important to get comfortable with them.

The specific functions I will cover are COUNT, SUM, DISTINCT, GROUP BY, HAVING, and CASE. There are also other basic aggregate functions such as MIN/MAX and AVG. Specifically, MIN(price), MAX(price), and AVG(price) will output the minimum, maximum, and average price from a given table.

# COUNT

COUNT is the easiest aggregate function, which counts the number of rows in a particular column. The COUNT function can be used on both numerical and non-numerical (i.e. text) values.

## COUNT all rows

`SELECT COUNT(*)`

FROM table_name

COUNT(*) will output a single number which equals the **total number of rows in the entire table**. (Note: using COUNT(1) has the same effect as COUNT(*) )

## COUNT individual columns

`SELECT COUNT(user_ID)`

FROM table_name

The query above returns the number of rows in which user_ID **is not null**.

**COUNT DISTINCT values**

`SELECT COUNT(DISTINCT user_ID)`

FROM table_name

This returns the total **unique values **of the user_ID column.

# SUM

SUM totals the values in a given column, Unlike COUNT, **SUM can only be applied to numerical columns**. Another key difference between the two is the presence of null values. As **SUM treats nulls as 0**, you don’t need to worry as much about null values here as you would with COUNT.

The query below will calculate the sum of the volume column from a table with Apple stock prices.

`SELECT SUM(volume)`

FROM apple_stock_price

Note that **aggregators like SUM only aggregate vertically**. Calculation across rows can be done using simple arithmetic.

# GROUP BY

GROUP BY comes in handy when you want to aggregate only a subset of the table by **separating the data into groups, **which can be aggregated independently. On the other hand, functions like COUNT and SUM have the commonality of aggregating across entire tables.

GROUP BY is often used with the aforementioned aggregate functions: COUNT, SUM, AVG, MIN, MAX.

`SELECT year, month, SUM(volume)`

FROM apple_stock_price

GROUP BY year, month

ORDER BY year, month

The above query will output the **total number of Apple shares traded (grouped by) each month**, sorted in (ascending) chronological order.

`SELECT product, COUNT(*)`

FROM transactions

GROUP BY product

The query above will calculate the **number of transactions for (grouped by) each product.**

# HAVING

HAVING is used to **filter aggregate columns, **which is similar to the WHERE clause. Note that HAVING always comes after GROUP BY (filter aggregate functions), whereas WHERE comes before GROUP BY.

`SELECT year, month, MAX(price)`

FROM apple_stock_price

GROUP BY year, month

HAVING MAX(price) > 400

Here we’re finding **every month in which Apple stock worked its way over $400**.

`SELECT product, AVG(price)`

FROM transactions

GROUP BY product

HAVING AVG(price) <= 200

This query above will output **which product had an average price of $200 or less**.

# CASE WHEN

The CASE WHEN statement is SQL’s version of **if/then** logic. Let’s take a closer look at the syntax:

`SELECT name, school_year, CASE WHEN school_year IN ('senior', 'junior') THEN 'yes' ELSE NULL END AS is_senior `

FROM ncaa_players

The query above can be interpreted as the following:

- CASE WHEN checks each row to see if school_year is ‘junior’ or ‘senior.
- For a given row: if the conditional statement is true, ‘yes’ gets printed in the newly created is_senior column. If false, there will be a null value in the is_senior column.
- Meanwhile, SQL will output all values in the name and school_year columns.

Here are two specific examples that illustrate the power of CASE WHEN.

`SELECT CASE WHEN state IN ('CA', 'WA', 'OR') THEN 'west coast' WHEN state = 'TX' THEN 'texas' ELSE 'other' END AS region, COUNT(1) AS player_count`

FROM ncaa_players

WHERE weight > 200

GROUP BY 1

**Count the number of players weighing more than 200 pounds for West Coast, Texas, and the other regions**. (Note: COUNT(1) and GROUP BY 1 both refer to aggregation on the first column, which is the region classification).

`SELECT state, COUNT(CASE WHEN year = 'junior' THEN 1 ELSE NULL END) AS junior_count, COUNT(CASE WHEN year = 'senior' THEN 1 ELSE NULL END) AS senior_count, COUNT(1) AS total_players_by_state`

FROM ncaa_players

GROUP BY state

ORDER BY total_players_by_state DESC

**Display the number of juniors and seniors by state, as well as the total number of players in each state.** Order the output results such that states with the most players come first.

References:

mode.com