By now you are already a master of
WHERE and know all about various types of filtering. In the previous lesson we have introduced aggregations which are used very often. We have also mixed filtering and aggregations in some examples.
In SQL there are two types of filtering, the one in
WHERE which we covered extensively and one in
HAVING which is what we will introduce today. Most simply put
HAVING works per group, keeping only groups which pass a condition which is an aggregation by itself. Ok it is not easy to explain it with a single definition but as always our barrage of examples will guide us through the murky waters of SQL.
You can use either or both types of filtering your queries.
These examples are not linked to any specific SQL table but to general idea of
count(*) < 1000are not part of the output.
The general idea is to to look at the aggregate property for a group and then filter away groups which do not satisfy that aggregate property.
Here is an example of an SQL query using the
The question is: “What are the schools where the average verbal score is less than 495?”
SELECT school, avg(sat_verbal) FROM datasets.sat_scores GROUP BY school HAVING avg(sat_verbal) < 495
HAVING comes after
GROUP BY. Using
HAVING without group by does not make sense and will give you an error.
NB: Until you master
HAVING you will often see the dreaded message
The query returned no data. To solve this write your query without
HAVING but with the aggregation you care about in the
SELECT part so you can analyze the numbers.
“Which passenger classes have more than 100 survivors?”
This question is a bit tricky because it does not appear to be a group by question, but it is. The only trick is that we don’t calculate any aggregate value in
SELECT but use aggregations to filter away groups using
SELECT pclass FROM datasets.titanic GROUP BY pclass HAVING sum(survived) > 100
“Which passenger classes have more than 10 survivors while the average age is less than 30”
SELECT pclass FROM datasets.titanic GROUP BY pclass HAVING sum(survived) > 10 AND avg(age) <= 30
Just like regular
WHERE you can use
NOT. You can use the standard suite of operators with the exception of
“Which passenger classes have more than 10 female survivors with the average age being less than 30”
SELECT pclass FROM datasets.titanic WHERE sex = 'female' GROUP BY pclass HAVING sum(survived) > 10 AND avg(age) <= 30
As you see,
HAVING can coexist in the same query.
When wondering what should go to
WHERE and what should go to
HAVING ask the question:
If the answer is ‘per entity’ then it goes to
WHERE, otherwise it goes to
This lesson is rather short because it covers only a single concept.