Here are some notes I’ve been making from the book
SQL Cookbook Query Solutions and Techniques for All SQL Users(Anthony Molinaro, Robert de Graaf), specifically from chapter 3.
Chapter 3
To return data stored from more than one table, when they have common data types, use UNION ALL. It combines rows from multiple row sources into one result set. As with all set operations, the items in all select lists must match in number and data type. Example: Suppose we want to superimpose a list of employees in dept 10 with their department number, and the names of departments with their deptno, we’d do something like this: SELECT ename as ename_and_dname, deptno FROM employee WHERE deptno = 10 UNION ALL SELECT ‘——————-’, NULL, FROM t1 UNION ALL SELECT dname, deptno FROM dept;
Note that duplicates will get copied over with UNION ALL. To avoid duplicates, we use UNION, which acts as DISTINCT used over a UNION ALL query.
When you use DISTINCT, the database compares the values of the selected columns across all rows and returns only unique combinations. Example: Select all distince combinatinos of job and department from the table employee. SELECT DISTINCT job, department FROM employee;
We now come to joins. Suppose you wanted to return rows from multiple tables by joining on a known common column or joining on columns that share common values. For instance, given an employee table that has the employee name and id, and a department table which has an employee id and location, we would use an Equi-join, a type of inner join, to get the result. The query would look like this: SELECT e.name, d.location FROM employee e, department d WHERE e.id = d.id;
Suppose we wanted to get all possible combinations of the rows of two tables, we would use the Cross-join, which gets the cartesian product of rows in each table. The cross join requires that each table have at least one row to function.
Inner joins and subsequently equi-joins (which are based on an equality condition) is produced from a cross join, and then the condition in the WHERE clause restricts the result set. We can also do this using an explicit join. Note that inner is optional as it is the default join method. Considering the same example as before: SELECT e.name, d.location FROM employee e INNER JOIN department d ON (e.id = d.id);
What if we wanted to find common rows between two tables? We could join on all the columns necessary to get the correct result, or we could use the INTERSECT set operation. Example: Join table employee to view V using multiple join conditions:
SELECT e.id, e.name, e.job, e.salary, e.department FROM emp e, V WHERE e.name = v.name AND e.job = v.job AND e.salary = v.salary;
SELECT e.id, e.name, e.job, e.salary, e.department FROM emp e JOIN V ON (e.name = v.name AND e.job = v.job AND e.salary = v.salary );
Suppose we don’t want to return columns from view V, you can use the INTERSECT operation with an IN predicate: SELECT id, name, job, salary, department FROM employee WHERE (name, job, salary) in ( SELECT name, job, salary FROM emp INTERSECT SELECT name, job, salary FROM V);
Note that INTERSECT requires the same number of items, of the same data types from the two tables. By default, duplicate rows will not be returned.
Suppose you want to get values in one table that do not exist in a target table. for instance, finding which departments (if any) in table department do not exist in table employee. Postgres has a shortcut for this compared to mysql using the EXCEPT set operation. Note that duplicate rows are eliminated using EXCEPT. NULLs are treated as values:
If department.deptno contains NULL, and no matching NULL exists in employee.deptno, then NULL will appear in the result.
If both tables have NULL in deptno, then NULL is considered “matched” and excluded.
postgres: SELECT deptno from department EXCEPT SELECT deptno FROM employee;
For mysql, we use a subquery to get all department numbers from table employee into an outer query that searches for rows that are not in the rows from the subquery. SELECT deptno FROM department WHERE deptno NOT IN (SELECT deptno FROM employee);
⚠️ Here’s the catch with NULL:
If the subquery (SELECT deptno FROM employee) returns even one NULL, then the whole NOT IN check returns no rows at all.
Why? Because comparisons like deptno <> NULL are unknown in SQL’s 3-valued logic.
So if employee.deptno has a NULL, the query will return an empty set, even if there are unmatched department numbers. Worth noting in SQL that TRUE OR NULL is TRUE, but FALSE OR NULL is NULL.
To avoid the null issue, we can use the following: SELECT d.deptno FROM dept d WHERE NOT EXISTS ( SELECT 1 FROM employee e WHERE d.deptno - e.deptno );
Conceptually, this is what happens:
The subquery is executed to see whether the department number exists in the EMP table. Note the condition D.DEPTNO = E.DEPTNO, which brings together the department numbers from the two tables.
If the subquery returns results, then EXISTS (…) evaluates to true and NOT EXISTS (…) thus evaluates to FALSE, and the row being considered by the outer query is discarded.
If the subquery returns no results, then NOT EXISTS (…) evaluates to TRUE, and the row being considered by the outer query is returned (because it is for a department not represented in the EMP table).
Consider the issue of getting rows from a table that do not match another table. Here we have to return all rows from one table along with rows from another that may or may not have a match on the common column, and then keep the rows with no match. Here we use the LEFT OUTER JOIN, or LEFT JOIN for short. SELECT d.* FROM department d LEFT OUTER JOIN employee e ON (d.deptno = e.deptno) WHERE e.deptno IS NULL;
Here we keep only the rows that have no match after the outer join. This is often called an anti-join.
Suppose you want to add information to a join from another table without losing any rows. This can be done using an OUTER JOIN. Suppose you want to get the list of employees with location, and add the latest dates when they received a bonus, stored in a different table emp_bonus.
SELECT e.name, d.location, eb.received FROM employee e JOIN department d ON (e.deptno = d.deptno) LEFT OUTER JOIN emp_bonus eb ON (e.empno = eb.empno) ORDER BY 2;
We can also use a scalar subquery to mimic an outer join: SELECT e.name, d.location, (SELECT eb.received FROM emp_bonus eb WHERE eb.empno = e.empno) AS received FROM employee e, department d WHERE e.deptno = d.deptno ORDER BY 2;
Suppose we wanted to figure out whether two tables have the same data (cardinality and values). Say you created a view V, and want to know whether it is equivalent to a table T. For this we can use set difference function EXCEPT for postgres, or a subquery for mysql.
We do this by finding the difference between view V and table employee, and getting UNION ALL with the difference between table EMP and view V:
postgres:
( SELECT empno, id, job, salary, commission, deptno, count() as cnt FROM V GROUP BY empno, id, job, salary, commission, deptno EXCEPT SELECT empno, id, job, salary, commission, deptno, count() as cnt FROM employee GROUP BY empno, id, job, salary, commission, deptno )
UNION ALL
( SELECT empno, id, job, salary, commission, deptno, count() as cnt FROM employee GROUP BY empno, id, job, salary, commission, deptno EXCEPT SELECT empno, id, job, salary, commission, deptno, count() as cnt FROM V GROUP BY empno, id, job, salary, commission, deptno )
This looks complicated, but it comes down to the following: Given a set A and set B, they are the same if ( A - B ) UNION ( B - A ) is NULL.
Since there is no EXCEPT function in mysql, we use a subquery to get A - B and B - A. This is syntactically dense.
mysql:
SELECT * FROM ( SELECT e.empno, e.name, e.job, e.salary, e.commission, e.deptno, count() as cnt FROM employee e GROUP BY empno, name, job, salary, commission, deptno ) AS e WHERE NOT EXISTS ( SELECT NULL FROM ( SELECT v.empno, v.name, v.job, v.salary, v.commission, v.deptno, count() as cnt FROM v GROUP BY empno, name, job, salary, commission, deptno ) AS v WHERE v. empno = e.empno AND v.name = e.name AND v.salary = e.salary AND v.deptno = e.deptno AND v.cnt = e.cnt AND COALESCE(v.commission,0) = COALESCE(e.commission,0)
UNION ALL
SELECT * FROM ( SELECT v.empno, v.name, v.job, v.salary, v.commission, v.deptno, count() as cnt FROM v GROUP BY empno, name, job, salary, commission, deptno ) AS e WHERE NOT EXISTS ( SELECT NULL FROM ( SELECT e.empno, e.name, e.job, e.salary, e.commission, e.deptno, count() as cnt FROM employee e GROUP BY empno, name, job, salary, commission, deptno ) AS e WHERE v. empno = e.empno AND v.name = e.name AND v.salary = e.salary AND v.deptno = e.deptno AND v.cnt = e.cnt AND COALESCE(v.commission,0) = COALESCE(e.commission,0)
Note that we’re using COALESCE because commission can be NULL, and COALESCE(null, 0) is 0.
Coming to aggregations across joins. Joins can cause duplicate entries, and aggregations can thus count values multiple times when applied on joins.
Consider the following Query: SELECT e.id, e.name, e.sal, e.deptno, e.sal * CASE WHEN eb.type = 1 THEN .1 WHEN eb.type = 2 THEN .2 END AS bonus FROM emp e, emp_bonus eb WHERE e.empno = eb.empno;
This is functional. Suppose we want to calculate total salary.
SELECT deptno, SUM(salary) AS total_sal, SUM(bonus) AS total_bonus FROM ( SELECT e.empno, e.name,. e.salary, e.deptno, e.sal * CASE WHEN eb.type =1 THEN .1 WHEN eb.type = 2 THEN .2 ELSE .3 END AS bonus FROM employee e, emp_bonus eb WHERE e.empno = eb.empno) AS temp GROUP BY deptno;
This gives rise to errors in total_sal, as there are duplicate entries created through the join.
This is resolved by using DISTINCT, or by calculating the aggregate in advance using a view. Lets look at the use of DISTINCT first:
SELECT deptno, SUM(DISTINCT salary) AS total_sal, SUM(bonus) AS total_bonus FROM (SELECT e.empno, e.name, e.sal, e.deptno, e.sal*CASE WHEN eb.type =1 THEN .1 WHEN eb.type = 2 THEN .2 ELSE .3 END AS bonus FROM employee e, emp_bonus eb WHERE e.empno = eb.empno) AS temp GROUP BY deptno;
Here, DISTINCT ignores the duplicate entries and thus gets us the accurate amount.