PlanetScale's "You don't* need JOINs" video"
The video claims that using a JOIN & DISTINCT will cause mysql to fetch extra rows & then discard them, whereas using a WHERE EXISTS (subquery)
will not do this & performance will be improved. He did not provide EXPLAIN query plans, so I'd like to test his claims.
The video is at: https://www.youtube.com/watch?v=5hsl47I3svw
He does perform benchmarks & shows exists is faster than IN & that JOIN is about 2 or 2.5 times slower than both of the subquery solutions.
He did not share his schema, so idk about foreign keys, and he did not share his database size.
I tried online tool dbfill.info (or filldb.info ?), but it didn't work. I used library at https://github.com/kedarvj/mysql-random-data-generator/tree/master to generate 10,000 users & 100,000 posts.
SCHEMA:
- posts
- id
- user_id
- title
- body
- views (populated, int, many over 90k)
- created_at (populated)
- updated_at
- users
- id
- name
- created_at (all null)
- updated_at (all null)
- flags (all zeros, presumably binary or int)
QUERIES:
First Successful query:
SELECT DISTINCT users.* FROM users
LEFT JOIN posts on users.id = posts.user_id
WHERE views > 90000
Second successful query:
SELECT * FROM users where id in (
select user_id from posts where views > 90000
)
Final successful query:
SELECT * FROM users where exists (
select user_id from posts where views > 90000 and users.id = user_id
)
CREATE TABLE
DROP TABLE IF EXISTS posts;
DROP TABLE IF EXISTS users;
CREATE TABLE posts (
id int PRIMARY KEY,
user_id int,
title varchar (256),
body TEXT,
views int,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE users (
id int PRIMARY KEY,
name VARCHAR(256),
email VARCHAR(256),
flags int,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
## EXPLAINS withOUT a foreign key constraint
With about 20,000 post rows & 10,000 user rows:
For `SELECT DISTINCT users.* FROM users LEFT JOIN posts on users.id = posts.user_id WHERE views > 90000;`
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | 1 | SIMPLE | posts | NULL | ALL | NULL | NULL | NULL | NULL | 18272 | 33.33 | Using where; Using temporary | | 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test_planetscale.posts.user_id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+
For `SELECT * FROM users where id in ( select user_id from posts where views > 90000 );`
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+
| 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9671 | 100.00 | NULL |
| 1 | SIMPLE |
For `SELECT * FROM users where exists ( select user_id from posts where views > 90000 and users.id = user_id );`
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+
| 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9671 | 100.00 | NULL |
| 1 | SIMPLE |
## EXPLAINs WITH foreign key constraint
Added constarint via `alter table posts add foreign key (user_id) references users(id);`
For `SELECT DISTINCT users.*` ...
`possible_keys` for `posts` is now `user_id` instead of NULL.
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | 1 | SIMPLE | posts | NULL | ALL | user_id | NULL | NULL | NULL | 18260 | 33.33 | Using where; Using temporary | | 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test_planetscale.posts.user_id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+
For `SELECT * FROM users where id in ( select user_id from posts where views > 90000 );`
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9375 | 100.00 | NULL | | 1 | SIMPLE | posts | NULL | ref | user_id | user_id | 5 | test_planetscale.users.id | 1 | 33.33 | Using where; FirstMatch(users) | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+
For `SELECT * FROM users where exists ( select user_id from posts where views > 90000 and users.id = user_id );`
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9375 | 100.00 | NULL | | 1 | SIMPLE | posts | NULL | ref | user_id | user_id | 5 | test_planetscale.users.id | 1 | 33.33 | Using where; FirstMatch(users) | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+