PlanetScale's "You don't* need JOINs" video"

The video claims that using a JOIN & DISTINCT will cause mysql to fetch extra rows & then discard them, whereas using a WHERE EXISTS (subquery) will not do this & performance will be improved. He did not provide EXPLAIN query plans, so I'd like to test his claims.

The video is at: https://www.youtube.com/watch?v=5hsl47I3svw

He does perform benchmarks & shows exists is faster than IN & that JOIN is about 2 or 2.5 times slower than both of the subquery solutions.

He did not share his schema, so idk about foreign keys, and he did not share his database size.

I tried online tool dbfill.info (or filldb.info ?), but it didn't work. I used library at https://github.com/kedarvj/mysql-random-data-generator/tree/master to generate 10,000 users & 100,000 posts.

SCHEMA:

  • posts
    • id
    • user_id
    • title
    • body
    • views (populated, int, many over 90k)
    • created_at (populated)
    • updated_at
  • users
    • id
    • name
    • email
    • created_at (all null)
    • updated_at (all null)
    • flags (all zeros, presumably binary or int)

QUERIES:

First Successful query:

SELECT DISTINCT users.* FROM users
    LEFT JOIN posts on users.id = posts.user_id
    WHERE views > 90000

Second successful query:

SELECT * FROM users where id in (
    select user_id from posts where views > 90000
)

Final successful query:

SELECT * FROM users where exists (
    select user_id from posts where views > 90000 and users.id = user_id
)

CREATE TABLE

DROP TABLE IF EXISTS posts;
DROP TABLE IF EXISTS users;

CREATE TABLE posts (
    id int PRIMARY KEY,
    user_id int,
    title varchar (256),
    body TEXT,
    views int,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

);

CREATE TABLE users (
    id int PRIMARY KEY,
    name VARCHAR(256),
    email VARCHAR(256),
    flags int,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

);


## EXPLAINS withOUT a foreign key constraint
With about 20,000 post rows & 10,000 user rows:
For `SELECT DISTINCT users.* FROM users     LEFT JOIN posts on users.id = posts.user_id     WHERE views > 90000;`

+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | 1 | SIMPLE | posts | NULL | ALL | NULL | NULL | NULL | NULL | 18272 | 33.33 | Using where; Using temporary | | 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test_planetscale.posts.user_id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+



For `SELECT * FROM users where id in (     select user_id from posts where views > 90000 );`

+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+ | 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9671 | 100.00 | NULL | | 1 | SIMPLE | | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5 | test_planetscale.users.id | 1 | 100.00 | NULL | | 2 | MATERIALIZED | posts | NULL | ALL | NULL | NULL | NULL | NULL | 18272 | 33.33 | Using where | +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+


For `SELECT * FROM users where exists (     select user_id from posts where views > 90000 and users.id = user_id );`

+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+ | 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9671 | 100.00 | NULL | | 1 | SIMPLE | | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5 | test_planetscale.users.id | 1 | 100.00 | NULL | | 2 | MATERIALIZED | posts | NULL | ALL | NULL | NULL | NULL | NULL | 18272 | 33.33 | Using where | +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------------+-------+----------+-------------+


## EXPLAINs WITH foreign key constraint
Added constarint via `alter table posts add foreign key (user_id) references users(id);`

For `SELECT DISTINCT users.*` ...
`possible_keys` for `posts` is now `user_id` instead of NULL.

+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+ | 1 | SIMPLE | posts | NULL | ALL | user_id | NULL | NULL | NULL | 18260 | 33.33 | Using where; Using temporary | | 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test_planetscale.posts.user_id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------------+-------+----------+------------------------------+


For `SELECT * FROM users where id in (     select user_id from posts where views > 90000 );`

+----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9375 | 100.00 | NULL | | 1 | SIMPLE | posts | NULL | ref | user_id | user_id | 5 | test_planetscale.users.id | 1 | 33.33 | Using where; FirstMatch(users) | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+


For `SELECT * FROM users where exists (     select user_id from posts where views > 90000 and users.id = user_id );`

+----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+ | 1 | SIMPLE | users | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9375 | 100.00 | NULL | | 1 | SIMPLE | posts | NULL | ref | user_id | user_id | 5 | test_planetscale.users.id | 1 | 33.33 | Using where; FirstMatch(users) | +----+-------------+-------+------------+------+---------------+---------+---------+---------------------------+------+----------+--------------------------------+