How do I SELECT elements using WHERE with another table's record values in WHERE statement?

All we need is an easy explanation of the problem, so here it is.

Problem summary

Today part of my app crashed, but nothing else stopped with it. I need to find the first and last tracked message ids for each channel during the downtime. I log unix timestamps alongside each message record, so I will be using those to determine which messages were not tracked during this period.

Let’s say the downtime started at timestamp x and ended at timestamp y. I want to return the last tracked message in channel z before timestamp x, and the first tracked message after timestamp y. I hope that makes sense.

Here are my tables

mysql> select * from messages limit 5;
+---------+---------+-----------+
| message | channel | timestamp |
+---------+---------+-----------+
| 123     | 456     | 165789921 |
| 124     | 457     | 165789921 |
| 125     | 458     | 165789922 |
| 126     | 459     | 165789922 |
| 127     | 460     | 165789924 |
+---------+---------+-----------+

mysql> explain select * from messages;
+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | SIMPLE      | messages | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 211250 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------+

CREATE TABLE messages (
    message VARCHAR(18) NOT NULL UNIQUE,
    `channel` VARCHAR(18) NOT NULL,
    `timestamp` TINYTEXT NOT NULL,
    ...,
    PRIMARY KEY (message),
    FOREIGN KEY (`channel`) REFERENCES channels (`channel`)
);
mysql> select * from channels limit 5;
+---------+-----+
| channel | ... |
+---------+-----+
| 456     | ... |
| 457     | ... |
| 458     | ... |
| 459     | ... |
| 460     | ... |
+---------+-----+

mysql> explain select * from channels;
+----+-------------+----------+------------+-------+---------------+-------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+-------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | channels | NULL       | index | NULL          | guild | 74      | NULL | 1434 |   100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+-------+---------+------+------+----------+-------------+

CREATE TABLE channels (
    `channel` VARCHAR(18) NOT NULL UNIQUE,
    ...,
    PRIMARY KEY(`channel`),
    FOREIGN KEY (guild) REFERENCES guilds (guild)
);

What I’ve tried

I’ve tried using a LEFT JOIN with WHERE clauses, but I’m no MySQL expert to say the least. There might be a way to do it this way, I just can’t with my current knowledge. The issue I was facing with this approach is, I’m not sure how to access the original channel id for matching in the SELECT query.

DOES NOT WORK – Obviously, why else would I be here…

SET @min := 1657956885428;
SET @max := 1657966145675;
SELECT * FROM (SELECT channel FROM channels) as channels
LEFT JOIN (SELECT channel, message as downtime_start_after FROM messages WHERE timestamp < @min ORDER BY timestamp ASC LIMIT 1) AS after USING (`channel`)
LEFT JOIN (SELECT channel, message as downtime_end_before FROM messages WHERE timestamp > @max ORDER BY timestamp DESC LIMIT 1) AS before USING (`channel`);

+---------+----------+-----------+
| channel | ...after | ...before |
+---------+----------+-----------+
| 456     | NULL     | NULL      |
| 457     | 124      | NULL      |
| 458     | NULL     | 130       |
| 459     | NULL     | NULL      |
| 460     | NULL     | NULL      |
+---------+----------+-----------+

Ideal result

Both parameters after and before should be the message IDs which were tracked last and first respectively. I intend to use these to determine whether a message in a channel may have been within this period.

+---------+----------+-----------+
| channel | ...after | ...before |
+---------+----------+-----------+
| 456     | 123      | 128       |
| 457     | 124      | 129       |
| 458     | 125      | 130       |
| 459     | 126      | 131       |
| 460     | 127      | 132       |
+---------+----------+-----------+

of course, the data here is just filler

Thanks in advance. Lmk if I should include any additional information!

EDIT:

I spent some more time working on a query, and managed to find something which works. I’d be more than glad to hear any advice people have for optimisations.

SET @down := 1657956885428;
SELECT * FROM (SELECT channel FROM channels) as channels
LEFT JOIN (SELECT * FROM (SELECT channel, message as `after` FROM messages WHERE `timestamp` < @down ORDER BY `timestamp` DESC ) as after_messages GROUP BY (`channel`)) as `after` USING (channel)
LEFT JOIN (SELECT * FROM (SELECT channel, message as `before` FROM messages WHERE `timestamp` > @down ORDER BY `timestamp` ASC ) as before_messages GROUP BY (`channel`)) as `before` USING (channel);

+---------+-------+--------+
| channel | after | before |
+---------+-------+--------+
| 456     | 123   | 128    |
| 457     | 124   | 129    |
| 458     | 125   | 130    |
| 459     | 126   | 131    |
| 460     | 127   | 132    |
+---------+-------+--------+

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

the downtime started at timestamp x and ended at timestamp y. I want to return the last tracked message in channel z before timestamp x

Does this achieve that?

SELECT ...
    FROM t
    WHERE channel = z
      AND timestamp <= x
    ORDER BY timestamp DESC
    LIMIT 1

Change three things to get the first item after z, then put them together thus:

( SELECT ... x ... )
UNION ALL
( SELECT ... y ... )
ORDER BY timestamp

Timestamp looks like a timestamp, why use TINYTEXT? Consider BIGINT`. And add these indexes:

INDEX(timestamp)
INDEX(channel, timestamp)

will give you two rows.

More

A rewrite of the query you presented; this should run a lot faster:

[code]
SELECT  channel, 
    (
        SELECT  message
            FROM  messages
            WHERE  channel = c.channel
              AND  timestamp < @down
            ORDER BY  timestamp DESC
            LIMIT  1 
    ) AS 'after', 
    (
        SELECT  message
            FROM  messages
            WHERE  channel = c.channel
              AND  timestamp > @down
            ORDER BY  timestamp ASC
            LIMIT  1 
    ) AS 'before',
    FROM  channels AS c
    ORDER BY channel  -- you may want this
    ;

And, again, messages needs INDEX(channel, timestamp).

If there is a message at exactly @down, it will be missed. Consider changing > to >= or < to <= (but not both).

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply