get base urls with different "endings" after the last slash

All we need is an easy explanation of the problem, so here it is.

Okay, so have a wonderful script that checks the “base” url up to a certain number of slashes, but realized what I really need is checking the “ending” going backwards to the first non-escaped URL slash, and providing the prefixes are identical, display them.

I.e., let’s say this is my data:

CREATE TABLE `testtable` (
  `field1` int(11) NOT NULL,
  `field2` datetime NOT NULL,
  `urlTest` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `testtable` (`field1`, `field2`, `urlTest`) VALUES
(1, '2010-01-01', 'http://test1.com/identicalprefix/anotherprefix/somethingelse.php?id=1'),
(2, '2010-01-01', 'http://test1.com/identicalprefix/anotherprefix/'),
(3, '2010-01-01', 'http://test1.com/identicalprefix/anotherprefix/randomdata'),
(4, '2010-01-01', 'http://test1.com/identicalprefix/anotherprefix/randomdata/dada'),
(5, '2010-01-01', 'http://test1.com/singleprefix/randomdata'),
(6, '2012-02-02', 'http://test1.com/newscript/something'),
(7, '2013-02-02', 'http://test2.com/newscript/something'),
(8, '2014-02-02', 'http://test3.com/newscript/something'),
(9, '2014-02-02', 'http://test3.com/');

ALTER TABLE `testtable`
  ADD PRIMARY KEY (`field1`);

so – the script should find identical “base” urls with different endings, based on the last “slash” in the URL.

So in this case,

http://test1.com/identicalprefix/anotherprefix/somethingelse.php?id=1
http://test1.com/identicalprefix/anotherprefix/
http://test1.com/identicalprefix/anotherprefix/randomdata

would both be listed, because they have the same “base” URL (http://test1.com/identicalprefix/anotherprefix/), just with different suffixes, and same dates.

and obviously, “test3.com” would not display, because even though it shares a common base url, and common date – it is not the “same” entire base URL? (ie., “http://test3.com/” versus “http://test3.com/newscript/

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

SELECT DISTINCT t1.*
FROM testtable t1, testtable t2
WHERE t1.field2 = t2.field2
  AND t1.urlTest != t2.urlTest
  AND SUBSTRING(REVERSE(t1.urlTest) FROM LOCATE('/', REVERSE(t1.urlTest)))
     =SUBSTRING(REVERSE(t2.urlTest) FROM LOCATE('/', REVERSE(t2.urlTest)))

fiddle

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply