# Why does FROM multiple tables default to cartesian product?

## All we need is an easy explanation of the problem, so here it is.

What was the idea behind doing a cartesian product (cross join) when i did a query like –

``````SELECT * FROM agents, orders
``````

I would think that they would concatenate (like pandas). It feels more natural to add tables instead of multiplying them.

Just curious and did not find on the internet the rationale behind defaulting to cartesian product. I am assuming that syntax `FROM table1, table2` according to SQL would probably be more correct with cross join but why?

## How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

### Method 1

The syntax of comma-style joins (SQL-89) versus using the `JOIN` keyword (SQL-92) is not the point. You should use the more modern syntax, but it doesn’t address the question of why the default would be a Cartesian product if you do not specify a condition.

The answer to that is that it makes relational algebra work.

A Cartesian product is the set of all combinations of elements of two sets. Each element of the first set is paired with every element of the second set.

A relation is a subset of a Cartesian product. It’s a Cartesian product, plus a condition which tests if any given pair of elements belongs to the relation.

But the default condition is a fixed `true` value, so every pairing passes the test. So the default relation ends up being the Cartesian product.

The alternative, I suppose would have been to make the default condition a fixed `false` value, so the default relation would be an empty relation.

That would have made some scenarios easier, for example if you run a `DELETE` but accidentally forget the `WHERE` clause, it would save you from deleting your whole table.

But then we would be getting questions on Stack Overflow from different people who ask why the default join is an empty set, because that makes their heart skip a beat when they forget their `WHERE` clause for a `SELECT` query and it appears that their database is empty.

### Method 2

I’m not familiar with how concatenate works in Pandas, but I think `CROSS JOIN` is the only clause that makes sense with nothing else specified in this case.

You certainly couldn’t vertically concatenate (`UNION` in SQL) the tables because they could vary in columns (by number of columns and their data types) and how would one horizontally concatenate them, i.e. on what conditions could you align the rows from each table to relate them together as a single row when nothing is specified? I think the simplest answer in the context of relational logic is `CROSS JOIN`.

Furthermore concatenate in Pandas seems to be meant to operate on a different type of objects than a relational database. While yes technically those objects can be considered a set of values, the criteria that describes them and the constraints those values live by are not the same as a relational table of records with columns that observe different data types, and are potentially bound by database constraints, etc.

To achieve a similar looking outcome in SQL, of what the single concatenate operation in Pandas does, you would need to apply a series of operations potentially including `PIVOT`, `UNION`, and `CAST`, for example.

### Method 3

Answer left as a comment by Lennart:

SQL was an attempt to implement the relational model defined in "A Relational Model of Data for Large Shared Data Banks" (pdf).

I don’t think there is a good answer to why the symbol "," was chosen to denote one of the operations and cartesian product in particular.

### Method 4

You should NOT use the `,` (comma) as a proxy for a `CROSS JOIN` clause to join your tables. It provides `CROSS JOIN` functionality, but at a cost – that of readability, clarity and explicitness – this latter word meaning the quality of being clear and exact (wasn’t sure it existed)!

As for your question about "`why the comma`" (in the past, but not any more – or, at least, it has fallen out of favour), see the discussion below!

Software spends way more time in maintenance mode than in development more, so it’s important that your software be easily read and maintained!

A `CROSS JOIN` can be logically thought of as this (nice image from here):

Another (again helpful) way of looking at this is (same link):

So, obviously `CROSS JOIN`s have useful, every-day, practical applications!

Consider the following three queries (see the fiddle here):

``````SELECT * FROM meal, drink;
``````

Result (same for all queries):

#### Query 1:

``````mname   dname
Omlette Coffee
Omlette Tea
Omlette Orange Juice
Fried Egg   Coffee
Fried Egg   Tea
Fried Egg   Orange Juice
Sausage Coffee
Sausage Tea
Sausage Orange Juice
9 rows
``````

So, for a dev looking at that query, they might say: "How many fields are there?", "What do these fields do?" or "What is the performance impact of this query?". We are precisely 0% of the way to being explicit!

Now, this:

#### Query 2:

``````SELECT *
FROM meal
CROSS JOIN drink;
``````

Slightly clearer – at least we can see from 10,000m that this is a `CROSS JOIN` because the term is there in black and white! So, we’re ~ 33% of the way to being explicit!

Finally, consider this one (best):

#### Query 3:

``````SELECT m.mname, d.dname
FROM meal m
CROSS JOIN drink d;
``````

So, now, we’re as explicit as we can be. There’s an alternative to this one and that is (sligthly reduces clarity – but is clearer than comma!):

Finally,

#### Query 4 (also ran…):

``````SELECT m.mname, d.dname
FROM meal m
JOIN drink d ON TRUE;  -- or 1 = 1;
``````

I would urge you to take a look at this excellent article (Explicit Coding Discipline – the BIG ONE is no. 3):

1. Why explicit matters
1. Explicit naming
1. Avoid tricks in favor of explicit code
1. Conclusion (quote below):

The author wraps it up with:

The explicit coding discipline favors clear and explicit expression of
intent in the code.

It suggests writing meaningful names for variables, functions, classes
and other constructions. It suggests avoiding tricky solutions in
favor of readings and clear intent.

#### Why the comma?

As for the comma – it’s just a syntax, notational thing – probably, if one were designing SQL now, it wouldn’t be the first choice (or any choice) as a way to `CROSS JOIN` tables. The term `CONCAT JOIN` (à la Python syntax…) might be a potential candidate (for an ab initio design) – but it just sounds weird now that we (SQL devs/DBAs) are used to `CROSS JOIN`.

Also, the separator for various fields in a query is a comma:

``````SELECT t1.f1, t2.f2
FROM t1
JOIN t1
ON t1.k = t2.fk
``````

So, it possibly (before my time…) appeared logical to use it as a table separator also.

The most important thing is clarity of expression, and don’t forget that SQL is considerably (~ 20 years) older than Python and that many languages have legacies from their past which might be better off being expunged – but just consider the hiatus going from Python 2 to Python 3 – that has (and continues to) caused major difficulties (for the record, I admire Guido van Rossum for his courage in making that change – broken code…).

The problem for SQL is that it is designed by committee and, at least from my experience, "`courage`" is not a trait I would associate with committees! 🙂 The comma has fallen out of favour and is discouraged by many serious practitioners but it has not (unfortunately) been eliminated with arrival of ANSI `JOIN`s which are clearer.

p.s. welcome to the forum!

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0