All we need is an easy explanation of the problem, so here it is.
This question is very much near to what I wanted to ask, but both question and answers are focusing more on selecting random rows.
Since there is general rule that, "SELECT COLUMN_LIST" is always recommended over "SELECT *", I wanted to know whether the recommendation changes with the below scenario.
From a table having about 5 columns, if I need information of those 5 columns, but at different steps,
Eg: Like in a Java function, at Step 1, first 2 columns will be used and at Step 4, column 3 and column 4 will be used and at Step 10, 5th column will be used.
There are 2 ways to get these information,
- Make 1 DB call, with "SELECT *" and extract required information at respective steps.
- Make multiple DB calls, with "SELECT COLUMN_LIST" at Step 1, Step4 and Step 10, fetching only required column’s data, in each call.
Which of the 2 ways is recommended for the above scenario?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
There’s a third option you forgot to mention, make one DB call, with
SELECT COLUMN_LIST. The reason I mention this is because the comparison of when to use
SELECT * vs
SELECT COLUMN_LIST isn’t really about how many database calls one makes, rather it’s about schema consistency.
SELECT * is recommended against is because the schema of the dataset you’re selecting from is liable to change over time, which could result in unexpected outcomes and errors, especially on your application as the consumer which is always expecting
Column1 to be
Column3 to be the third column, etc.
And even if no future schema changes ever break the application, you still can run into performance issues as well with using
SELECT *, for two reasons. One being that there may have been additional columns added to the end of the dataset which you’re now needlessly bringing back extra data for. The second reason is you’ll potentially cause a less than optimal query plan to be generated (e.g. an index that normally could be seeked on may not be applicable now).
On top of all of that, as mentioned earlier in the comments, it’s also cleaner code to specify the column list as opposed to using
SELECT *. This is because it explicitly communicates which fields from the database are being consumed and helps establish the intent of the code, especially for developers who may not have access to the database itself.
There are a multitude of other reasons as well, but those are the few important ones that come to mind. Conversely, there are a few edge cases where it is OK to use
SELECT * as opposed to an explicit column list, and you can find some of those examples in a similar question I asked here.
As you need all columns, make a SELECt Col1,col3….. so that is clear that you need all.
In future times, when you view your queries, and you need to figure pit what you have done 5 years ago, it helps to know which columns you select because structures and requirements change, and you must add new columns to the table, you would need more and more unnecessary resources.
To the point of getting them bit by bit or all at once, that depends on the hartdware you are using even a Rasberry pi has enough resources to gather and hold all data, but when devices have one a small number of resources mayby not all the information can be stored, and so a step by step approach is necessary.
The same goes for the database, when it it under stress and can’t handle all data at once, you also must choose the step by step aporach
In short the os no best way and the circumstances dictate the aproach
If your interface allows you to get the results in a hash (associative array), then
SELECT * should be just fine when you need all the columns.
The other answers point out why
SELECT * is likely to bite you in the future. I agree with them.
The Optimizer turns
SELECT * into
SELECT col1, ... before performing the query, so there is essentially no performance difference — if you need all the columns.
If, by specifying the columns explicitly, you can avoid fetching some big
BLOB columns, there could be a noticeable performance benefit. This difference comes from extra disk hits, slowdown in temp tables, and/or network bandwidth.
SELECTs will be about twice as slow as having one. But the extra query might take only a millisecond (in simple cases).
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂