I have read many articles on the internet where people suggest that using SELECT * in SQL query is a bad practice and you should always avoid it, but they never care to explain why.
Some of them will say you should always use an explicit list of columns in your SQL query, which is a good suggestion and one of the SQL best practices I teach to junior programmers. However, many of them don't explain the reason behind it.
Unless you explain why one should not use SELECT * in queries, it's difficult to convince SQL developers — many of whom have started learning SQL by using SELECT * from EMP in the Oracle database.
In this article, I will try to bridge that gap by giving some practical reasons why using SELECT * in an SQL query is not a good idea.
1. Unnecessary IO
By using SELECT *, you can return unnecessary data that will just be ignored. But fetching that data is not free of cost. This results in some wasteful IO cycles on the DB end since you will be reading all of that data off the pages. Perhaps you could have read the data from index pages. This can make your query a little bit slow as well.
2. Increased Network Traffic
SELECT * returns more data than required to the client, which in turn will use more network bandwidth. This increase in network bandwidth also means that data will take a longer time to reach the client application, which could be SSMS or your Java application server.
3. More Application Memory
Due to this increase in data, your application may require more memory just to hold unnecessary data that will not be used but is coming from a Microsoft SQL server or any other database you are connecting to.
4. Dependency on Order of Columns on ResultSet
When you use the SELECT * query in your application and have any dependency on the order of the columns — which you should not — the ordering of the result set will change if you add a new column or change the order of the columns.
5. Breaking Views While Adding New Columns to a Table
When you use SELECT * in views, you create subtle bugs if a new column has been added and the old one is removed from the table. Why? Because your view will not break but start returning an incorrect result.
To avoid that, you should always use WITHSCHEMABINDING it with views. This will also prevent you from using SELECT * in views.
6. Conflict in JOIN Query
When you use SELECT * in the JOIN query, you can introduce complications when multiple tables have columns with the same name (e.g. status, active, name, etc.).
For a simple query, this might be fine. But when you try to order by one of these columns or use the query in a CTE or derived table, you will need to make some adjustments.
7. Copying Data From One Table to Another
When you use SELECT * into an INSERT .. SELECT statement, which is a common way to copy data from one table to another, you could potentially copy incorrect data into the incorrect column if the order of the columns is not the same between two tables.
Some programmers think that using SELECT * vs. SELECT 1 in your EXISTS code is faster because the query parser has to do extra work to validate the static value.
That might have been true long ago, but nowadays the parser has become smart enough to know that the SELECT list is completely irrelevant within an EXISTS clause.
Conclusion
That is why you should not use SELECT * in an SQL query anymore. It's always better to use the explicit column list in a SELECT query than a * wild card. It not only improves performance but also makes your code more explicit.
It also helps you to create maintainable code that will not break when you add new columns to your table — especially if you have views that refer to the original table.
Some of them will say you should always use an explicit list of columns in your SQL query, which is a good suggestion and one of the SQL best practices I teach to junior programmers. However, many of them don't explain the reason behind it.
Unless you explain why one should not use SELECT * in queries, it's difficult to convince SQL developers — many of whom have started learning SQL by using SELECT * from EMP in the Oracle database.
In this article, I will try to bridge that gap by giving some practical reasons why using SELECT * in an SQL query is not a good idea.
1. Unnecessary IO
By using SELECT *, you can return unnecessary data that will just be ignored. But fetching that data is not free of cost. This results in some wasteful IO cycles on the DB end since you will be reading all of that data off the pages. Perhaps you could have read the data from index pages. This can make your query a little bit slow as well.
2. Increased Network Traffic
SELECT * returns more data than required to the client, which in turn will use more network bandwidth. This increase in network bandwidth also means that data will take a longer time to reach the client application, which could be SSMS or your Java application server.
3. More Application Memory
Due to this increase in data, your application may require more memory just to hold unnecessary data that will not be used but is coming from a Microsoft SQL server or any other database you are connecting to.
4. Dependency on Order of Columns on ResultSet
When you use the SELECT * query in your application and have any dependency on the order of the columns — which you should not — the ordering of the result set will change if you add a new column or change the order of the columns.
5. Breaking Views While Adding New Columns to a Table
When you use SELECT * in views, you create subtle bugs if a new column has been added and the old one is removed from the table. Why? Because your view will not break but start returning an incorrect result.
To avoid that, you should always use WITHSCHEMABINDING it with views. This will also prevent you from using SELECT * in views.
6. Conflict in JOIN Query
When you use SELECT * in the JOIN query, you can introduce complications when multiple tables have columns with the same name (e.g. status, active, name, etc.).
For a simple query, this might be fine. But when you try to order by one of these columns or use the query in a CTE or derived table, you will need to make some adjustments.
7. Copying Data From One Table to Another
When you use SELECT * into an INSERT .. SELECT statement, which is a common way to copy data from one table to another, you could potentially copy incorrect data into the incorrect column if the order of the columns is not the same between two tables.
Some programmers think that using SELECT * vs. SELECT 1 in your EXISTS code is faster because the query parser has to do extra work to validate the static value.
That might have been true long ago, but nowadays the parser has become smart enough to know that the SELECT list is completely irrelevant within an EXISTS clause.
Conclusion
That is why you should not use SELECT * in an SQL query anymore. It's always better to use the explicit column list in a SELECT query than a * wild card. It not only improves performance but also makes your code more explicit.
It also helps you to create maintainable code that will not break when you add new columns to your table — especially if you have views that refer to the original table.
No comments:
Post a Comment