For a long time now, MySQL has contained a notorious nonstandard extension to GROUP BY
, which allows oddball behavior in the name of efficiency. This extension has allowed countless developers around the world to use GROUP BY
in production code without completely understanding what they were doing.
In particular, it's a bad idea to use SELECT *
in a GROUP BY
query, because a standard GROUP BY
clause requires enumerating the columns. Many developers have, unfortunately, done that.
Read this. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
The MySQL team has been trying to fix this misfeature without messing up production code. They added a sql_mode
flag in 5.7.5 named ONLY_FULL_GROUP_BY
to compel standard behavior. In a recent release, they turned on that flag by default. When you upgraded your local MySQL to 5.7.14, the flag got switched on and your production code, dependent on the old extension, stopped working.
If you've recently started getting 1055 errors, what are your choices?
sql_mode
to get rid of the newly set ONLY_FULL_GROUP_BY
mode.You can change the mode by doing a SET
command.
SET sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
should do the trick if you do it right after your application connects to MySQL.
Or, you can find the init file in your MySQL installation, locate the sql_mode=
line, and change it to omit ONLY_FULL_GROUP_BY
, and restart your server.
SELECT item.item_id, item.name, /* not SQL-92 */
COUNT(*) number_of_uses
FROM item
JOIN uses ON item.item_id, uses.item_id
GROUP BY item.item_id
will show the rows in a table called item
, and show the count of related rows in a table called uses
. This works well, but unfortunately it's not standard SQL-92.
Why not? because the SELECT
clause (and the ORDER BY
clause) in GROUP BY
queries must contain columns that are
GROUP BY
clause, orCOUNT()
, MIN()
, and the like.This example's SELECT
clause mentions item.name
, a column that does not meet either of those criteria. MySQL 5.6 and earlier will reject this query if the SQL mode contains ONLY_FULL_GROUP_BY
.
This example query can be made to comply with the SQL-92 standard by changing the GROUP BY
clause, like this.
SELECT item.item_id, item.name,
COUNT(*) number_of_uses
FROM item
JOIN uses ON item.item_id, uses.item_id
GROUP BY item.item_id, item.name
The later SQL-99 standard allows a SELECT
statement to omit unaggregated columns from the group key if the DBMS can prove a functional dependence between them and the group key columns. Because item.name
is functionally dependent on item.item_id
, the initial example is valid SQL-99. MySQL gained a functional dependence prover in version 5.7. The original example works under ONLY_FULL_GROUP_BY
.
SELECT item.item_id, uses.category, /* nonstandard */
COUNT(*) number_of_uses
FROM item
JOIN uses ON item.item_id, uses.item_id
GROUP BY item.item_id
will show the rows in a table called item, and show the count of related rows in a table called uses. It will also show the value of a column called uses.category
.
This query works in MySQL (before the ONLY_FULL_GROUP_BY
flag appeared). It uses MySQL's nonstandard extension to GROUP BY
.
But the query has a problem: if several rows in the uses
table match the ON
condition in the JOIN
clause, MySQL returns the category
column from just one of those rows. Which row? The writer of the query, and the user of the application, doesn't get to know that in advance. Formally speaking, it's unpredictable: MySQL can return any value it wants.
Unpredictable is like random, with one significant difference. One might expect a random choice to change from time to time. Therefore, if a choice were random, you might detect it during debugging or testing. The unpredictable result is worse: MySQL returns the same result each time you use the query, until it doesn't. Sometimes it's a new version of the MySQL server that causes a different result. Sometimes it's a growing table causing the problem. What can go wrong, will go wrong, and when you don't expect it. That's called Murphy's Law.
The MySQL team has been working to make it harder for developers to make this mistake. Newer versions of MySQL in the 5.7 sequence have a sql_mode
flag called ONLY_FULL_GROUP_BY
. When that flag is set, the MySQL server returns the 1055 error and refuses to run this kind of query.
Sometimes a query looks like this, with a *
in the SELECT
clause.
SELECT item.*, /* nonstandard */
COUNT(*) number_of_uses
FROM item
JOIN uses ON item.item_id, uses.item_id
GROUP BY item.item_id
Such a query needs to be refactored to comply with the ONLY_FULL_GROUP_BY
standard.
To do this, we need a subquery that uses GROUP BY
correctly to return the number_of_uses
value for each item_id
. This subquery is short and sweet, because it only needs to look at the uses
table.
SELECT item_id, COUNT(*) number_of_uses
FROM uses
GROUP BY item_id
Then, we can join that subquery with the item
table.
SELECT item.*, usecount.number_of_uses
FROM item
JOIN (
SELECT item_id, COUNT(*) number_of_uses
FROM uses
GROUP BY item_id
) usecount ON item.item_id = usecount.item_id
This allows the GROUP BY
clause to be simple and correct, and also allows us to use the *
specifier.
Note: nevertheless, wise developers avoid using the *
specifier in any case. It's usually better to list the columns you want in a query.
SELECT item.item_id, ANY_VALUE(uses.tag) tag,
COUNT(*) number_of_uses
FROM item
JOIN uses ON item.item_id, uses.item_id
GROUP BY item.item_id
shows the rows in a table called item
, the count of related rows, and one of the values in the related table called uses
.
You can think of this ANY_VALUE()
function as a strange a kind of aggregate function. Instead of returning a count, sum, or maximum, it instructs the MySQL server to choose, arbitrarily, one value from the group in question. It's a way of working around Error 1055.
Be careful when using ANY_VALUE()
in queries in production applications.
It really should be called SURPRISE_ME()
. It returns the value of some row in the GROUP BY group. Which row it returns is indeterminate. That means it's entirely up to the MySQL server. Formally, it returns an unpredictable value.
The server doesn't choose a random value, it's worse than that. It returns the same value every time you run the query, until it doesn't. It can change, or not, when a table grows or shrinks, or when the server has more or less RAM, or when the server version changes, or when Mars is in retrograde (whatever that means), or for no reason at all.
You have been warned.