This topic was inspired by our last community meetup, in which we talked about some formulas in Google Sheets and some people want to know more about QUERY, which was branded “The most powerful function in Google Sheets” by famous GSheets teacher Ben Collins.
So, what is QUERY, and why is it so powerful. Here’s a writeup on it, which hopefully would help many of you here in this community. Trust me, it would change so much how many of you “clean” your data in the future from raw data, in a very easily achievable way.
1/Introduction
QUERY was introduced alongside the Google Visualization API Query Language, and it was a lifesaver for many people, but more so for those who are familiar with SQL (Structured Query Language).
If you are not familiar with SQL, no problem, you can grab most of the basics thing about its structure in this W3School link, which includes many examples for you to work with.
So, what does the general syntax look like?
QUERY(data, query, [headers])
- Data: The range of data that you want to perform the query function on.
- Query: The query to perform.
- Headers (Optional): The number of header rows at the top of the data. You can just choose the appropriate data range to not have headers, or include them in and provide the right number here. Official Google documentation says if this value “is omitted or set to -1 , the value is guessed based on the content of data”. Specifically they want to mention the cases where we have aggregation functions like SUM, AVG, …
2/Select all
So what does it do exactly? In short, it helps you combine many functions into one, and I will provide some specific cases for this. Starting from the easiest one - Select all.
This is the easiest of them all, in case you want to move the data from a sheet to another one in the same file (you can also use FILTER, ARRAYFORMULA or a simple array setup, whatever makes you feel comfortable for this case).
=QUERY(Sheet1!A1:A,"SELECT *",1)
The formula above select all values from Sheet1’s column A and put them in wherever you have this formula. Remember to put double quotation marks before and after your query sentence, or else it won’t work. I also told the Sheets that there’s a header row at the start by putting the argument 1 at the end.
3/Select specific columns
Same as the above, but you can choose specific columns, and in the order you want it to be, with just a very short query. How cool is that?
=QUERY(Sheet1!A1:D,"SELECT D, A, B, C",1)
The D, A, B, C part represents the name of the column you have in the data range you specified. You can also have the same outcome for SELECT Col4, Col1, Col2, Col3. The 4, 1, 2, 3 is assigned based on the data range. For example if you have a W:Z range it will be Col1 - W, Col2 - X, Col3 - Y, Col4 - Z.
4/Select with specific conditions
Now we come to the fun part, the conditions, which is offered by a WHERE clause.
Normally, before you use QUERY, this can be done with FILTER, INDEX - MATCH combination, and many things more. But QUERY make it much shorter.
An example is shown here where I want to show all people who have a salary more than 300.
Or going to a further level, I want to show Facebook employees with salary more than 300.
To make it funnier, let’s say I’m bored and want to find all information about people whose names start with D. It can be achieved with the LIKE clause, and a wildcard which I’m sure is familiar to those who know the REGEX function family.
Here the ‘D%’ stands for ‘names start with D’ (% is the wildcard)
Or this example where someone has “an” in their name. Notice how it didn’t cover Andrew, as the query is case-sensitive.
How do we overcome that? Further magic is applied, with the LOWER function.
5/Aggregation functions and labeling column names
This applies to Min, Max, Sum, Average,…
Assuming I want to know the max, min and average of salary column in the same dataset, here’s the formula I used to achieve it.
What ever comes after the LABEL function will be used to name your new columns. If we don’t have that, it will be “guessed” by the system and looks like the image below, but I guess you’d rather have beautified column names based on your own choice.
That brings us to the end of part 1. In the part 2 I will cover the GROUP BY, ORDER, LIMIT and some other useful cases of QUERY for you.
Thank you for reading until this point, and stay safe!