The QUERY function: using SQL in Google Sheets

My blog is now on Medium!

If you're familiar with SQL, one of the most powerful functions that you can use in Google Sheets is QUERY. This function runs a Google Visualization API Query Language, which is a very simple SQL language, but enough for the data manipulation typically done in spreadsheets.

And I say simple because it admits many typical SQL clauses (select, where, group by, etc.) but only a few aggregation and scalar functions. You can find all the supported functions in the official documentation.

In summary, you can finally forget all the annoying functions of spreadsheets and just use SQL!

The syntax of the QUERY function is:  =QUERY(data, query, [headers]) 
where data is the range of cells to perform the query on, query is the query to perform and headers is an optional parameter to specify the number of header rows at the top of data.

In this post, I want to show you two applications of this function: summarizing data (see example 1) and manipulating data from several sheets (see example 2).

Example 1

Suppose you have the following table in Sheet1:


var1 var2 var3
a1 b1 1
a1 b1 2
a1 b1 3
a1 b1 4

And that you want to obtain the sum of var3 grouping by var1 and var2, but only for those records whose value of var3 is less than 4. So, in this case:


var1 var2 sum var3
a1 b1 6 (=1+2+3)

Using the QUERY function, the formula you need is as simple as:

=QUERY(Sheet1!A1:C5, "select A, B, sum(C) where C<4 group by A, B")

Note that column identifiers are the column letters.


Example 2

Now, suppose that you have three tables in Sheet1, Sheet2 and Sheet3, respectively:


var1 var2 var3
a1 b1 1
a1 b1 2
a1 b1 3
a1 b1 4


var1 var2 var3
a2 b2 5
a2 b2 6
a2 b2 7
a2 b2 8


var1 var2 var3
a3 b3 9
a3 b3 10
a3 b3 11
a3 b3 12

With the QUERY function, you could combine the three tables by rows and manipulate the resulting table simultaneously. For example, the formula:

=QUERY({Sheet1!A1:C5;Sheet2!A2:C5;Sheet3!A2:C5},"select Col1, Col2, sum(Col3) group by Col1, Col2")

Would sum the values of var3 grouping by the other two columns, resulting in:


var1 var2 var3
a1 b1 10
a2 b2 26
a3 b3 42

Note that, in this case, column identifiers are Col1, Col2...

Comments

Popular posts from this blog

Understanding how LIME explains predictions

Interpretable Machine Learning: An Overview

Understanding how IME (Shapley Values) explains predictions