Google BigQuery Optimization Strategies

I am querying data from Google Analytics Premium using Google BigQuery. At the moment, I have one single query which I use to calculate some metrics (like total visits or conversion rate). This query contains several nested JOIN clauses and nested SELECTs. While querying just one table I am getting the error:

Error: Resources exceeded during query execution.

Using GROUP EACH BY and JOIN EACH does not seem to solve this issue.

One solution to be adopted in the future involves extracting only the relevant data needed for this query and exporting it into a separate table (which will then be queried). This strategy works in principle, I have already a working prototype for it.

However, I would like to explore additional optimization strategies for this query that work on the original table.

In this presentation You might be paying too much for BigQuery some of them are suggested, namely:

  • Narrowing the scan (already doing it)
  • Using query cache (does not apply)

The book "Google BigQuery Analytics" mentions also adjusting query features, namely:

  • GROUP BY clauses generating large number of distinct groups (already did this)
  • Aggregation functions requiring memory proportional to the number of input values (probably does not apply)
  • Join operations generating a greater number of outputs than inputs (does not seem to apply)

Another alternative is just splitting this query into its composing sub-queries, but at this moment I cannot opt for this strategy.

What else can I do to optimize this query?

Category:google analytics Views:2 Time:2018-11-08

Related post

Copyright (C), All Rights Reserved.

processed in 0.107 (s). 11 q(s)