Migrating to M-Lab Fast Tables

In March 2016, M-Lab launched new M-Lab BigQuery per-project tables (“fast tables”). These tables offer faster performance and a simpler data schema.

This guide walks users through the process of converting their existing BigQuery SQL queries to take advantage of these new tables.

1. Change the FROM clause to refer to the per-project tables

Users refer to M-Lab’s existing tables with a clause similar to the following:

FROM
  plx.google:m_lab.2016_01.all

To use the per-project tables, replace the YYYY_MM date in the table name with the M-Lab project of interest. For a query over NDT data, this becomes:

FROM
  plx.google:m_lab.ndt.all

The full list of available per-project tables is:

  • plx.google:m_lab.ndt.all
  • plx.google:m_lab.npad.all
  • plx.google:m_lab.sidestream.all
  • plx.google:m_lab.paris_traceroute.all

2. Add a WHERE clause to restrict based on test time

If your queries were using the table names as a means of limiting a query to a particular time range, it will be necessary to add WHERE clauses to continue enforcing this time range.

For NDT, NPAD, and SideStream, add a query of the following form (where the numeric values are UNIX timestamps in seconds):

WHERE
  (web100_log_entry.log_time >= 1420070400)      -- 2015-01-01T00:00:00Z
   AND (web100_log_entry.log_time < 1427846400)) -- 2015-04-01T00:00:00Z

For Paris Traceroute, add a query of the following form (where the numeric values are UNIX timestamps in seconds):

WHERE
  (log_time >= 1420070400)                       -- 2015-01-01T00:00:00Z
   AND (log_time < 1427846400))                  -- 2015-04-01T00:00:00Z

Optional time formatting functions

BigQuery also provides date/time formatting functions to convert human-readable time values into UNIX timestamps.

For example, for NDT, NPAD, or SideStream queries:

WHERE
  ((web100_log_entry.log_time >=
     PARSE_UTC_USEC("2015-01-01 00:00:00") / POW(10, 6))
   AND (web100_log_entry.log_time <
          PARSE_UTC_USEC("2015-04-01 00:00:00") / POW(10, 6)))

And for Paris Traceroute queries:

WHERE
  ((log_time >= PARSE_UTC_USEC("2015-01-01 00:00:00") / POW(10, 6))
   AND (log_time < PARSE_UTC_USEC("2015-04-01 00:00:00") / POW(10, 6)))

3. (optional) Remove WHERE clauses for project

M-Lab’s existing tables combine data for several different M-Lab projects (NDT, NPAD, SideStream, and Paris Traceroute) into the same table. As such, queries for a particular project’s data required the query author to add a WHERE project=XX clause to restrict the query to a particular project. The new tables are grouped on a per-project basis, so project filters are not necessary.

Within each of the new tables, all rows will contain the same value for project and web100_log_entry.is_last_entry. Therefore these WHERE clauses can be removed.

FROM
  plx.google:m_lab.ndt.all
WHERE
  project = 0 -- << Extraneous, all rows in m_lab.ndt have project = 0. Clause
              --    should be deleted.

It is not strictly necessary to remove project clauses when using the new tables, but doing so will improve query performance.

4. (optional) Remove WHERE clauses for is_last_entry

M-Lab’s existing tables include every web100 snapshot collected during the run of each test. The new, per-project tables include only the test’s final web100 snapshot. As such, clauses to restrict the query to the final snapshot of the test are no longer necessary and can be removed.

FROM
  plx.google:m_lab.ndt.all
WHERE
  web100_log_entry.is_last_entry = TRUE -- << Extraneous, all rows in m_lab.ndt
                                        --    have is_last_entry = TRUE. Clause
                                        --    should be deleted.

It is not strictly necessary to remove is_last_entry clauses when using the new tables, but doing so will improve query performance.

Note: this does not apply to Paris Traceroute data, as Paris Traceroute is not web100-based, so Paris Traceroute queries never included this clause.

Complete Example

To tie it all together, we will look at a complete example where we convert an existing query to take advantage of the new, per-project tables.

The query below calculates the total number of NDT tests performed against M-Lab servers for each day in the last quarter of 2015.

Original query

SELECT
  STRFTIME_UTC_USEC(web100_log_entry.log_time * 1000000,
                    '%Y-%m-%d') AS day,
  COUNT(*) AS num_tests
FROM
  [plx.google:m_lab.2015_10.all],
  [plx.google:m_lab.2015_11.all],
  [plx.google:m_lab.2015_12.all]
WHERE
  project = 0
  AND web100_log_entry.is_last_entry = TRUE
  AND web100_log_entry.log_time IS NOT NULL
GROUP BY
  day
ORDER BY
  day ASC

Converted to use per-project tables

SELECT
  STRFTIME_UTC_USEC(web100_log_entry.log_time * 1000000, '%Y-%m-%d') AS day,
  COUNT(*) AS num_tests
FROM
  [plx.google:m_lab.ndt.all]
WHERE
  ((web100_log_entry.log_time >= 1443657600)     -- 2015-10-01T00:00:00Z
   AND (web100_log_entry.log_time < 1451606400)) -- 2016-01-01T00:00:00Z
GROUP BY
  day
ORDER BY
  day ASC

The converted query performs significantly faster than the original:

Query Type Observed Execution Time
Original query 129.1 seconds
Converted to use per-project tables 5.8 seconds

Questions / Feedback

If you have questions or feedback about using the new, per-project tables, please send an email to support@measurementlab.net.

Back to Top