BigQuery Schema

Background

  • Each M-Lab tool consists of a client and a server.
  • Whenever an M-Lab user starts a test, the client and server interact to measure different aspects of that user’s connection.
  • A single user request triggers one or more tests (e.g., client-to-server test, server-to-client test).
  • For each test, a server collects a log, and the test can be uniquely identified by its log filename.

BigQuery Tables

M-Lab publishes data for each M-Lab project in its own BigQuery table:

  • plx.google:m_lab.ndt.all
  • plx.google:m_lab.npad.all
  • plx.google:m_lab.sidestream.all
  • plx.google:m_lab.paris_traceroute.all

Legacy Tables

M-Lab also publishes tables in its legacy, deprecated per-month format:

plx.google:m_lab.YYYY_MM.all (deprecated)

For example, data for March 2016 is published at:

plx.google:m_lab.2016_03.all (deprecated)

M-Lab continues to support legacy tables but urges clients to move to the new per-project tables, which yield substantially improved performance. See the Legacy Migration Guide for details.

Schema Fields

All M-Lab data share the same data schema in BigQuery. The fields are described in the table below:

Field name Type Description
test_id string ID of the test. It represents the filename of the log that contains the data generated during the test (e.g. 20090819T02:01:04.507508000Z_189.6.232.77:3859.c2s_snaplog.gz).
log_time integer Timestamp of when test log was created (in seconds since Unix epoch).

For NDT and NPAD, this is derived from the “Date/Time” field in the .meta file (for NDT and NPAD, prefer the web100_log_entry.log_time field, as it is more reliable).

For SideStream and Paris Traceroute, this is the timestamp as represented in the test log file’s filename.
connection_spec.data_direction integer Direction of the data sent during the test:
CLIENT_TO_SERVER = 0
SERVER_TO_CLIENT = 1
connection_spec.server_ip string Server’s IP address. (This field is optional. It’s preferable to use web100_log_entry.connection_spec.local_ip.)
connection_spec.server_af integer Address family of the server’s IP address. (This field is optional. It’s preferable to use web100_log_entry.connection_spec.local_af.)
connection_spec.server_hostname string Server’s hostname. (This field is optional.)
connection_spec.server_kernel_version string Server’s kernel version. (This field is optional.)
connection_spec.client_ip string IP address of the user’s client. (This field is optional. It’s preferable to use web100_log_entry.connection_spec.remote_ip.)
connection_spec.client_af integer Address family of the client’s IP address. (This field is optional.)
connection_spec.client_hostname string Client’s hostname. (This field is optional.)
connection_spec.client_application string Client application that ran the test. (This field is optional.)
connection_spec.client_browser string Client’s browser. (This field is optional.)
connection_spec.client_os string Client’s operating system. (This field is optional.)
connection_spec.client_kernel_version string Client’s kernel version. (This field is optional.)
connection_spec.client_version string Client’s version. (This field is optional.)
connection_spec.client_geolocation.continent_code string Geolocation fields extracted from open dataset created by MaxMind and available at www.maxmind.com. (These fields are optional.)
connection_spec.client_geolocation.country_code string  
connection_spec.client_geolocation.country_code3 string  
connection_spec.client_geolocation.country_name string  
connection_spec.client_geolocation.region string  
connection_spec.client_geolocation.metro_code integer  
connection_spec.client_geolocation.city string  
connection_spec.client_geolocation.area_code integer  
connection_spec.client_geolocation.postal_code string  
connection_spec.client_geolocation.latitude float  
connection_spec.client_geolocation.longitude float  
connection_spec.server_geolocation.continent_code string  
connection_spec.server_geolocation.country_code string  
connection_spec.server_geolocation.country_code3 string  
connection_spec.server_geolocation.country_name string  
connection_spec.server_geolocation.region string  
connection_spec.server_geolocation.metro_code integer  
connection_spec.server_geolocation.city string  
connection_spec.server_geolocation.area_code integer  
connection_spec.server_geolocation.postal_code string  
connection_spec.server_geolocation.latitude float  
connection_spec.server_geolocation.longitude float  
web100_log_entry.version string Web100 kernel patch version running on the server (as defined in /proc/web100/header).
web100_log_entry.log_time integer Timestamp of when the Web100 log was created (in seconds since Unix epoch).

For NDT and NPAD, this is derived by calling the web100_get_log_time() function on the web100 log file.

For SideStream, this is the value of the PollTime field in web100 ASCII log.
web100_log_entry.connection_spec.local_ip string IP address of the M-Lab server, as logged in the Web100 log.
web100_log_entry.connection_spec.local_af integer Address family of the server’s IP address, as logged in the Web100 log.
web100_log_entry.connection_spec.local_port integer Port of the M-Lab server (in host-byte-order), as logged in the Web100 log.
web100_log_entry.connection_spec.remote_ip string IP address of the user’s client, as logged in the Web100 log.
web100_log_entry.connection_spec.remote_port integer Port of the user’s client (in host-byte-order), as logged in the Web100 log.
web100_log_entry.snap.[web100_var_name], where web100_var_name is the name of a Web100 variable, as defined in [tcp-kis.txt][1] (field VariableName). See Web100 types [tcp-kis.txt][1] defines 150 Web100 variables. For example, web100_log_entry.snap.MinRTT represents the minimum sampled Round Trip Time.
web100_log_entry.snap.StartTimeStamp integer Time at which the test’s TCP connection was established, in microseconds since UNIX epoch. This variable is a special case, as it contradicts [tcp-kis.txt][1]. tcp-kis.txt defines the field as a 32-bit integer, but remaps two distinct 32-bit integers into this single name, which is not possible for a 32-bit value. To work around this bug in tcp-kis and provide microsecond precision, this field is a 64-bit integer in the BigQuery dataset.
paris_traceroute_hop.protocol integer Protocol used to generate the paris-traceroute trace.
UDP = 0
TCP = 1
ICMP = 2
paris_traceroute_hop.src_ip string The IP address of the start of the hop.
paris_traceroute_hop.src_af integer The address family used to connect to src_ip.
AF_INET = 2
AF_INET6 = 10
paris_traceroute_hop.src_hostname string The hostname of the start of the hop. This may be the same as src_ip if the hostname could not be resolved.
paris_traceroute_hop.dest_ip string The IP address of the end of the hop.
paris_traceroute_hop.dest_af integer The address family used to connect to dest_ip.
AF_INET = 2
AF_INET6 = 10.
paris_traceroute_hop.dest_hostname string The hostname of the end of the hop. This may be the same as dest_ip if the hostname could not be resolved.
paris_traceroute_hop.rtt float The RTT measured from connection_spec.server_ip to paris_traceroute_hop.dest_ip.

Deprecated Fields

The following fields are deprecated and no longer have meaning in the data set.

  • type
  • project
  • web100_log_entry.is_last_entry
  • web100_log_entry.group_name

Equivalent BigQuery and Web100 Field Types

tcp-kis.txt defines each Web100 variable with a specific SNMP type. This table shows how to map each SNMP type to a BigQuery type.

BigQuery Type Corresponding SNMP Type
integer Integer32, Integer, INTEGER, Gauge32, ZeroBasedCounter32, Unsigned32, Unsigned16, Counter32, ZeroBasedCounter64
string Ip_Address
bool TruthValue

Query Examples

See BigQuery Examples for examples of queries against this schema.

Back to Top