Google Cloud Storage

M-Lab publishes all data it collects in raw form as archives on Google Cloud Storage (GCS) at the following location:

https://console.developers.google.com/storage/browser/m-lab/

File Layout

All M-Lab files are packaged and compressed in .tar format. They are placed in folders and named according to the following schema:

[tool]/[YYYY]/[MM]/[DD]/[YYYYMMDD]T[HHMMSS]-[server]-[tool]-[file index].tgz

  • tool: The measurement tool that generated the data
  • YYYYMMDDTHHMMSS: Start of the time window in which the data were collected
  • server: M-Lab server that collected the data
  • file index: Index of the file

This means that each compressed .tgz file contains all the data collected during a single day, by a single tool running on a single M-Lab server.

If the data collected during one day by one tool on one server are more than 1 GB (uncompressed), the files are split into multiple compressed .tgz files of up to 1 GB in size.

For example, the compressed .tgz file 20090218T000000Z-mlab1-lga01-ndt-0000.tgz contains the first 1 GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18, 2009.

Project Data

Direct links to each M-Lab project’s raw data are available below:

  • Glasnost
    • Glasnost detects prioritization or censorship of network traffic.
    • More information is available at MPI SWS and Github.
  • NDT
    • Network Diagnostic Tool (NDT) measures characteristics of a TCP connection under heavy load.
    • More information is available at Internet2 and Github.
  • Neubot
    • Neubot measures the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship.
    • More information is available at Nexa Center and Github.
  • NPAD
    • Network Path and Application Diagnosis (NPAD) diagnoses issues in a network path that can degrade network performance.
    • More information is available at UCAR and Github.
  • OONI
    • OONI measures censorship, surveillance, and traffic manipulation on the Internet.
    • More information is available at OONI
  • Paris Traceroute
    • Paris Traceroute maps network topology between two points on the Internet.
    • More information is available at Paris Traceroute
  • pathload2 (deprecated)
    • M-Lab no longer supports this tool, but its archived data are available on GCS. For similar measurements with a current and supported tool, see NDT.
    • Pathload2 measures the available bandwidth of an Internet connection.
    • More information is available at https://code.google.com/p/pathload2-gatech/.
  • ShaperProbe (deprecated)
    • M-Lab no longer supports this tool, but its archived data are available on GCS.
    • ShaperProbe detects prioritization of network traffic.
    • More information is available at ShaperProbe.
  • SideStream
    • SideStream collects TCP state information about completed TCP connections on a system.
    • More information is available on Github.
  • mlab-collectd
    • mlab-collectd is a monitoring tool for M-Lab slices, which collects resource utilization information about all M-Lab servers.
    • More information is available on Github.

Accessing Data Programmatically

Accessing Data with gsutil

The easiest way to access M-Lab data on GCS programmatically is by using the gsutil command-line utility.

# List the contents of the M-Lab NDT data in GCS.
$ gsutil ls -l gsutil ls -l gs://m-lab/

# Copy a file from GCS locally.
$ gsutil cp gs://m-lab/ndt/2009/02/18/20090218T000000Z-mlab1-lga01-ndt-0000.tgz .

Accessing Data With Common HTTP Tools

The URLs shown in M-Lab’s GCS web interface require the user to be logged in, which can present challenges when attempting to access the data with common HTTP utilities like curl or wget.

You can access M-Lab files programmatically by replacing:

storage.cloud.google.com/m/cloudstorage/b

with

storage.googleapis.com

in any GCS URL.

For example, if the URL of a raw NDT archive on the GCS web application is:

https://storage.cloud.google.com/m/cloudstorage/b/m-lab/o/ndt/2015/12/28/20151228T000000Z-mlab1-lga04-ndt-0001.tgz

You can access it without authentication via this URL:

https://storage.googleapis.com/m-lab/ndt/2015/12/28/20151228T000000Z-mlab1-lga04-ndt-0001.tgz

GCS File Index

A list of all M-Lab files in GCS is available at:

https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz

This file provides gs:// URLs to M-Lab data.

To change these URLs to https:// URLs (compatible with common HTTP tools), you can convert the file using the following bash script:

$ curl https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz | gunzip | \
while read; do echo ${REPLY/gs:\/\//https://storage.googleapis.com/}; done
Back to Top