Google Cloud Storage
M-Lab publishes all data it collects in raw form as archives on Google Cloud Storage (GCS) at the following location:
https://console.developers.google.com/storage/browser/m-lab/
File Layout
All M-Lab files are packaged and compressed in .tar format. They are placed in folders and named according to the following schema:
[tool]/[YYYY]/[MM]/[DD]/[YYYYMMDD]T[HHMMSS]-[server]-[tool]-[file index].tgz
tool: The measurement tool that generated the dataYYYYMMDDTHHMMSS: Start of the time window in which the data were collectedserver: M-Lab server that collected the datafile index: Index of the file
This means that each compressed .tgz file contains all the data collected during a single day, by a single tool running on a single M-Lab server.
If the data collected during one day by one tool on one server are more than 1 GB (uncompressed), the files are split into multiple compressed .tgz files of up to 1 GB in size.
For example, the compressed .tgz file 20090218T000000Z-mlab1-lga01-ndt-0000.tgz contains the first 1 GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18, 2009.
Project Data
Direct links to each M-Lab project’s raw data are available below:
- Glasnost
- NDT
- Neubot
- Neubot measures the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship.
- More information is available at Nexa Center and Github.
- NPAD
- OONI
- OONI measures censorship, surveillance, and traffic manipulation on the Internet.
- More information is available at OONI
- Paris Traceroute
- Paris Traceroute maps network topology between two points on the Internet.
- More information is available at Paris Traceroute
- pathload2 (deprecated)
- M-Lab no longer supports this tool, but its archived data are available on GCS. For similar measurements with a current and supported tool, see NDT.
- Pathload2 measures the available bandwidth of an Internet connection.
- More information is available at https://code.google.com/p/pathload2-gatech/.
- ShaperProbe (deprecated)
- M-Lab no longer supports this tool, but its archived data are available on GCS.
- ShaperProbe detects prioritization of network traffic.
- More information is available at ShaperProbe.
- SideStream
- SideStream collects TCP state information about completed TCP connections on a system.
- More information is available on Github.
- mlab-collectd
- mlab-collectd is a monitoring tool for M-Lab slices, which collects resource utilization information about all M-Lab servers.
- More information is available on Github.
Accessing Data Programmatically
Accessing Data with gsutil
The easiest way to access M-Lab data on GCS programmatically is by using the gsutil command-line utility.
# List the contents of the M-Lab NDT data in GCS.
$ gsutil ls -l gsutil ls -l gs://m-lab/
# Copy a file from GCS locally.
$ gsutil cp gs://m-lab/ndt/2009/02/18/20090218T000000Z-mlab1-lga01-ndt-0000.tgz .
Accessing Data With Common HTTP Tools
The URLs shown in M-Lab’s GCS web interface require the user to be logged in, which can present challenges when attempting to access the data with common HTTP utilities like curl or wget.
You can access M-Lab files programmatically by replacing:
storage.cloud.google.com/m/cloudstorage/b
with
storage.googleapis.com
in any GCS URL.
For example, if the URL of a raw NDT archive on the GCS web application is:
You can access it without authentication via this URL:
https://storage.googleapis.com/m-lab/ndt/2015/12/28/20151228T000000Z-mlab1-lga04-ndt-0001.tgz
GCS File Index
A list of all M-Lab files in GCS is available at:
https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz
This file provides gs:// URLs to M-Lab data.
To change these URLs to https:// URLs (compatible with common HTTP tools), you can convert the file using the following bash script:
$ curl https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz | gunzip | \
while read; do echo ${REPLY/gs:\/\//https://storage.googleapis.com/}; done