Google Cloud Storage
M-Lab publishes all data it collects in raw form as archives on Google Cloud Storage (GCS) at the following location:
https://console.developers.google.com/storage/browser/m-lab/
File Layout
All M-Lab files are packaged and compressed in .tar format. They are placed in folders and named according to the following schema:
[tool]/[YYYY]/[MM]/[DD]/[YYYYMMDD]T[HHMMSS]-[server]-[tool]-[file index].tgz
tool
: The measurement tool that generated the dataYYYYMMDDTHHMMSS
: Start of the time window in which the data were collectedserver
: M-Lab server that collected the datafile index
: Index of the file
This means that each compressed .tgz file contains all the data collected during a single day, by a single tool running on a single M-Lab server.
If the data collected during one day by one tool on one server are more than 1 GB (uncompressed), the files are split into multiple compressed .tgz files of up to 1 GB in size.
For example, the compressed .tgz file 20090218T000000Z-mlab1-lga01-ndt-0000.tgz
contains the first 1 GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18, 2009.
Project Data
Direct links to each M-Lab project’s raw data are available below:
- Glasnost
- NDT
- Neubot
- Neubot measures the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship.
- More information is available at Nexa Center and Github.
- NPAD
- OONI
- OONI measures censorship, surveillance, and traffic manipulation on the Internet.
- More information is available at OONI
- Paris Traceroute
- Paris Traceroute maps network topology between two points on the Internet.
- More information is available at Paris Traceroute
- pathload2 (deprecated)
- M-Lab no longer supports this tool, but its archived data are available on GCS. For similar measurements with a current and supported tool, see NDT.
- Pathload2 measures the available bandwidth of an Internet connection.
- More information is available at https://code.google.com/p/pathload2-gatech/.
- ShaperProbe (deprecated)
- M-Lab no longer supports this tool, but its archived data are available on GCS.
- ShaperProbe detects prioritization of network traffic.
- More information is available at ShaperProbe.
- SideStream
- SideStream collects TCP state information about completed TCP connections on a system.
- More information is available on Github.
- mlab-collectd
- mlab-collectd is a monitoring tool for M-Lab slices, which collects resource utilization information about all M-Lab servers.
- More information is available on Github.
Accessing Data Programmatically
Accessing Data with gsutil
The easiest way to access M-Lab data on GCS programmatically is by using the gsutil
command-line utility.
# List the contents of the M-Lab NDT data in GCS.
$ gsutil ls -l gsutil ls -l gs://m-lab/
# Copy a file from GCS locally.
$ gsutil cp gs://m-lab/ndt/2009/02/18/20090218T000000Z-mlab1-lga01-ndt-0000.tgz .
Accessing Data With Common HTTP Tools
The URLs shown in M-Lab’s GCS web interface require the user to be logged in, which can present challenges when attempting to access the data with common HTTP utilities like curl
or wget
.
You can access M-Lab files programmatically by replacing:
storage.cloud.google.com/m/cloudstorage/b
with
storage.googleapis.com
in any GCS URL.
For example, if the URL of a raw NDT archive on the GCS web application is:
You can access it without authentication via this URL:
https://storage.googleapis.com/m-lab/ndt/2015/12/28/20151228T000000Z-mlab1-lga04-ndt-0001.tgz
GCS File Index
A list of all M-Lab files in GCS is available at:
https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz
This file provides gs:// URLs to M-Lab data.
To change these URLs to https:// URLs (compatible with common HTTP tools), you can convert the file using the following bash script:
$ curl https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz | gunzip | \
while read; do echo ${REPLY/gs:\/\//https://storage.googleapis.com/}; done