Datasets at a glance

Basically, there are two types of datasets: those that only measure objective metrics, and those that additionally collect subjective user feedback.

Information!

On Dec 2019, we released the WWW'19 dataset... Go grab it !

Wikipedia subjective metrics dataset (User satisfaction)

In a collaboration with Wikimedia foundation we collected for more than 5 months worth of Real User Monitoring (RUM) data from Wikipedia users, during their normal browsing activity, about whether they felt that page loading process was fast enough. The study has been published at [WWW-19], and we additionally prepared an extended technical report containing further details [TECHREP-19].

  • 6.21MB The Wikimedia legal team has given clearance for the publication of the datasets, after having fully prevented user deanonymization and content-linkability. The WWW’19 dataset comprises over 60,000 user survey answers, associated with 18 browser performance metrics.

Subjective metrics datasets (5-grades ACR scale)

These award winning datasets have been collected in our [PAM-17] and [PAM-18] papers. Subjective metrics have been collected with (hundreds of) real humans, browsing (tens of) real websites on controlled lab conditions. Details of the testbed are in [PAM-17] and details of the set of candidate pages in [DIRECTORSCUT-16]. To reduce human error and increase repeatability, we also provide our code.

  • 486KB compressed, 1.9MB raw The sanitized PAM-18 WebMOS dataset comprises over 3,000 user grades, that we describe and use in [PAM-18] and [QoMEX-18]. Details of the sanitization process are in [PAM-18], and (a simplified version of the) Jupyter Notebook of used in the paper can be found in the code section

  • 466KB compressed, 24MB raw The original WebMOS dataset (link disabled) used in [PAM-17] WebMOS] is still available, but was significantly extended in [PAM-18], so there is no reason you should pick this one!

  • 1.5MB compressed, 5MB raw The complete PAM-18 WebMOS dataset (link disabled), comprises over 9,000 user grades and is also still available (you can find the link in this page if you’re determined enough). However, for repeatability we would prefer you to use the sanitized version below!

Objective metrics datasets

These datasets have been collected for our award winning [SIGCOMM-QoE-16] paper. Objective metrics are collected with an automated process, and do not require user intervention. This makes it possible to collect fairly large datasets, with enough repetitions to make statistical analysis accurate

  •  24 MB The Alexa Top-100 Chrome dataset contains objective metrics metrics such as ByteIndex, ObjectIndex, DOM, onLoad, etc. (but not SpeedIndex as it slows down Webpage rendering process itself, see [SIGCOMM-QoE-16])

  •  7.2 GB The Alexa Top-100 WebPagetest dataset contains objective metrics metrics such as ByteIndex, ObjectIndex, DOM, onLoad, etc., as well as SpeedIndex (computed with histograms of the page rendering process by wepagetest).

  •  100+ GB We have collected much larger datasets on (several Top 1000 list of FR, EU, World) with 100+ repetitions from Webpagetest. If interested, contact us.