Computing Reviews

A long way to the top:significance, structure, and stability of Internet top lists
Scheitle Q., Hohlfeld O., Gamba J., Jelten J., Zimmermann T., Strowes S., Vallina-Rodriguez N.  IMC 2018 (Proceedings of the 2018 Internet Measurement Conference, Boston, MA, Oct 31-Nov 2, 2018)478-493,2018.Type:Proceedings
Date Reviewed: 02/05/19

Research communities that analyze parameters like Internet measurement, privacy, and network security typically use Internet top lists. As the selected Internet top list affects the research, it is very important to know how these lists are compiled as well as their stability and reliability. How much do Alexa, Umbrella, and Majestic Internet lists differ, and would using one list over another affect the main research? These are the broad areas addressed in this well organized and lucidly written paper.

There has not been any notable research on top lists and how top list selection can affect research. Internet top lists are compiled in different ways, for example, by collecting the accessed domains using browser plugins, by using domain name system (DNS) lookups happening at popular DNS servers (like openDNS), or by counting backlinks. The paper discusses the structure and stability differences of these top lists and goes on to show how they impact the main research. The paper goes on to actually use the top lists for Internet measurement, comparing the results with general data population.

The authors present their findings and recommendations in a simple, neatly flowing, elegant, and comprehensive way. The paper starts by classifying the research that uses Internet top lists into dependent, verification, and independent buckets, which are affected by the top list selection at varying degrees. Then it delves into the structure and stability of Alexa, Majestic, and Umbrella lists and presents charts like daily changes, weekday/weekend patterns, list intersection, and the life of a domain in these lists. They also reveal their success in gaming Umbrella ranking by planting multiple probes to a controlled domain. The paper goes on to show how the top lists skew research by using these top lists for evaluating domains on DNS parameters (NXDOMAIN, IPv6, CAA, and so on), transport layer security (TLS), and hypertext transfer protocol (HTTP/2) adoption. The paper wraps up by recommending qualities of a reliable top list, particularly in “dependent” research, and sharing their GitHub resources.

The study reveals that Internet top lists distort test results significantly, often by a few orders of magnitude compared to normal data, and results vary based on the day of the week. The paper recommends using Internet top lists that are consistent, transparent, and stable, especially for research where results are heavily dependent on Internet lists. The paper concludes by providing references to the Git repository, where the authors have shared other insights and code that continuously grabs Alexa, Umbrella, and Majestic top lists and draws a live plot.

Reviewer:  Subash Tirupachur Comerica Review #: CR146415 (1905-0168)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy