Adrien Hitz

University of Oxford


Modeling Website Visits

In this talk, I will analyze a data set consisting of the number of hits to the 99 most visited websites in the United States. Modeling the random vector of visits seems challenging because its marginals are very heavy-tailed distributed, exhibit peaks at zero and are strongly dependent. It turns out that a simple model based on a censored multivariate normal distribution with marginals transformed to be discrete Pareto IV accurately describes the observations. Following the ideas of Gaussian graphical models, we will see how to reduce dimensionality and visualize the dependence structure as a graph.