To better understand and characterize current uncertainties in the important observational constraint of climate models of aerosol optical depth (AOD), we evaluate and intercompare 14 satellite products, representing nine different retrieval algorithm families using observations from five different sensors on six different platforms. The satellite products (super-observations consisting of 1∘×1∘ daily aggregated retrievals drawn from the years 2006, 2008 and 2010) are evaluated with AErosol RObotic NETwork (AERONET) and Maritime Aerosol Network (MAN) data. Results show that different products exhibit different regionally varying biases (both under- and overestimates) that may reach ±50 %, although a typical bias would be 15 %–25 % (depending on the product). In addition to these biases, the products exhibit random errors that can be 1.6 to 3 times as large. Most products show similar performance, although there are a few exceptions with either larger biases or larger random errors. The intercomparison of satellite products extends this analysis and provides spatial context to it. In particular, we show that aggregated satellite AOD agrees much better than the spatial coverage (often driven by cloud masks) within the 1∘×1∘ grid cells. Up to ∼50 % of the difference between satellite AOD is attributed to cloud contamination. The diversity in AOD products shows clear spatial patterns and varies from 10 % (parts of the ocean) to 100 % (central Asia and Australia). More importantly, we show that the diversity may be used as an indication of AOD uncertainty, at least for the better performing products. This provides modellers with a global map of expected AOD uncertainty in satellite products, allows assessment of products away from AERONET sites, can provide guidance for future AERONET locations and offers suggestions for product improvements. We account for statistical and sampling noise in our analyses. Sampling noise, variations due to the evaluation of different subsets of the data, causes important changes in error metrics. The consequences of this noise term for product evaluation are discussed.