Number of pages, distribution of top-level domains, crawl overlaps, etc. - basic metrics about Common Crawl Monthly Crawl Archives
Latest crawl: CC-MAIN-2024-51
Crawler-related metrics are extracted from the crawler log files, cf. ../stats/crawler/ and include
The first plot shows absolute number for the metrics.
The relative portion of the fetch status is shown in the second graphics.
The next figure shows the relative usage of http and https URL protocols (schemes). The increasing usage HTTPS on the web is reflected. But also crawler properties such as sampling, deduplication and URL canonicalization) may influence the actual amount of HTTPS URLs in a single monthly crawl.