Number of pages, distribution of top-level domains, crawl overlaps, etc. - basic metrics about Common Crawl Monthly Crawl Archives
Latest crawl: CC-MAIN-2023-40
Size of crawls
View the Project on GitHub
Statistics of Common Crawl’s web archives released on a monthly base:
All metrics presented here are generated from Common Crawl’s URL index data using the code of the cc-crawl-statistics project. Inspired by Sebastian Spiegler’s Statistics of the Common Crawl Corpus 2012.