Statistics of Common Crawl Monthly Archives

Number of pages, distribution of top-level domains, crawl overlaps, etc. - basic metrics about Common Crawl Monthly Crawl Archives
Latest crawl: CC-MAIN-2024-46

View the Project on GitHub

Statistics of Common Crawl Monthly Archives

Statistics of Common Crawl’s web archives released on a monthly base:

All metrics presented here are generated from Common Crawl’s URL index data using the code of the cc-crawl-statistics project. Inspired by Sebastian Spiegler’s Statistics of the Common Crawl Corpus 2012.