Number of pages, distribution of top-level domains, crawl overlaps, etc. - basic metrics about Common Crawl Monthly Crawl Archives
Latest crawl: CC-MAIN-2024-46
Top-level domains (abbrev. “TLD”/”TLDs”) are a significant indicator for the representativeness of the data, whether the data set or particular crawl is biased towards certain countries, regions or languages.
Metrics about top-level domains are show on the following pages:
.com
vs. .jp
)Note, that top-level domain is defined here as the left-most element of a host name (com
in www.example.com
). Country-code second-level domains (“ccSLD”) and public suffixes are not covered by this metrics.