How we map
the Fediverse
FediSea's crawler is an open-source bot that respectfully discovers and indexes Fediverse instances. It gathers publicly available metadata to build a comprehensive map of the decentralized social web.
Before crawling any instance, we check its robots.txt file. If an instance disallows our user agent or the paths we need, we skip it completely. Instance administrators can opt out at any time by updating their robots.txt.
The crawling pipeline runs in six stages, continuously discovering and updating instance data.
The crawler starts from a major instance and expands by following peer lists from known instances. New domains are queued for crawling.
Before making any request, the crawler fetches and parses robots.txt. If crawling is disallowed, we only save the domain and the robots.txt status.
The crawler first fetched the /.well-known/nodeinfo endpoint to get the individual nodeinfo endpoint. Then we fetch the returned nodeinfo enpoint.
Based on the detected software, the crawler hits the appropriate API endpoint (Mastodon, Lemmy, Misskey, etc.) for more specific data, like the thumbnail for example.
Connected instances are extracted from peer lists, feeding new domains back into the discovery queue.
Instances are re-crawled on a rolling schedule. Active Instances are recrawled more often then dead ones.
We only collect publicly available metadata from standard API endpoints. No private user data, no posts, no DMs — ever.
/.well-known/nodeinfoAll Fediverse software
/api/v2/instanceMastodon, Pixelfed, Pleroma
/api/v3/siteLemmy
/api/v1/config/aboutPeertube
/api/metaMisskey
Add the following to your instance's robots.txt to prevent FediSea from crawling your server:
User-agent: FediSeaBot Disallow: /