How we map
the Fediverse

FediSea's crawler is an open-source bot that respectfully discovers and indexes Fediverse instances. It gathers publicly available metadata to build a comprehensive map of the decentralized social web.

View on GitHub

We respect robots.txt

Before crawling any instance, we check its robots.txt file. If an instance disallows our user agent or the paths we need, we skip it completely. Instance administrators can opt out at any time by updating their robots.txt.

How the crawler works

The crawling pipeline runs in six stages, continuously discovering and updating instance data.

Discovery

The crawler starts from a major instance and expands by following peer lists from known instances. New domains are queued for crawling.

Robots.txt Check

Before making any request, the crawler fetches and parses robots.txt. If crawling is disallowed, we only save the domain and the robots.txt status.

NodeInfo Fetch

The crawler first fetched the /.well-known/nodeinfo endpoint to get the individual nodeinfo endpoint. Then we fetch the returned nodeinfo enpoint.

Software-Specific Data

Based on the detected software, the crawler hits the appropriate API endpoint (Mastodon, Lemmy, Misskey, etc.) for more specific data, like the thumbnail for example.

Peer Discovery

Connected instances are extracted from peer lists, feeding new domains back into the discovery queue.

Scheduled Re-crawl

Instances are re-crawled on a rolling schedule. Active Instances are recrawled more often then dead ones.

What data we collect

We only collect publicly available metadata from standard API endpoints. No private user data, no posts, no DMs — ever.

NodeInfo

/.well-known/nodeinfo

All Fediverse software

Software name & version

Total users, active users (month/half-year)

Local posts, comments count

Open registration status

Mastodon API

/api/v2/instance

Mastodon, Pixelfed, Pleroma

Instance title & description

Admin contact info

Thumbnail / banner image

Source url

Lemmy API

/api/v3/site

Lemmy

Instance title & description

Peertupe API

/api/v1/config/about

Peertube

Instance title & description

Misskey API

/api/meta

Misskey

Instance title & description

Thumbnail / banner image

Source url

Want to opt out?

Add the following to your instance's robots.txt to prevent FediSea from crawling your server:

User-agent: FediSeaBot
Disallow: /

How we mapthe Fediverse

We respect robots.txt

How the crawler works

Discovery

Robots.txt Check

NodeInfo Fetch

Software-Specific Data

Peer Discovery

Scheduled Re-crawl

What data we collect

NodeInfo

Mastodon API

Lemmy API

Peertupe API

Misskey API

Want to opt out?

How we map
the Fediverse