⚠️ This project is still in development, so don't expect a finished product ⚠️
HomeServersSoftwareDocsCrawler

How we map

the Fediverse

FediSea's crawler is an open-source bot that respectfully discovers and indexes Fediverse instances. It gathers publicly available metadata to build a comprehensive map of the decentralized social web.

View on GitHub

We respect robots.txt

Before crawling any instance, we check its robots.txt file. If an instance disallows our user agent or the paths we need, we skip it completely. Instance administrators can opt out at any time by updating their robots.txt.

How the crawler works

The crawling pipeline runs in six stages, continuously discovering and updating instance data.

1

Discovery

The crawler starts from a major instance and expands by following peer lists from known instances. New domains are queued for crawling.

2

Robots.txt Check

Before making any request, the crawler fetches and parses robots.txt. If crawling is disallowed, we only save the domain and the robots.txt status.

3

NodeInfo Fetch

The crawler first fetched the /.well-known/nodeinfo endpoint to get the individual nodeinfo endpoint. Then we fetch the returned nodeinfo enpoint.

4

Software-Specific Data

Based on the detected software, the crawler hits the appropriate API endpoint (Mastodon, Lemmy, Misskey, etc.) for more specific data, like the thumbnail for example.

5

Peer Discovery

Connected instances are extracted from peer lists, feeding new domains back into the discovery queue.

6

Scheduled Re-crawl

Instances are re-crawled on a rolling schedule. Active Instances are recrawled more often then dead ones.

What data we collect

We only collect publicly available metadata from standard API endpoints. No private user data, no posts, no DMs — ever.

NodeInfo

/.well-known/nodeinfo

All Fediverse software

Software name & version
Total users, active users (month/half-year)
Local posts, comments count
Open registrations status

Mastodon API

/api/v2/instance

Mastodon, Pixelfed, Pleroma

Instance title & description
Admin contact info
Thumbnail / banner image
Source url

Lemmy API

/api/v3/site

Lemmy

Instance title & description

Peertupe API

/api/v1/config/about

Peertube

Instance title & description

Misskey API

/api/meta

Misskey

Instance title & description
Thumbnail / banner image
Source url

Want to opt out?

Add the following to your instance's robots.txt to prevent FediSea from crawling your server:

User-agent: FediSeaBot
Disallow: /
Footer
A Ghostbyte Production