Bluesky Post Archiving 2026: 5 Ways to Save Posts, Threads & Feeds
Bluesky has grown rapidly through 2025 and 2026, now hosting millions of active users and tens of millions of public posts every month. Whether you are a journalist tracking a breaking story, a researcher building a social media dataset, or a content creator repurposing your own Bluesky threads for long-form articles, you need a reliable way to archive Bluesky content.
The AT Protocol, which powers Bluesky, is open and decentralized by design. This means there are more archiving options than with any other major social platform. This guide covers five proven methods, from built-in bookmarks to real-time firehose scraping, so you can pick the one that fits your technical comfort level and workflow.
TL;DR. Use Bluesky built-in bookmarks for casual saves. Use the AT Protocol HTTP API for programmatic access without authentication. Use ThreadGrab for cross-platform archiving (Bluesky + X in one interface). Use Jetstream for real-time firehose data. Use the atproto Python SDK for fully custom archiving scripts.
Why Bluesky Archiving Matters in 2026
Three trends make Bluesky archiving particularly relevant this year. First, Bluesky's user base has crossed 30 million, making it the third-largest public conversation platform after X and Threads. Second, the platform has become a primary source for tech, journalism, and academic discourse -- communities that left X in 2024-2025 gravitated to Bluesky in large numbers. Third, the AT Protocol's open design means that archiving is not just possible but encouraged: every public post is accessible through documented APIs with no authentication required for read access.
Unlike X, which has restricted API access to paid tiers, Bluesky's AT Protocol remains completely open. You can fetch any user's timeline, any post's thread, or any feed's content with simple HTTP requests -- no API key, no developer account, no monthly subscription. This makes Bluesky the most archivable major social platform in 2026.
Method 1: Bluesky Built-in Bookmarks -- The Zero-Effort Save
Bluesky Native Bookmarks
Available in the Bluesky app (web, iOS, Android) since mid-2025.
Pros: Zero setup, private, searchable, works on every device, no technical skills needed.
Cons: No export capability, no programmatic access, limited to your own account, no batch operations.
Bluesky introduced native bookmarks in 2025, and they work exactly as you would expect. Click the bookmark icon on any post, and it is saved to a private bookmarks collection visible only to you. The bookmarks are searchable within the app, and you can organize them into folders or use the default flat list.
Bookmarks are great for casual use -- you see an interesting thread on your explore feed, bookmark it, and come back to read it later. But they have a hard limit: you cannot export your bookmarks as structured data. If you want to compile a research dataset, migrate to another platform, or run analysis on saved content, you need one of the methods below.
When to use built-in bookmarks
- You save posts occasionally and only need to revisit them within Bluesky.
- You share a Bluesky account with a team and want a shared reading list.
- You do not need the saved content outside of Bluesky.
Method 2: AT Protocol HTTP API -- The Open, No-Auth Approach
AT Protocol Public Endpoints
Base URL: https://bsky.social/xrpc/ -- no authentication required for read endpoints.
Pros: Completely free, no API key needed, well-documented, works with any HTTP client.
Cons: Rate-limited (approximately 5,000 requests per hour per IP), requires familiarity with AT Protocol data structures (CIDs, strong references).
The AT Protocol exposes a full set of public APIs for reading data. You can fetch any user's timeline, any individual post, entire conversation threads, and curated feeds -- all with simple HTTP GET requests.
# Fetch a user's timeline (last 30 posts)
curl -s "https://bsky.social/xrpc/app.bsky.feed.getAuthorFeed?actor=bsky.app" \
| jq '.feed[:3] | .[] | {author: .post.author.handle, text: .post.record.text}'
# Fetch a specific post by AT-URI
curl -s "https://bsky.social/xrpc/app.bsky.feed.getPostThread?uri=at://did:plc:.../app.bsky.feed.post/3lmp6q7q2hs2s" \
| jq '.thread.post.record.text'
# Search posts by keyword (with pagination)
curl -s "https://bsky.social/xrpc/app.bsky.feed.searchPosts?q=Bluesky+archiving+2026&limit=25" \
| jq '.posts[] | {author: .author.handle, text: .record.text}'
The key concept in the AT Protocol is the AT-URI -- a decentralized identifier that points to any record on the network. Every post, like, follow, and feed has a unique AT-URI. Once you have a user's AT-URI or DID (Decentralized Identifier), you can fetch all their public content without any authentication.
Rate limits are generous (about 5,000 requests per hour per IP), which is enough for personal archiving and small research projects. For large-scale collection, you need Jetstream (Method 4) or the firehose.
When to use the AT Protocol API
- You are comfortable with curl or a scripting language like Python or Node.js.
- You need to archive specific users, posts, or feeds on a recurring schedule.
- You want to build a custom dashboard or analysis pipeline for Bluesky content.
Method 3: ThreadGrab -- Cross-Platform Bluesky + X Archiving
ThreadGrab
Website: threadgrab.com -- free, no account needed.
Pros: Single interface for Bluesky AND X archiving, free API, no authentication required, returns structured JSON or Markdown.
Cons: Requires command-line comfort, does not offer real-time firehose streaming (use Jetstream for that).
ThreadGrab was built to solve exactly this problem: you should not need a different tool for every social platform. ThreadGrab's public API supports Bluesky posts through the AT Protocol's open endpoints, giving you the same structured data format whether you are archiving an X thread or a Bluesky feed.
# Fetch a Bluesky user's recent posts via ThreadGrab
curl -s "https://threadgrab.com/api/bluesky/profile/bsky.app" \
| jq '.[:3] | .[] | {author: .author, text: .text[0:120]}'
# Save Bluesky posts as Markdown for LLM input
curl -s "https://threadgrab.com/api/bluesky/profile/bsky.app" \
| jq -r '.[] | "## \\(.author)\\n\\n\\(.text)\\n---"' \
> bluesky-archive-$(date +%Y-%m-%d).md
# Search Bluesky posts by keyword
curl -s "https://threadgrab.com/api/bluesky/search?q=AT+Protocol+archiving&limit=10" \
| jq '.posts[] | {author: .author.handle, text: .text}'
Unlike the raw AT Protocol API, ThreadGrab normalizes the response format so that Bluesky data looks the same as X data. This makes it easy to build a single archiving pipeline that pulls from both platforms. The output is clean JSON that you can convert to Markdown, CSV, or any other format.
ThreadGrab also handles the AT-URI resolution for you -- you do not need to understand DIDs or CIDs. Just provide a handle (e.g., bsky.app) and ThreadGrab resolves it to the correct DID and fetches the posts.
When to use ThreadGrab
- You archive content from both Bluesky and X and want a single tool.
- You need structured data (JSON) but do not want to learn the AT Protocol's record schema.
- You want to save Bluesky posts as clean Markdown for LLM training or context injection.
Method 4: Jetstream -- Real-Time Bluesky Firehose
Jetstream
Maintained by Bluesky Social PBC. WebSocket endpoint for the full AT Protocol firehose.
Pros: Real-time, covers ALL public posts, ideal for research datasets and trend analysis, no rate limits.
Cons: Requires WebSocket client and significant storage, overkill for personal archiving, complex setup for filtering.
Jetstream is the real-time firehose service for the AT Protocol. It gives you a WebSocket stream of every public event on the Bluesky network -- posts, likes, reposts, follows, and more. If you need a complete dataset of Bluesky activity (for research, trend detection, or large-scale analysis), Jetstream is the right tool.
# Install a WebSocket client and connect to Jetstream
pip install websocket-client
# Connect and dump all new posts to stdout
python3 -c "
import json, websocket
ws = websocket.create_connection('wss://jetstream.atproto.tools/subscribe')
for _ in range(20): # grab 20 events
msg = json.loads(ws.recv())
if msg.get('kind') == 'commit' and 'post' in str(msg):
print(json.dumps(msg, indent=2)[:300])
ws.close()
"
Jetstream is maintained by Bluesky Social PBC and is the most reliable way to get comprehensive data. The stream typically delivers 50-200 events per second during peak hours, so you need a robust storage backend (database or streaming pipeline) to consume it meaningfully. For the average user who wants to save a few interesting threads, Jetstream is overkill -- but for researchers and data journalists, it is invaluable.
Jetstream also supports filtered subscriptions: you can subscribe to specific DIDs or record types, reducing the data volume to only what you care about.
When to use Jetstream
- You are a researcher building a longitudinal social media dataset.
- You need to detect trends or monitor specific keywords in real-time.
- You have the infrastructure (storage, compute) to consume a high-volume event stream.
Method 5: atproto Python SDK -- Fully Custom Archiving Scripts
atproto SDK
pip install atproto -- official Python SDK maintained by Bluesky Social PBC.
Pros: Full API coverage (auth, fetch, post, delete), type-safe, well-documented, handles DID resolution and error retries automatically.
Cons: Requires Python 3.9+, adds a dependency to your project, authentication flow can be confusing for beginners.
The official Python SDK for the AT Protocol is the most complete way to interact with Bluesky programmatically. It supports everything from simple read operations (fetching timelines, searching posts) to write operations (posting, bookmarking, following) and admin tasks (moderation).
from atproto import Client, models
# Fetch posts without authentication (public read only)
client = Client()
feed = client.get_author_feed(
models.AppBskyFeedGetAuthorFeed.Params(
actor='bsky.app',
limit=10,
)
)
for post in feed.feed:
record = post.post.record
print(f"@{post.post.author.handle}: {record.text[:100]}")
# Search posts by keyword
results = client.app.bsky.feed.search_posts(
models.AppBskyFeedSearchPosts.Params(
q='Bluesky archiving 2026',
limit=20,
)
)
for post in results.posts:
print(f"[{post.author.handle}] {post.record.text[:120]}")
The SDK handles AT-URI resolution, DID-to-handle mapping, pagination, and rate-limit backoff automatically. If you are building a custom archiving script that needs to be reliable over long periods (e.g., a daily cron job), the atproto SDK is the right choice over raw HTTP calls.
For authenticated operations (like fetching your own bookmarks), you need an app password from Bluesky's settings page:
from atproto import Client
client = Client()
client.login('your-handle.bsky.social', 'your-app-password')
# Fetch your bookmarks (requires authentication)
# Note: Bookmark API is available through app.bsky.feed.* namespace
feed = client.app.bsky.feed.get_timeline(
models.AppBskyFeedGetTimeline.Params(limit=30)
)
When to use the atproto SDK
- You need a reliable, long-running archiving script (cron job, background daemon).
- You are familiar with Python and want type-safe access to AT Protocol data.
- You need authenticated operations (bookmarks, private feeds, moderation).
Side-by-Side Comparison
| Feature | Bluesky Bookmarks | AT Protocol API | ThreadGrab | Jetstream | atproto SDK |
|---|---|---|---|---|---|
| Setup time | 0 seconds | 2 minutes (curl) | 2 minutes (curl) | 10 minutes (websocket) | 5 minutes (pip install) |
| Technical skill | None | Low | Low | Medium | Medium |
| Auth required | Yes (logged in) | No | No | No | Optional |
| Export capability | No | Yes (JSON) | Yes (JSON/MD) | Yes (JSON) | Yes (any format) |
| Real-time data | No | No | No | Yes (firehose) | No |
| Supports X too | No | No | Yes | No | No |
| Bulk / batch | Manual only | Scriptable | Scriptable | Automatic stream | Scriptable |
| Markdown output | No | Via jq conversion | Native support | Via processing | Via code |
| Best for | Casual readers | Scripting enthusiasts | Cross-platform users | Researchers | Python developers |
Building a Complete Archiving Pipeline
Here is how a journalist might combine these methods into a daily Bluesky archiving workflow:
#!/bin/bash
# Daily Bluesky archiving pipeline (runs at 7 AM via cron)
# Combines ThreadGrab API for profile archiving + file-based storage
SOURCES=("bsky.app" "nytopinion.bsky.social" "techmeme.bsky.social")
OUTPUT_DIR="$HOME/bluesky-archive/$(date +%Y/%m)"
mkdir -p "$OUTPUT_DIR"
for handle in "${SOURCES[@]}"; do
curl -s "https://threadgrab.com/api/bluesky/profile/$handle" \
| jq -r '.[] | "### \\(.author)\\n\\(.text)\\n---"' \
> "$OUTPUT_DIR/$handle.md"
echo "Saved $handle: $(wc -l < "$OUTPUT_DIR/$handle.md") lines"
done
echo "Archive complete for $(date +%Y-%m-%d)"
This pipeline runs daily via cron, saves Markdown files organized by year/month, and uses ThreadGrab for the actual API calls because it normalizes Bluesky and X data into the same format. The journalist can then search, analyze, or feed the archive into an LLM for summarization.
Pro tip. For maximum flexibility, combine ThreadGrab for profile-level archiving with the atproto SDK for authenticated operations (like fetching your bookmarks). ThreadGrab handles the cross-platform normalization; the SDK gives you full control when you need it.
How ThreadGrab Fits Into the Bluesky Ecosystem
ThreadGrab was designed to bridge the gap between social platforms. While Bluesky's AT Protocol is beautifully open, it is also different from every other platform's API. ThreadGrab abstracts away those differences: the same API call that fetches an X thread also fetches a Bluesky feed, returning the same structured format.
This matters because few people consume content on only one platform. A typical journalist today reads X for breaking news, Bluesky for tech discourse, and LinkedIn for industry analysis. ThreadGrab gives you a single archiving entry point for the two most important public conversation platforms.
Archive Bluesky threads and X articles side by side -- no account, no API key, no setup.
Try ThreadGrab -- Free Social Media ArchiverFAQ
Yes. Bluesky added native bookmarks in mid-2025. You can bookmark any post by clicking the bookmark icon. Bookmarks are private and searchable within your account, but you cannot export them as structured data.
Yes. The AT Protocol has an open, rate-limited API that requires no API key. You can fetch posts, user timelines, and feeds using simple HTTP GET requests to public endpoints like bsky.social/xrpc/com.atproto.repo.getRecord. Rate limits are approximately 5,000 requests per hour.
Yes. ThreadGrab supports Bluesky posts via the AT Protocol open API. You can use the ThreadGrab API to fetch Bluesky threads and profiles alongside X content through a single interface, with no account or API key required.
Jetstream is a real-time firehose service for the AT Protocol. It gives you a WebSocket stream of every public event across the entire Bluesky network, making it ideal for researchers who need comprehensive datasets. It delivers 50-200 events per second during peak hours.
Yes. The official Python SDK is atproto (pip install atproto). It supports authentication, fetching timelines, searching posts, managing bookmarks, and uploading media. It handles DID resolution, pagination, and rate-limit backoff automatically.
ThreadGrab is the best option for LLM workflows because it outputs clean Markdown or JSON directly. The raw AT Protocol API and atproto SDK both require additional processing to convert records to LLM-friendly formats. Jetstream provides too much volume for LLM context windows without significant filtering.
Choose Your Method and Start Archiving
Bluesky's open architecture makes it the most archivable social platform in 2026. Whether you use built-in bookmarks for casual reading, the AT Protocol API for lightweight scripting, ThreadGrab for cross-platform archiving, Jetstream for comprehensive research, or the atproto SDK for fully custom pipelines, there is a method that fits your workflow.
The key insight is that you do not have to pick just one. Bookmark interesting posts during the day, run ThreadGrab nightly for profile archives, and keep Jetstream running in the background if you need real-time data. The tools are free, open, and designed to work together. Start with ThreadGrab for the fastest path to a working archiving pipeline.