X Articles vs Bluesky Long-Form 2026
In late May 2026, Bluesky launched a long-form content feature designed to compete directly with X Articles. For the first time, two major social platforms offer native long-form publishing — and content scraping tools like ThreadGrab have a new frontier to cover.
This is not another "which platform is better for writers" comparison. This is a technical guide to scraping, archiving, and repurposing content from both platforms in 2026. If you are a researcher, an AI trainer, or a content creator who wants to own your data, here is what changed and how to adapt.
TL;DR. Both X Articles and Bluesky long-form can be saved as Markdown using ThreadGrab. X uses a proprietary API with stricter rate limits. Bluesky uses the open AT Protocol (free, no API key). For batch archiving, Bluesky is easier to scrape. For single high-value articles, both work identically through ThreadGrab.
What Changed: Bluesky Long-Form Content (May 2026)
Bluesky's long-form feature, announced on May 28, 2026, lets users write and publish posts exceeding the traditional 300-character limit. Similar to X Articles, these long-form posts support rich text, headers, lists, and embedded media. The difference is in the underlying protocol: Bluesky builds on the AT Protocol, an open, decentralized standard that any developer can query without authentication.
X Articles, by contrast, sits inside X's proprietary ecosystem. To scrape them programmatically, you need either the X API (paid tiers starting at $200/month) or a third-party tool like ThreadGrab that reverse-engineers the public web interface.
| Feature | X Articles | Bluesky Long-Form |
|---|---|---|
| Launch date | Late 2024 (public) | May 28, 2026 |
| Protocol | Proprietary (X API) | Open (AT Protocol) |
| Auth required for scraping | Yes (API key or web scraping) | No (public API) |
| Rate limits | Strict (100 req / 15 min) | Generous (AT Protocol) |
| Markdown output via ThreadGrab | Yes | Yes |
| Best for scraping | Single articles, individual saves | Batch feeds, research archives |
How to Scrape X Articles in 2026
X Articles are structured as HTML documents rendered inside X's web interface. The key challenge is that X serves articles as part of a React application, meaning the raw HTML source contains minimal content — most of the text is loaded dynamically via JavaScript.
ThreadGrab handles this by rendering the page server-side and extracting the article body from the DOM tree. The result is clean Markdown with no boilerplate, no sidebar, no suggested posts.
# Save an X Article as Markdown (via ThreadGrab API)
curl -s "https://threadgrab.com/api/x/article/some-article-title" \
| jq -r '.text' > article.md
# Or use the profile API to get the latest article from a user
curl -s "https://threadgrab.com/api/profile/paulg" \
| jq -r '.[] | select(.type == "article") | .text' > paulg-latest.md
Pro tip. X rate-limits anonymous page views aggressively in 2026. If you scrape X Articles directly with curl or Playwright, expect frequent CAPTCHAs and temporary IP blocks. ThreadGrab rotates user agents and proxies so you do not have to.
How to Scrape Bluesky Long-Form Content
Bluesky's AT Protocol makes scraping dramatically simpler. Every post — including long-form content — is stored as an AT Protocol record. You can query these records directly through any AT Protocol relay or through Bluesky's public API without authentication.
# Fetch a Bluesky user's recent posts (including long-form) via AT Protocol
curl -s "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=username.bsky.social" \
| jq -r '.feed[] | .post.record.text' > bsky-archive.md
# ThreadGrab supports Bluesky natively
curl -s "https://threadgrab.com/api/profile/username.bsky.social" \
| jq -r '.[] | .text' > bsky-threadgrab.md
A critical advantage: Bluesky posts are signed with cryptographic keys and stored on Personal Data Servers (PDS). Even if a post is deleted from the user's timeline, the record may still exist on the PDS, making Bluesky a better platform for long-term content preservation.
Side-by-Side: Scraping Comparison
| Criteria | X Articles | Bluesky Long-Form | ThreadGrab (both) |
|---|---|---|---|
| Scraping difficulty | High (JS rendering, CAPTCHAs) | Low (open API, no CAPTCHA) | Minimal (one endpoint) |
| Programmatic access | X API (paid) or scraping | AT Protocol (free, public) | Free API, no auth |
| Rate limit handling | Manual throttling required | Generous limits | Built-in retry + proxy |
| LLM-ready output | Depends on tool | Depends on tool | Clean Markdown by default |
| Long-term preservation | Content can be deleted | Signed records on PDS | Save to local .md files |
| Batch support | Per-article or per-profile | Per-feed or per-profile | Per-profile (both platforms) |
Building a Cross-Platform Archiving Pipeline
The true power of ThreadGrab is treating X and Bluesky as interchangeable sources. Here is a real-world pipeline that archives both platforms into a single Markdown vault:
#!/bin/bash
# Cross-platform content archive -- runs daily via cron
USERS_X=("paulg" "kelseyhightower" "levelsio")
USERS_BSKY=("jack.bsky.social" "tante.bsky.social")
OUTPUT_DIR="$HOME/archive/social-content"
mkdir -p "$OUTPUT_DIR"
echo "=== Archiving X Articles ==="
for user in "${USERS_X[@]}"; do
curl -s "https://threadgrab.com/api/profile/$user" \
| jq -r '.[] | select(.type == "article") | "## \(.author)\n\(.text)\n"' \
> "$OUTPUT_DIR/x-$user-$(date +%Y-%m-%d).md"
done
echo "=== Archiving Bluesky Long-Form ==="
for user in "${USERS_BSKY[@]}"; do
curl -s "https://threadgrab.com/api/profile/$user" \
| jq -r '.[] | "## \(.author)\n\(.text)\n"' \
> "$OUTPUT_DIR/bsky-$user-$(date +%Y-%m-%d).md"
done
echo "Archived to $OUTPUT_DIR"
This pipeline generates one Markdown file per platform per user per day. You can feed these files into Obsidian, Notion, or any LLM knowledge base. The jq filter select(.type == "article") picks only long-form posts from X profiles, while Bluesky's output already exposes the post text directly.
What the Bluesky Long-Form Launch Means for Scraping Tools
The launch of Bluesky long-form content reshapes the content scraping landscape in three important ways:
- More content to archive. Bluesky users who previously posted only short updates now have an incentive to write long-form articles. The pool of scrapable long-form content just expanded.
- Open protocol advantage. Bluesky's AT Protocol is fully documented and publicly queryable. Any scraping tool can integrate it without negotiating an API agreement. This puts pressure on X to either loosen its API restrictions or lose the "most scraped platform" crown to Bluesky.
- Archiving is now a competitive feature. As creators diversify across platforms, the ability to archive content from multiple sources through a single tool becomes a decisive advantage. ThreadGrab already supports both X and Bluesky through the same API — one of the few tools that can claim cross-platform parity in 2026.
Note. Bluesky long-form content is less than three weeks old as of this writing. The AT Protocol relay infrastructure is still maturing. Some long-form posts may take minutes to propagate across relays. For production archiving, use ThreadGrab's API which queries multiple relays and falls back gracefully.
Which Platform Should You Scrape — Based on Your Use Case
| Your goal | Best platform | Recommended method |
|---|---|---|
| LLM training data | Both (diverse sources) | ThreadGrab API + jq filter |
| Personal research archive | Bluesky (open, permanent) | AT Protocol direct query |
| Journalism / fact-checking | X Articles (more authors) | ThreadGrab with CAPTCHA bypass |
| Monitoring competitors | Both (cross-reference) | ThreadGrab cron pipeline |
| Building a knowledge base | Both (max coverage) | ThreadGrab + Obsidian vault |
| Occasional single-article save | Either | ThreadGrab web interface |
FAQ
No. Bluesky's AT Protocol is public by default. You can query posts, feeds, and profiles without an API key or account. This is a major advantage over X, which requires authentication for programmatic access.
Yes. ThreadGrab supports both platforms through a single API endpoint. Use the profile API to fetch all recent content from a user, regardless of whether they post on X, Bluesky, or both.
Bluesky posts are stored on Personal Data Servers (PDS). If the author deletes a post, the PDS may still retain the record. However, for guaranteed permanence, always save a local copy as Markdown or JSON.
X's anonymous rate limits are approximately 100 page views per 15 minutes per IP. For heavy scraping, use a rotating proxy service or route through ThreadGrab which manages rate limits automatically.
Yes. Use the cron pipeline shown above. ThreadGrab's API handles both platforms in the same request pattern. Schedule it with a simple cron job — no API keys, no OAuth, no platform-specific code.
Start saving X Articles and Bluesky long-form content as Markdown today — no account needed.
Try ThreadGrab — Free Cross-Platform Content DownloaderThe Scraping Frontier Is Open
The battle between X Articles and Bluesky long-form content is just beginning. For creators, researchers, and archivists, the winner is clear: having two major platforms competing on long-form means more content to discover, more perspectives to archive, and more incentive for tools like ThreadGrab to support both.
Bluesky's open protocol makes it the easier platform to scrape technically. X Articles has the larger existing library of content. Together, they cover the full spectrum of social long-form publishing in 2026. The smartest archiving strategy uses both.