X v11.90 Articles Scraper: threadgrab Long-Form 2026
If you scrape X Articles with a script that worked in March 2026, that script probably stopped working in late May. The X mobile app's v11.90 release moved the See All Articles entry from a buried profile-page tab to a top-level segmented control, and behind the scenes it swapped the public REST endpoint for a new GraphQL route. Scrapers that hard-coded the old URL now get 200 with an empty body, the article id format changed from 18-digit numeric to 19-digit base64, and the article body is now a JSON blob with named-entity spans instead of plain text. This post walks through what changed, why your old scraper broke, and how to use the threadgrab API to fetch the new v11.90 layout as clean Markdown in one call.
threadgrab has been updated to handle the v11.90 changes natively. Below is the new field reference, two ready-to-run code examples, and a side-by-side with the two other approaches developers have tried (raw X GraphQL, and the tweet.md project). If you maintain a public X-archive tool or build LLM training data from long-form creators, this is the shortest path to working output in 2026.
TL;DR. X v11.90 deprecated /Articles/search.json, switched the article id to 19-digit base64, and made the article body a named-entity JSON list. The threadgrab API resolves all three changes: pass a public article URL and it returns a clean Markdown body with **bold**, [links], @mentions, and #hashtags preserved. Single-article fetch: one GET, no auth, ~1.2s. Batch profile scrape: paginate with ?cursor= at 20 articles per page.
What X v11.90 Changed in the Articles Layout
Three things moved, and they are independent — a scraper can survive one or two of them but not all three at once.
1. The See All Articles entry moved
Before v11.90, the "See All Articles" link was a tab on the creator's profile page, three rows down below Posts, Replies, Media, and Likes. In v11.90 it became the first item in a new segmented control at the top of the profile header, sitting next to Posts / Replies / Media. The visible change is small, but the underlying request changed: the old tab called /Articles/search.json?include_replies=false, the new entry calls /i/api/graphql/<hash>/ArticleTimeline. Anything hard-coded to the old path now gets 200 with an empty array.
2. The article id format changed
The legacy article id was an 18-digit numeric string (for example 1823456789012345678). The new id is a 19-character base64 string with mixed case and digits (for example a1B2c3D4e5F6g7H8i9J0). The new id is also opaque — it does not encode the publication timestamp the way the old one did, so a scraper cannot infer the publish date from the id alone. URLs that worked in March now 404 because the id fragment is rejected by the routing layer.
3. The article body is now a named-entity JSON list
The old body was plain text with markdown-like asterisks for bold (**text**) and angle-bracket URL patterns for links. The new body is a JSON array of entity spans, where each span has a type field (text, bold, link, mention, hashtag, media_ref) and a payload. Naive text scrapers lose the formatting because they concatenate the text spans without traversing the bold, link, and mention spans. A bold phrase like **Freedom of Reach arrives as three separate spans: a text span containing the empty string, a bold span containing Freedom of Reach, and a trailing text span. Reassemble them wrong and the bold is gone.
How threadgrab Handles the New Layout
threadgrab's profile API wraps all three changes. You pass a public article URL or a username, and threadgrab returns a flat Markdown body plus metadata. The conversion logic walks the named-entity spans in document order and produces standard Markdown: text becomes plain text, bold becomes **text**, link becomes [text](url), mention becomes @username, hashtag becomes #tag, and media_ref becomes  with the alt text pulled from the span's alt_text field when present.
The result is a Markdown string that round-trips cleanly through any renderer (md2rich, GitHub, Obsidian, VS Code preview, md2pdf). threadgrab does not invent formatting the original author did not write — if the author did not bold a phrase, the output does not bold it.
Single article fetch (curl)
curl -s "https://threadgrab.com/api/profile?url=https://x.com/jack/status/a1B2c3D4e5F6g7H8i9J0" \
| jq '.article | {title, author, published_at, body_markdown}'
The response is a JSON object with five top-level fields: title (string), author (object with username, display_name, verified), published_at (ISO 8601 string), body_markdown (the formatted Markdown), and media (array of {type, url, alt_text} objects for images, videos, and polls). Average response time on a US-east Cloudflare edge: 1.1 to 1.4 seconds for a 2,000-word article, 0.6 to 0.9 seconds for a 500-word article.
Batch profile scrape (curl with cursor pagination)
USERNAME="naval"
CURSOR=""
LIMIT=20
while true; do
RESP=$(curl -s "https://threadgrab.com/api/profile/${USERNAME}/articles?cursor=${CURSOR}&limit=${LIMIT}")
echo "$RESP" | jq '.articles[] | {title, published_at, id: .article_id}'
CURSOR=$(echo "$RESP" | jq -r '.next_cursor')
[ -z "$CURSOR" ] || [ "$CURSOR" = "null" ] && break
sleep 1
done
This loop walks the creator's full Articles feed page by page. Each call returns 20 articles plus a next_cursor string; when next_cursor is empty or null, you have hit the end of the public feed. The one-second sleep between calls keeps the request rate at the 1 req/s/IP soft limit. A creator with 200 articles takes about 11 seconds end-to-end, including the sleep overhead.
Field Reference: threadgrab v11.90 Response Shape
The full JSON object returned by the single-article endpoint has the structure below. The fields are stable as of threadgrab v1.18 (June 2026).
| Field | Type | Description |
|---|---|---|
article_id |
string | New 19-char base64 id (v11.90 format). Use this for caching or for re-fetching later. |
title |
string | Article title as set by the author. May be empty for very short Articles. |
author.username |
string | The author's X handle without the leading @. |
author.display_name |
string | The author's display name, may include emoji and unusual whitespace. |
author.verified |
boolean | True if the author had any X verification badge at the time of fetch. |
published_at |
string (ISO 8601) | UTC timestamp of the original publication. Not the edit timestamp. |
body_markdown |
string | The full article body as Markdown, with bold, links, mentions, and hashtags preserved. |
media[] |
array | List of media objects, each with type (image / video / poll), url, and alt_text. |
engagement |
object | Snapshot of likes, reposts, replies, quotes, and views at fetch time. Numbers may be 24h-delayed. |
canonical_url |
string | The public x.com/<user>/article/<id> URL of the article. |
The body_markdown field is the one most callers care about. It is a single string with newlines preserved from the source, and the bold / link / mention / hashtag / image syntax is rendered as standard Markdown that any renderer will interpret correctly. If you need plain text only, pass the result through sed or your favorite Markdown-to-text tool to strip the syntax — the original text content is recoverable in one pass.
Comparison: threadgrab vs Raw GraphQL vs tweet.md
Three approaches are in use today for scraping X Articles after the v11.90 change. Here is how they compare on the three things that broke.
1. threadgrab API (recommended)
Pros: handles the new article id format, walks the named-entity body, returns clean Markdown with formatting preserved, no auth required, no token to manage, rate-limited at a safe 1 req/s/IP.
Cons: only public Articles (no protected accounts), no historical revisions (only the latest version of an Article is returned), no edit history.
2. Raw X GraphQL /ArticleTimeline
Pros: no intermediary, you control the headers and the request shape, you can pull raw data for unusual use cases (engagement analytics, quote-post graphs).
Cons: requires maintaining a session cookie or app-only bearer token, the GraphQL query hash rotates every few weeks, the named-entity body needs custom walking code, and a single bad request can get your IP rate-limited for 15 minutes.
3. tweet.md (open-source local proxy)
Pros: open source, runs locally, no third-party server, you can fork it for custom Markdown output.
Cons: tweet.md is optimized for thread-style short posts, not for the v11.90 long-form Article body; it does not preserve the named-entity formatting; requires a working X session cookie on first run.
For a use case that is "I want to read this Article offline as Markdown" or "I want to build a corpus of long-form posts for LLM training," threadgrab is the shortest path. For a use case that is "I am building a real-time X analytics product and need raw engagement data," the GraphQL route is unavoidable. tweet.md is a good fit for short posts and threads, less so for the new Article body.
Limitations and What threadgrab Does Not Do
Being honest about the edges of the tool matters more than overselling. The list below is what threadgrab v1.18 does not do, as of June 2026.
- Protected accounts. If the author's Articles are behind a login wall, threadgrab returns a 403. There is no workaround — the content is not on the public web.
- Edit history. An Article that the author has edited five times is returned in its current form. Earlier versions are not exposed by X, so threadgrab cannot reconstruct them.
- Quote-Article composite posts. A Quote-Article is a normal post with an Article card attached. threadgrab returns the post text and the card metadata, but not the embedded Article's body unless you make a second call to the article endpoint with the card's
article_id. - Drafts and scheduled Articles. Drafts and scheduled posts are not on the public web, so they are not scrapeable through any method, including threadgrab.
- Aggregate reach and per-audience numbers. The reach black box we covered in the Freedom of Reach post applies to threadgrab as well — the engagement snapshot in the response is what X shows to the public, not the internal reach number.
FAQ
X v11.90 moved the See All Articles entry from a buried profile-page tab to a top-level segmented control on a creator's profile header. The new entry surfaces every long-form post (Articles, Threads over 1,000 characters, and Quote-Articles) as a unified list, paginated 20 at a time. Behind the scenes, the entry now calls a new GraphQL endpoint (/ArticleTimeline) instead of the legacy /Articles/search.json REST route, so scrapers that hit the old endpoint get a 404 or an empty array.
Three reasons. First, the /Articles/search.json endpoint was deprecated in late May 2026 and now returns 200 with an empty body for most accounts. Second, X added a new article_id field in v11.90 that is a 19-digit base64 string, replacing the older 18-digit numeric id; scrapers that try to construct URLs from the old id pattern get 404. Third, the article body is now served as a JSON blob with named entity spans (bold, link, mention, hashtag) instead of plain text with markdown-like asterisks, so naive text scrapers lose the formatting.
Pass the article URL or the new article_id to the threadgrab API. threadgrab resolves the new GraphQL /ArticleTimeline endpoint, fetches the named-entity JSON body, flattens it to Markdown (preserving bold, links, mentions, and hashtags), and returns the clean text plus metadata. A typical request is GET https://threadgrab.com/api/profile?url=https://x.com/<user>/article/<id> with no auth, and the response is a JSON object with title, body_markdown, author, published_at, and media array.
Yes. Use threadgrab's profile batch endpoint: GET https://threadgrab.com/api/profile/<username>/articles?cursor=<next_cursor>&limit=20. The response includes a next_cursor field that you pass to the next call until you get an empty cursor. The endpoint is paginated 20 articles at a time to mirror X's own pagination, and it works with the new v11.90 layout. For a single-batch scrape of 100+ articles, expect the call to take 8 to 15 seconds depending on media attachments.
Yes. The new X v11.90 article body is a JSON list of entity spans (each span has a type field: text, bold, link, mention, hashtag, or media_ref). threadgrab walks the list and produces standard Markdown: bold spans become **text**, link spans become [text](url), mention spans become @username, hashtag spans become #tag, and media_ref spans become . The output round-trips cleanly through any Markdown renderer (md2rich, GitHub, Obsidian) without losing the original formatting.
threadgrab only fetches public Articles, which are public web pages that X explicitly publishes to the open web for indexing and sharing. The threadgrab API enforces a 1 request per second per IP rate limit and a soft cap of 5,000 article fetches per day, well under X's anti-abuse thresholds. Scraping public content for personal reading, research, or archival is generally permitted under the X Terms of Service; republishing or commercializing the fetched text without the author's consent is not.
Try It on a Public Article
The fastest way to see the new v11.90 layout round-trip through threadgrab is to pick any public Article on x.com, copy the URL, and run the single-article curl example above. You should get a clean JSON response within 1.5 seconds on a US or EU edge, with the body in Markdown that you can paste into Obsidian, a GitHub gist, or the md2rich converter for a side-by-side preview of how the formatting renders.
Scrape an X Article now — no login, no install.
threadgrab handles the v11.90 layout, the new article id, and the named-entity body. Paste any public x.com/<user>/article/<id> URL.
Open ThreadGrabRelated Reading
- X to Markdown 2026: 3 Ways to Save X Threads — covers the tweet.md, threadgrab, and browser-reader approaches for short posts and threads.
- 2026 X Thread Archiving: 5 Tools Compared — a wider look at Nitter, Thread Reader, archive.today, curl scripts, and threadgrab for full thread archival.
- X Articles Freedom of Reach 2026: The Algorithm Black Box — what the reach number does and does not tell you about who saw an Article.
- X Articles vs Bluesky vs LinkedIn Newsletter — the 2026 long-form showdown, with a decision tree for which platform to write on.