Fetch
Retrieve a single URL and return its parsed title, description, markdown body, lightly-cleaned HTML, and outbound links. Two-tier fetcher: plain HTTP first, escalating to a headless-browser worker when the page needs JS, returns a soft block, or rate-limits.
Use this when
- You need clean markdown for a single article, doc, or filing — not a whole site.
- Your scraper keeps tripping JS shells or soft blocks (Cloudflare, Akamai) and you want auto-escalation to a headless browser.
- You're feeding RAG and need both
markdown(LLM-ready) andhtml(your own pipeline) from the same call. - You're indexing outbound links from a page (
links[]) without crawling.
| Method | POST |
| Path | /api/v1/fetch |
| Auth | Bearer |
| Credits | 1 (http) or 3 (browser) |
Request
| Parameter | Type | Description |
|---|---|---|
urlrequired | string | Absolute http(s) URL. Private, loopback, and link-local hosts are rejected (SSRF). |
forceBrowser | booleandefault: false | Skip the HTTP tier and render via the browser worker. Also bypasses the SEC.gov fast path. Always costs 3 credits. |
Example body
{
"url": "https://example.com/article",
"forceBrowser": false
}Response
| Field | Type | Description |
|---|---|---|
url | string | Echo of the requested URL (not the post-redirect URL). |
title | string | Contents of <title>. Empty string when absent. |
description | string | <meta name="description">, falling back to og:description. Empty string when absent. |
markdown | string | Main-content markdown. See Notes for the extraction + cleaning pipeline. |
html | string | Lightly-cleaned HTML: only <script>, <style>, and <noscript> are removed. nav/header/footer/form/iframe/svg are preserved here (they're stripped only inside the markdown pipeline). |
links[] | string[] | Absolute http(s) URLs found in <a href> within the cleaned HTML. Deduped, fragments stripped. |
statusCode | number | Origin HTTP status. 0 if the underlying fetch threw (network error). |
fetchedAt | string | ISO timestamp when the response was assembled. |
via | "http" | "browser" | Tier that produced this result. Determines credit cost (1 vs 3). |
Example response
{
"ok": true,
"data": {
"url": "https://example.com/article",
"title": "Example Article",
"description": "A short summary.",
"markdown": "# Example Article\n\n...",
"html": "<html>...</html>",
"links": ["https://example.com/related"],
"statusCode": 200,
"fetchedAt": "2026-05-29T20:45:00.000Z",
"via": "http"
}
}Example
curl -X POST https://www.gyrence.com/api/v1/fetch \
-H "Authorization: Bearer $GYRENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com/article"}'Errors
| Code | HTTP | Meaning |
|---|---|---|
bad_request | 400 | url missing/invalid, or any field fails validation. |
unauthorized | 401 | Missing, malformed, or revoked Authorization header. |
credits_exhausted | 402 | Workspace balance below request cost. |
forbidden_url | 403 | SSRF guard rejected a private, loopback, or link-local host. |
not_found | 404 | Origin returned 404. Origin 404s are typed — you do not receive {ok:true, statusCode:404, markdown:"<404 page>"}. |
timeout | 408 | Request exceeded the 25-second hard deadline. |
rate_limited | 429 | Per-workspace rate limit. |
upstream_error | 502 | Origin returned 5xx. |
unavailable | 503 | Block-page detector tripped (see Notes), or any other unmapped error. |
{ "ok": false, "error": "origin returned 404", "code": "not_found" }Credits
1 credit per HTTP-tier success, 3 credits per browser-tier success. Read via on the response to attribute cost. Errors are not charged. forceBrowser: true always costs 3.
Coverage & known limits
- Block pages fail closed. Akamai / Cloudflare / PerimeterX / DataDome / generic "access denied" responses return
code: "unavailable"— you do not get the block page as markdown. - Markdown chrome-strip is regex-based and can eat legitimate
<header>blocks inside articles. Use thehtmlfield if you need to run your own conversion. links[]is DOM-time only. Links injected by JS afterpage.content()(infinite scroll, modal-loaded) won't appear.
Notes
- Credit accounting. 1 credit for HTTP-tier success, 3 credits for browser-tier success. Read
viaon the response to attribute cost. Errors are not charged. - Escalation triggers. HTTP tier escalates to the browser worker on origin
403,429,503, on network error, when the response body is< 500chars, when the body contains a JS-shell marker (id="root"></div>,id="__next">,id="app">,You need to enable JavaScript,<noscript>), or — post-conversion — when HTML≥ 1000chars produced< 100chars of markdown (content-loss escalation). Browser tier is never retried. forceBrowser. Skips both the HTTP tier and the SEC fast path; always costs 3 credits even if the page would have succeeded over HTTP.- SEC.gov fast path.
*.sec.govURLs (withoutforceBrowser) go HTTP-only with an identifyingGyrence (<contact>) - financial-data retrievalUA per SEC's fair-access policy. They skip JS-shell, browser, and content-loss escalation. - Block-page detection. When a response contains markers for Akamai, Cloudflare, PerimeterX, DataDome, or a generic
access denied … you don't have permissionpage (scanned in the first 4 KB), the request fails withcode: "unavailable"and a message likeBlocked by Akamai (status 403). You do not receive a 200 with the block page as markdown. markdownextraction. Pipeline: pick the first of<main>/<article>/<body>, strip<script>/<style>/<noscript>/<iframe>/<svg>/<nav>/<footer>/<header>/<form>, then convert vianode-html-markdown. The chrome strip is regex-based and known to be fragile on malformed HTML; it may eat legitimate<header>blocks inside articles.htmlvsmarkdown.htmlis the lightly-cleaned version (scripts/styles/noscript only) — use this if you want to run your own conversion. The heavy chrome strip is applied only to the input of the markdown converter and never leaks into thehtmlfield.links[]is DOM-time. Extracted via regex fromhtmlas-of conversion. Links added by JS after render (infinite scroll, modal-loaded content) appear only if they were in the DOM when the browser-tier worker calledpage.content(). HTTP-tier responses never see post-render links.statusCodesemantics.200–399and most non-2xx codes (e.g.403,451) pass through as{ok:true, statusCode}with whatever body the origin returned. Only404(→not_found) and5xx(→upstream_error) are mapped to error envelopes.statusCode: 0means the underlying fetch threw before getting a response.- Graceful worker fallback. If escalation is triggered but the browser worker is unreachable, the request falls back to the HTTP-tier result with
via: "http"and anerrorfield. The envelope still succeeds — inspecterrorif present. - SSRF. Private (RFC1918), loopback, link-local, and
.localhosts are rejected pre-fetch withforbidden_url.
Try it
Hit Fetch from the console at /app/fetch with one click — no curl required.
