Fetch

Retrieve a single URL and return its parsed title, description, markdown body, lightly-cleaned HTML, and outbound links. Two-tier fetcher: plain HTTP first, escalating to a headless-browser worker when the page needs JS, returns a soft block, or rate-limits.

Use this when

  • You need clean markdown for a single article, doc, or filing — not a whole site.
  • Your scraper keeps tripping JS shells or soft blocks (Cloudflare, Akamai) and you want auto-escalation to a headless browser.
  • You're feeding RAG and need both markdown (LLM-ready) and html (your own pipeline) from the same call.
  • You're indexing outbound links from a page (links[]) without crawling.
MethodPOST
Path/api/v1/fetch
AuthBearer
Credits1 (http) or 3 (browser)

Request

ParameterTypeDescription
url
required
stringAbsolute http(s) URL. Private, loopback, and link-local hosts are rejected (SSRF).
forceBrowserboolean
default: false
Skip the HTTP tier and render via the browser worker. Also bypasses the SEC.gov fast path. Always costs 3 credits.

Example body

{
  "url": "https://example.com/article",
  "forceBrowser": false
}

Response

FieldTypeDescription
urlstringEcho of the requested URL (not the post-redirect URL).
titlestringContents of <title>. Empty string when absent.
descriptionstring<meta name="description">, falling back to og:description. Empty string when absent.
markdownstringMain-content markdown. See Notes for the extraction + cleaning pipeline.
htmlstringLightly-cleaned HTML: only <script>, <style>, and <noscript> are removed. nav/header/footer/form/iframe/svg are preserved here (they're stripped only inside the markdown pipeline).
links[]string[]Absolute http(s) URLs found in <a href> within the cleaned HTML. Deduped, fragments stripped.
statusCodenumberOrigin HTTP status. 0 if the underlying fetch threw (network error).
fetchedAtstringISO timestamp when the response was assembled.
via"http" | "browser"Tier that produced this result. Determines credit cost (1 vs 3).

Example response

{
  "ok": true,
  "data": {
    "url": "https://example.com/article",
    "title": "Example Article",
    "description": "A short summary.",
    "markdown": "# Example Article\n\n...",
    "html": "<html>...</html>",
    "links": ["https://example.com/related"],
    "statusCode": 200,
    "fetchedAt": "2026-05-29T20:45:00.000Z",
    "via": "http"
  }
}

Example

curl -X POST https://www.gyrence.com/api/v1/fetch \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/article"}'

Errors

CodeHTTPMeaning
bad_request400url missing/invalid, or any field fails validation.
unauthorized401Missing, malformed, or revoked Authorization header.
credits_exhausted402Workspace balance below request cost.
forbidden_url403SSRF guard rejected a private, loopback, or link-local host.
not_found404Origin returned 404. Origin 404s are typed — you do not receive {ok:true, statusCode:404, markdown:"<404 page>"}.
timeout408Request exceeded the 25-second hard deadline.
rate_limited429Per-workspace rate limit.
upstream_error502Origin returned 5xx.
unavailable503Block-page detector tripped (see Notes), or any other unmapped error.
{ "ok": false, "error": "origin returned 404", "code": "not_found" }
Credits

1 credit per HTTP-tier success, 3 credits per browser-tier success. Read via on the response to attribute cost. Errors are not charged. forceBrowser: true always costs 3.

Coverage & known limits
  • Block pages fail closed. Akamai / Cloudflare / PerimeterX / DataDome / generic "access denied" responses return code: "unavailable" — you do not get the block page as markdown.
  • Markdown chrome-strip is regex-based and can eat legitimate <header> blocks inside articles. Use the html field if you need to run your own conversion.
  • links[] is DOM-time only. Links injected by JS after page.content() (infinite scroll, modal-loaded) won't appear.

Notes

  • Credit accounting. 1 credit for HTTP-tier success, 3 credits for browser-tier success. Read via on the response to attribute cost. Errors are not charged.
  • Escalation triggers. HTTP tier escalates to the browser worker on origin 403, 429, 503, on network error, when the response body is < 500 chars, when the body contains a JS-shell marker (id="root"></div>, id="__next">, id="app">, You need to enable JavaScript, <noscript>), or — post-conversion — when HTML ≥ 1000 chars produced < 100 chars of markdown (content-loss escalation). Browser tier is never retried.
  • forceBrowser. Skips both the HTTP tier and the SEC fast path; always costs 3 credits even if the page would have succeeded over HTTP.
  • SEC.gov fast path. *.sec.gov URLs (without forceBrowser) go HTTP-only with an identifying Gyrence (<contact>) - financial-data retrieval UA per SEC's fair-access policy. They skip JS-shell, browser, and content-loss escalation.
  • Block-page detection. When a response contains markers for Akamai, Cloudflare, PerimeterX, DataDome, or a generic access denied … you don't have permission page (scanned in the first 4 KB), the request fails with code: "unavailable" and a message like Blocked by Akamai (status 403). You do not receive a 200 with the block page as markdown.
  • markdown extraction. Pipeline: pick the first of <main> / <article> / <body>, strip <script>/<style>/<noscript>/<iframe>/<svg>/<nav>/<footer>/<header>/<form>, then convert via node-html-markdown. The chrome strip is regex-based and known to be fragile on malformed HTML; it may eat legitimate <header> blocks inside articles.
  • html vs markdown. html is the lightly-cleaned version (scripts/styles/noscript only) — use this if you want to run your own conversion. The heavy chrome strip is applied only to the input of the markdown converter and never leaks into the html field.
  • links[] is DOM-time. Extracted via regex from html as-of conversion. Links added by JS after render (infinite scroll, modal-loaded content) appear only if they were in the DOM when the browser-tier worker called page.content(). HTTP-tier responses never see post-render links.
  • statusCode semantics. 200399 and most non-2xx codes (e.g. 403, 451) pass through as {ok:true, statusCode} with whatever body the origin returned. Only 404 (→ not_found) and 5xx (→ upstream_error) are mapped to error envelopes. statusCode: 0 means the underlying fetch threw before getting a response.
  • Graceful worker fallback. If escalation is triggered but the browser worker is unreachable, the request falls back to the HTTP-tier result with via: "http" and an error field. The envelope still succeeds — inspect error if present.
  • SSRF. Private (RFC1918), loopback, link-local, and .local hosts are rejected pre-fetch with forbidden_url.
Try it

Hit Fetch from the console at /app/fetch with one click — no curl required.