. Empty string when absent. | | `description` | `string` | | | <meta name="description">, falling back to og:description. Empty string when absent. | | `markdown` | `string` | | | Main-content markdown. See Notes for the extraction + cleaning pipeline. | | `html` | `string` | | | Lightly-cleaned HTML: only <script>, <style>, and <noscript> are removed. nav/header/footer/form/iframe/svg are preserved here (they're stripped only inside the markdown pipeline). | | `links[]` | `string[]` | | | Absolute http(s) URLs found in <a href> within the cleaned HTML. Deduped, fragments stripped. | | `statusCode` | `number` | | | Origin HTTP status. 0 if the underlying fetch threw (network error). | | `fetchedAt` | `string` | | | ISO timestamp when the response was assembled. | | `via` | `"http" | "browser"` | | | Tier that produced this result. Determines credit cost (1 vs 3). | ### Example response ```json { "ok": true, "data": { "url": "https://example.com/article", "title": "Example Article", "description": "A short summary.", "markdown": "# Example Article\n\n...", "html": "<html>...</html>", "links": ["https://example.com/related"], "statusCode": 200, "fetchedAt": "2026-05-29T20:45:00.000Z", "via": "http" } } ``` ## Example ```bash curl -X POST https://www.gyrence.com/api/v1/fetch \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com/article"}' ``` ## Errors | Code | HTTP | Meaning | | --- | --- | --- | | `bad_request` | 400 | `url` missing/invalid, or any field fails validation. | | `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. | | `credits_exhausted` | 402 | Workspace balance below request cost. | | `forbidden_url` | 403 | SSRF guard rejected a private, loopback, or link-local host. | | `not_found` | 404 | Origin returned 404. Origin 404s are typed — you do **not** receive `{ok:true, statusCode:404, markdown:"<404 page>"}`. | | `timeout` | 408 | Request exceeded the 25-second hard deadline. | | `rate_limited` | 429 | Per-workspace rate limit. | | `upstream_error` | 502 | Origin returned 5xx. | | `unavailable` | 503 | Block-page detector tripped (see Notes), or any other unmapped error. | ```json { "ok": false, "error": "origin returned 404", "code": "not_found" } ``` > **Credits** — **1 credit** per HTTP-tier success, **3 credits** per browser-tier success. Read `via` on the response to attribute cost. Errors are not charged. `forceBrowser: true` always costs 3. > **Coverage & known limits** — - **Block pages fail closed.** Akamai / Cloudflare / PerimeterX / DataDome / generic "access denied" responses return `code: "unavailable"` — you do **not** get the block page as markdown. > - **Markdown chrome-strip is regex-based** and can eat legitimate `<header>` blocks inside articles. Use the `html` field if you need to run your own conversion. > - **`links[]` is DOM-time only.** Links injected by JS after `page.content()` (infinite scroll, modal-loaded) won't appear. ## Notes - **Credit accounting.** 1 credit for HTTP-tier success, 3 credits for browser-tier success. Read `via` on the response to attribute cost. Errors are not charged. - **Escalation triggers.** HTTP tier escalates to the browser worker on origin `403`, `429`, `503`, on network error, when the response body is `< 500` chars, when the body contains a JS-shell marker (`id="root"></div>`, `id="__next">`, `id="app">`, `You need to enable JavaScript`, `<noscript>`), or — post-conversion — when HTML `≥ 1000` chars produced `< 100` chars of markdown (content-loss escalation). Browser tier is never retried. - **`forceBrowser`.** Skips both the HTTP tier and the SEC fast path; always costs 3 credits even if the page would have succeeded over HTTP. - **SEC.gov fast path.** `*.sec.gov` URLs (without `forceBrowser`) go HTTP-only with an identifying `Gyrence (<contact>) - financial-data retrieval` UA per SEC's fair-access policy. They skip JS-shell, browser, and content-loss escalation. - **Block-page detection.** When a response contains markers for Akamai, Cloudflare, PerimeterX, DataDome, or a generic `access denied … you don't have permission` page (scanned in the first 4 KB), the request fails with `code: "unavailable"` and a message like `Blocked by Akamai (status 403)`. You do not receive a 200 with the block page as markdown. - **`markdown` extraction.** Pipeline: pick the first of `<main>` / `<article>` / `<body>`, strip `<script>/<style>/<noscript>/<iframe>/<svg>/<nav>/<footer>/<header>/<form>`, then convert via `node-html-markdown`. The chrome strip is regex-based and known to be fragile on malformed HTML; it may eat legitimate `<header>` blocks inside articles. - **`html` vs `markdown`.** `html` is the lightly-cleaned version (scripts/styles/noscript only) — use this if you want to run your own conversion. The heavy chrome strip is applied **only** to the input of the markdown converter and never leaks into the `html` field. - **`links[]` is DOM-time.** Extracted via regex from `html` as-of conversion. Links added by JS after render (infinite scroll, modal-loaded content) appear only if they were in the DOM when the browser-tier worker called `page.content()`. HTTP-tier responses never see post-render links. - **`statusCode` semantics.** `200`–`399` and most non-2xx codes (e.g. `403`, `451`) pass through as `{ok:true, statusCode}` with whatever body the origin returned. Only `404` (→ `not_found`) and `5xx` (→ `upstream_error`) are mapped to error envelopes. `statusCode: 0` means the underlying fetch threw before getting a response. - **Graceful worker fallback.** If escalation is triggered but the browser worker is unreachable, the request falls back to the HTTP-tier result with `via: "http"` and an `error` field. The envelope still succeeds — inspect `error` if present. - **SSRF.** Private (RFC1918), loopback, link-local, and `.local` hosts are rejected pre-fetch with `forbidden_url`. > **Try it** — Hit Fetch from the console at [/app/fetch](/app/fetch) with one click — no curl required. --- # Gyre Source: https://www.gyrence.com/docs/api/gyre.md Walk a site from a seed URL within a page budget, following outbound links breadth-first. Returns parsed markdown for every page successfully visited, plus per-URL errors. Same-domain by default. The link-traversal primitive in the Gyrence pipeline — Search finds, Fetch reads one, **Gyre** walks many. ## Use this when - You need a bounded slice of a site (1–50 pages) without standing up a full crawler. - You're feeding RAG and want a seed page plus everything it links to, in one call. - You want partial-success semantics — per-URL failures land in `errors[]` and the walk continues. - You're prototyping coverage before committing to a scheduled crawl. **Endpoint:** `POST /api/v1/gyre` **Auth:** Bearer **Credits:** 1 per HTTP page, 3 per browser page (min 1) ## Request | Parameter | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `url` | `string` | yes | | Absolute http(s) seed URL. Private, loopback, and link-local hosts are rejected (SSRF). | | `maxPages` | `number` | | `20` | Hard cap on pages visited. Range 1–50. The walk stops as soon as this many pages have been fetched (successes only). | | `sameDomain` | `boolean` | | `true` | When true, only links whose hostname exactly matches the seed's hostname are followed. Subdomains are NOT considered same-domain. | ### Example body ```json { "url": "https://example.com", "maxPages": 10, "sameDomain": true } ``` ## Response | Field | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `startUrl` | `string` | | | Echo of the seed URL. | | `totalPages` | `number` | | | Number of pages successfully fetched (length of `pages[]`). | | `pages[]` | `object[]` | | | One entry per successfully fetched page (see fields below). | | `pages[].url` | `string` | | | Absolute URL of the visited page. | | `pages[].title` | `string` | | | Contents of <title>. Empty when absent. | | `pages[].description` | `string` | | | Meta description or og:description. Empty when absent. | | `pages[].markdown` | `string` | | | Main-content markdown (same extraction pipeline as /fetch). | | `pages[].statusCode` | `number` | | | Origin HTTP status. | | `pages[].via` | `"http" | "browser"` | | | Tier that produced this page. Drives per-page credit cost. | | `errors[]` | `object[]` | | | Per-URL errors encountered during the walk. `{ url, error }`. The walk continues past individual failures. | ### Example response ```json { "ok": true, "data": { "startUrl": "https://example.com", "totalPages": 3, "pages": [ { "url": "https://example.com", "title": "Example", "description": "Home page", "markdown": "# Example\n\n...", "statusCode": 200, "via": "http" } ], "errors": [] } } ``` ## Example ```bash curl -X POST https://www.gyrence.com/api/v1/gyre \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com","maxPages":10}' ``` ## Errors | Code | HTTP | Meaning | | --- | --- | --- | | `bad_request` | 400 | `url` missing/invalid, or `maxPages` out of range. | | `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. | | `credits_exhausted` | 402 | Workspace balance below request cost. | | `forbidden_url` | 403 | SSRF guard rejected the seed (private, loopback, link-local host). | | `not_found` | 404 | The seed (and every queued URL) returned 404, or no pages were successfully fetched. | | `timeout` | 408 | Request exceeded the 25-second hard deadline. | | `rate_limited` | 429 | Per-workspace rate limit. | | `upstream_error` | 502 | First page returned 5xx. | | `unavailable` | 503 | Block-page detector tripped on the seed, or any other unmapped error. | > **Credits** — Each fetched page is billed individually: **1 credit** per HTTP-tier page, **3 credits** per browser-tier page. The total is the sum across `pages[]`, with a **minimum of 1** even if nothing was billable. Errored URLs are not charged. Inspect each `pages[].via` to attribute cost. > **Coverage & known limits** — - **`sameDomain` is exact-host.** `https://blog.example.com` is not considered same-domain as `https://example.com`. Use **Map** when you need subdomain coverage. > - **No `maxDepth` parameter.** Depth is bounded indirectly by `maxPages` and breadth-first queue order. > - **Query-string variants count as distinct pages.** `?utm_source=…` URLs each consume budget. > - **Per-page block detection** lands in `errors[]` (not the top-level envelope). The walk continues past blocked URLs. ## Notes - **Walk semantics.** Breadth-first from the seed. Each page's outbound `links[]` (as extracted by the underlying fetcher) is enqueued. Visited URLs are tracked verbatim. - **Page budget vs queue.** `maxPages` caps successful fetches, not URLs considered. The walk stops as soon as the budget is reached, even if the queue still has unvisited URLs. - **Per-page fetcher.** Each page goes through the same two-tier pipeline as `/fetch` (HTTP → browser escalation, SEC.gov fast path, block-page detection). See the Fetch docs for tier mechanics. - **Failure tolerance.** Per-URL failures are pushed to `errors[]` and the walk continues. The request only fails when nothing was fetched. > **Try it** — Run Gyre from the console at [/app/gyre](/app/gyre) — pick a seed, set a budget, and watch pages stream in. --- # Extract Source: https://www.gyrence.com/docs/api/extract.md Pull structured JSON out of a single page using an LLM. Gyrence fetches the URL (HTTP first, browser escalation if needed), converts it to Markdown, and uses an LLM to return strict JSON matching your prompt or schema. ## Use this when - You need a few specific fields from a page (price, title, author, contact email) without writing selectors. - The site's HTML changes often and selector-based scraping is too brittle. - You want JSON shaped to your own schema, not raw page content. - You're prototyping enrichment before committing to a per-domain parser. **Endpoint:** `POST /api/v1/extract` **Auth:** Bearer **Credits:** 5 (7 if browser) ## Request | Parameter | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `url` | `string` | yes | | Page URL to extract from. Must be a public http(s) origin. SSRF-blocked hosts are rejected. | | `prompt` | `string` | yes | | Natural-language instruction describing what to extract (e.g. "Return the product title, price in USD, and in-stock status"). | | `schema` | `string` | | | Optional JSON string describing the desired shape. Sent to the model as a target structure. Must be valid JSON — invalid JSON returns `bad_request`. | | `forceBrowser` | `boolean` | | `false` | Skip the HTTP-first attempt and render via the headless worker immediately. Use for known JS-heavy SPAs. | ### Example body ```json { "url": "https://example.com/products/widget", "prompt": "Extract the product name, price, currency, and availability.", "schema": "{\"name\":\"string\",\"price\":\"number\",\"currency\":\"string\",\"inStock\":\"boolean\"}" } ``` ## Response ```json { "ok": true, "data": { "url": "https://example.com/products/widget", "title": "Widget — Example", "extractedJson": "{\"name\":\"Widget\",\"price\":29.99,\"currency\":\"USD\",\"inStock\":true}", "via": "http" } } ``` | Field | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `url` | `string` | | | The final URL fetched (after redirects). | | `title` | `string` | | | Page title parsed from the fetched HTML. | | `extractedJson` | `string` | | | Stringified JSON returned by the model. Always a string — parse it client-side. On malformed model output, falls back to `{"_raw": "<original text>"}`. | | `via` | `"http" | "browser"` | | | How the page was fetched. `http` = direct fetch succeeded. `browser` = escalated to the headless worker (costs 2 extra credits). | ## Example ```bash curl -X POST https://www.gyrence.com/api/v1/extract \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/products/widget", "prompt": "Extract the product name, price, currency, and availability.", "schema": "{\"name\":\"string\",\"price\":\"number\",\"currency\":\"string\",\"inStock\":\"boolean\"}" }' ``` ## Errors | Code | HTTP | Meaning | | --- | --- | --- | | `bad_request` | 400 | `url` missing/invalid, `prompt` empty, or `schema` is not valid JSON. | | `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. | | `credits_exhausted` | 402 | Workspace balance below request cost, or the LLM provider returned 402. | | `forbidden_url` | 403 | SSRF guard rejected a private, loopback, or link-local host. | | `timeout` | 408 | Request exceeded the 25-second hard deadline. | | `rate_limited` | 429 | Per-workspace rate limit, or the LLM provider rate-limited the call. | | `upstream_error` | 502 | Page fetch or LLM call returned a 5xx. | | `unavailable` | 503 | Any other unmapped error. | ```json { "ok": false, "error": "Schema must be valid JSON", "code": "bad_request" } ``` > **Credits** — **5 credits** when fetched over HTTP, **7 credits** when Gyrence escalates to the headless browser. Charged once per call regardless of model response length. See Notes for the full list of escalation triggers. > **Limits & known behavior** — - **Content is truncated.** Only the first **12,000 characters** of converted Markdown are sent to the model. Long pages have their tail dropped — narrow the URL (e.g. a product page, not a catalog) for best results. > - **Prompt + page content compete for context.** Long prompts reduce the page content the model sees. Aim for prompts under **500 characters**; let the schema do the structural work and the prompt focus on *what* to extract, not *how*. > - **JSON guarantee is best-effort.** The model is instructed to return strict JSON, but malformed output falls back to `{ "_raw": "<text>" }`. Always handle that shape. > - **`schema` is a hint, not a validator.** Gyrence does not enforce types on the model's output. Validate with your own Zod/JSON-schema layer if it matters. > - **One page per call.** To extract across many URLs, fan out client-side (use [`/map`](/docs/api/map) to enumerate) and call `/extract` per URL. > - **Markdown fidelity.** Tables and deeply nested lists may lose structure in HTML→Markdown conversion before the model sees them. ## Notes - **Model selection and prompting are handled by Gyrence.** The system enforces strict JSON output — no prose, no markdown fences. Gyrence may update the underlying model over time to improve quality or reduce cost; the API contract (request/response shape) remains stable. - **Fetch pipeline**: identical to [`/fetch`](/docs/api/fetch). HTTP first, with automatic escalation to the headless browser on any of: origin 403/429/503, network error (status 0), body < 500 chars, JS-shell markers (`id="root">`, `id="__next">`, `<noscript>`, etc.), or post-conversion content loss (HTML ≥ 1000 chars producing < 100 markdown chars). `forceBrowser: true` skips the HTTP attempt. - **`via` drives pricing**, not the input. Even without `forceBrowser`, an automatic escalation bills as `browser` (7 credits). - **SSRF re-validation** runs on the final URL after redirects, not just the input. - **No retries on model failure.** Garbled output is returned as `_raw` for your inspection rather than re-prompted server-side. > **Try it** — Test prompts and schemas live in the console at [/app/extract](/app/extract) — paste a URL, iterate on the prompt, copy the resulting JSON. --- # Map Source: https://www.gyrence.com/docs/api/map.md Enumerate URLs for a site. Prefers sitemaps (`robots.txt` + common locations, recurses one level into sitemap indexes); falls back to a depth-1 anchor extraction from the homepage when no sitemap is found. ## Use this when - You need a fast, broad picture of what URLs a site exposes — before deciding what to fetch. - You're seeding a crawl or sitemap-aware indexer and want a deduped link list. - You want to filter (`search`) to a section like `/blog/` or `/docs/` without downloading pages. - You're checking whether a site even publishes a sitemap. **Endpoint:** `POST /api/v1/map` **Auth:** Bearer **Credits:** 1 ## Request | Parameter | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `url` | `string` | yes | | Site URL to map. Must be a public http(s) origin. SSRF-blocked hosts are rejected. | | `limit` | `number` | | `5000` | Max links returned. 1–5000. | | `includeSubdomains` | `boolean` | | `false` | Include links on subdomains of the root host (e.g. blog.example.com when mapping example.com). | | `search` | `string` | | | Case-insensitive substring filter applied to discovered URLs (e.g. "/blog/"). | ### Example body ```json { "url": "https://example.com", "limit": 500, "search": "/blog/" } ``` ## Response ```json { "ok": true, "data": { "url": "https://example.com", "source": "sitemap", "totalDiscovered": 1284, "totalReturned": 500, "links": ["https://example.com/", "https://example.com/about", "..."] } } ``` | Field | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `url` | `string` | | | The input URL, echoed back. | | `source` | `"sitemap" | "discovered"` | | | How links were discovered. `sitemap` = parsed from one or more sitemap files. `discovered` = homepage anchor extraction (fallback when no sitemap exists or all sitemaps are empty). | | `totalDiscovered` | `number` | | | Unique links after host filtering, substring filtering, and de-duplication — before the limit is applied. | | `totalReturned` | `number` | | | Links actually included in `links` (≤ `limit`). | | `links` | `string[]` | | | Flat list of absolute URLs. Fragments stripped. | ## Example ```bash curl -X POST https://www.gyrence.com/api/v1/map \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com","limit":500,"search":"/blog/"}' ``` ## Errors | Code | HTTP | Meaning | | --- | --- | --- | | `bad_request` | 400 | `url` missing/invalid, `limit` out of range, or any field fails validation. | | `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. | | `credits_exhausted` | 402 | Workspace balance below request cost. | | `forbidden_url` | 403 | SSRF guard rejected a private, loopback, or link-local host. | | `timeout` | 408 | Request exceeded the 25-second hard deadline. | | `rate_limited` | 429 | Per-workspace rate limit. | | `upstream_error` | 502 | Upstream returned 5xx during sitemap or homepage fetch. | | `unavailable` | 503 | Any other unmapped error. | ```json { "ok": false, "error": "url must be a public http(s) origin", "code": "forbidden_url" } ``` > **Credits** — **1 credit** per call regardless of `limit` or how many sitemaps are walked. The cost is the same whether you get 5 links or 5,000. > **Coverage & known limits** — - **Speed over completeness.** Map prioritizes a fast, sitemap-first answer. It does not crawl, does not render JS, and does not visit every URL it returns. > - **Sitemap-trusting.** When `source: "sitemap"`, results reflect what the site publishes — stale or partial sitemaps stay stale. > - **Fallback is shallow.** `source: "discovered"` reads only the homepage's anchor (`<a href>`) tags. `<link rel>`, `data-*`, and SPA-style navigation injected by JS are missed. Use **Fetch** on a specific page instead. > - **Subdomain default is OFF.** Set `includeSubdomains: true` to capture `blog.*`, `docs.*`, etc. ## Notes - **Sitemap discovery order**: `robots.txt` `Sitemap:` directives, then `/sitemap.xml`, `/sitemap_index.xml`, `/wp-sitemap.xml`. Sitemap indexes are followed **exactly one level** deep — `<sitemap><loc>` entries inside an index are fetched once and not recursed further. - **Anchor extraction (fallback)**: parses `<a href="...">` only via regex against the homepage HTML. Skips fragments (`#…`), `javascript:`, and `mailto:` schemes. Only `http(s)` URLs are kept. The seed URL itself is always included in the candidate set. - **Host filtering** is exact-match against the input URL's hostname. With `includeSubdomains: true`, a host matches if it equals the root host OR ends with `"." + rootHost`. - **SSRF re-validation**: each sitemap URL (including those from attacker-controllable `robots.txt` and from `<sitemap><loc>` entries in indexes) is re-validated against the SSRF allowlist before being fetched. Failures are skipped silently. - **Per-request timeouts**: `robots.txt` and sitemap fetches use a 10s cap; the homepage-fallback fetch uses 15s. The whole request still respects the 25s hard deadline. - **No page content.** Pair with [`/fetch`](/docs/api/fetch) or [`/extract`](/docs/api/extract) to retrieve bodies. > **Try it** — Map a site from the console at [/app/map](/app/map) — paste a URL, filter by path, export the link list. --- # Health Source: https://www.gyrence.com/docs/api/health.md Lightweight liveness probe scoped to the authenticated workspace. Confirms the API is reachable, your key is valid, and returns your workspace identity and plan. ## Use this when - Smoke-testing a new API key before wiring it into your app. - Confirming which workspace and plan a key belongs to. - Powering an external uptime monitor or readiness check in CI. **Endpoint:** `GET /api/v1/health` **Auth:** Bearer **Credits:** 0 ## Request No body. No query parameters. ## Response | Field | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `status` | `string` | | | Always `"healthy"` when the endpoint returns 200. | | `workspace_id` | `uuid` | | | ID of the workspace owning the API key used on this request. | | `plan_name` | `string` | | | Plan identifier for the workspace (e.g. `"free"`). | ### Example response ```json { "ok": true, "data": { "status": "healthy", "workspace_id": "8f3a1c20-9b4e-4d77-bc1e-3e5a7d2c9f10", "plan_name": "free" } } ``` ## Example ```bash curl https://www.gyrence.com/api/v1/health \ -H "Authorization: Bearer $GYRENCE_API_KEY" ``` ## Errors | Code | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `unauthorized` | `401` | | | Missing or invalid `Authorization: Bearer <key>` header. | | `unavailable` | `503` | | | "health check unavailable" — the workspace plan lookup (`get_plan_name` RPC) failed. Retry with backoff. | > **Credits** — **0 credits** per call regardless of frequency. Calls are still recorded in `usage_events` for audit. > **What this does NOT check** — A 200 here only confirms the Gyrence API, your API key, and the workspace plan lookup are alive. It does **not** probe Brave Search, the headless-browser worker, or the AI Gateway. Search, Fetch (browser tier), and Extract can be degraded independently. ## Unauthenticated probe For a public liveness check (no API key, no workspace context), call `GET /api/public/health`. Unlike the authenticated endpoint, the response is **not** wrapped in the standard envelope: ```json { "status": "ok", "timestamp": "2026-05-30T18:42:00.000Z", "uptime": 1284.31 } ``` Use this when you can't safely embed an API key (e.g. status-page pingers). --- # MCP Source: https://www.gyrence.com/docs/api/mcp.md Hosted **Model Context Protocol** endpoint exposing all five Gyrence primitives as MCP tools. JSON-RPC 2.0 in, JSON or Server-Sent Events out, bearer-token auth. Use this when your client is an MCP-aware agent runtime (Claude Desktop, Cursor, your own MCP SDK) and you want to skip writing per-primitive HTTP wrappers. This page is the wire-level reference. For the conceptual model — why MCP, billing parity, client configuration — start with [Concepts → MCP integration](/docs/concepts/mcp). ## Use this when - Your runtime speaks MCP and you want the agent itself to pick which Gyrence primitive to call. - You want one transport for both discovery (Search, Map) and acquisition (Fetch, Gyre, Extract). - You want zero divergence between MCP and HTTP — same credits, same activity log, same error model. **Endpoint:** `POST /api/mcp` **Auth:** Bearer **Credits:** Per-tool — identical to the underlying HTTP primitive ## Auth ``` Authorization: Bearer mc_<your-workspace-key> ``` The same key that authenticates `/api/v1/*` authenticates MCP. Header-only — keys are never accepted in the URL path or body. Missing/invalid keys return JSON-RPC error code `-32001` with HTTP `401`. Failed auth is **never billed**. `GET` and `DELETE` return HTTP `405` (no standalone SSE channel). ## The tools Five tools, named `gyrence_<primitive>`. Each tool's input schema is the **same Zod object** that validates the corresponding `/api/v1/*` request — so request shapes match the HTTP docs verbatim. | Tool | Underlying handler | Credit cost | | --- | --- | --- | | `gyrence_search` | [`POST /api/v1/search`](/docs/api/search) | 1 base; +1 per HTTP sub-fetch, +3 per browser sub-fetch when `fetch: true` | | `gyrence_fetch` | [`POST /api/v1/fetch`](/docs/api/fetch) | 1 (HTTP) or 3 (browser) | | `gyrence_gyre` | [`POST /api/v1/gyre`](/docs/api/gyre) | Sum of per-page costs across `pages[]` (1 / 3 per page), minimum 1 | | `gyrence_extract` | [`POST /api/v1/extract`](/docs/api/extract) | 5 (HTTP) or 7 (browser) | | `gyrence_map` | [`POST /api/v1/map`](/docs/api/map) | 1 flat | For full request/response shapes per tool, follow the link to the corresponding HTTP reference page — the schemas are identical. ## JSON-RPC methods The server speaks the standard MCP method set. Your client library handles this for you; the calls below are useful for sanity-checking with curl. ### `initialize` Negotiation handshake. Returns server name, version, and protocol capabilities. ```bash curl -X POST https://www.gyrence.com/api/mcp \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": { "protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": { "name": "my-client", "version": "0.0.1" } } }' ``` ### `tools/list` Enumerate the five tools and their JSON Schemas. The schema for each tool is generated from the same Zod object as the HTTP route, so the input shape is canonical. ```bash curl -X POST https://www.gyrence.com/api/mcp \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }' ``` Response (abbreviated): ```json { "jsonrpc": "2.0", "id": 2, "result": { "tools": [ { "name": "gyrence_search", "description": "Resolve a query into ranked, deduplicated URLs…", "inputSchema": { "type": "object", "properties": { "query": { "type": "string" }, "limit": { "type": "number" }, "fetch": { "type": "boolean" }, "fetchFormats": { "type": "array" }, "tbs": { "type": "string" } }, "required": ["query"] } }, { "name": "gyrence_fetch", "description": "…", "inputSchema": { "...": "..." } }, { "name": "gyrence_gyre", "description": "…", "inputSchema": { "...": "..." } }, { "name": "gyrence_extract", "description": "…", "inputSchema": { "...": "..." } }, { "name": "gyrence_map", "description": "…", "inputSchema": { "...": "..." } } ] } } ``` ### `tools/call` Invoke a tool. The `arguments` object matches the underlying HTTP primitive's request body. ```bash curl -X POST https://www.gyrence.com/api/mcp \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": { "name": "gyrence_gyre", "arguments": { "url": "https://example.com", "maxPages": 5 } } }' ``` Response (success): ```json { "jsonrpc": "2.0", "id": 3, "result": { "content": [ { "type": "text", "text": "{\"startUrl\":\"https://example.com\",\"totalPages\":1,\"pages\":[{\"url\":\"https://example.com\",\"title\":\"Example\",\"markdown\":\"...\",\"statusCode\":200,\"via\":\"http\"}],\"errors\":[]}" } ] } } ``` The `text` field is a **stringified** copy of the underlying primitive's `data` payload. Parse it client-side; the schema is documented per primitive (see the table above). ## Errors Two error surfaces, depending on where the failure happens. ### Transport-level errors Returned as JSON-RPC error envelopes with HTTP status mirroring the failure: | Code | Type | Required | Default | Description | | --- | --- | --- | --- | --- | | `-32001` | `401` | | | Unauthorized: missing or invalid API key. Failed auth is never billed. | | `-32000` | `405` | | | Method not allowed: GET or DELETE was used. Only POST is supported on this endpoint. | ```json { "jsonrpc": "2.0", "error": { "code": -32001, "message": "Unauthorized: invalid or missing API key." }, "id": null } ``` ### Tool-level errors Primitive failures (bad URL, credits exhausted, upstream error, etc.) come back inside the `tools/call` result with `isError: true`. The text content is the redacted error string in the form `code: message`. ```json { "jsonrpc": "2.0", "id": 3, "result": { "isError": true, "content": [ { "type": "text", "text": "credits_exhausted: workspace balance below request cost" } ] } } ``` The `code` prefix is one of the standard envelope codes documented in [Errors](/docs/errors): `bad_request`, `unauthorized`, `credits_exhausted`, `forbidden_url`, `not_found`, `timeout`, `rate_limited`, `upstream_error`, `unavailable`. > **Credits** — Every MCP <code>tools/call</code> bills exactly the same as the equivalent HTTP call to the underlying primitive. The recording happens in the shared <code>runEndpoint</code> path — there is no MCP-specific credit code. See <a href="/docs/concepts/credits">Credits</a> for the live schedule. > **Coverage & known limits** — - **Header auth only.** Keys in the URL path or request body are rejected. This is deliberate — URL paths leak into proxy access logs. > - **No standalone SSE.** <code>GET /api/mcp</code> returns 405. SSE is only used as the response transport when the client negotiates it during <code>tools/call</code>. > - **Per-tool credit cost is fixed by the underlying primitive.** Changing transports does not change cost. > - **Tool errors are not retried server-side.** A <code>upstream_error</code> from the underlying primitive surfaces as <code>isError: true</code>; your client decides whether to retry. The same backoff rules from <a href="/docs/rate-limits">Rate limits</a> apply. ## Notes - **Per-request server.** A fresh `McpServer` instance is constructed per HTTP request, with the parsed API key captured in each tool's closure. Two concurrent requests authenticated with different keys cannot share server state. - **Schemas are the single source of truth.** `searchHandler.schema`, `fetchHandler.schema`, etc. are imported by both the HTTP route and the MCP tool — schema drift between transports is impossible. - **Billing parity by construction.** Each MCP tool's `execute` synthesizes an internal `Request` and calls the same `handle()` the HTTP route uses. The `recordUsage` write happens in shared code; MCP and HTTP rows are indistinguishable except for the `url` field's provenance. - **Activity log.** MCP calls show up in **Console → Activity** with the primitive endpoint (`/gyre`, `/search`, etc.) — not as `/mcp`. This is the correct attribution: the cost was the primitive's cost. > **Sanity-check from your terminal** — The fastest way to verify your key works over MCP is a single <code>tools/list</code> call (see above). It returns the tool definitions, costs zero credits (auth-only path), and confirms the endpoint is reachable from your network.

# Gyrence — full documentation > Concatenation of every page at https://www.gyrence.com/docs. Generated on request from the .mdx sources. --- # Quickstart Source: https://www.gyrence.com/docs/quickstart.md Make your first Gyrence API call in under five minutes. By the end of this page you'll have a workspace, an API key, and a working `/fetch` request returning clean markdown. ## 1. Create a workspace Sign in at [gyrence.com](/login). Your first workspace is created automatically — a workspace is the unit that owns API keys, usage, and billing. You can create more later (one per project, customer, or environment) from the [console](/app/dashboard). > **What** — A workspace isolates a set of API keys, request history, and credit balance. Most teams start with one and add more only when they need separate billing or hard usage walls between projects. See [Concepts → Workspaces](/docs/concepts/workspaces). ## 2. Generate an API key Open [API Keys](/app/api-keys) in the console and click **Create key**. Copy the value immediately — Gyrence shows it once, then stores only a hash. If you lose a key, revoke it and create a new one. Keys start with `mc_` and authenticate every request via the `Authorization: Bearer` header. ```bash ``` ## 3. Make your first request Fetch any public URL as clean markdown: ```bash curl -X POST https://www.gyrence.com/api/v1/fetch \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' ``` Response: ```json { "ok": true, "data": { "url": "https://example.com/", "title": "Example Domain", "description": "", "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "html": "...", "links": ["https://www.iana.org/domains/example"], "statusCode": 200, "fetchedAt": "2026-05-29T18:42:00.000Z", "via": "http" } } ``` The `via` field tells you which tier handled the request: `"http"` (1 credit) or `"browser"` (3 credits). Gyrence picks automatically based on what the origin returns. ## 4. Use it from JavaScript ```javascript const res = await fetch("https://www.gyrence.com/api/v1/fetch", { method: "POST", headers: { "Authorization": `Bearer ${process.env.GYRENCE_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://example.com" }), }); const { ok, data, error } = await res.json(); if (!ok) throw new Error(error); console.log(data.markdown); ``` Every Gyrence response uses the same envelope: `{ ok: true, data }` on success, `{ ok: false, error, code }` on failure. One shape of error handling covers every endpoint. ## What's next The other four primitives compose on top of `/fetch`: - **[Search](/docs/api/search)** — natural-language query → ranked URLs (optionally fetched in the same call) - **[Gyre](/docs/api/gyre)** — walk a site within a page budget, returning markdown for every page - **[Extract](/docs/api/extract)** — fetch a URL and use an LLM to pull structured fields - **[Map](/docs/api/map)** — enumerate every URL on a site via its sitemaps Want the same five primitives inside Claude Desktop, Cursor, or your own MCP-aware agent? Skip the HTTP wiring entirely: - **[MCP integration](/docs/concepts/mcp)** — point your MCP client at `https://www.gyrence.com/api/mcp` with the same `Authorization: Bearer mc_…` header. All five primitives appear as `gyrence_*` tools, billed identically to HTTP. Before going to production, read: - **[Authentication](/docs/authentication)** — header format, key scopes, rotation - **[Errors](/docs/errors)** — full error envelope and code reference - **[Rate limits](/docs/rate-limits)** — per-plan limits and backoff strategy - **[Credits](/docs/concepts/credits)** — exactly what each endpoint costs > **For coding agents** — Point your agent at [`/llms-full.txt`](/llms-full.txt) for a single-file plaintext copy of this entire documentation site, or [`/llms.txt`](/llms.txt) for just the index. --- # Authentication Source: https://www.gyrence.com/docs/authentication.md Every Gyrence API call is authenticated with a workspace **API key** sent as a bearer token. There is no OAuth, no session cookie, no signed-request scheme — one header, one secret, one workspace. ```bash Authorization: Bearer mc_VHJ5dGhpc2lzbm90YXJlYWxrZXk ``` The key tells Gyrence which [workspace](/docs/concepts/workspaces) the call belongs to, which credit pool to draw from, and which Usage and Activity logs to write. The caller's human identity is not part of the request. ## The header All endpoints under `/api/v1/*` expect the same header: ``` Authorization: Bearer ``` - The scheme is `Bearer` (case-sensitive in spirit, case-insensitive in practice). - The key must start with `mc_`. Anything else is rejected before hash lookup. - The header is the only place credentials are read. Query-string keys, request-body keys, and basic auth are not supported. A missing header, malformed key, or revoked key returns: ```json { "ok": false, "error": "invalid api key", "code": "unauthorized" } ``` with HTTP `401`. Failed auth is **never billed** — the request is rejected before any primitive runs. ## Getting a key Keys are minted in the console at **API keys**, scoped to the currently-selected workspace. 1. Pick the workspace from the switcher in the top-right of the console. 2. Open **API keys → New key**. 3. Give it a human name (`prod-backend`, `ci-runner`, `dev-laptop-alice`). 4. Copy the plaintext value from the one-time reveal — Gyrence stores only a SHA-256 hash, so the full value can never be recovered. See [Concepts → API keys](/docs/concepts/api-keys) for the full lifecycle (rotation, revocation, prefixes). ## Always call `www.gyrence.com` The canonical host is **`www.gyrence.com`**. The apex (`gyrence.com`) issues a 307 redirect to `www`, and most HTTP clients (curl, `fetch`, the majority of SDKs) **drop the `Authorization` header on cross-host redirects** — so calls to the apex will surface as `401 unauthorized`. Always hard-code `https://www.gyrence.com` in your client, SDK config, and webhook destinations. ## A canonical request ```bash curl https://www.gyrence.com/api/v1/fetch \ -H "Authorization: Bearer $GYRENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com" }' ``` ```ts const res = await fetch("https://www.gyrence.com/api/v1/fetch", { method: "POST", headers: { Authorization: `Bearer ${process.env.GYRENCE_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://example.com" }), }); const json = await res.json(); if (!json.ok) throw new Error(`${json.code}: ${json.error}`); ``` The same key authenticates the [MCP endpoint](/docs/concepts/mcp) at `POST https://www.gyrence.com/api/mcp` — header-only, never in the URL path. ## Verifying a key without spending credits `GET /api/v1/health` is authenticated, costs **0 credits**, and never returns `402`. Use it as a liveness probe and as a cheap way to confirm a key is valid before promoting it to production. ```bash curl https://www.gyrence.com/api/v1/health \ -H "Authorization: Bearer $GYRENCE_API_KEY" ``` ```json { "ok": true, "data": { "status": "ok" } } ``` ## Handling secrets Keys grant full API access to a workspace and draw down its credit balance — treat them like a database password. - **Server-side only.** Never ship a key to a browser, mobile app, or any client your users control. - **Environment variables or a secret manager.** Not source code, not config files committed to git. - **One key per deployed surface.** Separate keys for `prod`, `staging`, CI, and each developer's laptop make revocation surgical. - **Rotate after exposure.** A key in a log, screenshot, or shared terminal is compromised. Mint a new one, roll it out, revoke the old one. > **If a key leaks** — Revoke it immediately in the console. Revocation takes effect on the next request — there is no propagation delay. Then audit **Usage** filtered by the revoked key's prefix to see what was called before you caught it. ## What auth does **not** do - **Endpoint scoping.** Any valid key can call any `/api/v1/*` endpoint on its workspace today. Per-key scopes are on the roadmap. - **Cross-workspace access.** A key is bound to exactly one workspace and cannot be promoted, transferred, or pooled. Mint a new key on the target workspace instead. - **Console sign-in.** Keys don't authenticate humans into the dashboard — that's a separate account-level login. --- # Errors Source: https://www.gyrence.com/docs/errors.md Gyrence uses a single, predictable error envelope across every `/api/v1/*` endpoint. One shape of error handling covers the whole API surface. ## The envelope Every response — success or failure — is JSON with an `ok` boolean. **Success:** ```json { "ok": true, "data": { /* endpoint-specific payload */ } } ``` **Failure:** ```json { "ok": false, "error": "human-readable message", "code": "machine_code" } ``` - `ok` is the only field guaranteed on every response. Always branch on it first. - `error` is meant for humans — log it, surface it in dashboards. The wording is not stable across releases. - `code` is meant for machines — branch on it. Codes are stable within `/api/v1`; renames require `/api/v2`. ```ts const res = await fetch(url, opts); const json = await res.json(); if (!json.ok) { // Branch on json.code, not on res.status — they match, but code is the source of truth. throw new Error(`${json.code}: ${json.error}`); } return json.data; ``` ## Code reference | HTTP | `code` | What it means | Billed? | | --- | --- | --- | --- | | `200` | — | Success. `ok: true`, `data` populated. | yes (per the [credit schedule](/docs/concepts/credits)) | | `400` | `bad_request` | Malformed JSON, missing required field, or input failed schema validation. | no | | `401` | `unauthorized` | Missing, malformed, or revoked API key. | no | | `402` | `credits_exhausted` | Workspace has zero credits remaining. Top up or wait for plan renewal. Never returned for internal-plan workspaces. | no | | `403` | `forbidden_url` | Target URL is blocked by SSRF rules (private IPs, link-local, localhost, non-HTTP schemes). | no | | `404` | `not_found` | The origin returned 404 for the requested URL. The request itself succeeded — the page just isn't there. | no | | `408` | `timeout` | The 25-second hard deadline tripped before useful work completed. | no | | `429` | `rate_limited` | You exceeded the per-workspace request rate. Back off and retry. | no | | `502` | `upstream_error` | The origin returned a 5xx, or the headless-browser worker failed. Gyrence didn't get the content you asked for. | no | | `502` | `provider_quota_exceeded` | A Gyrence-side provider subscription (e.g. Brave Search) returned a billing/quota signal (HTTP 402). Platform-side condition — not caller fault, not a per-workspace credit issue. Do not retry; surface to status. `details.provider` names the upstream. | no | | `503` | `unavailable` | Gyrence itself is degraded — the platform is having a bad minute. Retry with backoff. | no | The rule is consistent: **if the response doesn't include the data you asked for, you weren't billed**. The one exception is `/gyre`, which has a 1-credit floor so even a zero-page success records as a metered call. ## Branching by category In practice, error handling collapses to four cases: ```ts const { ok, data, error, code } = await res.json(); if (ok) return data; switch (code) { case "unauthorized": case "forbidden_url": case "bad_request": // Caller error — do not retry. Fix the code or the input. throw new PermanentError(code, error); case "credits_exhausted": // Operational — alert a human, stop sending traffic on this workspace. throw new BillingError(error); case "rate_limited": case "timeout": case "upstream_error": case "unavailable": // Transient — retry with exponential backoff (see /docs/rate-limits). throw new RetryableError(code, error); case "provider_quota_exceeded": // Platform-side: a Gyrence upstream provider is out of quota. // Not your fault, not your credits. Don't retry — alert and wait. throw new PermanentError(code, error); case "not_found": // Logical — the URL doesn't exist. Often you want to record and move on. return null; default: throw new Error(`unknown code: ${code}`); } ``` ## Per-primitive notes Most error codes are universal. A few have endpoint-specific nuances: - **`/fetch`, `/extract`.** A `404` from the origin surfaces as `404 not_found`. A `403` from the origin (bot block) is escalated to the browser tier first; if the block persists, you get `502 upstream_error`, not `403`. - **`/gyre`.** Per-page failures are **not** top-level errors — they land in the response's `errors[]` array while the overall call returns `200`. A `200` gyre with zero successful pages still bills 1 credit (the floor). - **`/search`.** If `fetch: true` is set, individual result fetches that fail land in each result's `fetchError` field. The search call itself returns `200` as long as Brave returned results. - **`/map`.** A site with no discoverable sitemap returns `200` with an empty `urls[]`, not a `404`. Use the result, don't catch. ## Partial success vs. hard failure The pattern across the API is: - **Hard failure** (`ok: false`) means the request as a whole produced nothing usable. - **Partial success** (`ok: true` with errors *inside* `data`) means some sub-units failed but the overall call returned something. Gyre `errors[]`, search per-result `fetchError`, and extract field-level confidence are all in this category. Both are normal. Code defensively on the inside-data errors, but don't conflate them with envelope-level failures. ## Stability guarantees - HTTP status codes and `code` values are **frozen for `/api/v1`**. New codes may be added; existing codes will not be renamed or repurposed. - `error` message wording **is not stable** — it's tuned for human readability and may change without notice. Never branch on the string. - New endpoints may add endpoint-specific codes; the universal codes in the table above will continue to mean the same thing everywhere. ## MCP error shape Calls to the [`/api/mcp`](/docs/api/mcp) endpoint use the same code set, but wrapped in JSON-RPC instead of the HTTP envelope. Transport-level failures (bad auth, wrong method) come back as `{ jsonrpc, error: { code, message }, id }` with codes `-32001` (unauthorized) or `-32000` (method not allowed). Primitive failures (`credits_exhausted`, `upstream_error`, etc.) surface inside the `tools/call` result with `isError: true` and a `text` content payload of the form `"

: "` — same code, different envelope. See [MCP](/docs/api/mcp#errors) for examples.

> **Status page** — Persistent `503 unavailable` or `502 upstream_error` across many requests usually means a platform-side incident. Check status before adding retry logic for what's actually an outage.


---

# Rate limits

Source: https://www.gyrence.com/docs/rate-limits.md

Gyrence rate-limits per **workspace**, not per key or per IP. Every key on a workspace draws from the same bucket, the same way they draw from the same [credit pool](/docs/concepts/credits).

The goals are:

- Keep one workspace's traffic from degrading another's.
- Surface runaway loops early (a misconfigured agent making thousands of calls per minute).
- Stay out of the way of normal application traffic.

## The hard deadline

Independent of any rate limit, **every request has a 25-second hard deadline**. If the work doesn't complete in 25 seconds — slow origin, stuck headless browser, traverse fanning out further than expected — the request returns:

```json
{ "ok": false, "error": "request exceeded 25s deadline", "code": "timeout" }
```

with HTTP `408`. Timeouts are **not billed**. The deadline applies end-to-end across all tiers and retries Gyrence does internally.

For gyres, the deadline is the most common reason a call returns fewer pages than `maxPages` — the budget was generous, but the clock ran out first.

## When you're rate-limited

Exceeding the per-workspace rate returns:

```json
{ "ok": false, "error": "rate limit exceeded", "code": "rate_limited" }
```

with HTTP `429`. Rate-limited requests are **not billed**.

The current production limits are deliberately quiet — the platform is built around credit metering, not request-count throttling — but they exist as a circuit breaker. If you're seeing `429` in normal operation, you're almost certainly in a runaway loop; check **Console → Activity** to see what's firing.

## Backoff strategy

When you see `429`, `502 upstream_error`, `503 unavailable`, or `408 timeout`, back off and retry. The right shape is **exponential backoff with full jitter**:

```ts
async function withRetry(fn: () => Promise, max = 4): Promise {
  for (let attempt = 0; attempt < max; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (!isRetryable(err) || attempt === max - 1) throw err;
      const base = Math.min(1000 * 2 ** attempt, 8000); // 1s, 2s, 4s, 8s cap
      const delay = Math.random() * base;                // full jitter
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("unreachable");
}

function isRetryable(err: unknown): boolean {
  const code = (err as { code?: string }).code;
  return code === "rate_limited" || code === "upstream_error"
      || code === "unavailable" || code === "timeout";
}
```

Three guidelines:

- **Cap the base.** Don't let the delay grow unboundedly — 8 seconds is plenty for a 25-second-deadline API.
- **Jitter, always.** Without jitter, a fleet of clients backing off in lockstep will pile back on the platform together and trigger the same `429`.
- **Limit total attempts.** Three to four retries is the sweet spot. Beyond that you're typically masking a real problem.

Do **not** retry `400 bad_request`, `401 unauthorized`, `402 credits_exhausted`, `403 forbidden_url`, or `404 not_found`. These are caller-side or logical errors — retrying changes nothing and just delays the failure surface.

## Concurrency, not just throughput

A 25-second deadline means that a workspace running 10 concurrent fetches can sustain ~24 requests/minute per concurrency slot. The throughput ceiling is usually concurrency × turnaround, not the rate-limit number.

For high-volume workloads:

- **Bound your own concurrency.** A worker pool with a hard cap (e.g. 20 in-flight requests) is more predictable than firing everything at once and reacting to `429`.
- **Prefer batching primitives.** A single `/gyre` with `maxPages: 50` is one HTTP connection and one billed call (per the gyre schedule); 50 individual `/fetch` calls is 50 round-trips and 50 chances to hit the deadline.
- **Use `/map` when you only need URLs.** It's a single 1-credit call and almost never times out.

## Idempotency

All `/api/v1/*` endpoints are **idempotent** — retrying the same request produces the same effect on Gyrence's side (no state change beyond the usage event). Safe to retry on transient failures without extra deduplication.

The one thing retries don't reverse is **what was billed**. If a request succeeded on attempt 1 but the response was lost in the network, the retry on attempt 2 will succeed (and bill) again. In practice this is rare — server-side timeouts are very different from client-side connection failures — but worth knowing.

MCP traffic at [`/api/mcp`](/docs/api/mcp) shares the same per-workspace rate-limit bucket and the same 25-second deadline as HTTP. Picking MCP doesn't relax (or tighten) any limit.

## Raising your limits

If your workload genuinely needs more than the default rate (and the credit pool to back it), contact us. Limit increases are tied to plan, not pay-as-you-go.

> **Rate limit vs. credits** — Running out of credits returns `402 credits_exhausted` — your workspace can't make any billable call, regardless of rate. Running into the rate limit returns `429 rate_limited` — your workspace is making calls faster than allowed, even if the credit balance is healthy. Different problems, different remedies.


---

# Workspaces

Source: https://www.gyrence.com/docs/concepts/workspaces.md

A **workspace** is the unit of ownership in Gyrence. Every API key, every credit balance, and every usage event belongs to exactly one workspace. When you call the API, the key you present is what tells us which workspace to bill and which activity log to write to.

Workspaces are how teams share access without sharing credentials, and how a single person separates unrelated projects — staging vs. production, client A vs. client B — into clean, independently-metered ledgers.

## What a workspace owns

- **API keys.** Minted in the console, scoped to the workspace, revocable at any time. A key never moves between workspaces.
- **Credit balance.** Fetches, searches, gyres, maps, and extracts draw down the workspace's credits. Top-ups apply to one workspace.
- **Usage events.** Every billable call is recorded against the workspace, viewable on the Activity and Usage pages.
- **Members.** Humans who can sign in to the console and see the workspace's keys, usage, and settings.

What a workspace does **not** own: the data you fetch. Gyrence does not retain fetched page content beyond the response.

## Members and roles

Roles control what a member can do **inside the console**. API keys are not role-scoped — any valid key gives full API access to the workspace it belongs to.

| Role | Can do |
| --- | --- |
| `owner` | Everything `admin` can, plus delete the workspace and transfer ownership. Exactly one per workspace. |
| `admin` | Mint and revoke keys, invite members, change settings, view usage. |
| `member` | View usage and activity. Cannot mint keys or invite. |

## Switching workspaces

The workspace switcher in the top-right of the console scopes everything below it — keys, usage, activity, settings — to the selected workspace. The switcher is the only place workspace context is set; there is no per-page override.

A signed-in user always has at least one workspace. Personal workspaces are created on first sign-in; team workspaces are created from **Settings → Workspaces → New**.

## Lifecycle

| State | Meaning |
| --- | --- |
| Active | Keys work, usage accrues, members can sign in. |
| Decommissioned | All keys revoked. History (usage events, member list) is retained. The workspace cannot be re-activated — create a new one. |

Workspaces are **never deleted** in production. To wind one down, revoke every key and rename it (e.g. `acme-prod` → `acme-prod-decommissioned-2026-05`). The historical ledger stays intact for audit and reconciliation.

> **Why no delete?** — Usage events are financial records. Deleting a workspace would orphan the events tied to it and break reconciliation against invoices. Revoke-and-rename gives you the same operational outcome without breaking the ledger.

## Common patterns

- **One workspace per environment.** `acme-dev`, `acme-staging`, `acme-prod` — separate keys, separate credit pools, separate dashboards. The cleanest way to keep staging traffic out of production billing.
- **One workspace per client.** Agencies and consultancies give each client their own workspace so usage and costs map 1:1 to invoices.
- **A shared "ops" workspace.** For internal monitoring, scraping, or research that isn't tied to a customer-facing product.


---

# API keys

Source: https://www.gyrence.com/docs/concepts/api-keys.md

An **API key** is what tells Gyrence which [workspace](/docs/concepts/workspaces) a request belongs to. Every call to `/api/v1/*` carries a key in the `Authorization` header, and that key — not the caller's identity — is what we bill, log, and rate-limit against.

Keys are minted in the console, scoped to one workspace, and revocable at any time.

## Anatomy of a key

```
mc_VHJ5dGhpc2lzbm90YXJlYWxrZXk
└┬┘ └─────────────┬─────────────┘
 │                │
 │                └── 32 chars of url-safe random (24 bytes, base64url)
 └─── fixed prefix
```

Every key starts with `mc_` followed by 32 random URL-safe characters. The plaintext is shown **exactly once**, at the moment of minting. We store only a SHA-256 hash plus the first 8 characters of the random part (the **key prefix**, e.g. `VHJ5dGhp…`) so you can identify a key in the console without ever exposing the full value.

If you lose a key, you can't recover it — revoke it and mint a new one.

## Sending a key

All endpoints take the key as a bearer token in the `Authorization` header:

```bash
curl https://www.gyrence.com/api/v1/fetch \
  -H "Authorization: Bearer mc_VHJ5dGhpc2lzbm90YXJlYWxrZXk" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com" }'
```

A missing, malformed, or revoked key returns `401 unauthorized`. The `mc_` prefix is required — keys without it are rejected before the hash lookup.

The same key authenticates the [MCP endpoint](/docs/concepts/mcp) at `POST /api/mcp` — header-only, never in the URL path. No separate "MCP key" type exists; an HTTP key works over MCP and vice versa.

> **Never put keys in client code** — Keys grant full API access to a workspace and draw down its credit balance. Treat them like a database password: server-side only, in environment variables or a secret manager, never shipped to a browser or mobile app.

## Minting and revoking

Keys are managed at **Console → API keys**, scoped to the currently-selected workspace.

- **Mint.** Give the key a human name (`prod-backend`, `ci-runner`, `intern-laptop`). The plaintext appears in a one-time reveal — copy it before closing the dialog.
- **Revoke.** Revocation is immediate. Subsequent requests with that key get `401 unauthorized`. Revoked keys stay in the list (greyed out) so the historical audit trail in Usage and Activity remains intelligible.

A workspace can hold as many keys as you like; there is no per-key credit cap (all keys on a workspace share the workspace's credit pool).

## Rotation

There's no built-in expiry, but rotating periodically is good hygiene — especially after a teammate leaves, a deploy environment is decommissioned, or a key may have been exposed in a log or screenshot.

The zero-downtime rotation pattern:

1. Mint a new key alongside the old one.
2. Roll the new value out to whatever uses the old key (deploy, CI secret update, etc.).
3. Once traffic on the old key drops to zero (check **Usage** filtered by key prefix), revoke it.

## One key per use, not one key per person

Keys identify a **caller**, not a human. The clean pattern is:

- One key per deployed environment (`prod`, `staging`, `dev`).
- One key per automation (`ci`, `nightly-batch`, `webhook-handler`).
- A short-lived key per developer laptop during local work — revoked when no longer needed.

This makes the Usage page legible (you can see *which* surface is burning credits), and it limits blast radius when a single key is exposed.

## What a key does **not** grant

- **Console access.** Signing in is via your account — keys don't authenticate humans into the dashboard.
- **Cross-workspace access.** A key is bound to exactly one workspace and cannot be promoted, transferred, or scoped to a different one. Mint a new key on the target workspace instead.
- **Selective endpoint access.** Today, any valid key can call any `/api/v1/*` endpoint on its workspace. Per-key scopes are on the roadmap, not shipped.

## Common patterns

- **Per-environment keys with matching workspaces.** Pair `acme-prod` workspace + `prod-backend` key, `acme-staging` workspace + `staging-backend` key. Credits, usage, and blast radius all stay separate.
- **Short-lived dev keys.** Mint a `dev-laptop-` key for local exploration; revoke it the same day. Cheaper than building a key-vault around a long-lived shared key.
- **One key per integration.** If you call Gyrence from three services, give each its own key. When one starts misbehaving, you can revoke without disrupting the others.


---

# Credits

Source: https://www.gyrence.com/docs/concepts/credits.md

**Credits** are how Gyrence meters usage. Every successful API call draws down the calling [workspace](/docs/concepts/workspaces)'s credit balance; every failure that didn't do real work is free. The schedule is flat, transparent, and tied to the actual cost of executing the call — not to which plan you're on or how recently you signed up.

## The schedule

| Endpoint | Cost |
| --- | --- |
| `POST /api/v1/fetch` | **1** (HTTP tier) or **3** (browser tier) |
| `POST /api/v1/search` | **2** base; plus per-result fetch cost if `fetch: true` |
| `POST /api/v1/gyre` | sum of per-page costs across `pages[]`, minimum **1** |
| `POST /api/v1/map` | **1** flat (regardless of `limit`) |
| `POST /api/v1/extract` | **5** (HTTP tier) or **7** (browser tier) |
| `GET /api/v1/health` | **0** |
| `POST /api/mcp` (any tool) | identical to the underlying primitive — see [MCP](/docs/concepts/mcp) |

There is no per-request surcharge, no monthly minimum, and no rounding-up of partial work.

## HTTP tier vs. browser tier

Fetch, gyre, and extract all use the same two-tier fetcher:

- **HTTP tier** is a plain HTTP GET. Cheap and fast — works for the majority of pages on the open web.
- **Browser tier** runs the page in a real headless browser with JavaScript executed. Used when the HTTP tier returns a soft block (403/429/503), when the response is suspiciously small, or when the page is a JS shell with no rendered content. The endpoint escalates automatically; you don't pick a tier per call.

You can force the browser tier by passing `forceBrowser: true` on `/fetch` and `/extract` — useful when you already know the target needs JS and want to skip the cheap-tier attempt.

## What costs what, exactly

**Fetch.** One credit if the HTTP tier returns a usable response; three credits if the request escalated (or was forced) to the browser tier. Forced browser-tier always bills 3.

**Search.** Two credits for the search itself (Brave Search is our dominant variable cost at ~$0.005/call, so the base is 2 to keep unit economics intact). If you set `fetch: true`, each fetched result bills at its own tier (so a 5-result search where 2 results needed the browser is `2 + 1 + 1 + 1 + 3 + 3 = 11`).

**Gyre.** Per-page billing across the response's `pages[]` array: 1 per HTTP-tier page, 3 per browser-tier page, with a 1-credit minimum even if zero pages succeed. Pages that errored or were blocked land in `errors[]` and are **free** — they don't count against the budget or the bill.

**Map.** A flat 1 credit regardless of how many URLs come back. Map prefers the sitemap and doesn't fetch page content, so it stays cheap.

**Extract.** Higher base cost than fetch because every successful call also runs a model pass against the page. 5 for HTTP-tier, 7 for browser-tier.

**Health.** Free, always. Use it for liveness probes and key validation without burning credits.

## What's never charged

- **Auth failures** (`401`) — the request never reached the primitive.
- **SSRF blocks** (`403 forbidden_url`) — refused before any fetch.
- **Bad input** (`400 bad_request`) — schema validation rejection.
- **Timeouts** (`408`) — the 25-second hard deadline tripped before useful work completed.
- **Upstream errors** (`502 upstream_error`) — origin returned a 5xx; we didn't get the content you asked for.
- **Rate limits** (`429`) — Gyrence-side throttling; you didn't get anything, so you don't pay.

The rule is: **if the response doesn't include the data you asked for, you weren't billed**. The one exception is the gyre 1-credit floor, which exists so a request that returns zero pages still records as a metered call.

## Where credits live

Credits are a single pool on each workspace. Every key on the workspace draws from the same pool; there is no per-key allocation. Top-ups, plan grants, and refunds all apply at the workspace level.

You can watch the balance — and the events drawing it down — at **Console → Usage** (totals and trends) and **Console → Activity** (event-by-event log with cost and tier).

> **402 vs. 0 credits** — Running out of credits returns `402 credits_exhausted`. Internal-plan workspaces never return `402` — they're untill metered for visibility but not gated on balance.

## Estimating spend

A few sanity-check numbers for the back-of-the-napkin:

- **A docs gyre with 50 pages, all HTTP tier:** 50 credits.
- **A search with 10 results and `fetch: true`, mixed tiers (~30% browser):** roughly `2 + (7×1) + (3×3) = 18` credits.
- **An extract over a JS-heavy product page (browser tier):** 7 credits.
- **A liveness check from a cron job, once a minute:** 0 credits (use `/health`).
- **The same calls over MCP:** identical to the rows above. Transport doesn't change cost.

## Common patterns

- **Use `/health` for warm-up and validation.** Free. Confirms the key is alive and the platform is reachable before you start spending.
- **Map first, then fetch selectively.** When you only need a handful of pages from a large site, run `/map` (1 credit) to find the URLs, then `/fetch` the specific ones you want. Cheaper than a wide gyre.
- **Inline `fetch: true` on search when you'll fetch every result anyway.** One round-trip, same total credits, half the latency.
- **Skip `forceBrowser` until you know you need it.** The auto-escalation only charges browser-tier when it actually runs the browser. Forcing it gives up the cheap-tier savings.


---

# Gyring

Source: https://www.gyrence.com/docs/concepts/gyring.md

**Gyre** (verb): to follow a site's link graph outward from a seed URL, capturing each page's content as you go. Named after the spiraling outward motion — you start at one point and wind through the connected web around it.

In the Gyrence pipeline, gyring is the **multi-page traversal primitive**, distinct from fetching a single URL or mapping a site's link surface.

## When to gyre

Use a gyre when you need the **content** of multiple connected pages, in one call, without orchestrating queue + fetch yourself.

Reach for gyre when:

- You have a seed URL and want the surrounding cluster of pages (e.g. a documentation root, a product category page, a press-release index).
- You want each visited page returned with its title, description, and markdown — not just a list of URLs.
- The site doesn't expose a clean sitemap, or its sitemap is too coarse for what you need.

Reach for a **different primitive** when:

| You want… | Use |
| --- | --- |
| One specific page's content | [Fetch](/docs/api/fetch) |
| The full URL surface of a site, no content | [Map](/docs/api/map) |
| Pages matching a query across the web | [Search](/docs/api/search) |
| Structured JSON pulled from page content | [Extract](/docs/api/extract) |

## How a gyre walks

A gyre is a breadth-first walk bounded by a page budget:

1. The seed URL is fetched.
2. Its outbound links are enqueued (filtered by `sameDomain` if set).
3. The next URL in the queue is fetched; its links are enqueued.
4. Steps 2–3 repeat until `maxPages` successful fetches have happened, or the queue is exhausted.

Each page goes through the same two-tier fetcher as the `/fetch` endpoint: plain HTTP first, escalating to the headless-browser worker when the page needs JS, returns a soft block, or rate-limits.

## Budget and billing

`maxPages` caps **successes**, not attempts. A page that 404s or trips the block-page detector lands in `errors[]` and does not count against the budget.

Billing is per-page: 1 credit for HTTP-tier pages, 3 for browser-tier. The total cost is the sum across the response's `pages[]`, with a minimum of 1.

## Scope: `sameDomain`

By default a gyre stays on the seed's exact hostname. `blog.example.com` is **not** treated as same-domain as `example.com` — subdomains are out of scope. Set `sameDomain: false` to follow links anywhere on the public web (still subject to SSRF rules).

If your goal is "find every URL on this site, including subdomains," gyre is the wrong tool — use [Map](/docs/api/map), which prefers sitemaps and is built for surface discovery rather than content capture.

## Common patterns

- **Mirror a small docs site.** Seed the docs root, set `maxPages: 50`, `sameDomain: true`. You get a markdown corpus you can feed straight to an LLM or your own RAG index.
- **Capture a press-release cluster.** Seed a `/newsroom` or `/press` page; the gyre fans out to the linked releases.
- **Snapshot a product category.** Seed a category index; capture the linked product detail pages in one call.

## Limits

- `maxPages` range: 1–50 per request.
- 25-second hard deadline on the whole request.
- No `maxDepth` parameter today — depth is bounded indirectly by `maxPages` and BFS order.
- Visited URLs are tracked verbatim. `?ref=x` and `?ref=y` count as distinct pages.

> **Try it** — Run a gyre from the console at [/app/gyre](/app/gyre) — pick a seed, set a budget, watch pages stream in.


---

# MCP integration

Source: https://www.gyrence.com/docs/concepts/mcp.md

Gyrence exposes all five primitives over the **Model Context Protocol** (MCP) at a single hosted endpoint. If your agent runtime speaks MCP — Claude Desktop, Cursor, Continue, an OpenAI Responses-API client, your own SDK — you can drop Gyrence in without writing HTTP plumbing.

Everything you read here is *additive* to the HTTP API. Same workspace, same API key, same credits, same activity log. Picking MCP vs. HTTP is a transport choice, not an account choice.

## What is MCP?

The **Model Context Protocol** is an open spec for connecting LLM-driven agents to external tools and data sources. An MCP server advertises a set of *tools* (typed JSON-RPC methods); a *client* (your agent runtime) calls them on the agent's behalf.

Gyrence's MCP server advertises five tools — one per primitive — and the client picks which to call based on the agent's reasoning. The wire protocol is JSON-RPC 2.0, transported over HTTP with optional Server-Sent Events for streaming responses.

You do not need to know any of this to use Gyrence via MCP. You configure your client once with the endpoint and an API key; the rest is the agent doing what agents do.

## The endpoint

```
POST https://www.gyrence.com/api/mcp
Authorization: Bearer mc_
Content-Type: application/json
```

That's the entire surface — one URL, one bearer token, JSON-RPC bodies in and JSON or SSE out. There are no per-tool subpaths, no session cookies, no separate handshake.

`GET` and `DELETE` are rejected with HTTP `405`. The MCP standalone-SSE channel is **not** offered today.

> **Why header auth, not URL-path auth?** — Putting the key in the path (e.g. /api/mcp/mc_…) leaks it into platform access logs, browser history, and any proxy in front of Gyrence. Header auth keeps the secret out of every URL-shaped log surface. If you're migrating from an earlier MCP integration that used path-based keys, just move the key into the Authorization header.

## The tools

The MCP server exposes exactly the five primitives, named with the `gyrence_` prefix so they don't collide with other servers your client might be wired to:

| MCP tool | Underlying endpoint | When the agent should pick it |
| --- | --- | --- |
| `gyrence_search` | `POST /api/v1/search` | The agent doesn't have a URL yet — needs to *find* pages by query. |
| `gyrence_fetch` | `POST /api/v1/fetch` | The agent has one known URL and needs the content. |
| `gyrence_gyre` | `POST /api/v1/gyre` | The agent wants the cluster around a seed URL — docs, releases, a category. |
| `gyrence_extract` | `POST /api/v1/extract` | The agent needs typed fields out of a page, not raw markdown. |
| `gyrence_map` | `POST /api/v1/map` | The agent needs to know *what URLs exist* on a site without fetching them. |

Tool descriptions (visible to the model during selection) call out credit cost and link back to the live credit schedule at [`/docs/concepts/credits`](/docs/concepts/credits). The schemas advertised over MCP are the **same Zod objects** that validate the HTTP requests — `searchHandler.schema`, `fetchHandler.schema`, and so on. There is no possibility of drift between the two transports.

## Billing parity

This is the central guarantee, and it's worth stating plainly: **a call via MCP costs exactly the same as the equivalent HTTP call, and produces an identical row in your `usage_events`**.

This isn't a careful audit — it's how the code is structured. Each MCP tool's `execute` synthesizes an internal `Request` and calls the **same `handle()` function** the HTTP route delegates to. That function is the only place credit metering, status mapping, and the `recordUsage` write happen. There is no separate MCP code path that could diverge.

Concretely, after an MCP `tools/call` for `gyrence_gyre`, you'll see a `usage_events` row with:

- `endpoint = "gyre"` (the primitive, not "mcp")
- `credits_used`, `via`, `status_code`, `status`, `latency_ms` — all identical in shape to an HTTP call
- `url` — the seed URL passed in the tool arguments

The Activity page in the console shows MCP and HTTP traffic in the same table. There is no MCP vs. HTTP filter because, at the billing/audit layer, there's nothing to filter — it's the same event.

## Authentication and concurrency

Two implementation choices worth knowing about:

- **Header auth, parsed up-front.** Every request starts with `parseBearer(request)` + `authenticateMcpKey(key)`. A missing or invalid key returns JSON-RPC error `-32001` ("Unauthorized") with HTTP `401`. Failed auth is never billed.
- **Per-request MCP server.** Gyrence constructs a *fresh* `McpServer` instance per request, with the parsed key captured in each tool's closure. Two concurrent calls authenticated with different keys cannot share server state. This was verified end-to-end: six interleaved requests across two workspaces produced zero cross-attribution.

In practice this means MCP behaves exactly like HTTP for concurrency: per-workspace rate limits and the 25-second hard deadline apply identically.

## Configuring a client

The exact configuration shape varies by client, but the inputs are always the same: the URL and the bearer token.

### Claude Desktop

Add an entry to `claude_desktop_config.json` under `mcpServers`:

```json
{
  "mcpServers": {
    "gyrence": {
      "transport": {
        "type": "http",
        "url": "https://www.gyrence.com/api/mcp",
        "headers": {
          "Authorization": "Bearer mc_your_key_here"
        }
      }
    }
  }
}
```

Restart Claude Desktop. The five `gyrence_*` tools appear in the tool picker; Claude will reach for them when a prompt implies web data work.

### Cursor

Cursor's MCP config (`~/.cursor/mcp.json` or the in-app settings) takes the same shape:

```json
{
  "mcpServers": {
    "gyrence": {
      "url": "https://www.gyrence.com/api/mcp",
      "headers": {
        "Authorization": "Bearer mc_your_key_here"
      }
    }
  }
}
```

### Anything else

If your client speaks MCP over HTTP, you need exactly:

- **URL:** `https://www.gyrence.com/api/mcp`
- **Method:** `POST`
- **Headers:** `Authorization: Bearer mc_…`, `Content-Type: application/json`

The body is whatever JSON-RPC payload your client emits. The server speaks the standard MCP methods (`initialize`, `tools/list`, `tools/call`, `notifications/initialized`).

For a minimal raw-curl `tools/call` example, see [API → MCP](/docs/api/mcp).

## When to pick MCP vs. HTTP

Use MCP when:

- The consuming surface is an agent runtime (Claude Desktop, Cursor, an MCP-aware framework). The integration is one config block instead of a custom tool wrapper per primitive.
- You want the agent to pick *which* primitive to call. The tool descriptions are written to guide selection: `gyrence_search` when there's no URL, `gyrence_fetch` for one known URL, etc.
- You want zero code on your side to map primitives to tools.

Use HTTP when:

- You're building a non-agent pipeline (a batch job, a webhook handler, a cron). Plain HTTP is simpler than wiring an MCP client.
- You need to drive the call deterministically from your own code (e.g. "always fetch this URL when X happens"). MCP adds a layer of model-driven dispatch you don't want.
- You're shipping an SDK — the HTTP API is the stable surface SDKs are generated from (`/openapi.json`).

Most teams end up using both: HTTP for the deterministic backend pipelines, MCP for the agent surfaces.

## What MCP does not add

A short list of "it's HTTP under the hood, so this is the same":

- **No separate plan, no separate pricing.** MCP traffic counts against the same monthly credit pool as HTTP.
- **No separate rate limits.** Per-workspace limits and the 25-second deadline are shared across transports.
- **No richer auth model.** A key that works over HTTP works over MCP, and vice versa. Per-key scopes are not shipped today.
- **No new primitives.** If a primitive isn't on the HTTP API, it isn't on MCP either. The two are mirror images.

> **Try it without an agent** — The fastest sanity check is a single curl against tools/list with your key in the header — you'll get back the five tool definitions. See the example at [API → MCP](/docs/api/mcp).


---

# Search

Source: https://www.gyrence.com/docs/api/search.md

Run a web search via Brave and optionally fetch each result page as markdown in the same call. The discovery primitive in the Gyrence pipeline — **Search** finds, Fetch reads one, Gyre walks many.

## Use this when

- You need discovery → cleaned markdown in a single call (RAG, agent tool, research pipeline).
- You're running site-scoped queries (`site:sec.gov 8-K`, `site:fda.gov recall`) and want the bodies, not just the SERP.
- You want recency control beyond Brave's defaults (`tbs: "y2"`, `"y3"`, `"y5"`).
- You need per-URL credit attribution — `results[].fetch.via` tells you which result paid HTTP vs browser tier.

**Endpoint:** `POST /api/v1/search`  
**Auth:** Bearer  
**Credits:** 2 base + (1 per http sub-fetch, 3 per browser sub-fetch) when fetch: true

## Request

| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `query` | `string` | yes |  | Search query. Min length 1. |
| `limit` | `number` |  | `10` | Max results. Range 1..20. |
| `fetch` | `boolean` |  | `false` | Fetch each result URL inline and attach a fetch field to each result. |
| `fetchFormats` | `("markdown" | "links" | "html")[]` |  | `["markdown"]` | Which fields to populate on each inline fetch. html is accepted but not currently emitted. |
| `tbs` | `"d" | "w" | "m" | "y" | "y2" | "y3" | "y5"` |  |  | Time window: past day / week / month / 1–5 years. y2/y3/y5 post-filter Brave's 1-year cap. |

### Example body

```json
{
  "query": "site:sec.gov 8-K corporate action",
  "limit": 10,
  "fetch": true,
  "fetchFormats": ["markdown"],
  "tbs": "m"
}
```

## Response

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `query` | `string` |  |  | Echo of the input query. |
| `totalResults` | `number` |  |  | Count of returned results. |
| `via` | `"http" | "browser" | "n/a"` |  |  | Dominant sub-fetch tier across results, or the literal "n/a" when fetch: false. |
| `results[]` | `SearchResult[]` |  |  | Ordered results, rank 1..N. |
| `results[].rank` | `number` |  |  | 1-based rank after filtering. |
| `results[].title` | `string` |  |  | Page title from Brave. |
| `results[].url` | `string` |  |  | Result URL. |
| `results[].snippet` | `string` |  |  | Brave-provided snippet. |
| `results[].description` | `string` |  |  | Alias of snippet. |
| `results[].age` | `string?` |  |  | Relative age (e.g. "3 months ago") when Brave provides it. |
| `results[].page_age` | `string?` |  |  | ISO timestamp when Brave provides it. |
| `results[].fetch` | `object?` |  |  | Present only when fetch: true. See below. |

### `results[].fetch` on success

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `ok` | `true` |  |  |  |
| `markdown` | `string` |  |  | Empty string when "markdown" not in fetchFormats. |
| `links` | `string[]` |  |  | Empty array when "links" not in fetchFormats. |
| `statusCode` | `number` |  |  | Origin HTTP status. |
| `via` | `"http" | "browser"` |  |  | Tier that produced this sub-fetch. Drives credit cost (1 vs 3). |

### `results[].fetch` on failure

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `ok` | `false` |  |  |  |
| `error` | `string` |  |  | Failure reason (timeout, SSRF block, upstream error). |
| `via` | `"http" | "browser"` |  |  | Always "http" on failure. |

### Example response

```json
{
  "ok": true,
  "data": {
    "query": "site:sec.gov 8-K corporate action",
    "totalResults": 2,
    "via": "http",
    "results": [
      {
        "rank": 1,
        "title": "Form 8-K — Example Corp",
        "url": "https://www.sec.gov/...",
        "snippet": "Item 8.01 Other Events...",
        "description": "Item 8.01 Other Events...",
        "page_age": "2026-05-12T14:00:00Z",
        "fetch": {
          "ok": true,
          "markdown": "# Form 8-K\n\n...",
          "links": [],
          "statusCode": 200,
          "via": "http"
        }
      }
    ]
  }
}
```

## Example

```bash
curl -X POST https://www.gyrence.com/api/v1/search \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"site:sec.gov 8-K","limit":10,"fetch":true,"tbs":"m"}'
```

## Errors

| Code | HTTP | Meaning |
| --- | --- | --- |
| `bad_request` | 400 | `query` missing/empty, or any field fails validation. |
| `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. |
| `credits_exhausted` | 402 | Workspace balance below request cost. |
| `timeout` | 408 | Request exceeded the 25-second hard deadline. |
| `rate_limited` | 429 | Per-workspace rate limit, or Brave returned 429. |
| `upstream_error` | 502 | Brave returned 5xx or unparseable response. |

```json
{ "ok": false, "error": "brave returned 503", "code": "upstream_error" }
```

> **Credits** — Total = **2 base + Σ sub-fetches**, where each sub-fetch is **1** (HTTP tier) or **3** (browser tier). With `fetch: false`, it's a flat **2**. Read `results[].fetch.via` to attribute cost per URL. Failed sub-fetches are not charged.

> **Coverage & known limits** — - **Max 20 results per call.** For deeper coverage, paginate by refining `query` or `tbs`.
>   - **Sub-fetches are capped at 10s each** (independent of the 25s request deadline). Slow origins surface as `{ ok: false, error: "timeout" }` on that result; the envelope still succeeds.
>   - **SSRF sub-fetches** to private / loopback / link-local hosts return `{ ok: false }` per result — they don't fail the whole call.
>   - **`tbs ≥ y2`** uses Brave's `freshness=py` and post-filters by parsed date. Results without a parseable `page_age` / `age` are kept (not dropped).
>   - **`html` in `fetchFormats`** is accepted today but not emitted. Use `/fetch` directly when you need the cleaned HTML field.

## Notes

- **Inline-fetch concurrency.** Up to 5 sub-fetches run in parallel; each capped at 10s independent of the 25s request deadline.
- **Result ordering.** Results are returned in Brave's relevance order, then filtered (e.g. by `tbs`), then re-numbered with `rank: 1..N`.
- **Per-URL fetcher.** Each sub-fetch uses the same two-tier pipeline as `/fetch` (HTTP → browser escalation, SEC.gov fast path, block-page detection).

> **Try it** — Run Search from the console at [/app/search](/app/search) — toggle `fetch`, set a `tbs` window, and inspect per-result tier.


---

# Fetch

Source: https://www.gyrence.com/docs/api/fetch.md

Retrieve a single URL and return its parsed title, description, markdown body, lightly-cleaned HTML, and outbound links. Two-tier fetcher: plain HTTP first, escalating to a headless-browser worker when the page needs JS, returns a soft block, or rate-limits.

## Use this when

- You need clean markdown for a single article, doc, or filing — not a whole site.
- Your scraper keeps tripping JS shells or soft blocks (Cloudflare, Akamai) and you want auto-escalation to a headless browser.
- You're feeding RAG and need both `markdown` (LLM-ready) and `html` (your own pipeline) from the same call.
- You're indexing outbound links from a page (`links[]`) without crawling.

**Endpoint:** `POST /api/v1/fetch`  
**Auth:** Bearer  
**Credits:** 1 (http) or 3 (browser)

## Request

| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` | yes |  | Absolute http(s) URL. Private, loopback, and link-local hosts are rejected (SSRF). |
| `forceBrowser` | `boolean` |  | `false` | Skip the HTTP tier and render via the browser worker. Also bypasses the SEC.gov fast path. Always costs 3 credits. |

### Example body

```json
{
  "url": "https://example.com/article",
  "forceBrowser": false
}
```

## Response

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` |  |  | Echo of the requested URL (not the post-redirect URL). |
| `title` | `string` |  |  | Contents of . Empty string when absent. |
| `description` | `string` |  |  | <meta name="description">, falling back to og:description. Empty string when absent. |
| `markdown` | `string` |  |  | Main-content markdown. See Notes for the extraction + cleaning pipeline. |
| `html` | `string` |  |  | Lightly-cleaned HTML: only <script>, <style>, and <noscript> are removed. nav/header/footer/form/iframe/svg are preserved here (they're stripped only inside the markdown pipeline). |
| `links[]` | `string[]` |  |  | Absolute http(s) URLs found in <a href> within the cleaned HTML. Deduped, fragments stripped. |
| `statusCode` | `number` |  |  | Origin HTTP status. 0 if the underlying fetch threw (network error). |
| `fetchedAt` | `string` |  |  | ISO timestamp when the response was assembled. |
| `via` | `"http" | "browser"` |  |  | Tier that produced this result. Determines credit cost (1 vs 3). |

### Example response

```json
{
  "ok": true,
  "data": {
    "url": "https://example.com/article",
    "title": "Example Article",
    "description": "A short summary.",
    "markdown": "# Example Article\n\n...",
    "html": "<html>...</html>",
    "links": ["https://example.com/related"],
    "statusCode": 200,
    "fetchedAt": "2026-05-29T20:45:00.000Z",
    "via": "http"
  }
}
```

## Example

```bash
curl -X POST https://www.gyrence.com/api/v1/fetch \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/article"}'
```

## Errors

| Code | HTTP | Meaning |
| --- | --- | --- |
| `bad_request` | 400 | `url` missing/invalid, or any field fails validation. |
| `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. |
| `credits_exhausted` | 402 | Workspace balance below request cost. |
| `forbidden_url` | 403 | SSRF guard rejected a private, loopback, or link-local host. |
| `not_found` | 404 | Origin returned 404. Origin 404s are typed — you do **not** receive `{ok:true, statusCode:404, markdown:"<404 page>"}`. |
| `timeout` | 408 | Request exceeded the 25-second hard deadline. |
| `rate_limited` | 429 | Per-workspace rate limit. |
| `upstream_error` | 502 | Origin returned 5xx. |
| `unavailable` | 503 | Block-page detector tripped (see Notes), or any other unmapped error. |

```json
{ "ok": false, "error": "origin returned 404", "code": "not_found" }
```

> **Credits** — **1 credit** per HTTP-tier success, **3 credits** per browser-tier success. Read `via` on the response to attribute cost. Errors are not charged. `forceBrowser: true` always costs 3.

> **Coverage & known limits** — - **Block pages fail closed.** Akamai / Cloudflare / PerimeterX / DataDome / generic "access denied" responses return `code: "unavailable"` — you do **not** get the block page as markdown.
>   - **Markdown chrome-strip is regex-based** and can eat legitimate `<header>` blocks inside articles. Use the `html` field if you need to run your own conversion.
>   - **`links[]` is DOM-time only.** Links injected by JS after `page.content()` (infinite scroll, modal-loaded) won't appear.

## Notes

- **Credit accounting.** 1 credit for HTTP-tier success, 3 credits for browser-tier success. Read `via` on the response to attribute cost. Errors are not charged.
- **Escalation triggers.** HTTP tier escalates to the browser worker on origin `403`, `429`, `503`, on network error, when the response body is `< 500` chars, when the body contains a JS-shell marker (`id="root"></div>`, `id="__next">`, `id="app">`, `You need to enable JavaScript`, `<noscript>`), or — post-conversion — when HTML `≥ 1000` chars produced `< 100` chars of markdown (content-loss escalation). Browser tier is never retried.
- **`forceBrowser`.** Skips both the HTTP tier and the SEC fast path; always costs 3 credits even if the page would have succeeded over HTTP.
- **SEC.gov fast path.** `*.sec.gov` URLs (without `forceBrowser`) go HTTP-only with an identifying `Gyrence (<contact>) - financial-data retrieval` UA per SEC's fair-access policy. They skip JS-shell, browser, and content-loss escalation.
- **Block-page detection.** When a response contains markers for Akamai, Cloudflare, PerimeterX, DataDome, or a generic `access denied … you don't have permission` page (scanned in the first 4 KB), the request fails with `code: "unavailable"` and a message like `Blocked by Akamai (status 403)`. You do not receive a 200 with the block page as markdown.
- **`markdown` extraction.** Pipeline: pick the first of `<main>` / `<article>` / `<body>`, strip `<script>/<style>/<noscript>/<iframe>/<svg>/<nav>/<footer>/<header>/<form>`, then convert via `node-html-markdown`. The chrome strip is regex-based and known to be fragile on malformed HTML; it may eat legitimate `<header>` blocks inside articles.
- **`html` vs `markdown`.** `html` is the lightly-cleaned version (scripts/styles/noscript only) — use this if you want to run your own conversion. The heavy chrome strip is applied **only** to the input of the markdown converter and never leaks into the `html` field.
- **`links[]` is DOM-time.** Extracted via regex from `html` as-of conversion. Links added by JS after render (infinite scroll, modal-loaded content) appear only if they were in the DOM when the browser-tier worker called `page.content()`. HTTP-tier responses never see post-render links.
- **`statusCode` semantics.** `200`–`399` and most non-2xx codes (e.g. `403`, `451`) pass through as `{ok:true, statusCode}` with whatever body the origin returned. Only `404` (→ `not_found`) and `5xx` (→ `upstream_error`) are mapped to error envelopes. `statusCode: 0` means the underlying fetch threw before getting a response.
- **Graceful worker fallback.** If escalation is triggered but the browser worker is unreachable, the request falls back to the HTTP-tier result with `via: "http"` and an `error` field. The envelope still succeeds — inspect `error` if present.
- **SSRF.** Private (RFC1918), loopback, link-local, and `.local` hosts are rejected pre-fetch with `forbidden_url`.

> **Try it** — Hit Fetch from the console at [/app/fetch](/app/fetch) with one click — no curl required.


---

# Gyre

Source: https://www.gyrence.com/docs/api/gyre.md

Walk a site from a seed URL within a page budget, following outbound links breadth-first. Returns parsed markdown for every page successfully visited, plus per-URL errors. Same-domain by default. The link-traversal primitive in the Gyrence pipeline — Search finds, Fetch reads one, **Gyre** walks many.

## Use this when

- You need a bounded slice of a site (1–50 pages) without standing up a full crawler.
- You're feeding RAG and want a seed page plus everything it links to, in one call.
- You want partial-success semantics — per-URL failures land in `errors[]` and the walk continues.
- You're prototyping coverage before committing to a scheduled crawl.

**Endpoint:** `POST /api/v1/gyre`  
**Auth:** Bearer  
**Credits:** 1 per HTTP page, 3 per browser page (min 1)

## Request

| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` | yes |  | Absolute http(s) seed URL. Private, loopback, and link-local hosts are rejected (SSRF). |
| `maxPages` | `number` |  | `20` | Hard cap on pages visited. Range 1–50. The walk stops as soon as this many pages have been fetched (successes only). |
| `sameDomain` | `boolean` |  | `true` | When true, only links whose hostname exactly matches the seed's hostname are followed. Subdomains are NOT considered same-domain. |

### Example body

```json
{
  "url": "https://example.com",
  "maxPages": 10,
  "sameDomain": true
}
```

## Response

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `startUrl` | `string` |  |  | Echo of the seed URL. |
| `totalPages` | `number` |  |  | Number of pages successfully fetched (length of `pages[]`). |
| `pages[]` | `object[]` |  |  | One entry per successfully fetched page (see fields below). |
| `pages[].url` | `string` |  |  | Absolute URL of the visited page. |
| `pages[].title` | `string` |  |  | Contents of <title>. Empty when absent. |
| `pages[].description` | `string` |  |  | Meta description or og:description. Empty when absent. |
| `pages[].markdown` | `string` |  |  | Main-content markdown (same extraction pipeline as /fetch). |
| `pages[].statusCode` | `number` |  |  | Origin HTTP status. |
| `pages[].via` | `"http" | "browser"` |  |  | Tier that produced this page. Drives per-page credit cost. |
| `errors[]` | `object[]` |  |  | Per-URL errors encountered during the walk. `{ url, error }`. The walk continues past individual failures. |

### Example response

```json
{
  "ok": true,
  "data": {
    "startUrl": "https://example.com",
    "totalPages": 3,
    "pages": [
      {
        "url": "https://example.com",
        "title": "Example",
        "description": "Home page",
        "markdown": "# Example\n\n...",
        "statusCode": 200,
        "via": "http"
      }
    ],
    "errors": []
  }
}
```

## Example

```bash
curl -X POST https://www.gyrence.com/api/v1/gyre \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","maxPages":10}'
```

## Errors

| Code | HTTP | Meaning |
| --- | --- | --- |
| `bad_request` | 400 | `url` missing/invalid, or `maxPages` out of range. |
| `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. |
| `credits_exhausted` | 402 | Workspace balance below request cost. |
| `forbidden_url` | 403 | SSRF guard rejected the seed (private, loopback, link-local host). |
| `not_found` | 404 | The seed (and every queued URL) returned 404, or no pages were successfully fetched. |
| `timeout` | 408 | Request exceeded the 25-second hard deadline. |
| `rate_limited` | 429 | Per-workspace rate limit. |
| `upstream_error` | 502 | First page returned 5xx. |
| `unavailable` | 503 | Block-page detector tripped on the seed, or any other unmapped error. |

> **Credits** — Each fetched page is billed individually: **1 credit** per HTTP-tier page, **3 credits** per browser-tier page. The total is the sum across `pages[]`, with a **minimum of 1** even if nothing was billable. Errored URLs are not charged. Inspect each `pages[].via` to attribute cost.

> **Coverage & known limits** — - **`sameDomain` is exact-host.** `https://blog.example.com` is not considered same-domain as `https://example.com`. Use **Map** when you need subdomain coverage.
>   - **No `maxDepth` parameter.** Depth is bounded indirectly by `maxPages` and breadth-first queue order.
>   - **Query-string variants count as distinct pages.** `?utm_source=…` URLs each consume budget.
>   - **Per-page block detection** lands in `errors[]` (not the top-level envelope). The walk continues past blocked URLs.

## Notes

- **Walk semantics.** Breadth-first from the seed. Each page's outbound `links[]` (as extracted by the underlying fetcher) is enqueued. Visited URLs are tracked verbatim.
- **Page budget vs queue.** `maxPages` caps successful fetches, not URLs considered. The walk stops as soon as the budget is reached, even if the queue still has unvisited URLs.
- **Per-page fetcher.** Each page goes through the same two-tier pipeline as `/fetch` (HTTP → browser escalation, SEC.gov fast path, block-page detection). See the Fetch docs for tier mechanics.
- **Failure tolerance.** Per-URL failures are pushed to `errors[]` and the walk continues. The request only fails when nothing was fetched.

> **Try it** — Run Gyre from the console at [/app/gyre](/app/gyre) — pick a seed, set a budget, and watch pages stream in.


---

# Extract

Source: https://www.gyrence.com/docs/api/extract.md

Pull structured JSON out of a single page using an LLM. Gyrence fetches the URL (HTTP first, browser escalation if needed), converts it to Markdown, and uses an LLM to return strict JSON matching your prompt or schema.

## Use this when

- You need a few specific fields from a page (price, title, author, contact email) without writing selectors.
- The site's HTML changes often and selector-based scraping is too brittle.
- You want JSON shaped to your own schema, not raw page content.
- You're prototyping enrichment before committing to a per-domain parser.

**Endpoint:** `POST /api/v1/extract`  
**Auth:** Bearer  
**Credits:** 5 (7 if browser)

## Request

| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` | yes |  | Page URL to extract from. Must be a public http(s) origin. SSRF-blocked hosts are rejected. |
| `prompt` | `string` | yes |  | Natural-language instruction describing what to extract (e.g. "Return the product title, price in USD, and in-stock status"). |
| `schema` | `string` |  |  | Optional JSON string describing the desired shape. Sent to the model as a target structure. Must be valid JSON — invalid JSON returns `bad_request`. |
| `forceBrowser` | `boolean` |  | `false` | Skip the HTTP-first attempt and render via the headless worker immediately. Use for known JS-heavy SPAs. |

### Example body

```json
{
  "url": "https://example.com/products/widget",
  "prompt": "Extract the product name, price, currency, and availability.",
  "schema": "{\"name\":\"string\",\"price\":\"number\",\"currency\":\"string\",\"inStock\":\"boolean\"}"
}
```

## Response

```json
{
  "ok": true,
  "data": {
    "url": "https://example.com/products/widget",
    "title": "Widget — Example",
    "extractedJson": "{\"name\":\"Widget\",\"price\":29.99,\"currency\":\"USD\",\"inStock\":true}",
    "via": "http"
  }
}
```

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` |  |  | The final URL fetched (after redirects). |
| `title` | `string` |  |  | Page title parsed from the fetched HTML. |
| `extractedJson` | `string` |  |  | Stringified JSON returned by the model. Always a string — parse it client-side. On malformed model output, falls back to `{"_raw": "<original text>"}`. |
| `via` | `"http" | "browser"` |  |  | How the page was fetched. `http` = direct fetch succeeded. `browser` = escalated to the headless worker (costs 2 extra credits). |

## Example

```bash
curl -X POST https://www.gyrence.com/api/v1/extract \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products/widget",
    "prompt": "Extract the product name, price, currency, and availability.",
    "schema": "{\"name\":\"string\",\"price\":\"number\",\"currency\":\"string\",\"inStock\":\"boolean\"}"
  }'
```

## Errors

| Code | HTTP | Meaning |
| --- | --- | --- |
| `bad_request` | 400 | `url` missing/invalid, `prompt` empty, or `schema` is not valid JSON. |
| `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. |
| `credits_exhausted` | 402 | Workspace balance below request cost, or the LLM provider returned 402. |
| `forbidden_url` | 403 | SSRF guard rejected a private, loopback, or link-local host. |
| `timeout` | 408 | Request exceeded the 25-second hard deadline. |
| `rate_limited` | 429 | Per-workspace rate limit, or the LLM provider rate-limited the call. |
| `upstream_error` | 502 | Page fetch or LLM call returned a 5xx. |
| `unavailable` | 503 | Any other unmapped error. |

```json
{ "ok": false, "error": "Schema must be valid JSON", "code": "bad_request" }
```

> **Credits** — **5 credits** when fetched over HTTP, **7 credits** when Gyrence escalates to the headless browser. Charged once per call regardless of model response length. See Notes for the full list of escalation triggers.

> **Limits & known behavior** — - **Content is truncated.** Only the first **12,000 characters** of converted Markdown are sent to the model. Long pages have their tail dropped — narrow the URL (e.g. a product page, not a catalog) for best results.
>   - **Prompt + page content compete for context.** Long prompts reduce the page content the model sees. Aim for prompts under **500 characters**; let the schema do the structural work and the prompt focus on *what* to extract, not *how*.
>   - **JSON guarantee is best-effort.** The model is instructed to return strict JSON, but malformed output falls back to `{ "_raw": "<text>" }`. Always handle that shape.
>   - **`schema` is a hint, not a validator.** Gyrence does not enforce types on the model's output. Validate with your own Zod/JSON-schema layer if it matters.
>   - **One page per call.** To extract across many URLs, fan out client-side (use [`/map`](/docs/api/map) to enumerate) and call `/extract` per URL.
>   - **Markdown fidelity.** Tables and deeply nested lists may lose structure in HTML→Markdown conversion before the model sees them.

## Notes

- **Model selection and prompting are handled by Gyrence.** The system enforces strict JSON output — no prose, no markdown fences. Gyrence may update the underlying model over time to improve quality or reduce cost; the API contract (request/response shape) remains stable.
- **Fetch pipeline**: identical to [`/fetch`](/docs/api/fetch). HTTP first, with automatic escalation to the headless browser on any of: origin 403/429/503, network error (status 0), body < 500 chars, JS-shell markers (`id="root">`, `id="__next">`, `<noscript>`, etc.), or post-conversion content loss (HTML ≥ 1000 chars producing < 100 markdown chars). `forceBrowser: true` skips the HTTP attempt.
- **`via` drives pricing**, not the input. Even without `forceBrowser`, an automatic escalation bills as `browser` (7 credits).
- **SSRF re-validation** runs on the final URL after redirects, not just the input.
- **No retries on model failure.** Garbled output is returned as `_raw` for your inspection rather than re-prompted server-side.

> **Try it** — Test prompts and schemas live in the console at [/app/extract](/app/extract) — paste a URL, iterate on the prompt, copy the resulting JSON.


---

# Map

Source: https://www.gyrence.com/docs/api/map.md

Enumerate URLs for a site. Prefers sitemaps (`robots.txt` + common locations, recurses one level into sitemap indexes); falls back to a depth-1 anchor extraction from the homepage when no sitemap is found.

## Use this when

- You need a fast, broad picture of what URLs a site exposes — before deciding what to fetch.
- You're seeding a crawl or sitemap-aware indexer and want a deduped link list.
- You want to filter (`search`) to a section like `/blog/` or `/docs/` without downloading pages.
- You're checking whether a site even publishes a sitemap.

**Endpoint:** `POST /api/v1/map`  
**Auth:** Bearer  
**Credits:** 1

## Request

| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` | yes |  | Site URL to map. Must be a public http(s) origin. SSRF-blocked hosts are rejected. |
| `limit` | `number` |  | `5000` | Max links returned. 1–5000. |
| `includeSubdomains` | `boolean` |  | `false` | Include links on subdomains of the root host (e.g. blog.example.com when mapping example.com). |
| `search` | `string` |  |  | Case-insensitive substring filter applied to discovered URLs (e.g. "/blog/"). |

### Example body

```json
{
  "url": "https://example.com",
  "limit": 500,
  "search": "/blog/"
}
```

## Response

```json
{
  "ok": true,
  "data": {
    "url": "https://example.com",
    "source": "sitemap",
    "totalDiscovered": 1284,
    "totalReturned": 500,
    "links": ["https://example.com/", "https://example.com/about", "..."]
  }
}
```

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `url` | `string` |  |  | The input URL, echoed back. |
| `source` | `"sitemap" | "discovered"` |  |  | How links were discovered. `sitemap` = parsed from one or more sitemap files. `discovered` = homepage anchor extraction (fallback when no sitemap exists or all sitemaps are empty). |
| `totalDiscovered` | `number` |  |  | Unique links after host filtering, substring filtering, and de-duplication — before the limit is applied. |
| `totalReturned` | `number` |  |  | Links actually included in `links` (≤ `limit`). |
| `links` | `string[]` |  |  | Flat list of absolute URLs. Fragments stripped. |

## Example

```bash
curl -X POST https://www.gyrence.com/api/v1/map \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","limit":500,"search":"/blog/"}'
```

## Errors

| Code | HTTP | Meaning |
| --- | --- | --- |
| `bad_request` | 400 | `url` missing/invalid, `limit` out of range, or any field fails validation. |
| `unauthorized` | 401 | Missing, malformed, or revoked `Authorization` header. |
| `credits_exhausted` | 402 | Workspace balance below request cost. |
| `forbidden_url` | 403 | SSRF guard rejected a private, loopback, or link-local host. |
| `timeout` | 408 | Request exceeded the 25-second hard deadline. |
| `rate_limited` | 429 | Per-workspace rate limit. |
| `upstream_error` | 502 | Upstream returned 5xx during sitemap or homepage fetch. |
| `unavailable` | 503 | Any other unmapped error. |

```json
{ "ok": false, "error": "url must be a public http(s) origin", "code": "forbidden_url" }
```

> **Credits** — **1 credit** per call regardless of `limit` or how many sitemaps are walked. The cost is the same whether you get 5 links or 5,000.

> **Coverage & known limits** — - **Speed over completeness.** Map prioritizes a fast, sitemap-first answer. It does not crawl, does not render JS, and does not visit every URL it returns.
>   - **Sitemap-trusting.** When `source: "sitemap"`, results reflect what the site publishes — stale or partial sitemaps stay stale.
>   - **Fallback is shallow.** `source: "discovered"` reads only the homepage's anchor (`<a href>`) tags. `<link rel>`, `data-*`, and SPA-style navigation injected by JS are missed. Use **Fetch** on a specific page instead.
>   - **Subdomain default is OFF.** Set `includeSubdomains: true` to capture `blog.*`, `docs.*`, etc.

## Notes

- **Sitemap discovery order**: `robots.txt` `Sitemap:` directives, then `/sitemap.xml`, `/sitemap_index.xml`, `/wp-sitemap.xml`. Sitemap indexes are followed **exactly one level** deep — `<sitemap><loc>` entries inside an index are fetched once and not recursed further.
- **Anchor extraction (fallback)**: parses `<a href="...">` only via regex against the homepage HTML. Skips fragments (`#…`), `javascript:`, and `mailto:` schemes. Only `http(s)` URLs are kept. The seed URL itself is always included in the candidate set.
- **Host filtering** is exact-match against the input URL's hostname. With `includeSubdomains: true`, a host matches if it equals the root host OR ends with `"." + rootHost`.
- **SSRF re-validation**: each sitemap URL (including those from attacker-controllable `robots.txt` and from `<sitemap><loc>` entries in indexes) is re-validated against the SSRF allowlist before being fetched. Failures are skipped silently.
- **Per-request timeouts**: `robots.txt` and sitemap fetches use a 10s cap; the homepage-fallback fetch uses 15s. The whole request still respects the 25s hard deadline.
- **No page content.** Pair with [`/fetch`](/docs/api/fetch) or [`/extract`](/docs/api/extract) to retrieve bodies.

> **Try it** — Map a site from the console at [/app/map](/app/map) — paste a URL, filter by path, export the link list.


---

# Health

Source: https://www.gyrence.com/docs/api/health.md

Lightweight liveness probe scoped to the authenticated workspace. Confirms the API is reachable, your key is valid, and returns your workspace identity and plan.

## Use this when

- Smoke-testing a new API key before wiring it into your app.
- Confirming which workspace and plan a key belongs to.
- Powering an external uptime monitor or readiness check in CI.

**Endpoint:** `GET /api/v1/health`  
**Auth:** Bearer  
**Credits:** 0

## Request

No body. No query parameters.

## Response

| Field | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `status` | `string` |  |  | Always `"healthy"` when the endpoint returns 200. |
| `workspace_id` | `uuid` |  |  | ID of the workspace owning the API key used on this request. |
| `plan_name` | `string` |  |  | Plan identifier for the workspace (e.g. `"free"`). |

### Example response

```json
{
  "ok": true,
  "data": {
    "status": "healthy",
    "workspace_id": "8f3a1c20-9b4e-4d77-bc1e-3e5a7d2c9f10",
    "plan_name": "free"
  }
}
```

## Example

```bash
curl https://www.gyrence.com/api/v1/health \
  -H "Authorization: Bearer $GYRENCE_API_KEY"
```

## Errors

| Code | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `unauthorized` | `401` |  |  | Missing or invalid `Authorization: Bearer <key>` header. |
| `unavailable` | `503` |  |  | "health check unavailable" — the workspace plan lookup (`get_plan_name` RPC) failed. Retry with backoff. |

> **Credits** — **0 credits** per call regardless of frequency. Calls are still recorded in `usage_events` for audit.

> **What this does NOT check** — A 200 here only confirms the Gyrence API, your API key, and the workspace plan lookup are alive. It does **not** probe Brave Search, the headless-browser worker, or the AI Gateway. Search, Fetch (browser tier), and Extract can be degraded independently.

## Unauthenticated probe

For a public liveness check (no API key, no workspace context), call `GET /api/public/health`. Unlike the authenticated endpoint, the response is **not** wrapped in the standard envelope:

```json
{
  "status": "ok",
  "timestamp": "2026-05-30T18:42:00.000Z",
  "uptime": 1284.31
}
```

Use this when you can't safely embed an API key (e.g. status-page pingers).


---

# MCP

Source: https://www.gyrence.com/docs/api/mcp.md

Hosted **Model Context Protocol** endpoint exposing all five Gyrence primitives as MCP tools. JSON-RPC 2.0 in, JSON or Server-Sent Events out, bearer-token auth. Use this when your client is an MCP-aware agent runtime (Claude Desktop, Cursor, your own MCP SDK) and you want to skip writing per-primitive HTTP wrappers.

This page is the wire-level reference. For the conceptual model — why MCP, billing parity, client configuration — start with [Concepts → MCP integration](/docs/concepts/mcp).

## Use this when

- Your runtime speaks MCP and you want the agent itself to pick which Gyrence primitive to call.
- You want one transport for both discovery (Search, Map) and acquisition (Fetch, Gyre, Extract).
- You want zero divergence between MCP and HTTP — same credits, same activity log, same error model.

**Endpoint:** `POST /api/mcp`  
**Auth:** Bearer  
**Credits:** Per-tool — identical to the underlying HTTP primitive

## Auth

```
Authorization: Bearer mc_<your-workspace-key>
```

The same key that authenticates `/api/v1/*` authenticates MCP. Header-only — keys are never accepted in the URL path or body. Missing/invalid keys return JSON-RPC error code `-32001` with HTTP `401`. Failed auth is **never billed**.

`GET` and `DELETE` return HTTP `405` (no standalone SSE channel).

## The tools

Five tools, named `gyrence_<primitive>`. Each tool's input schema is the **same Zod object** that validates the corresponding `/api/v1/*` request — so request shapes match the HTTP docs verbatim.

| Tool | Underlying handler | Credit cost |
| --- | --- | --- |
| `gyrence_search` | [`POST /api/v1/search`](/docs/api/search) | 1 base; +1 per HTTP sub-fetch, +3 per browser sub-fetch when `fetch: true` |
| `gyrence_fetch` | [`POST /api/v1/fetch`](/docs/api/fetch) | 1 (HTTP) or 3 (browser) |
| `gyrence_gyre` | [`POST /api/v1/gyre`](/docs/api/gyre) | Sum of per-page costs across `pages[]` (1 / 3 per page), minimum 1 |
| `gyrence_extract` | [`POST /api/v1/extract`](/docs/api/extract) | 5 (HTTP) or 7 (browser) |
| `gyrence_map` | [`POST /api/v1/map`](/docs/api/map) | 1 flat |

For full request/response shapes per tool, follow the link to the corresponding HTTP reference page — the schemas are identical.

## JSON-RPC methods

The server speaks the standard MCP method set. Your client library handles this for you; the calls below are useful for sanity-checking with curl.

### `initialize`

Negotiation handshake. Returns server name, version, and protocol capabilities.

```bash
curl -X POST https://www.gyrence.com/api/mcp \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {},
      "clientInfo": { "name": "my-client", "version": "0.0.1" }
    }
  }'
```

### `tools/list`

Enumerate the five tools and their JSON Schemas. The schema for each tool is generated from the same Zod object as the HTTP route, so the input shape is canonical.

```bash
curl -X POST https://www.gyrence.com/api/mcp \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }'
```

Response (abbreviated):

```json
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      { "name": "gyrence_search", "description": "Resolve a query into ranked, deduplicated URLs…", "inputSchema": { "type": "object", "properties": { "query": { "type": "string" }, "limit": { "type": "number" }, "fetch": { "type": "boolean" }, "fetchFormats": { "type": "array" }, "tbs": { "type": "string" } }, "required": ["query"] } },
      { "name": "gyrence_fetch",   "description": "…", "inputSchema": { "...": "..." } },
      { "name": "gyrence_gyre",    "description": "…", "inputSchema": { "...": "..." } },
      { "name": "gyrence_extract", "description": "…", "inputSchema": { "...": "..." } },
      { "name": "gyrence_map",     "description": "…", "inputSchema": { "...": "..." } }
    ]
  }
}
```

### `tools/call`

Invoke a tool. The `arguments` object matches the underlying HTTP primitive's request body.

```bash
curl -X POST https://www.gyrence.com/api/mcp \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "gyrence_gyre",
      "arguments": { "url": "https://example.com", "maxPages": 5 }
    }
  }'
```

Response (success):

```json
{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "{\"startUrl\":\"https://example.com\",\"totalPages\":1,\"pages\":[{\"url\":\"https://example.com\",\"title\":\"Example\",\"markdown\":\"...\",\"statusCode\":200,\"via\":\"http\"}],\"errors\":[]}"
      }
    ]
  }
}
```

The `text` field is a **stringified** copy of the underlying primitive's `data` payload. Parse it client-side; the schema is documented per primitive (see the table above).

## Errors

Two error surfaces, depending on where the failure happens.

### Transport-level errors

Returned as JSON-RPC error envelopes with HTTP status mirroring the failure:

| Code | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `-32001` | `401` |  |  | Unauthorized: missing or invalid API key. Failed auth is never billed. |
| `-32000` | `405` |  |  | Method not allowed: GET or DELETE was used. Only POST is supported on this endpoint. |

```json
{
  "jsonrpc": "2.0",
  "error": { "code": -32001, "message": "Unauthorized: invalid or missing API key." },
  "id": null
}
```

### Tool-level errors

Primitive failures (bad URL, credits exhausted, upstream error, etc.) come back inside the `tools/call` result with `isError: true`. The text content is the redacted error string in the form `code: message`.

```json
{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "isError": true,
    "content": [
      {
        "type": "text",
        "text": "credits_exhausted: workspace balance below request cost"
      }
    ]
  }
}
```

The `code` prefix is one of the standard envelope codes documented in [Errors](/docs/errors): `bad_request`, `unauthorized`, `credits_exhausted`, `forbidden_url`, `not_found`, `timeout`, `rate_limited`, `upstream_error`, `unavailable`.

> **Credits** — Every MCP <code>tools/call</code> bills exactly the same as the equivalent HTTP call to the underlying primitive. The recording happens in the shared <code>runEndpoint</code> path — there is no MCP-specific credit code. See <a href="/docs/concepts/credits">Credits</a> for the live schedule.

> **Coverage & known limits** — - **Header auth only.** Keys in the URL path or request body are rejected. This is deliberate — URL paths leak into proxy access logs.
>   - **No standalone SSE.** <code>GET /api/mcp</code> returns 405. SSE is only used as the response transport when the client negotiates it during <code>tools/call</code>.
>   - **Per-tool credit cost is fixed by the underlying primitive.** Changing transports does not change cost.
>   - **Tool errors are not retried server-side.** A <code>upstream_error</code> from the underlying primitive surfaces as <code>isError: true</code>; your client decides whether to retry. The same backoff rules from <a href="/docs/rate-limits">Rate limits</a> apply.

## Notes

- **Per-request server.** A fresh `McpServer` instance is constructed per HTTP request, with the parsed API key captured in each tool's closure. Two concurrent requests authenticated with different keys cannot share server state.
- **Schemas are the single source of truth.** `searchHandler.schema`, `fetchHandler.schema`, etc. are imported by both the HTTP route and the MCP tool — schema drift between transports is impossible.
- **Billing parity by construction.** Each MCP tool's `execute` synthesizes an internal `Request` and calls the same `handle()` the HTTP route uses. The `recordUsage` write happens in shared code; MCP and HTTP rows are indistinguishable except for the `url` field's provenance.
- **Activity log.** MCP calls show up in **Console → Activity** with the primitive endpoint (`/gyre`, `/search`, etc.) — not as `/mcp`. This is the correct attribution: the cost was the primitive's cost.

> **Sanity-check from your terminal** — The fastest way to verify your key works over MCP is a single <code>tools/list</code> call (see above). It returns the tool definitions, costs zero credits (auth-only path), and confirms the endpoint is reachable from your network.