Gyre

Walk a site from a seed URL within a page budget, following outbound links breadth-first. Returns parsed markdown for every page successfully visited, plus per-URL errors. Same-domain by default. The link-traversal primitive in the Gyrence pipeline — Search finds, Fetch reads one, Gyre walks many.

Use this when

  • You need a bounded slice of a site (1–50 pages) without standing up a full crawler.
  • You're feeding RAG and want a seed page plus everything it links to, in one call.
  • You want partial-success semantics — per-URL failures land in errors[] and the walk continues.
  • You're prototyping coverage before committing to a scheduled crawl.
MethodPOST
Path/api/v1/gyre
AuthBearer
Credits1 per HTTP page, 3 per browser page (min 1)

Request

ParameterTypeDescription
url
required
stringAbsolute http(s) seed URL. Private, loopback, and link-local hosts are rejected (SSRF).
maxPagesnumber
default: 20
Hard cap on pages visited. Range 1–50. The walk stops as soon as this many pages have been fetched (successes only).
sameDomainboolean
default: true
When true, only links whose hostname exactly matches the seed's hostname are followed. Subdomains are NOT considered same-domain.

Example body

{
  "url": "https://example.com",
  "maxPages": 10,
  "sameDomain": true
}

Response

FieldTypeDescription
startUrlstringEcho of the seed URL.
totalPagesnumberNumber of pages successfully fetched (length of `pages[]`).
pages[]object[]One entry per successfully fetched page (see fields below).
pages[].urlstringAbsolute URL of the visited page.
pages[].titlestringContents of <title>. Empty when absent.
pages[].descriptionstringMeta description or og:description. Empty when absent.
pages[].markdownstringMain-content markdown (same extraction pipeline as /fetch).
pages[].statusCodenumberOrigin HTTP status.
pages[].via"http" | "browser"Tier that produced this page. Drives per-page credit cost.
errors[]object[]Per-URL errors encountered during the walk. `{ url, error }`. The walk continues past individual failures.

Example response

{
  "ok": true,
  "data": {
    "startUrl": "https://example.com",
    "totalPages": 3,
    "pages": [
      {
        "url": "https://example.com",
        "title": "Example",
        "description": "Home page",
        "markdown": "# Example\n\n...",
        "statusCode": 200,
        "via": "http"
      }
    ],
    "errors": []
  }
}

Example

curl -X POST https://www.gyrence.com/api/v1/gyre \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","maxPages":10}'

Errors

CodeHTTPMeaning
bad_request400url missing/invalid, or maxPages out of range.
unauthorized401Missing, malformed, or revoked Authorization header.
credits_exhausted402Workspace balance below request cost.
forbidden_url403SSRF guard rejected the seed (private, loopback, link-local host).
not_found404The seed (and every queued URL) returned 404, or no pages were successfully fetched.
timeout408Request exceeded the 25-second hard deadline.
rate_limited429Per-workspace rate limit.
upstream_error502First page returned 5xx.
unavailable503Block-page detector tripped on the seed, or any other unmapped error.
Credits

Each fetched page is billed individually: 1 credit per HTTP-tier page, 3 credits per browser-tier page. The total is the sum across pages[], with a minimum of 1 even if nothing was billable. Errored URLs are not charged. Inspect each pages[].via to attribute cost.

Coverage & known limits
  • sameDomain is exact-host. https://blog.example.com is not considered same-domain as https://example.com. Use Map when you need subdomain coverage.
  • No maxDepth parameter. Depth is bounded indirectly by maxPages and breadth-first queue order.
  • Query-string variants count as distinct pages. ?utm_source=… URLs each consume budget.
  • Per-page block detection lands in errors[] (not the top-level envelope). The walk continues past blocked URLs.

Notes

  • Walk semantics. Breadth-first from the seed. Each page's outbound links[] (as extracted by the underlying fetcher) is enqueued. Visited URLs are tracked verbatim.
  • Page budget vs queue. maxPages caps successful fetches, not URLs considered. The walk stops as soon as the budget is reached, even if the queue still has unvisited URLs.
  • Per-page fetcher. Each page goes through the same two-tier pipeline as /fetch (HTTP → browser escalation, SEC.gov fast path, block-page detection). See the Fetch docs for tier mechanics.
  • Failure tolerance. Per-URL failures are pushed to errors[] and the walk continues. The request only fails when nothing was fetched.
Try it

Run Gyre from the console at /app/gyre — pick a seed, set a budget, and watch pages stream in.