Gyre

Walk a site from a seed URL within a page budget, following outbound links breadth-first. Returns parsed markdown for every page successfully visited, plus per-URL errors. Same-domain by default. The link-traversal primitive in the Gyrence pipeline — Search finds, Fetch reads one, Gyre walks many. Gyre includes built-in pagination loop pruning, simhash near-duplicate content deduplication, cross-domain hop control, and per-host concurrency limiting — purpose-built for AI agent graph traversal workloads.

Graph traversal capabilities

Gyre executes targeted network graph discovery across publicly indexable web nodes, isolating graph navigation from data extraction.

Deterministic graph pruning — traversal paths are evaluated against a 25-pattern default prune list (pagination traps, taxonomy loops, auth walls). Override via prunePaths or disable with allowPrune: false.
Cross-domain hop control — maxHops enforces strict domain-boundary limits for multi-hop research workflows. Combine with sameDomain: false for controlled cross-domain traversal.
Near-duplicate suppression — simhash fingerprinting prunes redundant branches spawned by mirrored content (≥85% overlap threshold) without LLM cost. The page itself is still returned in pages[].
Two-tier dynamic routing — automatic escalation from lightweight HTTP to headless browser rendering on JS-shell detection. No browser cost on the easy 80%.

Use this when

You need a bounded slice of a site (1–50 pages) without standing up a full crawler.
You're feeding RAG and want a seed page plus everything it links to, in one call.
You want partial-success semantics — per-URL failures land in errors[] and the walk continues.
You're prototyping coverage before committing to a scheduled crawl.

Method	POST
Path	`/api/v1/gyre`
Auth	Bearer
Credits	1 per HTTP page, 3 per browser page (min 1)

Request

Parameter	Type	Description
`url` required	`string`	Absolute http(s) seed URL. Private, loopback, and link-local hosts are rejected (SSRF).
`maxPages`	`number` default: `20`	Hard cap on pages visited. Range 1–100. The walk stops as soon as this many pages have been fetched (successes only).
`sameDomain`	`boolean` default: `true`	When true, only links whose hostname exactly matches the seed's hostname are followed. Subdomains are NOT considered same-domain.
`maxHops`	`number`	Max cross-domain hops from seed. 0 = seed domain only. 1 = follow one external link but stay on that domain. Undefined = unlimited (subject to sameDomain).
`prunePaths`	`string[]`	Additional URL substrings to never queue, merged with Gyrence defaults. Example: ["/blog/", "?ref="].
`allowPrune`	`boolean` default: `true`	Set false to disable all pruning including defaults. Use when you need to crawl /page/ or /category/ paths intentionally.

Example body

{
  "url": "https://example.com",
  "maxPages": 10,
  "sameDomain": true
}

Response

Field	Type	Description
`startUrl`	`string`	Echo of the seed URL.
`totalPages`	`number`	Number of pages successfully fetched (length of `pages[]`).
`pages[]`	`object[]`	One entry per successfully fetched page (see fields below).
`pages[].url`	`string`	Absolute URL of the visited page.
`pages[].title`	`string`	Contents of <title>. Empty when absent.
`pages[].description`	`string`	Meta description or og:description. Empty when absent.
`pages[].markdown`	`string`	Main-content markdown (same extraction pipeline as /fetch).
`pages[].statusCode`	`number`	Origin HTTP status.
`pages[].via`	`"http" \| "browser"`	Tier that produced this page. Drives per-page credit cost.
`errors[]`	`object[]`	Per-URL errors encountered during the walk. `{ url, error }`. The walk continues past individual failures.

Example response

{
  "ok": true,
  "data": {
    "startUrl": "https://example.com",
    "totalPages": 3,
    "pages": [
      {
        "url": "https://example.com",
        "title": "Example",
        "description": "Home page",
        "markdown": "# Example\n\n...",
        "statusCode": 200,
        "via": "http"
      }
    ],
    "errors": []
  }
}

Example

curl -X POST https://www.gyrence.com/api/v1/gyre \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","maxPages":10}'

Errors

Code	HTTP	Meaning
`bad_request`	400	`url` missing/invalid, or `maxPages` out of range.
`unauthorized`	401	Missing, malformed, or revoked `Authorization` header.
`credits_exhausted`	402	Workspace balance below request cost.
`forbidden_url`	403	SSRF guard rejected the seed (private, loopback, link-local host).
`not_found`	404	The seed (and every queued URL) returned 404, or no pages were successfully fetched.
`timeout`	408	Request exceeded the 25-second hard deadline.
`rate_limited`	429	Per-workspace rate limit.
`upstream_error`	502	First page returned 5xx.
`unavailable`	503	Block-page detector tripped on the seed, or any other unmapped error.

Credits

Each fetched page is billed individually: 1 credit per HTTP-tier page, 3 credits per browser-tier page. The total is the sum across pages[], with a minimum of 1 even if nothing was billable. Errored URLs are not charged. Inspect each pages[].via to attribute cost.

Coverage & known limits

sameDomain is exact-host. https://blog.example.com is not considered same-domain as https://example.com. Use Map when you need subdomain coverage.
No maxDepth parameter. Depth is bounded indirectly by maxPages and breadth-first queue order.
Query-string variants count as distinct pages. ?utm_source=… URLs each consume budget.
Per-page block detection lands in errors[] (not the top-level envelope). The walk continues past blocked URLs.
Default link pruning is on. ~25 URL patterns are pruned by default: pagination (?page=, /page/), taxonomy loops (/tag/, /category/, /author/), legal pages (/privacy, /terms), auth flows (/login, /cart), and ad/tracking paths. Pass allowPrune: false to disable. Extend via prunePaths.
Near-duplicate suppression. Pages whose markdown content is ≥85% similar to an already-fetched page (simhash) have their outbound links pruned — stopping mirrored press releases or syndicated content from multiplying crawl cost. The page itself is still returned in pages[].
Cross-domain hops. maxHops counts domain boundaries crossed from the seed. Works independently of sameDomain — maxHops: 1 with sameDomain: false follows one external link per branch and stays on that destination.

Notes

Walk semantics. Breadth-first from the seed. Each page's outbound links[] (as extracted by the underlying fetcher) is enqueued. Visited URLs are tracked verbatim.
Page budget vs queue. maxPages caps successful fetches, not URLs considered. The walk stops as soon as the budget is reached, even if the queue still has unvisited URLs.
Per-page fetcher. Each page goes through the same two-tier pipeline as /fetch (HTTP → browser escalation, SEC.gov fast path, block-page detection). See the Fetch docs for tier mechanics.
Failure tolerance. Per-URL failures are pushed to errors[] and the walk continues. The request only fails when nothing was fetched.

Try it

Run Gyre from the console at /app/gyre — pick a seed, set a budget, and watch pages stream in.