Map

Enumerate URLs for a site. Prefers sitemaps (robots.txt + common locations, recurses one level into sitemap indexes); falls back to a depth-1 anchor extraction from the homepage when no sitemap is found.

Use this when

You need a fast, broad picture of what URLs a site exposes — before deciding what to fetch.
You're seeding a crawl or sitemap-aware indexer and want a deduped link list.
You want to filter (search) to a section like /blog/ or /docs/ without downloading pages.
You're checking whether a site even publishes a sitemap.

Method	POST
Path	`/api/v1/map`
Auth	Bearer
Credits	1

Request

Parameter	Type	Description
`url` required	`string`	Site URL to map. Must be a public http(s) origin. SSRF-blocked hosts are rejected.
`limit`	`number` default: `5000`	Max links returned. 1–5000.
`includeSubdomains`	`boolean` default: `false`	Include links on subdomains of the root host (e.g. blog.example.com when mapping example.com).
`search`	`string`	Case-insensitive substring filter applied to discovered URLs (e.g. "/blog/").

Example body

{
  "url": "https://example.com",
  "limit": 500,
  "search": "/blog/"
}

Response

{
  "ok": true,
  "data": {
    "url": "https://example.com",
    "source": "sitemap",
    "totalDiscovered": 1284,
    "totalReturned": 500,
    "links": ["https://example.com/", "https://example.com/about", "..."]
  }
}

Field	Type	Description
`url`	`string`	The input URL, echoed back.
`source`	`"sitemap" \| "discovered"`	How links were discovered. `sitemap` = parsed from one or more sitemap files. `discovered` = homepage anchor extraction (fallback when no sitemap exists or all sitemaps are empty).
`totalDiscovered`	`number`	Unique links after host filtering, substring filtering, and de-duplication — before the limit is applied.
`totalReturned`	`number`	Links actually included in `links` (≤ `limit`).
`links`	`string[]`	Flat list of absolute URLs. Fragments stripped.

Example

curl -X POST https://www.gyrence.com/api/v1/map \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","limit":500,"search":"/blog/"}'

Errors

Code	HTTP	Meaning
`bad_request`	400	`url` missing/invalid, `limit` out of range, or any field fails validation.
`unauthorized`	401	Missing, malformed, or revoked `Authorization` header.
`credits_exhausted`	402	Workspace balance below request cost.
`forbidden_url`	403	SSRF guard rejected a private, loopback, or link-local host.
`timeout`	408	Request exceeded the 25-second hard deadline.
`rate_limited`	429	Per-workspace rate limit.
`upstream_error`	502	Upstream returned 5xx during sitemap or homepage fetch.
`unavailable`	503	Any other unmapped error.

{ "ok": false, "error": "url must be a public http(s) origin", "code": "forbidden_url" }

Credits

1 credit per call regardless of limit or how many sitemaps are walked. The cost is the same whether you get 5 links or 5,000.

Coverage & known limits

Speed over completeness. Map prioritizes a fast, sitemap-first answer. It does not crawl, does not render JS, and does not visit every URL it returns.
Sitemap-trusting. When source: "sitemap", results reflect what the site publishes — stale or partial sitemaps stay stale.
Fallback is shallow. source: "discovered" reads only the homepage's anchor (<a href>) tags. <link rel>, data-*, and SPA-style navigation injected by JS are missed. Use Fetch on a specific page instead.
Subdomain default is OFF. Set includeSubdomains: true to capture blog.*, docs.*, etc.

Notes

Sitemap discovery order: robots.txt Sitemap: directives, then /sitemap.xml, /sitemap_index.xml, /wp-sitemap.xml. Sitemap indexes are followed exactly one level deep — <sitemap><loc> entries inside an index are fetched once and not recursed further.
Anchor extraction (fallback): parses <a href="..."> only via regex against the homepage HTML. Skips fragments (#…), javascript:, and mailto: schemes. Only http(s) URLs are kept. The seed URL itself is always included in the candidate set.
Host filtering is exact-match against the input URL's hostname. With includeSubdomains: true, a host matches if it equals the root host OR ends with "." + rootHost.
SSRF re-validation: each sitemap URL (including those from attacker-controllable robots.txt and from <sitemap><loc> entries in indexes) is re-validated against the SSRF allowlist before being fetched. Failures are skipped silently.
Per-request timeouts: robots.txt and sitemap fetches use a 10s cap; the homepage-fallback fetch uses 15s. The whole request still respects the 25s hard deadline.
No page content. Pair with /fetch or /extract to retrieve bodies.

Try it

Map a site from the console at /app/map — paste a URL, filter by path, export the link list.