Map
Enumerate URLs for a site. Prefers sitemaps (robots.txt + common locations, recurses one level into sitemap indexes); falls back to a depth-1 anchor extraction from the homepage when no sitemap is found.
Use this when
- You need a fast, broad picture of what URLs a site exposes — before deciding what to fetch.
- You're seeding a crawl or sitemap-aware indexer and want a deduped link list.
- You want to filter (
search) to a section like/blog/or/docs/without downloading pages. - You're checking whether a site even publishes a sitemap.
| Method | POST |
| Path | /api/v1/map |
| Auth | Bearer |
| Credits | 1 |
Request
| Parameter | Type | Description |
|---|---|---|
urlrequired | string | Site URL to map. Must be a public http(s) origin. SSRF-blocked hosts are rejected. |
limit | numberdefault: 5000 | Max links returned. 1–5000. |
includeSubdomains | booleandefault: false | Include links on subdomains of the root host (e.g. blog.example.com when mapping example.com). |
search | string | Case-insensitive substring filter applied to discovered URLs (e.g. "/blog/"). |
Example body
{
"url": "https://example.com",
"limit": 500,
"search": "/blog/"
}Response
{
"ok": true,
"data": {
"url": "https://example.com",
"source": "sitemap",
"totalDiscovered": 1284,
"totalReturned": 500,
"links": ["https://example.com/", "https://example.com/about", "..."]
}
}| Field | Type | Description |
|---|---|---|
url | string | The input URL, echoed back. |
source | "sitemap" | "discovered" | How links were discovered. `sitemap` = parsed from one or more sitemap files. `discovered` = homepage anchor extraction (fallback when no sitemap exists or all sitemaps are empty). |
totalDiscovered | number | Unique links after host filtering, substring filtering, and de-duplication — before the limit is applied. |
totalReturned | number | Links actually included in `links` (≤ `limit`). |
links | string[] | Flat list of absolute URLs. Fragments stripped. |
Example
curl -X POST https://www.gyrence.com/api/v1/map \
-H "Authorization: Bearer $GYRENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","limit":500,"search":"/blog/"}'Errors
| Code | HTTP | Meaning |
|---|---|---|
bad_request | 400 | url missing/invalid, limit out of range, or any field fails validation. |
unauthorized | 401 | Missing, malformed, or revoked Authorization header. |
credits_exhausted | 402 | Workspace balance below request cost. |
forbidden_url | 403 | SSRF guard rejected a private, loopback, or link-local host. |
timeout | 408 | Request exceeded the 25-second hard deadline. |
rate_limited | 429 | Per-workspace rate limit. |
upstream_error | 502 | Upstream returned 5xx during sitemap or homepage fetch. |
unavailable | 503 | Any other unmapped error. |
{ "ok": false, "error": "url must be a public http(s) origin", "code": "forbidden_url" }Credits
1 credit per call regardless of limit or how many sitemaps are walked. The cost is the same whether you get 5 links or 5,000.
Coverage & known limits
- Speed over completeness. Map prioritizes a fast, sitemap-first answer. It does not crawl, does not render JS, and does not visit every URL it returns.
- Sitemap-trusting. When
source: "sitemap", results reflect what the site publishes — stale or partial sitemaps stay stale. - Fallback is shallow.
source: "discovered"reads only the homepage's anchor (<a href>) tags.<link rel>,data-*, and SPA-style navigation injected by JS are missed. Use Fetch on a specific page instead. - Subdomain default is OFF. Set
includeSubdomains: trueto captureblog.*,docs.*, etc.
Notes
- Sitemap discovery order:
robots.txtSitemap:directives, then/sitemap.xml,/sitemap_index.xml,/wp-sitemap.xml. Sitemap indexes are followed exactly one level deep —<sitemap><loc>entries inside an index are fetched once and not recursed further. - Anchor extraction (fallback): parses
<a href="...">only via regex against the homepage HTML. Skips fragments (#…),javascript:, andmailto:schemes. Onlyhttp(s)URLs are kept. The seed URL itself is always included in the candidate set. - Host filtering is exact-match against the input URL's hostname. With
includeSubdomains: true, a host matches if it equals the root host OR ends with"." + rootHost. - SSRF re-validation: each sitemap URL (including those from attacker-controllable
robots.txtand from<sitemap><loc>entries in indexes) is re-validated against the SSRF allowlist before being fetched. Failures are skipped silently. - Per-request timeouts:
robots.txtand sitemap fetches use a 10s cap; the homepage-fallback fetch uses 15s. The whole request still respects the 25s hard deadline. - No page content. Pair with
/fetchor/extractto retrieve bodies.
Try it
Map a site from the console at /app/map — paste a URL, filter by path, export the link list.
