Map

Enumerate URLs for a site. Prefers sitemaps (robots.txt + common locations, recurses one level into sitemap indexes); falls back to a depth-1 anchor extraction from the homepage when no sitemap is found.

Use this when

  • You need a fast, broad picture of what URLs a site exposes — before deciding what to fetch.
  • You're seeding a crawl or sitemap-aware indexer and want a deduped link list.
  • You want to filter (search) to a section like /blog/ or /docs/ without downloading pages.
  • You're checking whether a site even publishes a sitemap.
MethodPOST
Path/api/v1/map
AuthBearer
Credits1

Request

ParameterTypeDescription
url
required
stringSite URL to map. Must be a public http(s) origin. SSRF-blocked hosts are rejected.
limitnumber
default: 5000
Max links returned. 1–5000.
includeSubdomainsboolean
default: false
Include links on subdomains of the root host (e.g. blog.example.com when mapping example.com).
searchstringCase-insensitive substring filter applied to discovered URLs (e.g. "/blog/").

Example body

{
  "url": "https://example.com",
  "limit": 500,
  "search": "/blog/"
}

Response

{
  "ok": true,
  "data": {
    "url": "https://example.com",
    "source": "sitemap",
    "totalDiscovered": 1284,
    "totalReturned": 500,
    "links": ["https://example.com/", "https://example.com/about", "..."]
  }
}
FieldTypeDescription
urlstringThe input URL, echoed back.
source"sitemap" | "discovered"How links were discovered. `sitemap` = parsed from one or more sitemap files. `discovered` = homepage anchor extraction (fallback when no sitemap exists or all sitemaps are empty).
totalDiscoverednumberUnique links after host filtering, substring filtering, and de-duplication — before the limit is applied.
totalReturnednumberLinks actually included in `links` (≤ `limit`).
linksstring[]Flat list of absolute URLs. Fragments stripped.

Example

curl -X POST https://www.gyrence.com/api/v1/map \
  -H "Authorization: Bearer $GYRENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","limit":500,"search":"/blog/"}'

Errors

CodeHTTPMeaning
bad_request400url missing/invalid, limit out of range, or any field fails validation.
unauthorized401Missing, malformed, or revoked Authorization header.
credits_exhausted402Workspace balance below request cost.
forbidden_url403SSRF guard rejected a private, loopback, or link-local host.
timeout408Request exceeded the 25-second hard deadline.
rate_limited429Per-workspace rate limit.
upstream_error502Upstream returned 5xx during sitemap or homepage fetch.
unavailable503Any other unmapped error.
{ "ok": false, "error": "url must be a public http(s) origin", "code": "forbidden_url" }
Credits

1 credit per call regardless of limit or how many sitemaps are walked. The cost is the same whether you get 5 links or 5,000.

Coverage & known limits
  • Speed over completeness. Map prioritizes a fast, sitemap-first answer. It does not crawl, does not render JS, and does not visit every URL it returns.
  • Sitemap-trusting. When source: "sitemap", results reflect what the site publishes — stale or partial sitemaps stay stale.
  • Fallback is shallow. source: "discovered" reads only the homepage's anchor (<a href>) tags. <link rel>, data-*, and SPA-style navigation injected by JS are missed. Use Fetch on a specific page instead.
  • Subdomain default is OFF. Set includeSubdomains: true to capture blog.*, docs.*, etc.

Notes

  • Sitemap discovery order: robots.txt Sitemap: directives, then /sitemap.xml, /sitemap_index.xml, /wp-sitemap.xml. Sitemap indexes are followed exactly one level deep — <sitemap><loc> entries inside an index are fetched once and not recursed further.
  • Anchor extraction (fallback): parses <a href="..."> only via regex against the homepage HTML. Skips fragments (#…), javascript:, and mailto: schemes. Only http(s) URLs are kept. The seed URL itself is always included in the candidate set.
  • Host filtering is exact-match against the input URL's hostname. With includeSubdomains: true, a host matches if it equals the root host OR ends with "." + rootHost.
  • SSRF re-validation: each sitemap URL (including those from attacker-controllable robots.txt and from <sitemap><loc> entries in indexes) is re-validated against the SSRF allowlist before being fetched. Failures are skipped silently.
  • Per-request timeouts: robots.txt and sitemap fetches use a 10s cap; the homepage-fallback fetch uses 15s. The whole request still respects the 25s hard deadline.
  • No page content. Pair with /fetch or /extract to retrieve bodies.
Try it

Map a site from the console at /app/map — paste a URL, filter by path, export the link list.