For AI Agents
Plug Gyrence into LangChain, CrewAI, AutoGen, Claude Desktop, and Cursor as your retrieval and web-data layer. Governed sources, format-aware extractors, and MCP↔REST billing parity — built for autonomous agents that need trustworthy, token-efficient web data.
No bespoke SDK required. Call Gyrence as a REST tool from any agent framework, or connect via MCP for zero-glue integration. Native typed JS/TS SDK is on the roadmap.
Wrap Gyrence Fetch as a Tool.
from langchain.tools import Tool
import requests
def gyrence_fetch(url: str) -> str:
r = requests.post(
"https://www.gyrence.com/api/v1/fetch",
headers={"Authorization": "Bearer $GYRENCE_API_KEY"},
json={"url": url},
)
return r.json()["data"]["markdown"]
gyrence_tool = Tool(
name="gyrence_fetch",
func=gyrence_fetch,
description="Fetch a URL and return LLM-ready markdown.",
)Expose Fetch as a CrewAI @tool.
from crewai.tools import tool
import requests
@tool("gyrence_fetch")
def gyrence_fetch(url: str) -> str:
"""Fetch a URL via Gyrence and return markdown."""
r = requests.post(
"https://www.gyrence.com/api/v1/fetch",
headers={"Authorization": "Bearer $GYRENCE_API_KEY"},
json={"url": url},
)
return r.json()["data"]["markdown"]Register as a function tool on any agent.
from autogen_core.tools import FunctionTool
import httpx
async def gyrence_fetch(url: str) -> str:
async with httpx.AsyncClient() as c:
r = await c.post(
"https://www.gyrence.com/api/v1/fetch",
headers={"Authorization": "Bearer $GYRENCE_API_KEY"},
json={"url": url},
)
return r.json()["data"]["markdown"]
fetch_tool = FunctionTool(gyrence_fetch, description="Fetch URL via Gyrence.")All Gyrence primitives as MCP tools — same handlers, same billing as REST.
{
"mcpServers": {
"gyrence": {
"url": "https://www.gyrence.com/api/mcp/YOUR_API_KEY"
}
}
}Works with Claude Desktop, Cursor, LangChain MCP adapters, AutoGen MCP, and any MCP-aware runtime.
| Gyrence | Firecrawl | Crawl4AI | |
|---|---|---|---|
| Distribution | Managed API | Managed API | Open-source library |
| Native MCP server | Yes — billing parity with REST | Limited | Self-build |
| Governed Source Registry | Yes — trust + readability scores | No | No |
| Format-aware extractors | iXBRL, MediaWiki, Parquet, RSS, Office | Markdown / JSON | Markdown / JSON |
| Adaptive browser escalation | HTTP-first, escalate on bot-wall signals | Browser by default | Configurable |
Every fetch anchored to a cataloged source with trust score, machine-readability score, jurisdiction, and freshness.
iXBRL → structured facts, MediaWiki → ~2.2x faster, Parquet/Arrow, Atom/RSS, Office docs (docABL) — not just raw HTML.
HTTP-first, escalates to a Playwright worker only on 403/429/503, JS-shell markers, or thin bodies. No browser cost on the easy 80%.
Cloudflare robots.txt AI directives parsed and returned in the response — respect publisher consent without a second fetch.
Same handlers power /api/v1/* and /api/mcp/:key. Identical extraction quality and metered cost on either transport.
Structured outputs replace markdown blobs on supported formats — typically 5–20x fewer tokens per page.
| Source type | Markdown-only crawler | Gyrence (structured) | Reduction |
|---|---|---|---|
| SEC iXBRL filing | ~80–150k tokens | ~4–8k tokens (facts digest) | ~15–20x |
| Wikipedia list page | ~40k tokens | ~8k tokens (table-aware) | ~5x |
| RSS / Atom feed | ~10k tokens | ~1.5k tokens (entries digest) | ~6x |
Indicative ranges from internal parity audits; actual reductions vary by document.
Gyrence does not ship native typed SDKs for LangChain, CrewAI, or AutoGen yet. It exposes two integration surfaces those frameworks can consume today: a stable REST API at /api/v1/* (Search, Traverse, Fetch, Extract, Map) with a frozen { ok, data } envelope, and a native MCP server at /api/mcp/:key that exposes the same primitives as MCP tools. The MCP and REST handlers share the same code path and billing, so any MCP-aware agent runtime — Claude Desktop, Cursor, LangChain MCP adapters, AutoGen MCP — plugs in with zero glue code. A typed JS/TS SDK is on the roadmap.
Gyrence is a managed API, hosted at gyrence.com, with workspace-scoped API keys, usage metering, and credit accounting. It is in the Firecrawl class, not the Crawl4AI class. There is no self-host distribution today.
Gyrence is a Source Intelligence Platform, not a generic crawler. Every fetch is anchored to a governed Source Registry entry with trust score, machine-readability score, jurisdiction, and freshness, so agents retrieve from vetted sources instead of arbitrary URLs. Format-aware extractors return structured data for iXBRL (SEC filings), MediaWiki (~2.2x faster than generic HTML on Wikipedia), Parquet/Arrow, Atom/RSS, and Office documents — typically 5–20x fewer tokens than markdown-only competitors on these formats. A two-tier fetch escalates to a Playwright worker only on bot-wall signals, so the 80%+ of pages that do not need a browser render do not pay browser cost. Cloudflare Content-Signals directives from robots.txt are surfaced in the response envelope, and MCP↔REST billing parity means agents get identical extraction quality and metered cost on either transport.
Gyrence is a managed web signal acquisition and source intelligence platform for autonomous agent web crawler workloads. It delivers LLM-friendly JSON extraction across HTML, iXBRL, MediaWiki, Parquet, Atom/RSS, and Office documents, and exposes an MCP server for web data so any MCP-aware runtime can consume Search, Traverse, Fetch, Extract, and Map as native tools. If you are evaluating a Firecrawl alternative or a Crawl4AI alternative for a LangChain, CrewAI, or AutoGen pipeline, Gyrence is the managed, governed option — built around a trust-scored Source Registry rather than ad-hoc URL fetching.