Infratex API
Programmatic access to document parsing, indexing, search, and AI-powered responses.
Authentication
All requests require an API key in the Authorization header. Create and manage keys from your dashboard.
Authorization: Bearer infratex_sk_your_key_hereKeys are scoped to your account. All usage is billed to your credit balance. Revoke a key at any time — it takes effect immediately.
Installation
Install the official SDK for your language. Both wrap the REST API with typed methods, automatic retries, and streaming support.
pip install infratexThen initialize the client with your API key:
from infratex import Infratex
client = Infratex(api_key="infratex_sk_...")Source code: infratex-python and infratex-node.
Base URL
https://api.infratex.ioAll paths in this documentation are relative to this base URL. The SDKs handle this automatically.
Errors
All errors return a consistent JSON envelope:
{
"error": {
"code": "insufficient_credit",
"message": "Insufficient credit for this operation",
"request_id": "a1b2c3d4..."
}
}| Status | Code | Meaning |
|---|---|---|
| 401 | unauthorized | Missing or invalid API key |
| 402 | insufficient_credit | Not enough balance |
| 403 | account_frozen | Account suspended |
| 404 | not_found | Resource does not exist |
| 409 | document_not_ready | Document not in required status |
| 409 | index_not_ready | Requested scope is missing a ready index for the selected method |
| 409 | no_indexed_documents | No documents are indexed with the selected retrieval method for that scope |
| 413 | payload_too_large | File exceeds size limit |
| 413 | context_too_large | Input context exceeds 250K token limit |
| 422 | validation_error | Invalid request body |
| 500 | internal_error | Server error |
Documents
Upload PDFs or ordered page-image batches, extract structured markdown, and manage your document library.
POST Upload & Parse PDF
Maximum file size: 50 MB. This route accepts PDF files only.
POST /api/v1/documents
Content-Type: multipart/form-data
file: <binary PDF> (required, max 50 MB)
method: "standard" | "max" | "experimental" | "legacy" | "cost-efficient" (default: "standard")
collection_id: "col-uuid" (optional: assign to a collection on upload)Raw HTTP response 202:
{
"id": "8f2a9a0c-...",
"status": "pending",
"method": "standard",
"filename": "report.pdf",
"page_count": 42,
"collection_id": "col-uuid"
}The REST API is async-first for uploads: POST /api/v1/documents creates the document resource and queues parsing on the backend. Poll GET /api/v1/documents/{id} until the status becomes parsed or indexed, then fetch the extracted content with GET /api/v1/documents/{id}/markdown.
The official Python and Node.js SDKs keep documents.upload(...) ergonomic by waiting for parsing by default, while also exposingwait=False / { wait: false } for queue-first control and documents.get(..., wait=True) when you want to resume later.
POST Upload & Parse Images
Use /api/v1/documents/images when the input is an ordered batch of page images instead of a PDF. Supported formats are PNG, JPEG, and WebP. Image uploads support only standard and max.
POST /api/v1/documents/images
Content-Type: multipart/form-data
files: <binary image> (required, repeat once per page image)
method: "standard" | "max" (default: "standard")
collection_id: "col-uuid" (optional: assign to a collection on upload)The image route is also async-first and returns the same queued Document resource shape as PDF uploads. The dashboard uses it for image-based uploads, and the backend preserves image order as page order.
The official SDKs expose the same ergonomic default here too: documents.upload_images(...) / documents.uploadImages(...) wait by default, and support queue-first control through wait=False / { wait: false }.
Parsing Methods & Pricing
| Method | Price | Description |
|---|---|---|
standard | $0.005/page | Best quality. Recommended for most documents. |
max | $0.008/page | Gemini-based high-fidelity parsing plus brief visual interpretation notes for charts, figures, screenshots, and photos. |
cost-efficient | $0.002/page | Faster, lower cost. Good for simple layouts. |
experimental | $0.005/page | OCR-based pipeline for scanned documents. |
legacy | $0.005/page | Previous generation parser. |
GET List Documents
GET /api/v1/documents?limit=50&offset=0&status=parsed&collection_id=col-uuidReturns {"documents": [...], "total": 142}. All query parameters are optional. Filter by status: pending, processing, parsed, indexed, or error.
GET Get Document
GET /api/v1/documents/{id}Returns document metadata, processing status, and per-method index resources in the indexes array. Markdown is retrieved separately from /api/v1/documents/{id}/markdown.
GET Download Markdown
GET /api/v1/documents/{id}/markdownReturns the raw extracted markdown as text/markdown.
PATCH Update Document
PATCH /api/v1/documents/{id}
Content-Type: application/json
{
"filename": "new-name.pdf", // optional
"collection_id": "col-uuid", // optional: move to collection
"remove_collection": true // optional: remove from collection
}DELETE Delete Document
DELETE /api/v1/documents/{id}Returns 204. Removes the document and all associated index data.
Indexes
Build retrieval artifacts on a parsed document. Required before /api/v1/searches or /api/v1/responsescan use that document with the selected retrieval method. Billed per 1M tokens processed.
POST Create Index
POST /api/v1/documents/{document_id}/indexes
Content-Type: application/json
{
"method": "vector" | "hybrid" // default: "vector"
}Raw HTTP response 202:
{
"id": "idx-uuid",
"document_id": "8f2a9a0c-...",
"filename": "report.pdf",
"method": "vector",
"status": "pending",
"chunk_count": null,
"node_count": null,
"has_ast": false,
"has_description": false,
"processing_time_ms": null,
"error_message": null,
"created_at": "2026-04-02T12:00:00Z",
"updated_at": "2026-04-02T12:00:00Z"
}The REST API is async-first for indexing as well: POST /api/v1/documents/{document_id}/indexes creates or resets the method-specific index resource and queues background processing. Poll GET /api/v1/documents/{document_id}/indexes or GET /api/v1/documents/{document_id}/indexes/{method} until status becomes indexed or error.
The official SDKs keep documents.index(...) ergonomic by waiting for the queued index by default, while still exposingwait=False / { wait: false } if you want queue-first behavior.
When using the hybrid method, node_count reflects the number of AST nodes created, and has_ast / has_description indicate whether structural analysis and document summaries were generated.
GET List Indexes
GET /api/v1/documents/{document_id}/indexesGET Get Index
GET /api/v1/documents/{document_id}/indexes/{method}Index Methods & Pricing
| Method | Price | Description |
|---|---|---|
vector | $0.10/1M tokens | Standard vector embeddings. Fast and cost-effective. |
hybrid | $0.50/1M tokens | Vector + structural analysis with AST and LLM summaries. Deeper understanding of complex documents. |
Document Lifecycle
Upload (PDF) → Parse (markdown) → Index (vector/hybrid) → Search / Respond
status: parsed status: indexedA document must be in parsed or indexed status before indexing. You can re-index a document with a different method at any time.
Search
Semantic search across your indexed documents. Returns the most relevant sections. Billed per query.
POST Request
POST /api/v1/searches
Content-Type: application/json
{
"method": "vector" | "hybrid", // default: "vector"
"query": "What was the Q3 revenue?", // required, max 4000 chars
"limit": 5, // 1-50, default 5
"collection_id": "col-uuid" // optional: choose collection_id or document_ids
}Response 200
{
"method": "vector",
"query": "What was the Q3 revenue?",
"results": [
{
"document_id": "8f2a9a0c-...",
"document_name": "report.pdf",
"score": 0.92,
"content": "Total revenue for Q3 was $45.2M...",
"title": "",
"summary": "",
"node_id": null,
"chunk_index": 12,
"metadata": null,
"source": "vector"
}
]
}With hybrid search, results include title and summary from the document's AST, and node_id identifies the exact section. With vector search, these fields are empty/null.
Use exactly one of document_ids or collection_id to scope the search. If neither is provided, all documents with a ready index for the selected method are searched. If you scope the request explicitly, every requested document must already have a ready index for that same method or the API returns index_not_ready. Billing is captured only after the search succeeds.
Search Pricing
| Method | Price |
|---|---|
vector | $0.01/query |
hybrid | $0.05/query |
AI Responses
Ask a question and get an AI-generated answer grounded in your documents. The system retrieves relevant context, then generates a cited response streamed as Server-Sent Events. Billed per token based on the selected model. Retrieval method affects retrieval behavior, not response pricing.
POST Request
POST /api/v1/responses
Content-Type: application/json
{
"method": "vector" | "hybrid", // default: "vector"
"model": "fast" | "pro", // default: "fast"
"message": "Summarize the key risk factors", // required, max 8000 chars
"limit": 5, // results to retrieve, 1-20
"collection_id": "col-uuid", // optional: use exactly one scope selector
"reasoning": true | false // default: false — enable extended reasoning
}For single-turn requests, pass exactly one scope selector: document_ids or collection_id. If neither is provided, the response is generated from all documents with a ready index for the selected retrieval method.
For multi-turn threads, first create a conversation with its document scope, then call /api/v1/responses withconversation_id. When conversation_id is present, do not send document_ids orcollection_id in the same request.
Responses enforce the same method-specific readiness invariant as /api/v1/searches: the chosen scope must already have readyvector or hybrid indexes that match the request method.
Multi-turn Example
POST /api/v1/conversations
Content-Type: application/json
{
"title": "Q3 Analysis",
"collection_id": "col-uuid"
}
POST /api/v1/responses
Content-Type: application/json
{
"conversation_id": "conv-uuid",
"method": "hybrid",
"model": "pro",
"message": "How does that compare with the previous quarter?",
"limit": 8,
"reasoning": true
}Response — Server-Sent Events
The response streams as SSE. Both models support reasoning — thinking events stream before the answer:
data: {"type": "sources", "content": [{"id": 1, "document_id": "...", "snippet": "..."}]}
data: {"type": "thinking", "content": "Let me analyze the risk factors..."}
data: {"type": "thinking", "content": " across the documents..."}
data: {"type": "text", "content": "Based on the documents, "}
data: {"type": "text", "content": "the key risk factors are..."}
data: {"type": "done"}Both fast and pro models produce reasoning when reasoning is set to true. The fast model reasons briefly before answering. The pro model reasons extensively for complex queries. Reasoning is false by default in the API. Pass a conversation_id for multi-turn conversations after creating the thread separately.
Models
| Model | Description |
|---|---|
fast | Quick and efficient. Best for most queries. |
pro | More intelligent model for complex tasks. Higher accuracy on nuanced questions. |
Response Pricing
| Model | Input Tokens | Output Tokens |
|---|---|---|
fast | $1.00/1M tokens | $5.00/1M tokens |
pro | $3.00/1M tokens | $16.00/1M tokens |
Reasoning tokens are included in output token billing for both models. The maximum input context is 250,000 tokens. If exceeded, the API returns acontext_too_large error — start a new conversation or reduce scope.
Field Extraction
Pull typed fields out of any parsed document. You define the fields you want (name, type, description), the API returns a strict JSON object that matches your schema, plus per-field source citations. Works on documents in parsed / indexed / done status.
The pipeline routes each document to one of three execution tiers automatically based on size and index availability:
| Tier | When | How it works |
|---|---|---|
S | markdown ≤ 30k tokens | Single-pass extraction over the full document. |
M | 30k – 150k tokens, hybrid index built | AST-guided: a router picks relevant sections, the extractor sees only those. |
L | > 150k tokens, or any size with vector-only index | Per-field hybrid retrieval; array fields run as parallel map-reduce with a deterministic Python-side numeric filter. |
Tier L requires at least one indexed vector or hybrid index — otherwise the request returns 409 index_required_for_tier_l.
Field types
Each field declares a name (snake_case), a type, a description, an optional instructions hint, and a required flag. Optional fields can come back as null.
| type | JSON shape returned | Notes |
|---|---|---|
string | string or null | Free text. |
number | number or null | No currency symbols or commas — pure numeric. |
integer | integer or null | |
boolean | true / false / null | |
date | ISO-8601 string (YYYY-MM-DD) or null | |
enum | one of enum_values | Provide enum_values: [...]. |
object | nested object | Provide a typed properties: [...] list. |
array | array | Provide a typed items: {...} shape. Use array<object> for table-like fields. |
POST Create Extraction Run
Submits an asynchronous extraction. Returns 202 Accepted immediately with a run_id; the actual extraction runs on a background worker. Poll the run to get the result.
POST /api/v1/documents/{document_id}/extractions
Content-Type: application/json
{
"inline_fields": [
{"name": "company_name", "type": "string", "description": "Issuing company", "required": true},
{"name": "agreement_date", "type": "date", "description": "Signature date", "required": false},
{"name": "total_amount", "type": "number", "description": "Total contract amount USD", "required": false},
{"name": "parties", "type": "array", "description": "All parties to the agreement",
"required": true,
"items": {
"name": "row", "type": "object", "description": "row",
"properties": [
{"name": "name", "type": "string", "description": "party name", "required": true},
{"name": "role", "type": "string", "description": "party role", "required": false}
]
}}
],
"inline_system_prompt": "This is a commercial subscription agreement.",
"model": "fast", // "fast" or "pro" (default: "fast")
"include_evidence": true // include per-field source citations in GETs (default: false)
}Two ways to define fields: send inline_fields directly (one-off), or save a named template via the dashboard and pass template_id instead. Templates support edit/delete and version-pin the snapshot used by each run. Provide exactly one of inline_fields or template_id.
Response 202: {"id": "run-uuid", "status": "pending", "tier": null, ...}.tier is populated once the worker picks up the job.
GET Get Extraction Run
Polls a single run. Repeat until status is done or error.
GET /api/v1/extractions/{run_id}?include_evidence=trueResponse shape:
{
"id": "run-uuid",
"document_id": "doc-uuid",
"template_id": null,
"status": "done",
"tier": "S" | "M" | "L",
"model": "fast" | "pro",
"result": {
"company_name": "Verse AB",
"agreement_date": "2021-05-03",
"total_amount": 1000000
},
"evidence": {
"company_name": { "source_quote": "## VERSE AB" },
"agreement_date": { "source_quote": "Latest update: 3 May 2021" },
"total_amount": { "source_quote": "Total subscription monies (SEK) 1,000,000" }
},
"processing_time_ms": 5215,
"input_tokens": 7284,
"output_tokens": 916,
"error_message": null,
"created_at": "...",
"updated_at": "..."
}Evidence is always captured server-side. Pass ?include_evidence=true to receive it. For Tier M / L runs, evidence also carries an internal _tier_m / _tier_l audit object showing which sections the router or retriever picked.
GET List Runs for a Document
GET /api/v1/documents/{document_id}/extractions?limit=50&offset=0&include_evidence=falseGET Export a Run (CSV / XLSX)
Downloads the tabular fields (any array<object>) of a completed run. CSV mode: one array per run becomes a single CSV; multiple arrays become a ZIP. XLSX mode: one sheet per array field.
GET /api/v1/extractions/{run_id}/export?format=csv
GET /api/v1/extractions/{run_id}/export?format=xlsxRequires the run to be done and the snapshot to contain at least one array<object> field with rows; otherwise the endpoint returns 409 extraction_not_done or 422 no_tabular_fields.
Array fields with filters (Tier L map-reduce)
For documents large enough to route to Tier L, an array<object> field can carry an instructionshint that the engine parses as a deterministic post-filter:
{
"name": "transactions",
"type": "array",
"description": "All transactions on the statement",
"instructions": "only transactions where amount > 100,000",
"required": true,
"items": {
"name": "row", "type": "object", "description": "row",
"properties": [
{"name": "date", "type": "date", "description": "tx date", "required": true},
{"name": "amount", "type": "number", "description": "signed", "required": true},
{"name": "description", "type": "string", "description": "memo", "required": true}
]
}
}The LLM extracts every row it sees. The filter is then applied in Python — never by the model — so it is reproducible and auditable. Supported operators: >, >=, <, <=, between … and …, with K / M / B suffixes and $, €, £, ¥ currency prefixes. Both the filtered set (in result) and the unfiltered set (in evidence.<field>.unfiltered_rows) are returned.
Extraction Pricing
| Model | Input Tokens | Output Tokens |
|---|---|---|
fast | $1.00/1M tokens | $5.00/1M tokens |
pro | $3.00/1M tokens | $16.00/1M tokens |
Identical re-runs of the same (document, content_hash, model) tuple are served from a server-side cache at no cost.
Collections
Organize documents into collections. Use collections to scope search and retrieval.
POST Create Collection
POST /api/v1/collections
Content-Type: application/json
{"name": "Q3 Reports"} // required, max 255 charsResponse 201: {"id": "col-uuid", "name": "Q3 Reports", "created_at": "..."}
GET List Collections
GET /api/v1/collectionsGET Get Collection
GET /api/v1/collections/{id}PATCH Rename Collection
PATCH /api/v1/collections/{id}
{"name": "New Name"}DELETE Delete Collection
DELETE /api/v1/collections/{id}Returns 204. Documents inside are not deleted — they become unassigned.
Conversations
Conversations are an optional managed-session layer on top of /api/v1/responses. Each conversation stores message history and the document scope used for the thread, so follow-up turns stay grounded in the same corpus.
POST Create Conversation
POST /api/v1/conversations
Content-Type: application/json
{
"title": "Q3 Analysis",
"collection_id": "col-uuid"
}A conversation can be scoped to exactly one collection or an explicit list of documents. That scope is locked for the life of the thread and becomes the retrieval scope for all follow-up responses.
Response 201: {"id": "conv-uuid", "title": "Q3 Analysis", "collection_id": "col-uuid", "document_ids": [], ...}
GET List Conversations
GET /api/v1/conversations?limit=50&offset=0GET Get Conversation with Messages
GET /api/v1/conversations/{id}Returns the conversation with its full message history, persisted thread scope, and per-message metadata such as retrieval method, model, reasoning, sources, and token usage when available.
DELETE Delete Conversation
DELETE /api/v1/conversations/{id}Account & Billing
Check your balance and monitor usage programmatically.
GET Get Account
GET /api/v1/accountResponse 200:
{
"tenant": {
"id": "a1b2c3d4-...",
"name": "Acme Corp",
"email": "dev@acme.com",
"credit_balance_micros": 5000000,
"is_admin": false,
"is_frozen": false,
"frozen_reason": null,
"deleted_at": null,
"created_at": "2025-01-15T10:30:00Z"
}
}Balances are in micros: 1,000,000 = $1.00. The example above shows a $5.00 balance.
GET Get Billing Summary
GET /api/v1/billingResponse 200:
{
"balance_micros": 5000000,
"recent_transactions": [
{
"id": "tx-uuid",
"transaction_type": "usage_charge",
"status": "posted",
"amount_micros": -25000,
"balance_after_micros": 4975000,
"currency": "USD",
"source_type": "documents.parse.standard",
"description": "Parsing Standard • 5 pages from report.pdf",
"created_at": "2025-01-20T14:30:00Z"
}
],
"recent_credit_transactions": [],
"recent_usage": [
{
"id": "evt-uuid",
"service_key": "documents.parse.standard",
"status": "posted",
"unit_count": 5,
"unit_label": "pages",
"unit_price_micros": 5000,
"total_cost_micros": 25000,
"document_id": "doc-uuid",
"created_at": "2025-01-20T14:30:00Z"
}
],
"spend_by_service": [
{ "service_key": "documents.parse.standard", "total_cost_micros": 250000, "total_units": 50 }
],
"daily_spend": [
{ "day": "2025-01-20", "total_cost_micros": 25000, "total_units": 5, "by_service": {} }
],
"totals": {
"lifetime_spent_micros": 1500000,
"lifetime_parse_pages": 300,
"lifetime_searches": 42,
"thirty_day_spend_micros": 500000,
"thirty_day_transactions": 15
}
}All monetary values are in micros (1,000,000 = $1.00). Covers the last 30 days of activity, with lifetime totals in the totals object.
Quick Start
Upload a document, let the parser finish, index it, search, and get an AI response.
# 1. Create a document and queue parsing
curl -X POST https://api.infratex.io/api/v1/documents \
-H "Authorization: Bearer $INFRATEX_KEY" \
-F "file=@report.pdf"
# 2. Poll until status becomes parsed or indexed
curl -X GET https://api.infratex.io/api/v1/documents/{id} \
-H "Authorization: Bearer $INFRATEX_KEY"
# 3. Download the extracted markdown
curl -X GET https://api.infratex.io/api/v1/documents/{id}/markdown \
-H "Authorization: Bearer $INFRATEX_KEY"
# 4. Queue the vector index
curl -X POST https://api.infratex.io/api/v1/documents/{id}/indexes \
-H "Authorization: Bearer $INFRATEX_KEY" \
-H "Content-Type: application/json" \
-d '{"method": "vector"}'
# 5. Poll the method-specific index until status becomes indexed
curl -X GET https://api.infratex.io/api/v1/documents/{id}/indexes/vector \
-H "Authorization: Bearer $INFRATEX_KEY"
# 6. Search that document
curl -X POST https://api.infratex.io/api/v1/searches \
-H "Authorization: Bearer $INFRATEX_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "What is the revenue?", "method": "vector", "limit": 5, "document_ids": ["{id}"]}'
# 7. Get an AI-generated response from that same document
curl -X POST https://api.infratex.io/api/v1/responses \
-H "Authorization: Bearer $INFRATEX_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "Summarize the findings", "method": "vector", "limit": 5, "document_ids": ["{id}"]}'Rate Limits
The core API endpoints (documents, indexes, searches, responses) are not rate-limited. Usage is metered and billed per operation. Ensure your account has sufficient credit balance before making requests.
SDKs
Official client libraries are available for Python and Node.js. Both provide typed interfaces, automatic retries, and first-class streaming support.
pip install infratexBuilt on httpx. Source and docs: github.com/beltromatti/infratex-python
npm install infratexTypeScript, uses native fetch. Source and docs: github.com/beltromatti/infratex-node
In both SDKs, documents.upload(...), documents.upload_images(...) / documents.uploadImages(...), and documents.index(...) share the same contract: they wait by default, but each also exposes queue-first control when you passwait=False / { wait: false }.
The matching getters also line up: documents.get(..., wait=True) resumes a queued PDF upload or image-batch upload until parsed, anddocuments.get_index(..., wait=True) / documents.getIndex(..., { wait: true }) resumes a queued index until ready.
The REST API follows standard conventions and works with any HTTP client. All curl examples in this documentation translate directly to any language.
MCP
Infratex exposes a remote Model Context Protocol server for AI agents that need direct access to the same parsing, indexing, retrieval, and grounded answering pipeline as the REST API. It is retrieval-first and uses the same tenant, billing, and readiness invariants as the core backend.
The MCP server is stateless and remote. Use your existing API keys and connect to https://api.infratex.io/mcp. The dashboard also includes an operational overview at /dashboard/mcp.
Authentication & Transport
{
"mcpServers": {
"infratex": {
"url": "https://api.infratex.io/mcp",
"headers": {
"Authorization": "Bearer infratex_sk_..."
}
}
}
}Use the same infratex_sk_... keys you use for the REST API. Every MCP tool call is tenant-scoped and billed exactly like the equivalent backend operation.
Core Tools
| Tool | Purpose | Default Behavior |
|---|---|---|
list_collections | List available collections | Read-only |
list_documents | List tenant documents with optional filters | Read-only |
create_document | Queue PDF parsing from a base64 payload | wait=false |
create_document_images | Queue parsing for an ordered image batch | wait=false |
get_document | Read document metadata and status | Read-only |
get_document_markdown | Fetch extracted markdown | Read-only |
list_indexes | List vector and hybrid indexes for a document | Read-only |
create_index | Queue vector or hybrid indexing | wait=false |
get_index | Read method-specific index status | Read-only |
search_documents | Run vector or hybrid retrieval | Sync |
answer_documents | Generate a cited answer over indexed documents | Sync |
Recommended Agent Workflow
# 1. Queue parsing
create_document(wait=False) # or create_document_images(wait=False)
# 2. Poll until parsed
get_document(document_id=...)
# 3. Queue indexing
create_index(document_id=..., method="vector", wait=False)
# 4. Poll until indexed
get_index(document_id=..., method="vector")
# 5. Retrieve or answer
search_documents(...)
answer_documents(...)create_document, create_document_images, and create_index are async-first by default because parsing and indexing can be long-running. Set wait=true only when an agent should block until the resource is ready. This keeps MCP aligned with the production backend, where documents and indexes are queued resources.
Scope & Behavior
Retrieval tools accept the same scope rules as REST: pass exactly one of document_ids or collection_id, or omit both to search across all ready documents for the selected retrieval method.
search_documents and answer_documents enforce the same method-specific readiness check as /api/v1/searches and /api/v1/responses. If you ask for hybrid, the scope must already have readyhybrid indexes; vector indexes are not reused implicitly.
MCP is meant for agents and orchestration. For end-user applications or direct integrations, the REST API and official SDKs remain the primary surface.