Collections API

Collection Management

The base URL for collection management is shared with Management API at https://management-api.x.ai/. You have to authenticate using xAI Management API Key with the header Authorization: Bearer <your xAI Management API key>.

[!NOTE]

For more details on provisioning xAI Management API key and using Management API, you can visit

Using Management API

POST /v1/collections

Create a collection.

Request Body

team_id (string) — The ID of the team that will own this new collection. If not provided, the team ID will be derived from your request credentials.
collection_name (string, required) — Name to use for the new collection.
index_configuration (object)
- model_name (string) — Embedding model that would make the conversion.
chunk_configuration (object)
- chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- table_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- code_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- code_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- bytes_configuration (object)
  - max_chunk_size_bytes (integer) — Max length per chunk in bytes.
  - chunk_overlap_bytes (integer) — Overlap between chunks in bytes.
- strip_whitespace (boolean) — Remove leading/trailing whitespce.
- inject_name_into_chunks (boolean) — Inject name into produced chunks.
metric_space ("HNSW_METRIC_UNKNOWN" | "HNSW_METRIC_COSINE" | "HNSW_METRIC_EUCLIDEAN" | "HNSW_METRIC_INNER_PRODUCT") — Distance space for the HNSW index.
version (integer) — Internal only. Version number of the Collection API used under the hood. This is an internal only setting so it is okay to be left ambiguous (no enum).
field_definitions (array<object>)
- key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").
- required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.
- inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.
- unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.
- description (string) — Optional description of what this field represents.
collection_description (string) — Human-friendly description displayed to users and agents.

Response Body

collection_id (string) — UUIDv4 that represents an ID of the collection.
collection_name (string) — Name of the collection.
created_at (string) — The Unix timestamp for when the document was created.
index_configuration (object)
- model_name (string) — Embedding model that would make the conversion.
chunk_configuration (object)
- chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- table_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- code_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- code_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- bytes_configuration (object)
  - max_chunk_size_bytes (integer) — Max length per chunk in bytes.
  - chunk_overlap_bytes (integer) — Overlap between chunks in bytes.
- strip_whitespace (boolean) — Remove leading/trailing whitespce.
- inject_name_into_chunks (boolean) — Inject name into produced chunks.
documents_count (integer) — How many files the collection contains.
field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.
- key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").
- required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.
- inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.
- unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.
- description (string) — Optional description of what this field represents.
collection_description (string) — Optional description of the collection.

**Request example:**

{
  "collection_name": "SEC Filings",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true
  },
  "collection_description": "Filings from the SEC for financial analysis"
}

**Response example:**

{
  "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
  "collection_name": "SEC Filings",
  "created_at": "2025-09-16T18:36:09.790629Z",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true,
    "inject_name_into_chunks": false
  },
  "documents_count": 0,
  "collection_description": "Filings from the SEC for financial analysis"
}

GET /v1/collections

List all the collections a team has.

Query Parameters

team_id (string) — The ID of the team that owns the collections being listed. If not provided, the team ID will be derived from your request credentials.
limit (integer) — A limit on the number of objects to be returned. Max 100 items per request. If not provided, set the default to 100 items.
order ("ORDERING_UNKNOWN" | "ORDERING_ASCENDING" | "ORDERING_DESCENDING") — The ordering to sort the returned collections. If not provided, the default order is Descending.
sort_by ("COLLECTIONS_SORT_BY_NAME" | "COLLECTIONS_SORT_BY_AGE") — The parameter that the collections will be sorted by. If not provided, the default is to sort by `collection_name`.
pagination_token (string) — Optional token to retrieve the next page. Provided by `pagination_token` in a previous `ListCollectionsResponse`.
filter (string) — Filter expression to narrow down results. Supports filtering on: collection_id, collection_name (partial string matching), created_at, documents_count Examples: - 'collection_id = "collection_123"' - 'collection_name:"SEC" AND documents_count:>10' - 'collection_name = "report"' (partial match) - 'created_at:>2025-01-01T00:00:00Z'

Response Body

collections (array<object>) — List of collections.
- collection_id (string) — UUIDv4 that represents an ID of the collection.
- collection_name (string) — Name of the collection.
- created_at (string) — The Unix timestamp for when the document was created.
- index_configuration (object)
  - model_name (string) — Embedding model that would make the conversion.
- chunk_configuration (object)
  - chars_configuration (object)
    - max_chunk_size_chars (integer) — Max length per chunk.
    - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
  - tokens_configuration (object)
    - max_chunk_size_tokens (integer) — Max length per chunk.
    - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
    - encoding_name (string) — Name of the encoding to use for the tokenizer.
  - ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.
    - max_chunk_size_tokens (integer) — Max length per chunk.
    - encoding_name (string) — Name of the encoding to use for the tokenizer.
  - table_configuration (object)
    - max_chunk_size_tokens (integer) — Max length per chunk.
    - encoding_name (string) — Name of the encoding to use for the tokenizer.
  - markdown_tokens_configuration (object)
    - max_chunk_size_tokens (integer) — Max length per chunk.
    - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
    - encoding_name (string) — Name of the encoding to use for the tokenizer.
  - markdown_chars_configuration (object)
    - max_chunk_size_chars (integer) — Max length per chunk.
    - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
  - code_tokens_configuration (object)
    - max_chunk_size_tokens (integer) — Max length per chunk.
    - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
    - encoding_name (string) — Name of the encoding to use for the tokenizer.
  - code_chars_configuration (object)
    - max_chunk_size_chars (integer) — Max length per chunk.
    - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
  - bytes_configuration (object)
    - max_chunk_size_bytes (integer) — Max length per chunk in bytes.
    - chunk_overlap_bytes (integer) — Overlap between chunks in bytes.
  - strip_whitespace (boolean) — Remove leading/trailing whitespce.
  - inject_name_into_chunks (boolean) — Inject name into produced chunks.
- documents_count (integer) — How many files the collection contains.
- field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.
  - key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").
  - required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.
  - inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.
  - unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.
  - description (string) — Optional description of what this field represents.
- collection_description (string) — Optional description of the collection.
pagination_token (string) — Token to be sent in the next `ListCollectionsRequest`'s `pagination_token` for retrieving the next page.

**Response example:**

{
  "collections": [
    {
      "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
      "collection_name": "SEC Filings",
      "created_at": "2025-09-16T18:36:09.790629Z",
      "index_configuration": {
        "model_name": "grok-embedding-small"
      },
      "chunk_configuration": {
        "tokens_configuration": {
          "max_chunk_size_tokens": 1024,
          "chunk_overlap_tokens": 200,
          "encoding_name": "o200k_base"
        },
        "strip_whitespace": true,
        "inject_name_into_chunks": false
      },
      "documents_count": 0,
      "collection_type": "text",
      "collection_description": "Filings from the SEC for financial analysis"
    }
  ]
}

GET /v1/collections/{collection_id}

Get a collection's metadata.

Path Parameters

collection_id (string, required) — The ID of the collection to request.

Query Parameters

team_id (string) — The ID of the team that owns the collection. If not provided, the team ID will be derived from your request credentials.

Response Body

collection_id (string) — UUIDv4 that represents an ID of the collection.
collection_name (string) — Name of the collection.
created_at (string) — The Unix timestamp for when the document was created.
index_configuration (object)
- model_name (string) — Embedding model that would make the conversion.
chunk_configuration (object)
- chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- table_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- code_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- code_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- bytes_configuration (object)
  - max_chunk_size_bytes (integer) — Max length per chunk in bytes.
  - chunk_overlap_bytes (integer) — Overlap between chunks in bytes.
- strip_whitespace (boolean) — Remove leading/trailing whitespce.
- inject_name_into_chunks (boolean) — Inject name into produced chunks.
documents_count (integer) — How many files the collection contains.
field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.
- key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").
- required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.
- inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.
- unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.
- description (string) — Optional description of what this field represents.
collection_description (string) — Optional description of the collection.

**Response example:**

{
  "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
  "collection_name": "SEC Filings",
  "created_at": "2025-09-16T18:36:09.790629Z",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true,
    "inject_name_into_chunks": false
  },
  "documents_count": 0,
  "collection_description": "Filings from the SEC for financial analysis"
}

DELETE /v1/collections/{collection_id}

Delete a specific collection.

Path Parameters

collection_id (string, required) — The ID of the collection to delete.

Query Parameters

team_id (string) — The ID of the team that owns the collection. If not provided, the team ID will be derived from your request credentials.

**Response example:**

{}

PUT /v1/collections/{collection_id}

Update collection's config.

Path Parameters

collection_id (string, required) — The ID of the collection to update.

Request Body

team_id (string) — The ID of the team that owns the document. If not provided, the team ID will be derived from your request credentials.
collection_name (string) — Name of the collection.
chunk_configuration (object)
- chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- table_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- code_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- code_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- bytes_configuration (object)
  - max_chunk_size_bytes (integer) — Max length per chunk in bytes.
  - chunk_overlap_bytes (integer) — Overlap between chunks in bytes.
- strip_whitespace (boolean) — Remove leading/trailing whitespce.
- inject_name_into_chunks (boolean) — Inject name into produced chunks.
field_definition_updates (array<object>) — Field definition updates to apply to this collection (ADD or DELETE).
- field_definition (object, required) — Definition of a field that can be attached to documents in a collection. Field definitions specify constraints and behaviors for document metadata within a collection.
  - key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").
  - required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.
  - inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.
  - unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.
  - description (string) — Optional description of what this field represents.
- operation ("FIELD_DEFINITION_ADD" | "FIELD_DEFINITION_DELETE") — Operation to perform on a collection's field definition.
  
  - FIELD_DEFINITION_ADD: Add a new field definition or update an existing one. If the field key already exists, the definition will be updated. Note: New fields with `required=true` are not allowed (existing documents would fail validation). - FIELD_DEFINITION_DELETE: Delete an existing field definition. CASCADE behavior: Also removes the field value from all documents in the collection.
collection_description (string) — Optional description of the collection.

Response Body

collection_id (string) — UUIDv4 that represents an ID of the collection.
collection_name (string) — Name of the collection.
created_at (string) — The Unix timestamp for when the document was created.
index_configuration (object)
- model_name (string) — Embedding model that would make the conversion.
chunk_configuration (object)
- chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- table_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- markdown_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- code_tokens_configuration (object)
  - max_chunk_size_tokens (integer) — Max length per chunk.
  - chunk_overlap_tokens (integer) — Overlap between chunks, both sides.
  - encoding_name (string) — Name of the encoding to use for the tokenizer.
- code_chars_configuration (object)
  - max_chunk_size_chars (integer) — Max length per chunk.
  - chunk_overlap_chars (integer) — Overlap between chunks, both sides.
- bytes_configuration (object)
  - max_chunk_size_bytes (integer) — Max length per chunk in bytes.
  - chunk_overlap_bytes (integer) — Overlap between chunks in bytes.
- strip_whitespace (boolean) — Remove leading/trailing whitespce.
- inject_name_into_chunks (boolean) — Inject name into produced chunks.
documents_count (integer) — How many files the collection contains.
field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.
- key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").
- required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.
- inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.
- unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.
- description (string) — Optional description of what this field represents.
collection_description (string) — Optional description of the collection.

**Request example:**

{
  "collectionName": "SEC Filings (New)",
  "chunkConfiguration": {
    "tokensConfiguration": {
      "maxChunkSizeTokens": 1024,
      "chunkOverlapTokens": 200,
      "encodingName": "o200k_base"
    },
    "stripWhitespace": true,
    "injectNameIntoChunks": false
  },
  "collectionDescription": "Updated description of the collection"
}

**Response example:**

{
  "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
  "collection_name": "SEC Filings",
  "created_at": "2025-09-16T18:36:09.790629Z",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true,
    "inject_name_into_chunks": false
  },
  "documents_count": 0,
  "collection_description": "Filings from the SEC for financial analysis"
}

POST /v1/collections/{collection_id}/documents/{file_id}

Add a document to collection.

Path Parameters

collection_id (string, required) — The id of the collection this document will be added to.
file_id (string, required) — The ID of the document to use for this request.

Request Body

team_id (string) — The ID of the team the document belongs to. If not provided, the team ID will be derived from your request credentials.
fields (object) — User-defined fields to add to this document in this new collection.

**Request example:**

{
  "fields": {
    "type": "10-Q"
  }
}

**Response example:**

{}

GET /v1/collections/{collection_id}/documents

List documents in a collection.

Path Parameters

collection_id (string, required) — The ID of the collection to list documents from.

Query Parameters

team_id (string) — The ID of the team owning the documents. If not provided, the team ID will be derived from your request credentials.
limit (integer) — A limit on the number of objects to be returned. Max 100 items per request. If not provided, set the default to 100 items.
order ("ORDERING_UNKNOWN" | "ORDERING_ASCENDING" | "ORDERING_DESCENDING") — The ordering to sort the returned documents. If not provided, the default order is Descending.
sort_by ("DOCUMENTS_SORT_BY_NAME" | "DOCUMENTS_SORT_BY_SIZE" | "DOCUMENTS_SORT_BY_AGE") — The parameter that the documents will be sorted by. If not provided, the default is to sort by `name`.
pagination_token (string) — Optional token to retrieve the next page. Provided by `pagination_token` in a previous `ListDocumentsResponse`.
name (string) — The name of the documents to get. DEPRECATED: Use filter field instead with "name:value"
filter (string) — Filter expression to narrow down results. Supports filtering on file metadata (name, content_type, size_bytes, created_at) and document fields (status, fields.{key}) Examples: - 'status:DOCUMENT_STATUS_PROCESSED' - 'name:"quarterly" AND status:!DOCUMENT_STATUS_FAILED' - 'fields.isbn:"978-1-234567-89-0"' - 'size_bytes:>5000000 AND content_type:application/pdf'

Response Body

documents (array<object>) — List of documents.
- file_metadata (object) — Metadata of an uploaded file.
  - file_id (string) — The document ID.
  - name (string) — The name of the document.
  - size_bytes (string) — The size of the document, in bytes.
  - content_type (string) — MIME type.
  - created_at (string) — The Unix timestamp for when the document was created.
  - expires_at (string) — The Unix timestamp for when the document will expire.
  - hash (string)
  - upload_status (string)
  - upload_error_message (string) — Error message if upload failed.
  - processing_status (string) — Processing status of the file (pending, processing, complete, failed, skipped).
  - file_path (string) — Optional: hierarchical path for the file (e.g., "folder1/subfolder"). This is relative to the team root and does not include the filename.
- fields (object)
- status ("DOCUMENT_STATUS_UNKNOWN" | "DOCUMENT_STATUS_PROCESSING" | "DOCUMENT_STATUS_PROCESSED" | "DOCUMENT_STATUS_FAILED")
- error_message (string) — Any error that occurred while processing.
- last_indexed_at (string) — Timestamp of when this document was last indexed. Empty if it hasn't been.
pagination_token (string) — Token to be sent in the next `ListDocumentsRequest`'s `pagination_token` for retrieving the next page.

**Response example:**

{
  "documents": [
    {
      "file_metadata": {
        "file_id": "file_94847856-a56f-4b1e-82dd-7fe0b3af43d9",
        "name": "tsla-20250630.txt",
        "size_bytes": "119237",
        "content_type": "text/plain",
        "created_at": "2025-09-16T19:06:53.472088Z",
        "expires_at": null,
        "hash": "a15b2225695f242af60e5d99a7455b0a2e371dac88283401ebc013dba1dfbc84"
      },
      "fields": {
        "type": "10-Q"
      },
      "status": "DOCUMENT_STATUS_PROCESSED",
      "error_message": ""
    }
  ]
}

GET /v1/collections/{collection_id}/documents/{file_id}

Retrieve document metadata in a collection.

Path Parameters

collection_id (string, required) — The ID of the collection this document belongs to.
file_id (string, required) — The ID of the document to use for this request.

Query Parameters

team_id (string) — The ID of the team the document belongs to. If not provided, the team ID will be derived from your request credentials.

Response Body

file_metadata (object) — Metadata of an uploaded file.
- file_id (string) — The document ID.
- name (string) — The name of the document.
- size_bytes (string) — The size of the document, in bytes.
- content_type (string) — MIME type.
- created_at (string) — The Unix timestamp for when the document was created.
- expires_at (string) — The Unix timestamp for when the document will expire.
- hash (string)
- upload_status (string)
- upload_error_message (string) — Error message if upload failed.
- processing_status (string) — Processing status of the file (pending, processing, complete, failed, skipped).
- file_path (string) — Optional: hierarchical path for the file (e.g., "folder1/subfolder"). This is relative to the team root and does not include the filename.
fields (object)
status ("DOCUMENT_STATUS_UNKNOWN" | "DOCUMENT_STATUS_PROCESSING" | "DOCUMENT_STATUS_PROCESSED" | "DOCUMENT_STATUS_FAILED")
error_message (string) — Any error that occurred while processing.
last_indexed_at (string) — Timestamp of when this document was last indexed. Empty if it hasn't been.

**Response example:**

{
  "file_metadata": {
    "file_id": "file_94847856-a56f-4b1e-82dd-7fe0b3af43d9",
    "name": "tsla-20250630.txt",
    "size_bytes": "119237",
    "content_type": "text/plain",
    "created_at": "2025-09-16T19:06:53.472088Z",
    "expires_at": null,
    "hash": "a15b2225695f242af60e5d99a7455b0a2e371dac88283401ebc013dba1dfbc84"
  },
  "fields": {
    "type": "10-Q"
  },
  "status": "DOCUMENT_STATUS_PROCESSED",
  "error_message": ""
}

PATCH /v1/collections/{collection_id}/documents/{file_id}

Regenerate indices for the given document.

Path Parameters

collection_id (string, required) — The ID of the collection that includes the document.
file_id (string, required) — The ID of the file to update.

Query Parameters

team_id (string) — The ID of the team that owns the document. If not provided, the team ID will be derived from your request credentials.

**Response example:**

{}

DELETE /v1/collections/{collection_id}/documents/{file_id}

Remove document from collection.

Path Parameters

collection_id (string, required) — The ID of the collection the document will be remove from.
file_id (string, required) — The file ID of the document to use for this request.

Query Parameters

team_id (string) — The ID of the team that owns the collection. If not provided, the team ID will be derived from your request credentials.

**Response example:**

{}

GET /v1/collections/{collection_id}/documents:batchGet

Get documents metadata in a batch request.

Path Parameters

collection_id (string, required) — The ID of the collection that includes the documents.

Query Parameters

team_id (string) — The ID of the team that owns the document. If `None`, the team ID will be derived from your request credentials.
file_ids (array<string>, required) — The IDs of the files to retrieve the document metadata from.

Response Body

documents (array<object>) — Documents' metadata requested.
- file_metadata (object) — Metadata of an uploaded file.
  - file_id (string) — The document ID.
  - name (string) — The name of the document.
  - size_bytes (string) — The size of the document, in bytes.
  - content_type (string) — MIME type.
  - created_at (string) — The Unix timestamp for when the document was created.
  - expires_at (string) — The Unix timestamp for when the document will expire.
  - hash (string)
  - upload_status (string)
  - upload_error_message (string) — Error message if upload failed.
  - processing_status (string) — Processing status of the file (pending, processing, complete, failed, skipped).
  - file_path (string) — Optional: hierarchical path for the file (e.g., "folder1/subfolder"). This is relative to the team root and does not include the filename.
- fields (object)
- status ("DOCUMENT_STATUS_UNKNOWN" | "DOCUMENT_STATUS_PROCESSING" | "DOCUMENT_STATUS_PROCESSED" | "DOCUMENT_STATUS_FAILED")
- error_message (string) — Any error that occurred while processing.
- last_indexed_at (string) — Timestamp of when this document was last indexed. Empty if it hasn't been.

**Response example:**

{
  "documents": [
    {
      "file_metadata": {
        "file_id": "file_94847856-a56f-4b1e-82dd-7fe0b3af43d9",
        "name": "tsla-20250630.txt",
        "size_bytes": "119237",
        "content_type": "text/plain",
        "created_at": "2025-09-16T19:06:53.472088Z",
        "expires_at": null,
        "hash": "a15b2225695f242af60e5d99a7455b0a2e371dac88283401ebc013dba1dfbc84"
      },
      "fields": {},
      "status": "DOCUMENT_STATUS_PROCESSED",
      "error_message": ""
    }
  ]
}

rest-api-reference/collections/collection.md 2026-06-11 10:57 UTC to 2026-06-14 22:02 UTC

Collections API

Collection Management

POST /v1/collections

Request Body

Response Body

GET /v1/collections

Query Parameters

Response Body

GET /v1/collections/{collection_id}

Path Parameters

Query Parameters

Response Body

DELETE /v1/collections/{collection_id}

Path Parameters

Query Parameters

PUT /v1/collections/{collection_id}

Path Parameters

Request Body

Response Body

POST /v1/collections/{collection_id}/documents/{file_id}

Path Parameters

Request Body

GET /v1/collections/{collection_id}/documents

Path Parameters

Query Parameters

Response Body

GET /v1/collections/{collection_id}/documents/{file_id}

Path Parameters

Query Parameters

Response Body

PATCH /v1/collections/{collection_id}/documents/{file_id}

Path Parameters

Query Parameters

DELETE /v1/collections/{collection_id}/documents/{file_id}

Path Parameters

Query Parameters

GET /v1/collections/{collection_id}/documents:batchGet

Path Parameters

Query Parameters

Response Body