SpyBara
Go Premium

rest-api-reference/collections/collection.md 2026-06-11 10:57 UTC to 2026-06-14 22:02 UTC

1 added, 1 removed.

2026
Mon 29 23:02 Sat 27 00:02 Wed 24 22:02 Mon 22 20:59 Fri 19 05:59 Thu 18 00:57 Wed 17 15:58 Mon 15 23:02 Sun 14 22:02 Thu 11 10:57

Collections API

Collection Management

The base URL for collection management is shared with Management API at https://management-api.x.ai/. You have to authenticate using xAI Management API Key with the header Authorization: Bearer <your xAI Management API key>.

[!NOTE]

For more details on provisioning xAI Management API key and using Management API, you can visit

Using Management API


POST /v1/collections

Create a collection.

Request Body

  • team_id (string) — The ID of the team that will own this new collection. If not provided, the team ID will be derived from your request credentials.

  • collection_name (string, required) — Name to use for the new collection.

  • index_configuration (object)

    • model_name (string) — Embedding model that would make the conversion.
  • chunk_configuration (object)

    • chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • table_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • code_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • code_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • bytes_configuration (object)

      • max_chunk_size_bytes (integer) — Max length per chunk in bytes.

      • chunk_overlap_bytes (integer) — Overlap between chunks in bytes.

    • strip_whitespace (boolean) — Remove leading/trailing whitespce.

    • inject_name_into_chunks (boolean) — Inject name into produced chunks.

  • metric_space ("HNSW_METRIC_UNKNOWN" | "HNSW_METRIC_COSINE" | "HNSW_METRIC_EUCLIDEAN" | "HNSW_METRIC_INNER_PRODUCT") — Distance space for the HNSW index.

  • version (integer) — Internal only. Version number of the Collection API used under the hood. This is an internal only setting so it is okay to be left ambiguous (no enum).

  • field_definitions (array<object>)

    • key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").

    • required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.

    • inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.

    • unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.

    • description (string) — Optional description of what this field represents.

  • collection_description (string) — Human-friendly description displayed to users and agents.

Response Body

  • collection_id (string) — UUIDv4 that represents an ID of the collection.

  • collection_name (string) — Name of the collection.

  • created_at (string) — The Unix timestamp for when the document was created.

  • index_configuration (object)

    • model_name (string) — Embedding model that would make the conversion.
  • chunk_configuration (object)

    • chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • table_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • code_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • code_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • bytes_configuration (object)

      • max_chunk_size_bytes (integer) — Max length per chunk in bytes.

      • chunk_overlap_bytes (integer) — Overlap between chunks in bytes.

    • strip_whitespace (boolean) — Remove leading/trailing whitespce.

    • inject_name_into_chunks (boolean) — Inject name into produced chunks.

  • documents_count (integer) — How many files the collection contains.

  • field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.

    • key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").

    • required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.

    • inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.

    • unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.

    • description (string) — Optional description of what this field represents.

  • collection_description (string) — Optional description of the collection.

**Request example:**

{
  "collection_name": "SEC Filings",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true
  },
  "collection_description": "Filings from the SEC for financial analysis"
}

**Response example:**

{
  "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
  "collection_name": "SEC Filings",
  "created_at": "2025-09-16T18:36:09.790629Z",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true,
    "inject_name_into_chunks": false
  },
  "documents_count": 0,
  "collection_description": "Filings from the SEC for financial analysis"
}

GET /v1/collections

List all the collections a team has.

Query Parameters

  • team_id (string) — The ID of the team that owns the collections being listed. If not provided, the team ID will be derived from your request credentials.

  • limit (integer) — A limit on the number of objects to be returned. Max 100 items per request. If not provided, set the default to 100 items.

  • order ("ORDERING_UNKNOWN" | "ORDERING_ASCENDING" | "ORDERING_DESCENDING") — The ordering to sort the returned collections. If not provided, the default order is Descending.

  • sort_by ("COLLECTIONS_SORT_BY_NAME" | "COLLECTIONS_SORT_BY_AGE") — The parameter that the collections will be sorted by. If not provided, the default is to sort by `collection_name`.

  • pagination_token (string) — Optional token to retrieve the next page. Provided by `pagination_token` in a previous `ListCollectionsResponse`.

  • filter (string) — Filter expression to narrow down results. Supports filtering on: collection_id, collection_name (partial string matching), created_at, documents_count Examples: - 'collection_id = "collection_123"' - 'collection_name:"SEC" AND documents_count:>10' - 'collection_name = "report"' (partial match) - 'created_at:>2025-01-01T00:00:00Z'

Response Body

  • collections (array<object>) — List of collections.

    • collection_id (string) — UUIDv4 that represents an ID of the collection.

    • collection_name (string) — Name of the collection.

    • created_at (string) — The Unix timestamp for when the document was created.

    • index_configuration (object)

      • model_name (string) — Embedding model that would make the conversion.
    • chunk_configuration (object)

      • chars_configuration (object)

        • max_chunk_size_chars (integer) — Max length per chunk.

        • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

      • tokens_configuration (object)

        • max_chunk_size_tokens (integer) — Max length per chunk.

        • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

        • encoding_name (string) — Name of the encoding to use for the tokenizer.

      • ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.

        • max_chunk_size_tokens (integer) — Max length per chunk.

        • encoding_name (string) — Name of the encoding to use for the tokenizer.

      • table_configuration (object)

        • max_chunk_size_tokens (integer) — Max length per chunk.

        • encoding_name (string) — Name of the encoding to use for the tokenizer.

      • markdown_tokens_configuration (object)

        • max_chunk_size_tokens (integer) — Max length per chunk.

        • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

        • encoding_name (string) — Name of the encoding to use for the tokenizer.

      • markdown_chars_configuration (object)

        • max_chunk_size_chars (integer) — Max length per chunk.

        • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

      • code_tokens_configuration (object)

        • max_chunk_size_tokens (integer) — Max length per chunk.

        • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

        • encoding_name (string) — Name of the encoding to use for the tokenizer.

      • code_chars_configuration (object)

        • max_chunk_size_chars (integer) — Max length per chunk.

        • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

      • bytes_configuration (object)

        • max_chunk_size_bytes (integer) — Max length per chunk in bytes.

        • chunk_overlap_bytes (integer) — Overlap between chunks in bytes.

      • strip_whitespace (boolean) — Remove leading/trailing whitespce.

      • inject_name_into_chunks (boolean) — Inject name into produced chunks.

    • documents_count (integer) — How many files the collection contains.

    • field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.

      • key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").

      • required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.

      • inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.

      • unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.

      • description (string) — Optional description of what this field represents.

    • collection_description (string) — Optional description of the collection.

  • pagination_token (string) — Token to be sent in the next `ListCollectionsRequest`'s `pagination_token` for retrieving the next page.

**Response example:**

{
  "collections": [
    {
      "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
      "collection_name": "SEC Filings",
      "created_at": "2025-09-16T18:36:09.790629Z",
      "index_configuration": {
        "model_name": "grok-embedding-small"
      },
      "chunk_configuration": {
        "tokens_configuration": {
          "max_chunk_size_tokens": 1024,
          "chunk_overlap_tokens": 200,
          "encoding_name": "o200k_base"
        },
        "strip_whitespace": true,
        "inject_name_into_chunks": false
      },
      "documents_count": 0,
      "collection_type": "text",
      "collection_description": "Filings from the SEC for financial analysis"
    }
  ]
}

GET /v1/collections/{collection_id}

Get a collection's metadata.

Path Parameters

  • collection_id (string, required) — The ID of the collection to request.

Query Parameters

  • team_id (string) — The ID of the team that owns the collection. If not provided, the team ID will be derived from your request credentials.

Response Body

  • collection_id (string) — UUIDv4 that represents an ID of the collection.

  • collection_name (string) — Name of the collection.

  • created_at (string) — The Unix timestamp for when the document was created.

  • index_configuration (object)

    • model_name (string) — Embedding model that would make the conversion.
  • chunk_configuration (object)

    • chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • table_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • code_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • code_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • bytes_configuration (object)

      • max_chunk_size_bytes (integer) — Max length per chunk in bytes.

      • chunk_overlap_bytes (integer) — Overlap between chunks in bytes.

    • strip_whitespace (boolean) — Remove leading/trailing whitespce.

    • inject_name_into_chunks (boolean) — Inject name into produced chunks.

  • documents_count (integer) — How many files the collection contains.

  • field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.

    • key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").

    • required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.

    • inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.

    • unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.

    • description (string) — Optional description of what this field represents.

  • collection_description (string) — Optional description of the collection.

**Response example:**

{
  "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
  "collection_name": "SEC Filings",
  "created_at": "2025-09-16T18:36:09.790629Z",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true,
    "inject_name_into_chunks": false
  },
  "documents_count": 0,
  "collection_description": "Filings from the SEC for financial analysis"
}

DELETE /v1/collections/{collection_id}

Delete a specific collection.

Path Parameters

  • collection_id (string, required) — The ID of the collection to delete.

Query Parameters

  • team_id (string) — The ID of the team that owns the collection. If not provided, the team ID will be derived from your request credentials.

**Response example:**

{}

PUT /v1/collections/{collection_id}

Update collection's config.

Path Parameters

  • collection_id (string, required) — The ID of the collection to update.

Request Body

  • team_id (string) — The ID of the team that owns the document. If not provided, the team ID will be derived from your request credentials.

  • collection_name (string) — Name of the collection.

  • chunk_configuration (object)

    • chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • table_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • code_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • code_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • bytes_configuration (object)

      • max_chunk_size_bytes (integer) — Max length per chunk in bytes.

      • chunk_overlap_bytes (integer) — Overlap between chunks in bytes.

    • strip_whitespace (boolean) — Remove leading/trailing whitespce.

    • inject_name_into_chunks (boolean) — Inject name into produced chunks.

  • field_definition_updates (array<object>) — Field definition updates to apply to this collection (ADD or DELETE).

    • field_definition (object, required) — Definition of a field that can be attached to documents in a collection. Field definitions specify constraints and behaviors for document metadata within a collection.

      • key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").

      • required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.

      • inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.

      • unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.

      • description (string) — Optional description of what this field represents.

    • operation ("FIELD_DEFINITION_ADD" | "FIELD_DEFINITION_DELETE") — Operation to perform on a collection's field definition.

      - FIELD_DEFINITION_ADD: Add a new field definition or update an existing one. If the field key already exists, the definition will be updated. Note: New fields with `required=true` are not allowed (existing documents would fail validation). - FIELD_DEFINITION_DELETE: Delete an existing field definition. CASCADE behavior: Also removes the field value from all documents in the collection.

  • collection_description (string) — Optional description of the collection.

Response Body

  • collection_id (string) — UUIDv4 that represents an ID of the collection.

  • collection_name (string) — Name of the collection.

  • created_at (string) — The Unix timestamp for when the document was created.

  • index_configuration (object)

    • model_name (string) — Embedding model that would make the conversion.
  • chunk_configuration (object)

    • chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • ast_configuration (object) — Deprecated: Use CodeTokensConfiguration or CodeCharsConfiguration instead.

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • table_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • markdown_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • code_tokens_configuration (object)

      • max_chunk_size_tokens (integer) — Max length per chunk.

      • chunk_overlap_tokens (integer) — Overlap between chunks, both sides.

      • encoding_name (string) — Name of the encoding to use for the tokenizer.

    • code_chars_configuration (object)

      • max_chunk_size_chars (integer) — Max length per chunk.

      • chunk_overlap_chars (integer) — Overlap between chunks, both sides.

    • bytes_configuration (object)

      • max_chunk_size_bytes (integer) — Max length per chunk in bytes.

      • chunk_overlap_bytes (integer) — Overlap between chunks in bytes.

    • strip_whitespace (boolean) — Remove leading/trailing whitespce.

    • inject_name_into_chunks (boolean) — Inject name into produced chunks.

  • documents_count (integer) — How many files the collection contains.

  • field_definitions (array<object>) — Field definitions for documents in this collection. Defines what fields documents can have and their constraints.

    • key (string, required) — The key/name of the field (e.g., "title", "author", "isbn").

    • required (boolean) — If true, this field must be provided for every document added to the collection. Documents missing required fields will be rejected at upload time.

    • inject_into_chunk (boolean) — If true, this field's value will be injected at the start of each chunk generated from documents (for contextual retrieval). Improves retrieval accuracy by providing context about the document.

    • unique (boolean) — If true, this field's value must be unique across all documents within this collection. Duplicate values will be rejected.

    • description (string) — Optional description of what this field represents.

  • collection_description (string) — Optional description of the collection.

**Request example:**

{
  "collectionName": "SEC Filings (New)",
  "chunkConfiguration": {
    "tokensConfiguration": {
      "maxChunkSizeTokens": 1024,
      "chunkOverlapTokens": 200,
      "encodingName": "o200k_base"
    },
    "stripWhitespace": true,
    "injectNameIntoChunks": false
  },
  "collectionDescription": "Updated description of the collection"
}

**Response example:**

{
  "collection_id": "collection_80100614-300c-4609-959b-a138fa90f542",
  "collection_name": "SEC Filings",
  "created_at": "2025-09-16T18:36:09.790629Z",
  "index_configuration": {
    "model_name": "grok-embedding-small"
  },
  "chunk_configuration": {
    "tokens_configuration": {
      "max_chunk_size_tokens": 1024,
      "chunk_overlap_tokens": 200,
      "encoding_name": "o200k_base"
    },
    "strip_whitespace": true,
    "inject_name_into_chunks": false
  },
  "documents_count": 0,
  "collection_description": "Filings from the SEC for financial analysis"
}

POST /v1/collections/{collection_id}/documents/{file_id}

Add a document to collection.

Path Parameters

  • collection_id (string, required) — The id of the collection this document will be added to.

  • file_id (string, required) — The ID of the document to use for this request.

Request Body

  • team_id (string) — The ID of the team the document belongs to. If not provided, the team ID will be derived from your request credentials.

  • fields (object) — User-defined fields to add to this document in this new collection.

**Request example:**

{
  "fields": {
    "type": "10-Q"
  }
}

**Response example:**

{}

GET /v1/collections/{collection_id}/documents

List documents in a collection.

Path Parameters

  • collection_id (string, required) — The ID of the collection to list documents from.

Query Parameters

  • team_id (string) — The ID of the team owning the documents. If not provided, the team ID will be derived from your request credentials.

  • limit (integer) — A limit on the number of objects to be returned. Max 100 items per request. If not provided, set the default to 100 items.

  • order ("ORDERING_UNKNOWN" | "ORDERING_ASCENDING" | "ORDERING_DESCENDING") — The ordering to sort the returned documents. If not provided, the default order is Descending.

  • sort_by ("DOCUMENTS_SORT_BY_NAME" | "DOCUMENTS_SORT_BY_SIZE" | "DOCUMENTS_SORT_BY_AGE") — The parameter that the documents will be sorted by. If not provided, the default is to sort by `name`.

  • pagination_token (string) — Optional token to retrieve the next page. Provided by `pagination_token` in a previous `ListDocumentsResponse`.

  • name (string) — The name of the documents to get. DEPRECATED: Use filter field instead with "name:value"

  • filter (string) — Filter expression to narrow down results. Supports filtering on file metadata (name, content_type, size_bytes, created_at) and document fields (status, fields.{key}) Examples: - 'status:DOCUMENT_STATUS_PROCESSED' - 'name:"quarterly" AND status:!DOCUMENT_STATUS_FAILED' - 'fields.isbn:"978-1-234567-89-0"' - 'size_bytes:>5000000 AND content_type:application/pdf'

Response Body

  • documents (array<object>) — List of documents.

    • file_metadata (object) — Metadata of an uploaded file.

      • file_id (string) — The document ID.

      • name (string) — The name of the document.

      • size_bytes (string) — The size of the document, in bytes.

      • content_type (string) — MIME type.

      • created_at (string) — The Unix timestamp for when the document was created.

      • expires_at (string) — The Unix timestamp for when the document will expire.

      • hash (string)

      • upload_status (string)

      • upload_error_message (string) — Error message if upload failed.

      • processing_status (string) — Processing status of the file (pending, processing, complete, failed, skipped).

      • file_path (string) — Optional: hierarchical path for the file (e.g., "folder1/subfolder"). This is relative to the team root and does not include the filename.

    • fields (object)

    • status ("DOCUMENT_STATUS_UNKNOWN" | "DOCUMENT_STATUS_PROCESSING" | "DOCUMENT_STATUS_PROCESSED" | "DOCUMENT_STATUS_FAILED")

    • error_message (string) — Any error that occurred while processing.

    • last_indexed_at (string) — Timestamp of when this document was last indexed. Empty if it hasn't been.

  • pagination_token (string) — Token to be sent in the next `ListDocumentsRequest`'s `pagination_token` for retrieving the next page.

**Response example:**

{
  "documents": [
    {
      "file_metadata": {
        "file_id": "file_94847856-a56f-4b1e-82dd-7fe0b3af43d9",
        "name": "tsla-20250630.txt",
        "size_bytes": "119237",
        "content_type": "text/plain",
        "created_at": "2025-09-16T19:06:53.472088Z",
        "expires_at": null,
        "hash": "a15b2225695f242af60e5d99a7455b0a2e371dac88283401ebc013dba1dfbc84"
      },
      "fields": {
        "type": "10-Q"
      },
      "status": "DOCUMENT_STATUS_PROCESSED",
      "error_message": ""
    }
  ]
}

GET /v1/collections/{collection_id}/documents/{file_id}

Retrieve document metadata in a collection.

Path Parameters

  • collection_id (string, required) — The ID of the collection this document belongs to.

  • file_id (string, required) — The ID of the document to use for this request.

Query Parameters

  • team_id (string) — The ID of the team the document belongs to. If not provided, the team ID will be derived from your request credentials.

Response Body

  • file_metadata (object) — Metadata of an uploaded file.

    • file_id (string) — The document ID.

    • name (string) — The name of the document.

    • size_bytes (string) — The size of the document, in bytes.

    • content_type (string) — MIME type.

    • created_at (string) — The Unix timestamp for when the document was created.

    • expires_at (string) — The Unix timestamp for when the document will expire.

    • hash (string)

    • upload_status (string)

    • upload_error_message (string) — Error message if upload failed.

    • processing_status (string) — Processing status of the file (pending, processing, complete, failed, skipped).

    • file_path (string) — Optional: hierarchical path for the file (e.g., "folder1/subfolder"). This is relative to the team root and does not include the filename.

  • fields (object)

  • status ("DOCUMENT_STATUS_UNKNOWN" | "DOCUMENT_STATUS_PROCESSING" | "DOCUMENT_STATUS_PROCESSED" | "DOCUMENT_STATUS_FAILED")

  • error_message (string) — Any error that occurred while processing.

  • last_indexed_at (string) — Timestamp of when this document was last indexed. Empty if it hasn't been.

**Response example:**

{
  "file_metadata": {
    "file_id": "file_94847856-a56f-4b1e-82dd-7fe0b3af43d9",
    "name": "tsla-20250630.txt",
    "size_bytes": "119237",
    "content_type": "text/plain",
    "created_at": "2025-09-16T19:06:53.472088Z",
    "expires_at": null,
    "hash": "a15b2225695f242af60e5d99a7455b0a2e371dac88283401ebc013dba1dfbc84"
  },
  "fields": {
    "type": "10-Q"
  },
  "status": "DOCUMENT_STATUS_PROCESSED",
  "error_message": ""
}

PATCH /v1/collections/{collection_id}/documents/{file_id}

Regenerate indices for the given document.

Path Parameters

  • collection_id (string, required) — The ID of the collection that includes the document.

  • file_id (string, required) — The ID of the file to update.

Query Parameters

  • team_id (string) — The ID of the team that owns the document. If not provided, the team ID will be derived from your request credentials.

**Response example:**

{}

DELETE /v1/collections/{collection_id}/documents/{file_id}

Remove document from collection.

Path Parameters

  • collection_id (string, required) — The ID of the collection the document will be remove from.

  • file_id (string, required) — The file ID of the document to use for this request.

Query Parameters

  • team_id (string) — The ID of the team that owns the collection. If not provided, the team ID will be derived from your request credentials.

**Response example:**

{}

GET /v1/collections/{collection_id}/documents:batchGet

Get documents metadata in a batch request.

Path Parameters

  • collection_id (string, required) — The ID of the collection that includes the documents.

Query Parameters

  • team_id (string) — The ID of the team that owns the document. If `None`, the team ID will be derived from your request credentials.

  • file_ids (array<string>, required) — The IDs of the files to retrieve the document metadata from.

Response Body

  • documents (array<object>) — Documents' metadata requested.

    • file_metadata (object) — Metadata of an uploaded file.

      • file_id (string) — The document ID.

      • name (string) — The name of the document.

      • size_bytes (string) — The size of the document, in bytes.

      • content_type (string) — MIME type.

      • created_at (string) — The Unix timestamp for when the document was created.

      • expires_at (string) — The Unix timestamp for when the document will expire.

      • hash (string)

      • upload_status (string)

      • upload_error_message (string) — Error message if upload failed.

      • processing_status (string) — Processing status of the file (pending, processing, complete, failed, skipped).

      • file_path (string) — Optional: hierarchical path for the file (e.g., "folder1/subfolder"). This is relative to the team root and does not include the filename.

    • fields (object)

    • status ("DOCUMENT_STATUS_UNKNOWN" | "DOCUMENT_STATUS_PROCESSING" | "DOCUMENT_STATUS_PROCESSED" | "DOCUMENT_STATUS_FAILED")

    • error_message (string) — Any error that occurred while processing.

    • last_indexed_at (string) — Timestamp of when this document was last indexed. Empty if it hasn't been.

**Response example:**

{
  "documents": [
    {
      "file_metadata": {
        "file_id": "file_94847856-a56f-4b1e-82dd-7fe0b3af43d9",
        "name": "tsla-20250630.txt",
        "size_bytes": "119237",
        "content_type": "text/plain",
        "created_at": "2025-09-16T19:06:53.472088Z",
        "expires_at": null,
        "hash": "a15b2225695f242af60e5d99a7455b0a2e371dac88283401ebc013dba1dfbc84"
      },
      "fields": {},
      "status": "DOCUMENT_STATUS_PROCESSED",
      "error_message": ""
    }
  ]
}