Skip to content

ragieai/ragie-cli

Repository files navigation

ragie-cli

A command line interface for importing various data formats into Ragie.

Installation

OSX using homebrew (recommended)

# Add the Ragie tap
brew tap ragieai/tap

# Install ragie-cli
brew install ragie

Linux using apt

# Add repository
curl -1sLf \
  'https://dl.cloudsmith.io/public/ragieai/ragie-repo/setup.deb.sh' \
  | sudo -E bash

# Install ragie-cli
sudo apt install ragie

Manual Installation

  1. Make sure you have Go 1.16 or later installed
  2. Clone this repository
  3. Run go install in the repository root

Configuration

Set your Ragie API key as an environment variable:

export RAGIE_API_KEY=your_api_key_here

For the describe command, you'll also need to set your OpenAI API key:

export OPENAI_API_KEY=your_openai_api_key_here

Usage

Import YouTube Data

ragie import youtube path/to/youtube.json [--dry-run] [--delay 2.0] [--partition your-partition]

Import WordPress Data

ragie import wordpress path/to/wordpress.xml [--dry-run] [--delay 2.0] [--partition your-partition]

Import ReadmeIO Data

ragie import readmeio path/to/readme.zip [--dry-run] [--delay 2.0] [--partition your-partition]

Import Files from Directory

ragie import files path/to/directory [--dry-run] [--delay 2.0] [--partition your-partition]

The files importer will recursively scan the specified directory and import all non-empty files. Each file will be imported as a document with the following metadata:

  • source_type: "files"
  • path: The relative path from the import directory
  • extension: The file extension
  • size: The file size in bytes
  • mod_time: The file's last modification time

Import Files from ZIP Archive

ragie import zip path/to/archive.zip [--dry-run] [--delay 2.0] [--partition your-partition]

The zip importer will process all files within the ZIP archive without extracting them first. Each file will be imported as a document with the following metadata:

  • source_type: "zip"
  • path: The path within the ZIP archive
  • extension: The file extension
  • size: The uncompressed file size in bytes
  • compressed_size: The compressed file size in bytes
  • mod_time: The file's last modification time
  • zip_source: The name of the source ZIP file

Clear All Documents

ragie clear [--dry-run] [--partition your-partition]

Generate Tool Description

ragie describe [--partition your-partition] [--max-samples 10] [--shell-escape]

The describe command generates a description for a Ragie retrieval tool by analyzing document summaries from the specified partition. It uses OpenAI to create a coherent description of the knowledge base contents.

Flags:

  • --partition: Specify the partition to analyze (optional)
  • --max-samples: Maximum number of documents to use for description generation (default: 10)
  • --shell-escape: Output description in JSON and shell-safe format (escapes quotes)

Retrieve Documents

ragie retrieve "your search query" [--top-k 8] [--filter '{"source_type":"files"}'] [--partition your-partition] [--rerank] [--max-chunks-per-document 5] [--recency-bias]

The retrieve command performs semantic search on your documents in Ragie and returns the most relevant results as JSON. This is useful for testing your knowledge base or integrating with other tools.

Flags:

  • --top-k: Maximum number of results to return (default: 8)
  • --filter: JSON filter to apply to the search (e.g., '{"source_type":"files"}' to only search files)
  • --partition: Specify the partition to search in (optional)
  • --rerank: Rerank chunks for semantic relevancy post cosine similarity
  • --max-chunks-per-document: Maximum number of chunks to retrieve per document
  • --recency-bias: Enable recency bias to favor more recent documents

Examples:

# Basic search for documents about "API documentation"
ragie retrieve "API documentation"

# Search with custom limit and filter
ragie retrieve "configuration" --top-k 5 --filter '{"source_type":"files"}'

# Search in a specific partition with reranking
ragie retrieve "user guide" --partition production --rerank

# Search with recency bias and chunk limits
ragie retrieve "latest updates" --recency-bias --max-chunks-per-document 3

The output is formatted as JSON with the following structure:

{
  "scored_chunks": [
    {
      "text": "Document content chunk...",
      "score": 0.95,
      "id": "chunk_id",
      "index": 0,
      "metadata": {
        "source_type": "files",
        "path": "path/to/file.txt"
      },
      "document_id": "doc_id",
      "document_name": "Document Name",
      "document_metadata": {
        "source_type": "files",
        "path": "path/to/file.txt"
      },
      "links": {
        "self": {
          "href": "https://api.ragie.ai/chunks/chunk_id",
          "type": "application/json"
        }
      }
    }
  ]
}

Global Flags

  • --dry-run: Print what would happen without making changes
  • --delay: Delay between imports in seconds (default: 2.0)
  • --partition: Specify a custom partition for your data (e.g., "production", "staging", "test")

Development

  1. Clone the repository
  2. Run go mod download to install dependencies
  3. Make your changes
  4. Run go build to build the binary

Testing

Unit Tests

Run unit tests with:

go test ./...

Integration Tests

Integration tests require a valid Ragie API key and will make actual API calls. To run integration tests:

export RAGIE_API_KEY=your_api_key_here
export INTEGRATION_TEST=true
go test ./integration_test -v

Note: Integration tests will create and delete test documents in your Ragie account. They clean up after themselves, but you may want to use a test account.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

CLI tool for importing documents into Ragie

Resources

License

Stars

Watchers

Forks

Packages

No packages published