Skip to content

CLI Reference

Basic Usage

bash
markitai <input> [options]

The <input> can be:

  • A file path (document.docx)
  • A directory path (./docs)
  • A URL (https://example.com)

Conversion Options

--llm

Enable LLM-powered format cleaning and optimization. By default, only .llm.md is written (base .md is skipped). Use --keep-base to write both.

bash
markitai document.docx --llm

--preset <name>

Use a predefined configuration preset.

PresetDescription
richLLM + alt + desc + screenshot
standardLLM + alt + desc
minimalBasic conversion only
bash
markitai document.pdf --preset rich

--alt

Generate alt text for images using AI.

bash
markitai document.pdf --alt

--desc

Generate detailed descriptions for images.

bash
markitai document.pdf --desc

--screenshot

Enable screenshot capture:

  • PDF/PPTX: Renders pages/slides as JPEG images
  • URLs: Captures full-page screenshots using Playwright
bash
# Document screenshots
markitai document.pdf --screenshot
markitai presentation.pptx --screenshot

# URL screenshots
markitai https://example.com --screenshot

TIP

For URLs, --screenshot automatically upgrades the fetch strategy to playwright if needed. The screenshot is saved as {domain}_path.full.jpg in the .markitai/screenshots/ subdirectory.

--screenshot-only

Capture screenshots only without extracting content. Behavior depends on --llm:

CommandOutput
--screenshot-onlyScreenshots only (no .md files)
--llm --screenshot-only.md + .llm.md + screenshots (LLM extracts from screenshots)
bash
# Just capture screenshots
markitai https://example.com --screenshot-only

# LLM extracts content purely from screenshots
markitai https://example.com --llm --screenshot-only

TIP

Use --llm --screenshot-only for pages where traditional content extraction fails (e.g., heavy JavaScript sites, social media).

--ocr

Enable OCR for scanned documents.

bash
markitai scanned.pdf --ocr

--pure

Transparent pass-through mode: LLM only does text cleaning, no frontmatter generation or post-processing.

bash
# Without --llm: writes raw markdown without frontmatter
markitai document.docx --pure

# With --llm: sends content through LLM for text cleaning only
markitai document.docx --llm --pure

# With --preset: preset controls features, --pure controls output format
markitai document.pdf --preset rich --pure

TIP

--pure and --llm are independent flags. --pure alone skips frontmatter generation; --pure --llm sends content to LLM for cleaning but returns raw output without generated metadata (description, tags, etc.).

WARNING

--pure silently overrides --alt, --desc, and --screenshot. A warning is displayed when these flags are used together.

--keep-base

Write base .md file even in LLM mode. By default, --llm only outputs .llm.md to avoid redundant files.

bash
# Default: only .llm.md is written
markitai document.docx --llm

# Keep both .md and .llm.md
markitai document.docx --llm --keep-base

--no-compress

Disable image compression.

bash
markitai document.pdf --no-compress

Output Options

-o, --output <path>

Specify output directory.

bash
markitai document.docx -o ./output

--resume

Resume interrupted batch processing.

bash
markitai ./docs -o ./output --resume

Concurrency Options

--llm-concurrency <n>

Number of concurrent LLM requests.

bash
markitai ./docs --llm --llm-concurrency 10

-j, --batch-concurrency <n>

Number of concurrent file processing tasks (default: 10).

bash
markitai ./docs -o ./output -j 4

TIP

For mixed file and URL batches, use --url-concurrency to control URL fetching separately. This prevents slow URLs from blocking file processing.

Cache Options

--no-cache

Disable LLM result caching (force fresh API calls).

bash
markitai document.docx --llm --no-cache

--no-cache-for <patterns>

Disable cache for specific files or patterns (comma-separated).

bash
# Single file
markitai ./docs --no-cache-for file1.pdf

# Glob pattern
markitai ./docs --no-cache-for "*.pdf"

# Multiple patterns
markitai ./docs --no-cache-for "*.pdf,reports/**"

URL Options

.urls File Support

When the input is a .urls file, Markitai automatically processes it as a URL batch.

bash
markitai urls.urls -o ./output

The .urls file format:

# Comments start with #
https://example.com/page1
https://example.com/page2

--glob, -g <pattern>

Restrict directory batch discovery to matching relative paths. Can be repeated to add multiple patterns. Prefix with ! to exclude.

bash
# Only process PDF files
markitai ./docs -o ./output -g "*.pdf"

# Process PDFs and DOCX files
markitai ./docs -o ./output -g "*.pdf" -g "*.docx"

# Exclude a subdirectory
markitai ./docs -o ./output -g '!drafts/**'

TIP

Only applies to directory input. Use single quotes in shells with history expansion (e.g., zsh) when using ! prefix.

--max-depth <n>

Override recursive directory scan depth for batch discovery. 0 means only scan the input directory itself (no recursion).

bash
markitai ./docs -o ./output --max-depth 2

--url-concurrency <n>

Number of concurrent URL fetch operations (default: 5). This is separate from --batch-concurrency to prevent slow URLs from blocking file processing.

bash
markitai ./docs -o ./output --url-concurrency 5

--defuddle

Force Defuddle API for URL fetching. Free, no authentication required, with excellent content cleaning.

bash
markitai https://example.com --defuddle

--static

Force static HTTP fetch with native webextract. No external API needed.

bash
markitai https://example.com --static

--playwright

Force browser rendering for URL fetching using Playwright. Useful for JavaScript-heavy SPA websites (e.g., x.com, dynamic web apps).

bash
markitai https://x.com/user/status/123 --playwright

TIP

To pre-install Playwright browsers:

bash
uv run playwright install chromium
# Linux also requires system dependencies:
uv run playwright install-deps chromium

--jina

Force Jina Reader API for URL fetching. A cloud-based alternative when browser rendering is not available.

bash
markitai https://example.com --jina

--cloudflare

Use Cloudflare as the cloud backend. This is a unified switch:

  • URL input: Uses Cloudflare Browser Rendering /markdown API
  • File input: Uses Cloudflare Workers AI toMarkdown for file conversion

Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID environment variables (or configure in markitai.json). Create an API token at dash.cloudflare.com/profile/api-tokens with Browser Rendering: Edit and Workers AI: Read permissions. See Configuration → Cloudflare Settings for details.

bash
# URL rendering via CF Browser Rendering
markitai https://example.com --cloudflare

# File conversion via CF Workers AI toMarkdown
markitai document.pdf --cloudflare

TIP

Cloudflare Browser Rendering is available on the Free plan. Workers AI toMarkdown is free for PDF/Office/CSV/XML; image conversion uses Neurons quota.

WARNING

--playwright, --defuddle, --static, --jina, and --cloudflare are mutually exclusive URL strategies. You can only use one at a time.

--kreuzberg

Force kreuzberg converter for all file formats, overriding native converters. This is a file conversion option (orthogonal to URL fetch strategies), and can be combined with --playwright, --defuddle, --static, or --jina. However, --kreuzberg and --cloudflare are mutually exclusive since both override file conversion.

bash
# Install kreuzberg
uv pip install markitai[kreuzberg]

# Use kreuzberg converter
markitai document.pdf --kreuzberg

# Combine with a URL strategy
markitai https://example.com --kreuzberg --playwright

Setup Commands

markitai init

Interactive setup wizard that checks dependencies, detects LLM providers (including ChatGPT and Gemini CLI), and generates a configuration file.

bash
# Interactive setup wizard
markitai init

# Quick mode (generate default config without prompts)
markitai init --yes

# Generate local project config (./markitai.json)
markitai init --local

# Specify output path
markitai init -o ./markitai.json

-I, --interactive

Enter interactive mode for guided file conversion setup.

bash
markitai -I

Configuration Commands

markitai config list

Display all configuration settings.

bash
markitai config list
markitai config list --format json

markitai config get <key>

Get a specific configuration value.

bash
markitai config get llm.enabled
markitai config get cache.enabled

markitai config set <key> <value>

Set a configuration value.

bash
markitai config set llm.enabled true
markitai config set cache.enabled false

markitai config path

Show configuration file paths.

bash
markitai config path

markitai config edit

Interactively edit configuration settings with a guided menu.

bash
markitai config edit

markitai config validate

Validate a configuration file.

bash
markitai config validate

Cache Commands

markitai cache stats

Display cache statistics.

bash
markitai cache stats
markitai cache stats --verbose    # Verbose mode
markitai cache stats --json       # JSON output
markitai cache stats --verbose --limit 50   # Limit entries shown

markitai cache clear

Clear cached data.

bash
markitai cache clear
markitai cache clear -y                       # Skip confirmation
markitai cache clear --include-spa-domains    # Also clear learned SPA domains

markitai cache spa-domains

View or manage learned SPA domains. These are domains automatically detected as requiring browser rendering.

bash
markitai cache spa-domains             # List learned domains
markitai cache spa-domains --json      # JSON output
markitai cache spa-domains --clear     # Clear all learned domains

TIP

SPA domains are learned automatically when static fetch detects JavaScript requirement. This speeds up subsequent requests by skipping wasted static fetch attempts.

Diagnostic Commands

markitai doctor

Check system health, dependencies, and authentication status. This is the primary diagnostic command.

bash
markitai doctor
markitai doctor --fix     # Auto-fix missing components
markitai doctor --json    # JSON output
markitai doctor --suggest-extras   # List recommended pip extras

This command verifies:

  • Playwright: For dynamic URL fetching (SPA rendering)
  • LibreOffice: For Office document conversion (doc, docx, xls, xlsx, ppt, pptx)
  • FFmpeg: For audio/video file processing (mp3, mp4, wav, etc.)
  • RapidOCR: For scanned document OCR (built-in, no external dependencies)
  • LLM API: Configuration and model status
  • Vision Model: For image analysis (auto-detected from litellm)
  • Local Provider Auth: Authentication status for Claude Agent and GitHub Copilot (if configured)

Example output:

┌──────────────────────────────────────────────────────────────────────────┐
│                           Dependency Status                               │
├─────────────────────┬────────┬──────────────────────────────┬────────────┤
│ Component           │ Status │ Description                  │ Details    │
├─────────────────────┼────────┼──────────────────────────────┼────────────┤
│ Playwright          │   ✓    │ Browser automation           │ Installed  │
│ LibreOffice         │   ✓    │ Office document conversion   │ v7.6.4     │
│ FFmpeg              │   ✓    │ Audio/video processing       │ v6.0       │
│ RapidOCR            │   ✓    │ OCR for scanned documents    │ v1.4.0     │
│ LLM API (copilot)   │   ✓    │ Content enhancement          │ 1 model(s) │
│ Copilot Auth        │   ✓    │ GitHub Copilot auth status   │ Authenticated │
│ Vision Model        │   ✓    │ Image analysis               │ 1 detected │
└─────────────────────┴────────┴──────────────────────────────┴────────────┘

TIP

When using local providers (claude-agent/ or copilot/), the doctor command also checks authentication status and provides resolution hints if authentication fails.

Authentication Commands

markitai auth

Authentication helpers for local providers (Gemini, Copilot, Claude, ChatGPT).

markitai auth gemini status

Show current Gemini authentication profile.

bash
markitai auth gemini status
markitai auth gemini status --json    # JSON output

markitai auth gemini login

Run Gemini OAuth login and save a Markitai-managed profile.

bash
# Default: Google One mode
markitai auth gemini login

# Code Assist mode with project ID
markitai auth gemini login --mode code-assist --project-id my-project

markitai auth copilot status

Show GitHub Copilot CLI authentication status.

bash
markitai auth copilot status
markitai auth copilot status --json    # JSON output

markitai auth copilot login

Run GitHub Copilot CLI authentication.

bash
markitai auth copilot login

markitai auth claude status

Show Claude Code CLI authentication status.

bash
markitai auth claude status
markitai auth claude status --json    # JSON output

markitai auth claude login

Run Claude Code CLI authentication.

bash
markitai auth claude login

markitai auth chatgpt status

Show ChatGPT OAuth authentication status.

bash
markitai auth chatgpt status
markitai auth chatgpt status --json    # JSON output

markitai auth chatgpt login

Run ChatGPT OAuth Device Code Flow authentication.

bash
markitai auth chatgpt login

TIP

You can also use markitai doctor to check authentication status for all configured providers at once.

Other Options

--quiet, -q

Suppress non-essential output.

bash
markitai document.docx --quiet

--verbose

Enable verbose output.

bash
markitai document.docx --verbose

--dry-run

Preview conversion without writing files.

bash
markitai document.docx --dry-run

-c, --config <path>

Specify configuration file path.

bash
markitai document.docx --config ./my-config.json

-V, --version

Show version information.

bash
markitai -V

-h, --help

Show help message.

bash
markitai -h
markitai config -h
markitai cache -h