CLI Reference
Basic Usage
markitai <input> [options]The <input> can be:
- A file path (
document.docx) - A directory path (
./docs) - A URL (
https://example.com)
Conversion Options
--llm
Enable LLM-powered format cleaning and optimization. By default, only .llm.md is written (base .md is skipped). Use --keep-base to write both.
markitai document.docx --llm--preset <name>
Use a predefined configuration preset.
| Preset | Description |
|---|---|
rich | LLM + alt + desc + screenshot |
standard | LLM + alt + desc |
minimal | Basic conversion only |
markitai document.pdf --preset rich--alt
Generate alt text for images using AI.
markitai document.pdf --alt--desc
Generate detailed descriptions for images.
markitai document.pdf --desc--screenshot
Enable screenshot capture:
- PDF/PPTX: Renders pages/slides as JPEG images
- URLs: Captures full-page screenshots using Playwright
# Document screenshots
markitai document.pdf --screenshot
markitai presentation.pptx --screenshot
# URL screenshots
markitai https://example.com --screenshotTIP
For URLs, --screenshot automatically upgrades the fetch strategy to playwright if needed. The screenshot is saved as {domain}_path.full.jpg in the screenshots/ subdirectory.
--screenshot-only
Capture screenshots only without extracting content. Behavior depends on --llm:
| Command | Output |
|---|---|
--screenshot-only | Screenshots only (no .md files) |
--llm --screenshot-only | .md + .llm.md + screenshots (LLM extracts from screenshots) |
# Just capture screenshots
markitai https://example.com --screenshot-only
# LLM extracts content purely from screenshots
markitai https://example.com --llm --screenshot-onlyTIP
Use --llm --screenshot-only for pages where traditional content extraction fails (e.g., heavy JavaScript sites, social media).
--ocr
Enable OCR for scanned documents.
markitai scanned.pdf --ocr--pure
Transparent pass-through mode: LLM only does text cleaning, no frontmatter generation or post-processing.
# Without --llm: writes raw markdown without frontmatter
markitai document.docx --pure
# With --llm: sends content through LLM for text cleaning only
markitai document.docx --llm --pure
# With --preset: preset controls features, --pure controls output format
markitai document.pdf --preset rich --pureTIP
--pure and --llm are independent flags. --pure alone skips frontmatter generation; --pure --llm sends content to LLM for cleaning but returns raw output without generated metadata (description, tags, etc.).
WARNING
--pure silently overrides --alt, --desc, and --screenshot. A warning is displayed when these flags are used together.
--keep-base
Write base .md file even in LLM mode. By default, --llm only outputs .llm.md to avoid redundant files.
# Default: only .llm.md is written
markitai document.docx --llm
# Keep both .md and .llm.md
markitai document.docx --llm --keep-base--no-compress
Disable image compression.
markitai document.pdf --no-compressOutput Options
-o, --output <path>
Specify output directory.
markitai document.docx -o ./output--resume
Resume interrupted batch processing.
markitai ./docs -o ./output --resumeConcurrency Options
--llm-concurrency <n>
Number of concurrent LLM requests.
markitai ./docs --llm --llm-concurrency 10-j, --batch-concurrency <n>
Number of concurrent file processing tasks (default: 10).
markitai ./docs -o ./output -j 4TIP
For mixed file and URL batches, use --url-concurrency to control URL fetching separately. This prevents slow URLs from blocking file processing.
Cache Options
--no-cache
Disable LLM result caching (force fresh API calls).
markitai document.docx --llm --no-cache--no-cache-for <patterns>
Disable cache for specific files or patterns (comma-separated).
# Single file
markitai ./docs --no-cache-for file1.pdf
# Glob pattern
markitai ./docs --no-cache-for "*.pdf"
# Multiple patterns
markitai ./docs --no-cache-for "*.pdf,reports/**"URL Options
.urls File Support
When the input is a .urls file, Markitai automatically processes it as a URL batch.
markitai urls.urls -o ./outputThe .urls file format:
# Comments start with #
https://example.com/page1
https://example.com/page2--url-concurrency <n>
Number of concurrent URL fetch operations (default: 5). This is separate from --batch-concurrency to prevent slow URLs from blocking file processing.
markitai ./docs -o ./output --url-concurrency 5--playwright
Force browser rendering for URL fetching using Playwright. Useful for JavaScript-heavy SPA websites (e.g., x.com, dynamic web apps).
markitai https://x.com/user/status/123 --playwrightTIP
To pre-install Playwright browsers:
uv run playwright install chromium
# Linux also requires system dependencies:
uv run playwright install-deps chromium--jina
Force Jina Reader API for URL fetching. A cloud-based alternative when browser rendering is not available.
markitai https://example.com --jina--cloudflare
Use Cloudflare as the cloud backend. This is a unified switch:
- URL input: Uses Cloudflare Browser Rendering
/markdownAPI - File input: Uses Cloudflare Workers AI
toMarkdownfor file conversion
Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID environment variables (or configure in markitai.json). Create an API token at dash.cloudflare.com/profile/api-tokens with Browser Rendering: Edit and Workers AI: Read permissions. See Configuration → Cloudflare Settings for details.
# URL rendering via CF Browser Rendering
markitai https://example.com --cloudflare
# File conversion via CF Workers AI toMarkdown
markitai document.pdf --cloudflareTIP
Cloudflare Browser Rendering is available on the Free plan. Workers AI toMarkdown is free for PDF/Office/CSV/XML; image conversion uses Neurons quota.
WARNING
--playwright, --jina, and --cloudflare are mutually exclusive. You can only use one at a time.
Setup Commands
markitai init
Interactive setup wizard that checks dependencies, detects LLM providers, and generates a configuration file.
# Interactive setup wizard
markitai init
# Quick mode (generate default config without prompts)
markitai init --yes
# Generate local project config (./markitai.json)
markitai init --local
# Specify output path
markitai init -o ./markitai.json-I, --interactive
Enter interactive mode for guided file conversion setup.
markitai -IConfiguration Commands
markitai config list
Display all configuration settings.
markitai config list
markitai config list --format jsonmarkitai config get <key>
Get a specific configuration value.
markitai config get llm.enabled
markitai config get cache.enabledmarkitai config set <key> <value>
Set a configuration value.
markitai config set llm.enabled true
markitai config set cache.enabled falsemarkitai config path
Show configuration file paths.
markitai config pathmarkitai config validate
Validate a configuration file.
markitai config validateCache Commands
markitai cache stats
Display cache statistics.
markitai cache stats
markitai cache stats --verbose # Verbose mode
markitai cache stats --json # JSON outputmarkitai cache clear
Clear cached data.
markitai cache clear
markitai cache clear -y # Skip confirmation
markitai cache clear --include-spa-domains # Also clear learned SPA domainsmarkitai cache spa-domains
View or manage learned SPA domains. These are domains automatically detected as requiring browser rendering.
markitai cache spa-domains # List learned domains
markitai cache spa-domains --json # JSON output
markitai cache spa-domains --clear # Clear all learned domainsTIP
SPA domains are learned automatically when static fetch detects JavaScript requirement. This speeds up subsequent requests by skipping wasted static fetch attempts.
Diagnostic Commands
markitai doctor
Check system health, dependencies, and authentication status. This is the primary diagnostic command.
markitai doctor
markitai doctor --fix # Auto-fix missing components
markitai doctor --json # JSON outputThis command verifies:
- Playwright: For dynamic URL fetching (SPA rendering)
- LibreOffice: For Office document conversion (doc, docx, xls, xlsx, ppt, pptx)
- FFmpeg: For audio/video file processing (mp3, mp4, wav, etc.)
- RapidOCR: For scanned document OCR (built-in, no external dependencies)
- LLM API: Configuration and model status
- Vision Model: For image analysis (auto-detected from litellm)
- Local Provider Auth: Authentication status for Claude Agent and GitHub Copilot (if configured)
Example output:
┌──────────────────────────────────────────────────────────────────────────┐
│ Dependency Status │
├─────────────────────┬────────┬──────────────────────────────┬────────────┤
│ Component │ Status │ Description │ Details │
├─────────────────────┼────────┼──────────────────────────────┼────────────┤
│ Playwright │ ✓ │ Browser automation │ Installed │
│ LibreOffice │ ✓ │ Office document conversion │ v7.6.4 │
│ FFmpeg │ ✓ │ Audio/video processing │ v6.0 │
│ RapidOCR │ ✓ │ OCR for scanned documents │ v1.4.0 │
│ LLM API (copilot) │ ✓ │ Content enhancement │ 1 model(s) │
│ Copilot Auth │ ✓ │ GitHub Copilot auth status │ Authenticated │
│ Vision Model │ ✓ │ Image analysis │ 1 detected │
└─────────────────────┴────────┴──────────────────────────────┴────────────┘TIP
When using local providers (claude-agent/ or copilot/), the doctor command also checks authentication status and provides resolution hints if authentication fails.
Other Options
--quiet, -q
Suppress non-essential output.
markitai document.docx --quiet--verbose
Enable verbose output.
markitai document.docx --verbose--dry-run
Preview conversion without writing files.
markitai document.docx --dry-run-c, --config <path>
Specify configuration file path.
markitai document.docx --config ./my-config.json-v, --version
Show version information.
markitai -v-h, --help
Show help message.
markitai -h
markitai config -h
markitai cache -h