Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.14.0] - 2026-03-25

Added

Steam News Extractor: Site-specific extractor for store.steampowered.com/news/ pages that parses BBCode announcements from JSON data attributes
MathML-to-LaTeX Converter: Structural MathML conversion for pages without LaTeX annotations (KaTeX/MathJax), handling msup, msub, mfrac, msqrt, mover, munder, mtable, and 70+ Unicode math symbol replacements
LibreOffice Functional Check: is_libreoffice_functional() verifies LibreOffice can actually convert files, not just that the binary exists
CSS Modules Hidden Detection: Detect hashed hidden class names like isHidden-vzcyV0 from CSS-in-JS frameworks

Fixed

Math Content Extraction: Body fallback now triggers when all retry levels fail to reach the sparse threshold, fixing KaTeX pages where scoring selected a single math div instead of the full article
Integration Test Reliability: Batch test fixture filters to files with registered converters; LibreOffice tests skip properly when installation is non-functional
CLI Preset Validation: Unknown presets now show available options and exit with error instead of silently continuing
BBCode XSS Prevention: Raw HTML in Steam BBCode content is escaped before conversion to prevent injection

Security

litellm Supply-Chain Pin: Pin litellm to <1.82.7 to exclude compromised versions

Changed

CI Resilience: Windows LibreOffice install retries up to 3 times with backoff to handle transient Chocolatey failures

[0.13.1] - 2026-03-23

Added

Config Editor Redesign: Replace questionary select with a custom prompt_toolkit UI featuring a visible search box with frame, fuzzy filtering, scrollable list with cursor, and "↑ N more above / ↓ N more below" scroll indicators
Fuzzy Match Search: Case-insensitive fuzzy matching for config settings (characters in order, not necessarily consecutive) with scoring that rewards consecutive and early matches
Config Field Descriptions: Add Field(description=...) to 66 Pydantic config fields, displayed inline in the config editor
In-Place UI Refresh: Use ANSI cursor position queries to erase only the lines occupied by each UI component, preserving terminal history

Fixed

Esc Key Support: Inject Esc key bindings into all questionary prompts (text, select) via prompt_toolkit merge_key_bindings; questionary 2.1.1 select() only binds Ctrl+C/Ctrl+Q natively
Bool Editor: Replace questionary.confirm() with questionary.select() using Choice(value=True/False) for consistent Esc support
Search + j/k Conflict: Disable use_jk_keys when use_search_filter is enabled (questionary 2.1.1 raises ValueError otherwise)
Literal Type Preservation: Use Choice(value=original) to preserve original typed values (int, str) when editing Literal fields, instead of converting to string

0.12.1 - 2026-03-22

Added

Stdout Terminal Image Display: Inline image rendering for Kitty/iTerm2 terminals in stdout mode, with three-tier resolution cascade (terminal protocol → persistent asset store → markdown placeholder)
Content-Addressed Asset Store: Persistent image storage with symlink refs at ~/.markitai/assets/, enabling stdout image persistence across sessions
Terminal Image Protocol Detection: Auto-detect Kitty and iTerm2 graphics protocols for native inline image display
stdout_persist Config Fields: New image.stdout_persist and image.stdout_persist_dir settings for controlling stdout image persistence
External Image Inline Display: Download and inline-display external images in single URL stdout mode (image.stdout_fetch_external)
User Journey Documentation: Comprehensive Chinese user journey document covering all features and workflows

Fixed

Stdout Mode LLM Errors: Make LLM errors visible in quiet/stdout mode via ERROR-level log handler
LLM Warning Implementation: Address third-party review findings on LLM warning display
Kitty Graphics Protocol: Convert images to PNG for Kitty protocol compatibility
Stdout Image Handling: Resolve three bugs in stdout image asset resolution and display
Cross-Platform Tests: Fix Windows test failures and missing Playwright browser handling
markitai init Duplicate Routes: Deduplicate overlapping default provider entries in generated configs, preferring Claude CLI over Anthropic API and direct Gemini API over OpenRouter Gemini

Changed

Stdout Asset Resolution: Rename strip_asset_references to resolve_asset_references with three-tier cascade logic
Terminal Image Rendering: Harden rendering pipeline and improve test coverage
markitai init Default Config: Stop writing redundant default image.compress and image.quality settings into newly generated configs

0.12.0 - 2026-03-20

Added

Native HTML Extraction Parity: Introduce resolver-based extraction pipeline with typed extraction results, frontmatter builder, quality profiles, and semantic models for threaded pages
Structured Site Extractors: Rebuild threaded extraction on shared abstractions and add native resolver coverage for GitHub Discussions, X threads, and YouTube pages
Webextract Quality Enhancements: Add noise removal, enhanced scoring, standardization, multi-level retry, content patterns, heading anchors, callouts, srcset optimization, and code language detection
CLI Force Flags: Add --static to force static HTTP with native webextract and --kreuzberg to force kreuzberg conversion for all formats
Async Enrichment Pipeline: Add policy-aware enrichers and thread inclusion rules for structured extraction
Language-Aware Vision Retry: Retry and rewrite image analysis outputs in the document language

Fixed

URL Stdout Fallback: URL mode without -o now writes to stdout instead of erroring
Concurrency Safety: Make ContentCache, _image_cache, model cooldown tracking, and io_semaphore thread-safe and reuse the cached semaphore instance
Atomic Writes: Use atomic write patterns for ConfigManager.save() and async byte writes
Resource Cleanup: Reset semaphores and proxy-bypass state in shared-client cleanup
Observability: Add debug logging for previously silent exception handlers
Webextract Regressions: Fix None tag.attrs, selector conflicts, math protection, callout/task-list/table formatting, X.com Playwright crash, tweet noise, and resolver acceptance parity
Tooling Hygiene: Resolve remaining Ruff, Pyright, Pytest, and Bandit issues and close low-priority parity coverage gaps

Changed

HTML Conversion Path: Route HTML files through the native webextract pipeline by default
Fetch Internals: Split fetch.py into smaller modules and decompose fetch_url() into composable sub-functions
CLI Logging UX: Improve batch progress reporting and quiet/verbose URL logs
Release Cleanup: Update dependencies, CI and website docs, model metadata, and clean up project structure for the 0.12.0 release

Removed

Obsolete Project Docs: Remove outdated root docs, archived plans, and historical reference material during project cleanup

0.11.2 - 2026-03-14

Fixed

Windows Compatibility: Add Windows GlobalMemoryStatusEx RAM detection for proper heavy task semaphore sizing
Lazy Directory Creation: Defer ~/.markitai/ directory creation from import-time to first write — prevents side effects when the tool is only imported or used read-only
- SPADomainCache: mkdir moved from __init__ to _save()
- SQLiteCache: mkdir moved from __init__ to _get_connection() with _dir_ensured flag to avoid repeated syscalls
Default Output/Log Dir: DEFAULT_OUTPUT_DIR and DEFAULT_LOG_DIR now default to None instead of hardcoded paths — output directory must be explicitly specified via CLI -o or config file
Pyright Warnings: Eliminate all 27 pyright warnings — suppress reportUnsupportedDunderAll for PEP 562 lazy-loading modules, fix curl_cffi ProxySpec TypedDict type mismatch
Schema Sync: Update config.schema.json to match new OutputConfig.dir and LogConfig.dir nullable types

0.11.1 - 2026-03-14

Added

Interactive Pure Mode: Add pure mode option to interactive CLI wizard

Fixed

Pure Mode Vision Bypass: --pure now correctly skips screenshot-only and vision enhancement paths, falling through to text-only LLM processing
Pure Mode Warning False Positive: --pure --screenshot-only no longer warns about --screenshot being ignored
URL Content Validation: Lower too_short threshold from 100 to 30 characters — minimal landing pages were incorrectly rejected after stripping markdown syntax
Type Safety: Fix merge_llm_usage parameter type to accept LLMUsageByModel (pyright warning)
Dead Code: Remove unused _format_standalone_image_markdown alias

Changed

CI: Upgrade GitHub Actions to Node.js 24 compatible versions

0.11.0 - 2026-03-13

Added

Pure Mode (--pure): Full implementation of transparent LLM pass-through mode — text cleaning only, no frontmatter generation or post-processing
Pure Mode Decoupled from LLM: --pure no longer implies --llm; --pure alone writes raw markdown without frontmatter, --pure --llm sends content through LLM cleaning only
Image Vision in Pure Mode: --llm --pure with image inputs routes to Vision analysis path (process_image_with_vision_pure)
--keep-base CLI Option: Explicitly write base .md even in LLM mode (default: skip base .md when LLM is enabled)
Image-Only Format Handling: Skip image-only formats (PNG, JPG, etc.) in non-LLM/non-OCR mode with clear warning
LLM Fallback: Write .md as fallback when LLM processing fails
Batch Skip Summary: Group skipped files by reason with example filenames in batch summary
Pure Mode Warning: Warn when --pure silently overrides --alt/--desc/--screenshot
Mode-Specific Cleaner Prompt: {mode_rules} template variable in cleaner prompt — standard mode gets image placeholder rules, pure mode gets YAML frontmatter preservation rules

Fixed

URL Processors: Respect --pure/--llm/--keep-base flags for base .md output in both single and batch URL processing
Pure Mode Frontmatter: process_with_llm uses clean_document_pure() instead of process_document() in pure mode, preventing LLM-generated frontmatter (description, tags, etc.)
Source Frontmatter Reconstruction: Reconstruct original YAML frontmatter from defuddle metadata before sending to LLM in pure mode
Vision Prompt Drift: Add placeholder REMINDER to vision prompts to reduce LLM drift on __MARKITAI_IMG_N__ placeholders
Stabilization Dedup: Deduplicate stabilization calls and add paged_stabilized guard
Vision JSON Mode: Fix wrong message index in vision json_mode and race condition in parallel gather
Misc Fixes: Frontmatter regex, env variable quoting, Ctrl+C handling, hardcoded weight, docstring corrections
SVG as Image-Only: Treat SVG as image-only format in batch mode

Changed

Output Strategy: LLM mode skips writing base .md by default (use --keep-base to override)
Test Performance: Optimize test suite speed (~70s → ~30s)

0.10.0 - 2026-03-12

Added

Auto-detect LLM Providers: When no markitai.json config exists, automatically detect available providers from environment variables and authenticated CLI tools (Claude CLI, Copilot CLI, Gemini CLI, ChatGPT OAuth)
Shared Provider Detection: Extract provider detection into cli/providers_detect.py shared module for reuse across interactive and non-interactive modes

Changed

Interactive Mode UX: Separate OCR and screenshots from LLM features into independent "Additional options" prompt, since they are local processing capabilities (RapidOCR, Playwright) that don't require LLM
Feature Display: Unified build_feature_str() in ui.py separates LLM features from local features with | delimiter (e.g., LLM alt desc | OCR screenshot)
Interactive Mode Flow: Show configured models after user confirms LLM enablement, not before; warn when no provider detected
Dependencies: Raise minimum constraints to match tested versions (pymupdf4llm >=1.27.2, litellm >=1.82.0, pydantic >=2.12.0, pytest >=9.0.0, ruff >=0.15.0)
CLI Flags: -v is now --verbose (was --version), -V is now --version

Fixed

Image Alt Text Language: Strip YAML frontmatter before extracting document context for image analysis, so alt text matches the document's actual language instead of defaulting to English
Interactive Provider Display: Show actual configured models from config file instead of auto-detected provider name
URL Processor Feature Display: Add missing OCR to URL processor dry-run features list
Cold Startup Performance: Lazy imports in cli/, processors/, and workflow/ __init__.py reduce cold startup from ~5s to ~0.3s

Removed

Language Field: Remove LLM-generated language field from Frontmatter model — LLM should only generate description and tags, not infer extra metadata

0.9.2 - 2026-03-11

Fixed

Copilot/Claude Login: Revert subprocess output interception for copilot/claude-agent login — always use inherited stdio so the CLI sees a real TTY, fixing credential storage failures
Login Output Display: Detect URL and device code on the same line (copilot outputs both together); track externally-printed lines for clean erasure after login
Error Message Clarity: Fix format_error_message following __context__ (implicit exception chain) to wrapper exceptions like tenacity RetryError, replacing informative provider errors with opaque <Future at 0x...> messages in logs; now only follows __cause__ (explicit raise X from Y)
Error Message Consistency: Use format_error_message in CLI catch-all handlers (file.py, workflow/core.py) to prevent opaque chained exception messages reaching users

Added

SubprocessInterceptor URL+code same-line formatting for copilot device code flow
OutputManager.track_external_lines() for tracking terminal output from inherited-stdio subprocesses

0.9.1 - 2026-03-09

Fixed

Provider Auth Preflight: Add can_attempt_login() guard to skip login prompt when provider SDK is missing; fix Rich markup swallowing [gemini-cli] via escape(); fix "Login failed: Login failed:" duplication
Install Scripts Extras Parsing: Fix greedy regex (\[.*\] → \[[^]]*\]) that captured TOML outer brackets, corrupting extras names like gemini-cli}]
Install Scripts Resilience: Progressive fallback when full extras install fails (retry without SDK-dependent extras); fix set -e silent exit on uv tool install failure; fix PowerShell 5.x Join-Path 3-arg incompatibility
Install Scripts Extras Strategy: Merge-based finalize (no longer replaces manually tracked extras); generic receipt parsing (future-proof for new extras)

Added

markitai doctor --suggest-extras as single source of truth for install scripts to query recommended extras
can_attempt_login() provider guard with get_auth_resolution_hint() fallback messages
i18n key not_found for zh-CN and en in both setup scripts

0.9.0 - 2026-03-09

Added

Fetch Strategy Priority: Configurable global and per-domain strategy ordering via strategy_priority in policy and domain_profiles
Domain/IP Exemption: local_only_patterns config field restricts specified domains/IPs to local-only strategies (static, playwright) — supports exact domain, suffix (.internal.com), wildcard (*.internal.com), IP, and CIDR notation (10.0.0.0/8, fd00::/8)
NO_PROXY Integration: inherit_no_proxy (default: true) automatically merges NO_PROXY environment variable patterns into local-only exemptions
Fetch Security Feature: README documentation for the new information security compliance capabilities

Fixed

LLM Language Consistency: Strengthened 5 prompt templates to prevent language translation when fetching mixed-language content (e.g., English UI + Chinese body) — LLM now determines output language from body text, not UI elements

0.8.1 - 2026-03-06

Added

Defuddle Fetch Strategy: New defuddle strategy (GET https://defuddle.md/<url>) as top-priority URL fetch method — free, no auth, returns clean Markdown with YAML frontmatter (title, author, published, description, word_count, domain)
Aggressive Strategy Ordering: Default ordering changed to defuddle → jina → static → playwright → cloudflare (both default and SPA scenarios)
CLI --defuddle Flag: Force defuddle-only URL fetching (mutually exclusive with --playwright, --jina, --cloudflare)
DefuddleConfig: Configurable timeout and RPM rate limiting (conservative defaults for undocumented API limits)

Changed

FetchPolicyEngine: Simplified ordering logic — removed has_jina_key branching; defuddle+jina always first
max_strategy_hops: Default increased from 4 to 5 to accommodate the new strategy

0.8.0 - 2026-03-06

Added

Extended Format Support: 20+ new file formats via markitdown and kreuzberg converters
- Markitdown-based: HTML/HTM/XHTML, CSV, EPUB, MSG, IPYNB (Jupyter Notebook), Apple Numbers
- Kreuzberg-based (optional dependency): TSV, XML, ODS, ODT, SVG, RTF, RST, ORG, TEX, EML
- Kreuzberg is a pure Rust wheel — install with uv pip install markitai[kreuzberg]
Extended Image Support: GIF, BMP, TIFF now supported by ImageConverter; BMP/TIFF auto-converted to PNG for LLM vision APIs
LLM Vision Format Helpers: is_llm_supported_image(), get_llm_effective_mime() in utils/mime.py for transparent BMP/TIFF → PNG handling

Fixed

Claude Agent SDK v0.1.46 compatibility: Removed deprecated allow_dangerously_skip_permissions parameter (permission_mode="bypassPermissions" is sufficient)
i18n test isolation: Fixed global state leak in test_i18n.py causing 3 integration tests to fail when run in full suite
Import-time log leakage: Kreuzberg registration logs changed from logger.debug to logger.trace to prevent terminal noise before CLI log setup

Changed

Converter registry: New FileFormat enum members for all added formats; kreuzberg registers as gap-filler (only for formats without native converters)
Test fixtures: Renamed to consistent sample.* naming convention; added fixtures for all new formats; removed orphaned sample.mobi
Markitdown lazy init: MarkItDown() in markitdown_ext.py now initialized on first use instead of import time

0.7.0 - 2026-03-05

Added

ChatGPT Provider (chatgpt/): Subscription-based provider using ChatGPT OAuth Device Code Flow and Responses API. No extra SDK required — uses LiteLLM's built-in authenticator. Models: chatgpt/gpt-5.2, chatgpt/codex-mini, etc.
Gemini CLI Provider (gemini-cli/): Uses Google's Gemini CLI OAuth credentials (~/.gemini/oauth_creds.json) with automatic token refresh. Optional SDK: uv add markitai[gemini-cli]. Models: gemini-cli/gemini-2.5-pro, gemini-cli/gemini-2.5-flash, etc.
Weight=0 Model Disabling: Setting weight: 0 in model config now explicitly disables the model (excluded from routing). Useful for temporarily disabling models without removing config.
Interactive Mode Enhancements: Updated onboarding wizard with ChatGPT and Gemini CLI provider options

Fixed

ZeroDivisionError in Router: Models with weight=0 are now filtered before LiteLLM Router creation, preventing division by zero in simple-shuffle routing strategy when all selected models have zero weight
Router Weight Selection: _select_model fallback uses random.choice() instead of random.uniform(0, 0) when all models have zero weight

Changed

Weight Field Semantics: weight field description updated to clarify that 0 = disabled. Minimum value enforced at 0 (negative weights rejected by validation)

0.6.1 - 2026-03-05

Fixed

Claude Agent SDK compliance: Add allow_dangerously_skip_permissions=True when using bypassPermissions, pass system messages via SDK's system_prompt parameter instead of XML tags, set additionalProperties: false in JSON object schema
Auth pre-check gaps: Detect GH_TOKEN/GITHUB_TOKEN env vars as valid Copilot authentication, detect CLAUDE_CODE_USE_BEDROCK/VERTEX/FOUNDRY env vars as valid Claude authentication
Resolution hints: Include env var alternatives in authentication error messages

Changed

Docs: Update configuration guide and ai-tools-setup with env var auth methods

0.6.0 - 2026-03-04

Added

Cloudflare Integration: Unified cloud backend with two capabilities:
- Browser Rendering: --cloudflare flag for cloud-based URL rendering via CF /markdown API, with rate limiting, cache TTL, and advanced params (user_agent, cookies, wait_for_selector, http_credentials)
- Workers AI toMarkdown: Cloud-based document conversion for PDF/XLSX/DOCX/PPTX (converter backend)
Fetch Policy Engine (fetch_policy.py): Policy-driven strategy ordering with domain-specific profiles, session persistence, and adaptive targeting
Domain Profiles: Per-domain fetch config (wait_for_selector, wait_for, extra_wait_ms, prefer_strategy) in markitai.json
Playwright Session Persistence: session_mode (isolated/domain_persistent) and session_ttl_seconds for reusing browser contexts across requests
Static HTTP Abstraction (fetch_http.py): Pluggable HTTP backend with httpx (default) and curl-cffi (TLS fingerprint impersonation) via MARKITAI_STATIC_HTTP env var
Content Validation Gate: All fetch strategies now validate content quality before accepting results
api_base env: syntax: "api_base": "env:MY_BASE_URL" in model config for environment variable expansion
CF Markdown for Agents: Content negotiation via Accept: text/markdown header for Cloudflare-enabled sites

Changed

Vision Router Fallback: When all vision models are disabled (weight=0), falls back to main router with warning instead of crashing
Playwright UTF-8 Encoding: Force UTF-8 for HTML-to-Markdown conversion to prevent encoding errors
Integration Test Resilience: Cloudflare integration tests now skip on rate limit (429) instead of failing

Fixed

ZeroDivisionError in Vision Router: Models with weight=0 (disabled) are now filtered out before litellm Router creation, preventing division by zero in simple-shuffle routing strategy
Dead Code Cleanup: Removed 21 dead functions/classes across 15+ files (backward compat aliases, deprecated functions, unused constants)

Removed

_html_to_text, _normalize_bypass_list, _get_proxy_bypass, get_proxy_for_url, _url_to_session_id from fetch.py
sanitize_error_message from security.py
_deep_update, get_config from config.py
order_dict_keys_sorted, _order_image_entry from json_order.py
reset_consoles from console.py
get_llm_not_configured_hint from hints.py
remove_uncommented_screenshots, _UNCOMMENTED_SCREENSHOT_RE from llm/content.py
get_pending_urls, finish_url_processing from batch.py
LLMUsageAccumulator from workflow/helpers.py
DEFAULT_LOG_PANEL_MAX_LINES from constants.py
Multiple backward-compatibility aliases from cli/processors/

0.5.2 - 2026-02-07

Fixed

SQLite ResourceWarning: Close SQLite connections explicitly via _connect() context manager, preventing ResourceWarning: unclosed database on Python 3.13
Windows path handling: context_display_name() now handles C:/ forward-slash Windows paths (was only handling C:\)
Windows install hints: markitai doctor shows platform-specific install commands (PowerShell/winget on Windows, curl on Unix)
OAuth token expiry: markitai doctor no longer reports "Token expired" when a valid refresh token exists
Config get output: markitai config get renders Pydantic models as formatted JSON with syntax highlighting instead of raw Python repr
Copilot ProviderError: Added missing provider kwarg when raising ProviderError for unsupported models
Pyright warnings: Resolved all Pyright warnings (lazy __all__, type narrowing, optional imports)

Changed

26 documentation fixes: Comprehensive audit fixing docstring-to-code mismatches across all modules (llm, providers, converter, utils, config)

0.5.1 - 2026-02-07

Added

Playwright auto-scroll: Auto-scroll pages to trigger lazy-loaded content before extraction (up to 8 steps, inspired by baoyu-skills url-to-markdown)
DOM noise cleanup: Remove navigation, ads, cookie banners, popups, and inline event handlers before content extraction
python -m markitai: Add __main__.py for -m invocation support (fixes Windows execution)
Multi-provider detection: Interactive mode (-I) now detects and displays all available LLM providers (DeepSeek, OpenRouter included)
Copilot GPT-5 series support: GPT-5, GPT-5.1, GPT-5.2, GPT-5.1-Codex-Mini/Max, GPT-5.2-Codex now fully supported via Copilot provider
22 new unit tests: Vision fallback strategies, smart_truncate edge cases, content protection roundtrip, cache fingerprint collision resistance, batch thread safety

Changed

Default models modernized: Updated outdated defaults across init/interactive/doctor (haiku→sonnet, gpt-4o→gpt-5.2, gemini-2.0→2.5, claude-sonnet-4→4.5)
Init wizard: Multi-provider default selection, API keys stored in .env instead of plaintext config, next-steps hints after completion
LLM code deduplication: document.py now delegates _protect_image_positions / _restore_image_positions to content.py shared functions
Cache fingerprint: SHA256 over full content + page structure replaces text[:1000] prefix-based cache keys, preventing collisions for documents with identical prefixes
Batch thread safety: Double-checked locking with timeout-based lock acquisition (5s) replaces non-blocking acquire(blocking=force)
LiteLLM model database: Refreshed with 35 new models including Claude Opus 4.6

Fixed

DOM cleanup JS syntax error: Selectors with double quotes (e.g., [role="banner"]) now properly escaped via json.dumps() instead of f-string interpolation
Copilot model blocklist: Removed outdated GPT-5 series from UNSUPPORTED_MODELS (only o1/o3 reasoning models remain blocked)
CLI provider display: Truncate provider list with (+N more) when >3 detected to prevent line overflow

0.5.0 - 2026-02-06

Added

Unified UI system: New ui.py components and i18n.py module with Chinese/English support across all CLI commands
markitai init: One-stop setup wizard — checks dependencies, detects LLM providers, generates config
Interactive mode (-I): Guided setup with questionary prompts for new users
doctor --fix: Auto-install missing components (e.g., Playwright)
Cross-platform install hints: Platform-specific installation commands in doctor output
MARKITAI_LOG_FORMAT: Environment variable override for log format
JSON repair: Fallback parser for malformed LLM JSON responses using json_repair

Changed

Performance

CLI startup: Lazy-load processor and command modules (~3x faster --help)
Dependency checks: Parallelized doctor and init with ThreadPoolExecutor
LLM processing: Pre-compiled regex patterns and batched replacements
PDF rendering: Parallel page rendering for standard and LLM modes
URL fetching: Async-safe cache locking for concurrent requests
Executor: Auto-detect heavy task limit based on system RAM
Image processing: Offloaded CPU-intensive work to thread pool
Cache stats: Merged stats and model breakdown into single SQLite query

Refactoring

Batch UI: Replaced Rich table/LogPanel with compact unified UI (progress bar with current file, completion summary)
Log format: Default changed to human-readable text (was JSON)
LLM cache: Deduplicated SQLiteCache/PersistentCache into llm/cache.py
Single file output: Layered output with --verbose for detailed logs
Setup scripts: Consolidated 10 scripts into 2 unified files (setup.sh + setup.ps1) with built-in i18n

Fixed

Windows: LibreOffice detection with fallback to Program Files paths (not just PATH)
Windows: FFmpeg/CLI path display — show "installed" instead of long winget package paths
Windows: config path alignment with dynamic padding and continuous │ column
Playwright: Default wait_for changed to domcontentloaded (was networkidle, caused hangs)
Config: Schema and function defaults synced with constants
Exceptions: Preserved exception chains (raise from) across codebase
Cache: Prevented stale markitai_processed timestamp on cache hit
CLI: Version flag reverted to -v/--version, --verbose kept without short flag

CI

Added Windows LibreOffice install step (choco) to CI matrix
Changed to --all-extras for comprehensive dependency testing
Publish workflow: split unit/integration tests with SKIP_LLM_TESTS

0.4.2 - 2026-02-03

Changed

Playwright defaults: wait_for changed to networkidle, extra_wait_ms to 5000ms for better SPA support
Frontmatter validation: Pydantic validators reject empty description/tags, triggering Instructor auto-retry
VitePress: Upgraded to 2.0.0-alpha.16

Fixed

X/Twitter content: Pages now wait for full JS rendering before capture
Cache directories: All caches now respect cache.global_dir config instead of hardcoded paths
Setup scripts: Improved piped execution (curl | sh), proper Playwright installation paths
Config init: Added --yes/-y flag for non-interactive use

0.4.1 - 2026-02-02

Added

markitai doctor: New diagnostic command for system health and auth status checking
Adaptive timeout: Local providers auto-adjust timeout based on request complexity
Prompt caching: Claude Agent caches long system prompts (>4KB) for cost reduction

Changed

check-deps renamed to doctor (old name kept as alias)
Improved error messages with resolution hints for local providers

Fixed

Request timeouts on large documents with Claude Agent / Copilot
JSON extraction issues with control characters and markdown code blocks

0.4.0 - 2026-01-28

Added

Claude Agent SDK: claude-agent/sonnet|opus|haiku via Claude Code CLI
GitHub Copilot SDK: copilot/claude-sonnet-4.5|gpt-4o|o1 models
URL HTTP caching: ETag/Last-Modified conditional requests
Quiet mode: --quiet / -q flag (auto-enabled for single file)
Module refactoring: cli.py → cli/, llm.py → llm/, new providers/
Setup scripts hardening: default N for high-impact ops, version pinning
Docs: CONTRIBUTING.md, architecture.md, ai-tools-setup.md, dependabot.yml

Changed

Python 3.13, docs reorganized to docs/archive/
agent-browser locked to 0.7.6 (Windows bug in 0.8.x)
Default extra_wait_ms: 1000 → 3000, Instructor mode: JSON → MD_JSON

Fixed

Windows: UTF-8 console, Copilot CLI path discovery, script argument quoting
LLM: Frontmatter regex fallback, source field fix, vision/frontmatter error handling
Prompts: Enhanced prompt leakage prevention, placeholder protection rules
Content: Social media cleanup rules (X/Twitter, Facebook, Instagram)
Setup: WSL detection, Python pymanager support, PATH refresh order

0.3.2 - 2026-01-27

Added

Chinese README (README_ZH.md) with language toggle
Chinese setup scripts: setup-zh.sh, setup-zh.ps1, setup-dev-zh.sh, setup-dev-zh.ps1

Changed

Improved setup scripts with better error handling and user feedback
Updated Python version note: 3.11-3.13 (3.14 not yet supported)
Updated documentation language toggle links

0.3.1 - 2026-01-27

Fixed

Prompt Leakage Prevention

Split all prompts into *_system.md (role definition) and *_user.md (content template)
Added _validate_no_prompt_leakage() to detect and handle prompt leakage in LLM output
Updated LLM calls to use proper [{"role": "system"}, {"role": "user"}] message structure

LLM Compatibility

Fixed max_tokens exceeding deepseek limit by using minimum across all router models
Fixed terminal window popup on Windows when running agent-browser verification

URL Fetching

Improved error messages for browser fetch timeout (no longer suggests installing when already attempted)
Added auto-proxy detection for Jina API and browser fetching
- Checks environment variables: HTTPS_PROXY, HTTP_PROXY, ALL_PROXY
- Auto-detects local proxy ports: 7890 (Clash), 10808 (V2Ray), 1080 (SOCKS5), etc.

Added

SPA Domain Learning

New SPADomainCache for automatic detection and caching of JavaScript-heavy sites
markitai cache spa-domains command to view/manage learned domains
markitai cache clear --include-spa-domains option

Windows Performance Optimizations

Thread pool optimization: Windows defaults to 4 workers (vs 8 on Linux/macOS)
ONNX Runtime global singleton with preheat for OCR engine
OpenCV-based image compression (releases GIL, 20-40% faster)
Batch subprocess execution for agent-browser commands

Changed

Default image quality: 85 → 75
Default image max_height: 1080 → 99999 (effectively unlimited)
Default image min_area filter: 2500 → 5000
Default URL concurrency: 3 → 5
Default scan_max_depth: 10 → 5
Extended fallback_patterns with more social media domains

0.3.0 - 2026-01-26

Added

URL Conversion Support

Direct URL conversion: markitai <url> converts web pages to Markdown
URL batch processing: Support .urls file format (text or JSON), auto-detected from input
URL image downloading: download_url_images() with concurrent downloads (5 parallel)
Automatic relative URL resolution for images
Cross-platform filename sanitization (Windows illegal characters handling)

Multi-Source URL Fetching (fetch.py)

Three fetch strategies: --static / --agent-browser / --jina
- static: MarkItDown direct HTTP fetch (default, fastest)
- browser: agent-browser headless rendering (for JS-heavy pages)
- jina: Jina Reader API (cloud-based, no local deps)
- auto: Smart fallback (static → browser/jina if JS detected)
FetchCache: SQLite-based URL cache with LRU eviction (100MB default)
Screenshot capture: --screenshot for full-page screenshots via browser
Multi-source content: Parallel static + browser fetch with quality validation
Domain pattern matching for auto-browser fallback (x.com, twitter.com, etc.)
FetchResult with static_content, browser_content, screenshot_path

agent-browser Integration

Headless browser automation via agent-browser CLI
Configurable wait states: load, domcontentloaded, networkidle
Extra wait time for SPA rendering (extra_wait_ms)
Session isolation for concurrent fetches
verify_agent_browser_ready() with cached readiness check
Screenshot compression with Pillow (JPEG quality + max height)

URL LLM Enhancement

New prompts/url_enhance.md for URL-specific content cleaning
Multi-source LLM processing: combine static + browser + screenshot
Smart content selection based on validity detection

Cache Enhancements

--no-cache-for <pattern>: Selective cache bypass with glob patterns
- Single file: --no-cache-for file1.pdf
- Glob pattern: --no-cache-for "*.pdf"
- Mixed: --no-cache-for "*.pdf,reports/**"
markitai cache stats -v: Verbose mode with detailed cache entries
--limit N: Control number of entries in verbose output (default: 20)
--scope project|global|all: Filter cache statistics by scope
SQLiteCache.list_entries(): List cache entries with metadata
SQLiteCache.stats_by_model(): Per-model cache statistics
Improved cache hash: head + tail + length algorithm for better invalidation

Workflow Core Refactor (workflow/core.py)

ConversionContext: Unified single-file conversion context
convert_document_core(): Main conversion pipeline
- validate_and_detect_format() → convert_document() → process_embedded_images()
- write_base_markdown() → process_with_vision_llm() / process_with_standard_llm()
Parallel document + image processing with proper dependency handling
Alt text injection after LLM processing completes (race condition fix)

Official Website

VitePress 2.x documentation site with bilingual support (English/Chinese)
Custom theme with brand colors matching logo
Local search integration
GitHub Actions auto-deployment to GitHub Pages

Project

MIT License: Added LICENSE file

CI/CD

.github/workflows/ci.yml: Automated testing on push/PR
.github/workflows/deploy-website.yml: Website deployment to GitHub Pages

Code Architecture

New utils/paths.py: ensure_dir(), ensure_subdir(), ensure_assets_dir()
New utils/mime.py: get_mime_type(), get_extension_from_mime()
New utils/text.py: normalize_markdown_whitespace(), text utilities
New utils/executor.py: run_in_executor() with shared ThreadPoolExecutor
New utils/output.py: Output formatting helpers
New json_order.py: Ordered JSON serialization for reports/state files
New urls.py: .urls file parser (JSON and plain text formats)
LLMUsageAccumulator class for centralized cost tracking
create_llm_processor() factory function
Unified detect_language() with get_language_name() helper
Centralized IMAGE_EXTENSIONS, JS_REQUIRED_PATTERNS constants

Configuration

supports_vision now optional: Auto-detected from litellm when not explicitly set
- No need to manually configure for most models (GPT-4o, Gemini, Claude, etc.)
- Explicit supports_vision: true/false overrides auto-detection if needed

Changed

Package Rename

markit → markitai: Package renamed for clarity
CLI command remains markitai

Python Version

Python 3.11+ support: Lowered minimum Python version from 3.13 to 3.11

CLI Behavior

Single file mode: Direct stdout output (no logging by default)
--verbose: Show logs before output in single file mode
Batch processing behavior unchanged

Code Quality

Refactored PowerShell COM conversion scripts (~18% code reduction)
Unified MIME type mapping across codebase
Extracted common fixtures to conftest.py
Improved error messages for network failures (SSL/connection/proxy)
Architecture diagram updated in docs/spec.md

Fixed

URL filename cross-platform compatibility
Cache invalidation for large documents (tail changes now detected)
Image analysis race condition with .llm.md file writing

0.2.4 - 2026-01-21

Changed

Restructured assets.json format with flat asset array
Extract Live display management for early log capture
Improved MS Office detection with file path fallback

Fixed

Add openpyxl FileVersion compatibility patch
Add pptx XMLSyntaxError compatibility patch
Enhanced check_symlink_safety with nested symlink detection
LLM empty response retry logic
normalize_frontmatter for consistent YAML field order

0.2.3 - 2026-01-20

Added

Persistent LLM Cache

SQLite-based cache with LRU eviction and size limits (default 1GB)
Dual-layer lookup: project cache + global cache
CacheConfig in MarkitaiConfig with enabled/no_cache/max_size options
--no-cache CLI flag: Skip reading but still write (Bun semantics)
markitai cache stats [--json]: View cache statistics
markitai cache clear [--scope]: Clear cache by scope

Vision Router Optimization

Smart router selection: auto-detect image content in messages
vision_router property filtering only supports_vision=true models
Replace hardcoded "vision" model name with "default" + smart routing

Legacy Office Conversion

MS Office COM batch conversion: one app launch per file type
check_ms_word/excel/powerpoint_available() registry-based detection
Pre-convert legacy files before batch processing to reduce overhead

Performance (Phase 3)

Parallel PDF processing: Concurrent page OCR & rendering
Parallel image processing: ProcessPoolExecutor for CPU-bound compression
Adaptive worker count based on file size
LRU eviction and byte-size limits for image cache
Batch semaphore for memory pressure control

Changed

OCR optimization: recognize_numpy() and recognize_pixmap() for direct array processing
Reuse already-rendered pixmap in PDF OCR (avoid re-rendering)

Fixed

EMF/WMF format detection and PNG conversion support
DATA_URI_PATTERN regex for hyphenated MIME types (x-emf, x-wmf)
Base64 stripping: remove hallucinated images instead of replacing
Batch timing: record start_at before pre-conversion for accurate duration
Pyright venv detection: add venvPath/venv to pyproject.toml

0.2.2 - 2026-01-20

Added

constants.py module to consolidate hardcoded values
Unit tests for image and llm modules
convert_to_markdown.py reference script

Changed

Centralized constants usage across config.py, llm.py, batch.py, image.py
Improved LLM content restoration with garbage detection logic
Enable parallel batch processing for image analysis
Move state saving outside semaphore to reduce blocking

Fixed

Rich Panel markup parsing issue (escape file paths)

0.2.1 - 2026-01-20

Added

LLM Usage Tracking

Context-based usage tracking (per-file instead of global)
get_context_cost() and get_context_usage() for per-file stats
Thread-safe lock for concurrent access to usage dictionaries

Type System

types.py with TypedDict definitions (ModelUsageStats, LLMUsageByModel, AssetDescription)
ImageAnalysis.llm_usage for multi-model tracking (renamed from model)

Model Configuration

get_model_max_output_tokens() using litellm.get_model_info()
Auto-inject max_tokens with fallback to conservative default (8192)

Office Detection

utils/office.py module with cross-platform detection
has_ms_office(): Windows COM-based MS Office detection
find_libreoffice(): PATH + common paths search with @lru_cache

Image Processing

strip_base64_images() method
remove_nonexistent_images() to clean LLM-hallucinated references
Normalize whitespace for standalone image .llm.md output

Changed

File conflict rename strategy: .2.md → .v2.md for natural sort order
Batch state: add screenshots field (separate from embedded images)
Batch state: add log_file field for run traceability
Store file paths as relative to input_dir in batch state

0.2.0 - 2026-01-19

Added

Monorepo architecture with uv workspace (packages/markitai/)
LiteLLM integration for unified LLM provider access
New converter modules: pdf, office, image, text, legacy
Workflow system for single file processing (workflow/single.py)
Markdown-based prompt management system (prompts/*.md)
Unified config with JSON schema validation (config.schema.json)
Security module for path validation (security.py)
Comprehensive test suite with fixtures

Changed

CLI rewritten with Click (replaced Typer)
Requires Python 3.13+

Removed

Old src/markitai/ structure and all legacy code
Complex pipeline/router/state machine architecture
Individual LLM provider implementations (OpenAI, Anthropic, etc.)
Docker and CI scripts (to be re-added later)

Breaking Changes

Configuration format changed (see migration guide)
CLI command syntax updated
Python 3.12 and below no longer supported

0.1.6 - 2026-01-14

Fixed

Model routing strategy bugs
Documentation accuracy improvements

0.1.5 - 2026-01-13

Changed

Refactored prompt management system for better maintainability
Simplified cleaner module logic

0.1.4 - 2026-01-13

Fixed

JSON parsing edge cases in LLM responses
Log formatting improvements for readability

0.1.3 - 2026-01-12

Added

Test coverage improved to 81%

Changed

Adopted src layout for project structure
Reorganized documentation to docs/reference/
Added GitHub Actions CI workflow

Fixed

Provider-specific bugs in fallback handling

0.1.2 - 2026-01-12

Added

Resilience features for network failures (retry logic, timeout handling)
CLAUDE.md and AGENTS.md documentation for AI assistants

Changed

Log optimization for cleaner, more informative output

0.1.1 - 2026-01-11

Changed

Major architecture refactoring with service layer pattern
Enhanced LLM support with better error handling and retries

0.1.0 - 2026-01-10

Added

Capability-Based Model Routing

required_capability and prefer_capability parameters for LLM calls
Text tasks prioritize text-only models for cost efficiency
Vision tasks automatically use vision-capable models
Backward compatible: parameters default to None (round-robin behavior)

Lazy Model Initialization

Providers loaded on-demand instead of all at startup
Significantly reduced initialization time for single-file conversions
warmup() method for batch mode to validate providers upfront
required_capabilities parameter in initialize()

Concurrent Fallback Mechanism

Primary model timeout triggers parallel backup model execution
Neither model is interrupted - first response wins
Configurable via llm.concurrent_fallback_timeout (default: 180s)
Handles Gemini 504 timeout scenarios gracefully

Execution Mode Support

--fast flag for speed-optimized batch processing
Fast mode: skips validation, limits fallback attempts, reduces logging
Default mode: full validation, detailed logging, comprehensive retries
Configurable via execution.mode in config file

Enhanced Statistics

BatchStats class for comprehensive processing metrics
Per-model tracking: calls, tokens, duration, estimated cost
ModelCostConfig for optional cost estimation
Summary format: "Complete: X success, Y failed | Total: Xs | Tokens: N"

Changed

CLI architecture refactored for better modularity
Config format migrated from JSON to YAML

0.0.1 - 2026-01-08

Added

Initial release
CLI commands: convert, batch, config, provider
Multi-format support: Word (.doc, .docx), PowerPoint (.ppt, .pptx), Excel (.xls, .xlsx), PDF, HTML
LLM enhancement: markdown formatting, frontmatter generation, image alt text
5 LLM providers with fallback: OpenAI, Anthropic, Gemini, Ollama, OpenRouter
3 PDF engines: pymupdf4llm (default), pymupdf, pdfplumber
Image processing: extraction, compression (oxipng/mozjpeg), LLM analysis
Batch processing with resume capability and concurrency control
Unit and integration tests
Docker multi-stage build
Chinese and English documentation

Changelog ​

[0.14.0] - 2026-03-25 ​

Added ​

Fixed ​

Security ​

Changed ​

[0.13.1] - 2026-03-23 ​

Added ​

Fixed ​

0.12.1 - 2026-03-22 ​

Added ​

Fixed ​

Changed ​

0.12.0 - 2026-03-20 ​

Added ​

Fixed ​

Changed ​

Removed ​

0.11.2 - 2026-03-14 ​

Fixed ​

0.11.1 - 2026-03-14 ​

Added ​

Fixed ​

Changed ​

0.11.0 - 2026-03-13 ​

Added ​

Fixed ​

Changed ​

0.10.0 - 2026-03-12 ​

Added ​

Changed ​

Fixed ​

Removed ​

0.9.2 - 2026-03-11 ​

Fixed ​

Added ​

0.9.1 - 2026-03-09 ​

Fixed ​

Added ​

0.9.0 - 2026-03-09 ​

Added ​

Fixed ​

0.8.1 - 2026-03-06 ​

Added ​

Changed ​

0.8.0 - 2026-03-06 ​

Added ​

Fixed ​

Changed ​

0.7.0 - 2026-03-05 ​

Added ​

Fixed ​

Changed ​

0.6.1 - 2026-03-05 ​

Fixed ​

Changed ​

0.6.0 - 2026-03-04 ​

Added ​

Changed ​

Fixed ​

Removed ​

0.5.2 - 2026-02-07 ​

Fixed ​

Changed ​

0.5.1 - 2026-02-07 ​

Added ​

Changed ​

Fixed ​

0.5.0 - 2026-02-06 ​

Added ​

Changed ​

Performance ​

Refactoring ​

Fixed ​

CI ​

0.4.2 - 2026-02-03 ​

Changed ​

Fixed ​

0.4.1 - 2026-02-02 ​

Added ​

Changelog

[0.14.0] - 2026-03-25

Added

Fixed

Security

Changed

[0.13.1] - 2026-03-23

Added

Fixed

0.12.1 - 2026-03-22

Added

Fixed

Changed

0.12.0 - 2026-03-20

Added

Fixed

Changed

Removed

0.11.2 - 2026-03-14

Fixed

0.11.1 - 2026-03-14

Added

Fixed

Changed

0.11.0 - 2026-03-13

Added

Fixed

Changed

0.10.0 - 2026-03-12

Added

Changed

Fixed

Removed

0.9.2 - 2026-03-11

Fixed

Added

0.9.1 - 2026-03-09

Fixed

Added

0.9.0 - 2026-03-09

Added

Fixed

0.8.1 - 2026-03-06

Added

Changed

0.8.0 - 2026-03-06

Added

Fixed

Changed

0.7.0 - 2026-03-05

Added

Fixed

Changed

0.6.1 - 2026-03-05

Fixed

Changed

0.6.0 - 2026-03-04

Added

Changed

Fixed

Removed

0.5.2 - 2026-02-07

Fixed

Changed

0.5.1 - 2026-02-07

Added

Changed

Fixed

0.5.0 - 2026-02-06

Added

Changed

Performance

Refactoring

Fixed

CI

0.4.2 - 2026-02-03

Changed

Fixed

0.4.1 - 2026-02-02

Added