Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.11.2 - 2026-03-14
Fixed
- Windows Compatibility: Add Windows
GlobalMemoryStatusExRAM detection for proper heavy task semaphore sizing - Lazy Directory Creation: Defer
~/.markitai/directory creation from import-time to first write — prevents side effects when the tool is only imported or used read-onlySPADomainCache: mkdir moved from__init__to_save()SQLiteCache: mkdir moved from__init__to_get_connection()with_dir_ensuredflag to avoid repeated syscalls
- Default Output/Log Dir:
DEFAULT_OUTPUT_DIRandDEFAULT_LOG_DIRnow default toNoneinstead of hardcoded paths — output directory must be explicitly specified via CLI-oor config file - Pyright Warnings: Eliminate all 27 pyright warnings — suppress
reportUnsupportedDunderAllfor PEP 562 lazy-loading modules, fixcurl_cffiProxySpecTypedDict type mismatch - Schema Sync: Update
config.schema.jsonto match newOutputConfig.dirandLogConfig.dirnullable types
0.11.1 - 2026-03-14
Added
- Interactive Pure Mode: Add pure mode option to interactive CLI wizard
Fixed
- Pure Mode Vision Bypass:
--purenow correctly skips screenshot-only and vision enhancement paths, falling through to text-only LLM processing - Pure Mode Warning False Positive:
--pure --screenshot-onlyno longer warns about--screenshotbeing ignored - URL Content Validation: Lower
too_shortthreshold from 100 to 30 characters — minimal landing pages were incorrectly rejected after stripping markdown syntax - Type Safety: Fix
merge_llm_usageparameter type to acceptLLMUsageByModel(pyright warning) - Dead Code: Remove unused
_format_standalone_image_markdownalias
Changed
- CI: Upgrade GitHub Actions to Node.js 24 compatible versions
0.11.0 - 2026-03-13
Added
- Pure Mode (
--pure): Full implementation of transparent LLM pass-through mode — text cleaning only, no frontmatter generation or post-processing - Pure Mode Decoupled from LLM:
--pureno longer implies--llm;--purealone writes raw markdown without frontmatter,--pure --llmsends content through LLM cleaning only - Image Vision in Pure Mode:
--llm --purewith image inputs routes to Vision analysis path (process_image_with_vision_pure) --keep-baseCLI Option: Explicitly write base.mdeven in LLM mode (default: skip base.mdwhen LLM is enabled)- Image-Only Format Handling: Skip image-only formats (PNG, JPG, etc.) in non-LLM/non-OCR mode with clear warning
- LLM Fallback: Write
.mdas fallback when LLM processing fails - Batch Skip Summary: Group skipped files by reason with example filenames in batch summary
- Pure Mode Warning: Warn when
--puresilently overrides--alt/--desc/--screenshot - Mode-Specific Cleaner Prompt:
{mode_rules}template variable in cleaner prompt — standard mode gets image placeholder rules, pure mode gets YAML frontmatter preservation rules
Fixed
- URL Processors: Respect
--pure/--llm/--keep-baseflags for base.mdoutput in both single and batch URL processing - Pure Mode Frontmatter:
process_with_llmusesclean_document_pure()instead ofprocess_document()in pure mode, preventing LLM-generated frontmatter (description, tags, etc.) - Source Frontmatter Reconstruction: Reconstruct original YAML frontmatter from defuddle metadata before sending to LLM in pure mode
- Vision Prompt Drift: Add placeholder REMINDER to vision prompts to reduce LLM drift on
__MARKITAI_IMG_N__placeholders - Stabilization Dedup: Deduplicate stabilization calls and add
paged_stabilizedguard - Vision JSON Mode: Fix wrong message index in vision
json_modeand race condition in parallel gather - Misc Fixes: Frontmatter regex, env variable quoting, Ctrl+C handling, hardcoded weight, docstring corrections
- SVG as Image-Only: Treat SVG as image-only format in batch mode
Changed
- Output Strategy: LLM mode skips writing base
.mdby default (use--keep-baseto override) - Test Performance: Optimize test suite speed (~70s → ~30s)
0.10.0 - 2026-03-12
Added
- Auto-detect LLM Providers: When no
markitai.jsonconfig exists, automatically detect available providers from environment variables and authenticated CLI tools (Claude CLI, Copilot CLI, Gemini CLI, ChatGPT OAuth) - Shared Provider Detection: Extract provider detection into
cli/providers_detect.pyshared module for reuse across interactive and non-interactive modes
Changed
- Interactive Mode UX: Separate OCR and screenshots from LLM features into independent "Additional options" prompt, since they are local processing capabilities (RapidOCR, Playwright) that don't require LLM
- Feature Display: Unified
build_feature_str()inui.pyseparates LLM features from local features with|delimiter (e.g.,LLM alt desc | OCR screenshot) - Interactive Mode Flow: Show configured models after user confirms LLM enablement, not before; warn when no provider detected
- Dependencies: Raise minimum constraints to match tested versions (pymupdf4llm >=1.27.2, litellm >=1.82.0, pydantic >=2.12.0, pytest >=9.0.0, ruff >=0.15.0)
- CLI Flags:
-vis now--verbose(was--version),-Vis now--version
Fixed
- Image Alt Text Language: Strip YAML frontmatter before extracting document context for image analysis, so alt text matches the document's actual language instead of defaulting to English
- Interactive Provider Display: Show actual configured models from config file instead of auto-detected provider name
- URL Processor Feature Display: Add missing OCR to URL processor dry-run features list
- Cold Startup Performance: Lazy imports in
cli/,processors/, andworkflow/__init__.pyreduce cold startup from ~5s to ~0.3s
Removed
- Language Field: Remove LLM-generated
languagefield from Frontmatter model — LLM should only generatedescriptionandtags, not infer extra metadata
0.9.2 - 2026-03-11
Fixed
- Copilot/Claude Login: Revert subprocess output interception for copilot/claude-agent login — always use inherited stdio so the CLI sees a real TTY, fixing credential storage failures
- Login Output Display: Detect URL and device code on the same line (copilot outputs both together); track externally-printed lines for clean erasure after login
- Error Message Clarity: Fix
format_error_messagefollowing__context__(implicit exception chain) to wrapper exceptions like tenacityRetryError, replacing informative provider errors with opaque<Future at 0x...>messages in logs; now only follows__cause__(explicitraise X from Y) - Error Message Consistency: Use
format_error_messagein CLI catch-all handlers (file.py,workflow/core.py) to prevent opaque chained exception messages reaching users
Added
SubprocessInterceptorURL+code same-line formatting for copilot device code flowOutputManager.track_external_lines()for tracking terminal output from inherited-stdio subprocesses
0.9.1 - 2026-03-09
Fixed
- Provider Auth Preflight: Add
can_attempt_login()guard to skip login prompt when provider SDK is missing; fix Rich markup swallowing[gemini-cli]viaescape(); fix "Login failed: Login failed:" duplication - Install Scripts Extras Parsing: Fix greedy regex (
\[.*\]→\[[^]]*\]) that captured TOML outer brackets, corrupting extras names likegemini-cli}] - Install Scripts Resilience: Progressive fallback when full extras install fails (retry without SDK-dependent extras); fix
set -esilent exit onuv tool installfailure; fix PowerShell 5.xJoin-Path3-arg incompatibility - Install Scripts Extras Strategy: Merge-based finalize (no longer replaces manually tracked extras); generic receipt parsing (future-proof for new extras)
Added
markitai doctor --suggest-extrasas single source of truth for install scripts to query recommended extrascan_attempt_login()provider guard withget_auth_resolution_hint()fallback messages- i18n key
not_foundfor zh-CN and en in both setup scripts
0.9.0 - 2026-03-09
Added
- Fetch Strategy Priority: Configurable global and per-domain strategy ordering via
strategy_priorityinpolicyanddomain_profiles - Domain/IP Exemption:
local_only_patternsconfig field restricts specified domains/IPs to local-only strategies (static, playwright) — supports exact domain, suffix (.internal.com), wildcard (*.internal.com), IP, and CIDR notation (10.0.0.0/8,fd00::/8) - NO_PROXY Integration:
inherit_no_proxy(default: true) automatically mergesNO_PROXYenvironment variable patterns into local-only exemptions - Fetch Security Feature: README documentation for the new information security compliance capabilities
Fixed
- LLM Language Consistency: Strengthened 5 prompt templates to prevent language translation when fetching mixed-language content (e.g., English UI + Chinese body) — LLM now determines output language from body text, not UI elements
0.8.1 - 2026-03-06
Added
- Defuddle Fetch Strategy: New
defuddlestrategy (GET https://defuddle.md/<url>) as top-priority URL fetch method — free, no auth, returns clean Markdown with YAML frontmatter (title, author, published, description, word_count, domain) - Aggressive Strategy Ordering: Default ordering changed to
defuddle → jina → static → playwright → cloudflare(both default and SPA scenarios) - CLI
--defuddleFlag: Force defuddle-only URL fetching (mutually exclusive with--playwright,--jina,--cloudflare) - DefuddleConfig: Configurable timeout and RPM rate limiting (conservative defaults for undocumented API limits)
Changed
- FetchPolicyEngine: Simplified ordering logic — removed
has_jina_keybranching; defuddle+jina always first - max_strategy_hops: Default increased from 4 to 5 to accommodate the new strategy
0.8.0 - 2026-03-06
Added
- Extended Format Support: 20+ new file formats via markitdown and kreuzberg converters
- Markitdown-based: HTML/HTM/XHTML, CSV, EPUB, MSG, IPYNB (Jupyter Notebook), Apple Numbers
- Kreuzberg-based (optional dependency): TSV, XML, ODS, ODT, SVG, RTF, RST, ORG, TEX, EML
- Kreuzberg is a pure Rust wheel — install with
uv pip install markitai[kreuzberg]
- Extended Image Support: GIF, BMP, TIFF now supported by ImageConverter; BMP/TIFF auto-converted to PNG for LLM vision APIs
- LLM Vision Format Helpers:
is_llm_supported_image(),get_llm_effective_mime()inutils/mime.pyfor transparent BMP/TIFF → PNG handling
Fixed
- Claude Agent SDK v0.1.46 compatibility: Removed deprecated
allow_dangerously_skip_permissionsparameter (permission_mode="bypassPermissions"is sufficient) - i18n test isolation: Fixed global state leak in
test_i18n.pycausing 3 integration tests to fail when run in full suite - Import-time log leakage: Kreuzberg registration logs changed from
logger.debugtologger.traceto prevent terminal noise before CLI log setup
Changed
- Converter registry: New
FileFormatenum members for all added formats; kreuzberg registers as gap-filler (only for formats without native converters) - Test fixtures: Renamed to consistent
sample.*naming convention; added fixtures for all new formats; removed orphanedsample.mobi - Markitdown lazy init:
MarkItDown()inmarkitdown_ext.pynow initialized on first use instead of import time
0.7.0 - 2026-03-05
Added
- ChatGPT Provider (
chatgpt/): Subscription-based provider using ChatGPT OAuth Device Code Flow and Responses API. No extra SDK required — uses LiteLLM's built-in authenticator. Models:chatgpt/gpt-5.2,chatgpt/codex-mini, etc. - Gemini CLI Provider (
gemini-cli/): Uses Google's Gemini CLI OAuth credentials (~/.gemini/oauth_creds.json) with automatic token refresh. Optional SDK:uv add markitai[gemini-cli]. Models:gemini-cli/gemini-2.5-pro,gemini-cli/gemini-2.5-flash, etc. - Weight=0 Model Disabling: Setting
weight: 0in model config now explicitly disables the model (excluded from routing). Useful for temporarily disabling models without removing config. - Interactive Mode Enhancements: Updated onboarding wizard with ChatGPT and Gemini CLI provider options
Fixed
- ZeroDivisionError in Router: Models with
weight=0are now filtered before LiteLLM Router creation, preventingdivision by zeroinsimple-shufflerouting strategy when all selected models have zero weight - Router Weight Selection:
_select_modelfallback usesrandom.choice()instead ofrandom.uniform(0, 0)when all models have zero weight
Changed
- Weight Field Semantics:
weightfield description updated to clarify that 0 = disabled. Minimum value enforced at 0 (negative weights rejected by validation)
0.6.1 - 2026-03-05
Fixed
- Claude Agent SDK compliance: Add
allow_dangerously_skip_permissions=Truewhen usingbypassPermissions, pass system messages via SDK'ssystem_promptparameter instead of XML tags, setadditionalProperties: falsein JSON object schema - Auth pre-check gaps: Detect
GH_TOKEN/GITHUB_TOKENenv vars as valid Copilot authentication, detectCLAUDE_CODE_USE_BEDROCK/VERTEX/FOUNDRYenv vars as valid Claude authentication - Resolution hints: Include env var alternatives in authentication error messages
Changed
- Docs: Update configuration guide and ai-tools-setup with env var auth methods
0.6.0 - 2026-03-04
Added
- Cloudflare Integration: Unified cloud backend with two capabilities:
- Browser Rendering:
--cloudflareflag for cloud-based URL rendering via CF/markdownAPI, with rate limiting, cache TTL, and advanced params (user_agent,cookies,wait_for_selector,http_credentials) - Workers AI toMarkdown: Cloud-based document conversion for PDF/XLSX/DOCX/PPTX (converter backend)
- Browser Rendering:
- Fetch Policy Engine (
fetch_policy.py): Policy-driven strategy ordering with domain-specific profiles, session persistence, and adaptive targeting - Domain Profiles: Per-domain fetch config (
wait_for_selector,wait_for,extra_wait_ms,prefer_strategy) inmarkitai.json - Playwright Session Persistence:
session_mode(isolated/domain_persistent) andsession_ttl_secondsfor reusing browser contexts across requests - Static HTTP Abstraction (
fetch_http.py): Pluggable HTTP backend withhttpx(default) andcurl-cffi(TLS fingerprint impersonation) viaMARKITAI_STATIC_HTTPenv var - Content Validation Gate: All fetch strategies now validate content quality before accepting results
api_baseenv: syntax:"api_base": "env:MY_BASE_URL"in model config for environment variable expansion- CF Markdown for Agents: Content negotiation via
Accept: text/markdownheader for Cloudflare-enabled sites
Changed
- Vision Router Fallback: When all vision models are disabled (
weight=0), falls back to main router with warning instead of crashing - Playwright UTF-8 Encoding: Force UTF-8 for HTML-to-Markdown conversion to prevent encoding errors
- Integration Test Resilience: Cloudflare integration tests now skip on rate limit (429) instead of failing
Fixed
- ZeroDivisionError in Vision Router: Models with
weight=0(disabled) are now filtered out before litellm Router creation, preventingdivision by zeroinsimple-shufflerouting strategy - Dead Code Cleanup: Removed 21 dead functions/classes across 15+ files (backward compat aliases, deprecated functions, unused constants)
Removed
_html_to_text,_normalize_bypass_list,_get_proxy_bypass,get_proxy_for_url,_url_to_session_idfromfetch.pysanitize_error_messagefromsecurity.py_deep_update,get_configfromconfig.pyorder_dict_keys_sorted,_order_image_entryfromjson_order.pyreset_consolesfromconsole.pyget_llm_not_configured_hintfromhints.pyremove_uncommented_screenshots,_UNCOMMENTED_SCREENSHOT_REfromllm/content.pyget_pending_urls,finish_url_processingfrombatch.pyLLMUsageAccumulatorfromworkflow/helpers.pyDEFAULT_LOG_PANEL_MAX_LINESfromconstants.py- Multiple backward-compatibility aliases from
cli/processors/
0.5.2 - 2026-02-07
Fixed
- SQLite ResourceWarning: Close SQLite connections explicitly via
_connect()context manager, preventingResourceWarning: unclosed databaseon Python 3.13 - Windows path handling:
context_display_name()now handlesC:/forward-slash Windows paths (was only handlingC:\) - Windows install hints:
markitai doctorshows platform-specific install commands (PowerShell/winget on Windows, curl on Unix) - OAuth token expiry:
markitai doctorno longer reports "Token expired" when a valid refresh token exists - Config get output:
markitai config getrenders Pydantic models as formatted JSON with syntax highlighting instead of raw Python repr - Copilot ProviderError: Added missing
providerkwarg when raisingProviderErrorfor unsupported models - Pyright warnings: Resolved all Pyright warnings (lazy
__all__, type narrowing, optional imports)
Changed
- 26 documentation fixes: Comprehensive audit fixing docstring-to-code mismatches across all modules (llm, providers, converter, utils, config)
0.5.1 - 2026-02-07
Added
- Playwright auto-scroll: Auto-scroll pages to trigger lazy-loaded content before extraction (up to 8 steps, inspired by baoyu-skills url-to-markdown)
- DOM noise cleanup: Remove navigation, ads, cookie banners, popups, and inline event handlers before content extraction
python -m markitai: Add__main__.pyfor-minvocation support (fixes Windows execution)- Multi-provider detection: Interactive mode (
-I) now detects and displays all available LLM providers (DeepSeek, OpenRouter included) - Copilot GPT-5 series support: GPT-5, GPT-5.1, GPT-5.2, GPT-5.1-Codex-Mini/Max, GPT-5.2-Codex now fully supported via Copilot provider
- 22 new unit tests: Vision fallback strategies, smart_truncate edge cases, content protection roundtrip, cache fingerprint collision resistance, batch thread safety
Changed
- Default models modernized: Updated outdated defaults across init/interactive/doctor (haiku→sonnet, gpt-4o→gpt-5.2, gemini-2.0→2.5, claude-sonnet-4→4.5)
- Init wizard: Multi-provider default selection, API keys stored in
.envinstead of plaintext config, next-steps hints after completion - LLM code deduplication:
document.pynow delegates_protect_image_positions/_restore_image_positionstocontent.pyshared functions - Cache fingerprint: SHA256 over full content + page structure replaces
text[:1000]prefix-based cache keys, preventing collisions for documents with identical prefixes - Batch thread safety: Double-checked locking with timeout-based lock acquisition (5s) replaces non-blocking
acquire(blocking=force) - LiteLLM model database: Refreshed with 35 new models including Claude Opus 4.6
Fixed
- DOM cleanup JS syntax error: Selectors with double quotes (e.g.,
[role="banner"]) now properly escaped viajson.dumps()instead of f-string interpolation - Copilot model blocklist: Removed outdated GPT-5 series from
UNSUPPORTED_MODELS(only o1/o3 reasoning models remain blocked) - CLI provider display: Truncate provider list with
(+N more)when >3 detected to prevent line overflow
0.5.0 - 2026-02-06
Added
- Unified UI system: New
ui.pycomponents andi18n.pymodule with Chinese/English support across all CLI commands markitai init: One-stop setup wizard — checks dependencies, detects LLM providers, generates config- Interactive mode (
-I): Guided setup with questionary prompts for new users doctor --fix: Auto-install missing components (e.g., Playwright)- Cross-platform install hints: Platform-specific installation commands in doctor output
MARKITAI_LOG_FORMAT: Environment variable override for log format- JSON repair: Fallback parser for malformed LLM JSON responses using
json_repair
Changed
Performance
- CLI startup: Lazy-load processor and command modules (~3x faster
--help) - Dependency checks: Parallelized doctor and init with
ThreadPoolExecutor - LLM processing: Pre-compiled regex patterns and batched replacements
- PDF rendering: Parallel page rendering for standard and LLM modes
- URL fetching: Async-safe cache locking for concurrent requests
- Executor: Auto-detect heavy task limit based on system RAM
- Image processing: Offloaded CPU-intensive work to thread pool
- Cache stats: Merged stats and model breakdown into single SQLite query
Refactoring
- Batch UI: Replaced Rich table/LogPanel with compact unified UI (progress bar with current file, completion summary)
- Log format: Default changed to human-readable text (was JSON)
- LLM cache: Deduplicated
SQLiteCache/PersistentCacheintollm/cache.py - Single file output: Layered output with
--verbosefor detailed logs - Setup scripts: Consolidated 10 scripts into 2 unified files (
setup.sh+setup.ps1) with built-in i18n
Fixed
- Windows: LibreOffice detection with fallback to
Program Filespaths (not just PATH) - Windows: FFmpeg/CLI path display — show "installed" instead of long winget package paths
- Windows:
config pathalignment with dynamic padding and continuous│column - Playwright: Default
wait_forchanged todomcontentloaded(wasnetworkidle, caused hangs) - Config: Schema and function defaults synced with constants
- Exceptions: Preserved exception chains (
raise from) across codebase - Cache: Prevented stale
markitai_processedtimestamp on cache hit - CLI: Version flag reverted to
-v/--version,--verbosekept without short flag
CI
- Added Windows LibreOffice install step (
choco) to CI matrix - Changed to
--all-extrasfor comprehensive dependency testing - Publish workflow: split unit/integration tests with
SKIP_LLM_TESTS
0.4.2 - 2026-02-03
Changed
- Playwright defaults:
wait_forchanged tonetworkidle,extra_wait_msto 5000ms for better SPA support - Frontmatter validation: Pydantic validators reject empty description/tags, triggering Instructor auto-retry
- VitePress: Upgraded to 2.0.0-alpha.16
Fixed
- X/Twitter content: Pages now wait for full JS rendering before capture
- Cache directories: All caches now respect
cache.global_dirconfig instead of hardcoded paths - Setup scripts: Improved piped execution (
curl | sh), proper Playwright installation paths - Config init: Added
--yes/-yflag for non-interactive use
0.4.1 - 2026-02-02
Added
markitai doctor: New diagnostic command for system health and auth status checking- Adaptive timeout: Local providers auto-adjust timeout based on request complexity
- Prompt caching: Claude Agent caches long system prompts (>4KB) for cost reduction
Changed
check-depsrenamed todoctor(old name kept as alias)- Improved error messages with resolution hints for local providers
Fixed
- Request timeouts on large documents with Claude Agent / Copilot
- JSON extraction issues with control characters and markdown code blocks
0.4.0 - 2026-01-28
Added
- Claude Agent SDK:
claude-agent/sonnet|opus|haikuvia Claude Code CLI - GitHub Copilot SDK:
copilot/claude-sonnet-4.5|gpt-4o|o1models - URL HTTP caching: ETag/Last-Modified conditional requests
- Quiet mode:
--quiet/-qflag (auto-enabled for single file) - Module refactoring:
cli.py→cli/,llm.py→llm/, newproviders/ - Setup scripts hardening: default N for high-impact ops, version pinning
- Docs: CONTRIBUTING.md, architecture.md, ai-tools-setup.md, dependabot.yml
Changed
- Python 3.13, docs reorganized to
docs/archive/ - agent-browser locked to 0.7.6 (Windows bug in 0.8.x)
- Default
extra_wait_ms: 1000 → 3000, Instructor mode:JSON→MD_JSON
Fixed
- Windows: UTF-8 console, Copilot CLI path discovery, script argument quoting
- LLM: Frontmatter regex fallback,
sourcefield fix, vision/frontmatter error handling - Prompts: Enhanced prompt leakage prevention, placeholder protection rules
- Content: Social media cleanup rules (X/Twitter, Facebook, Instagram)
- Setup: WSL detection, Python pymanager support, PATH refresh order
0.3.2 - 2026-01-27
Added
- Chinese README (
README_ZH.md) with language toggle - Chinese setup scripts:
setup-zh.sh,setup-zh.ps1,setup-dev-zh.sh,setup-dev-zh.ps1
Changed
- Improved setup scripts with better error handling and user feedback
- Updated Python version note: 3.11-3.13 (3.14 not yet supported)
- Updated documentation language toggle links
0.3.1 - 2026-01-27
Fixed
Prompt Leakage Prevention
- Split all prompts into
*_system.md(role definition) and*_user.md(content template) - Added
_validate_no_prompt_leakage()to detect and handle prompt leakage in LLM output - Updated LLM calls to use proper
[{"role": "system"}, {"role": "user"}]message structure
LLM Compatibility
- Fixed
max_tokensexceeding deepseek limit by using minimum across all router models - Fixed terminal window popup on Windows when running agent-browser verification
URL Fetching
- Improved error messages for browser fetch timeout (no longer suggests installing when already attempted)
- Added auto-proxy detection for Jina API and browser fetching
- Checks environment variables:
HTTPS_PROXY,HTTP_PROXY,ALL_PROXY - Auto-detects local proxy ports: 7890 (Clash), 10808 (V2Ray), 1080 (SOCKS5), etc.
- Checks environment variables:
Added
SPA Domain Learning
- New
SPADomainCachefor automatic detection and caching of JavaScript-heavy sites markitai cache spa-domainscommand to view/manage learned domainsmarkitai cache clear --include-spa-domainsoption
Windows Performance Optimizations
- Thread pool optimization: Windows defaults to 4 workers (vs 8 on Linux/macOS)
- ONNX Runtime global singleton with preheat for OCR engine
- OpenCV-based image compression (releases GIL, 20-40% faster)
- Batch subprocess execution for agent-browser commands
Changed
- Default image quality: 85 → 75
- Default image max_height: 1080 → 99999 (effectively unlimited)
- Default image min_area filter: 2500 → 5000
- Default URL concurrency: 3 → 5
- Default scan_max_depth: 10 → 5
- Extended fallback_patterns with more social media domains
0.3.0 - 2026-01-26
Added
URL Conversion Support
- Direct URL conversion:
markitai <url>converts web pages to Markdown - URL batch processing: Support
.urlsfile format (text or JSON), auto-detected from input - URL image downloading:
download_url_images()with concurrent downloads (5 parallel) - Automatic relative URL resolution for images
- Cross-platform filename sanitization (Windows illegal characters handling)
Multi-Source URL Fetching (fetch.py)
- Three fetch strategies:
--static/--agent-browser/--jinastatic: MarkItDown direct HTTP fetch (default, fastest)browser: agent-browser headless rendering (for JS-heavy pages)jina: Jina Reader API (cloud-based, no local deps)auto: Smart fallback (static → browser/jina if JS detected)
- FetchCache: SQLite-based URL cache with LRU eviction (100MB default)
- Screenshot capture:
--screenshotfor full-page screenshots via browser - Multi-source content: Parallel static + browser fetch with quality validation
- Domain pattern matching for auto-browser fallback (x.com, twitter.com, etc.)
FetchResultwithstatic_content,browser_content,screenshot_path
agent-browser Integration
- Headless browser automation via
agent-browserCLI - Configurable wait states:
load,domcontentloaded,networkidle - Extra wait time for SPA rendering (
extra_wait_ms) - Session isolation for concurrent fetches
verify_agent_browser_ready()with cached readiness check- Screenshot compression with Pillow (JPEG quality + max height)
URL LLM Enhancement
- New
prompts/url_enhance.mdfor URL-specific content cleaning - Multi-source LLM processing: combine static + browser + screenshot
- Smart content selection based on validity detection
Cache Enhancements
--no-cache-for <pattern>: Selective cache bypass with glob patterns- Single file:
--no-cache-for file1.pdf - Glob pattern:
--no-cache-for "*.pdf" - Mixed:
--no-cache-for "*.pdf,reports/**"
- Single file:
markitai cache stats -v: Verbose mode with detailed cache entries--limit N: Control number of entries in verbose output (default: 20)--scope project|global|all: Filter cache statistics by scopeSQLiteCache.list_entries(): List cache entries with metadataSQLiteCache.stats_by_model(): Per-model cache statistics- Improved cache hash: head + tail + length algorithm for better invalidation
Workflow Core Refactor (workflow/core.py)
ConversionContext: Unified single-file conversion contextconvert_document_core(): Main conversion pipelinevalidate_and_detect_format()→convert_document()→process_embedded_images()write_base_markdown()→process_with_vision_llm()/process_with_standard_llm()
- Parallel document + image processing with proper dependency handling
- Alt text injection after LLM processing completes (race condition fix)
Official Website
- VitePress 2.x documentation site with bilingual support (English/Chinese)
- Custom theme with brand colors matching logo
- Local search integration
- GitHub Actions auto-deployment to GitHub Pages
Project
- MIT License: Added LICENSE file
CI/CD
.github/workflows/ci.yml: Automated testing on push/PR.github/workflows/deploy-website.yml: Website deployment to GitHub Pages
Code Architecture
- New
utils/paths.py:ensure_dir(),ensure_subdir(),ensure_assets_dir() - New
utils/mime.py:get_mime_type(),get_extension_from_mime() - New
utils/text.py:normalize_markdown_whitespace(), text utilities - New
utils/executor.py:run_in_executor()with shared ThreadPoolExecutor - New
utils/output.py: Output formatting helpers - New
json_order.py: Ordered JSON serialization for reports/state files - New
urls.py:.urlsfile parser (JSON and plain text formats) LLMUsageAccumulatorclass for centralized cost trackingcreate_llm_processor()factory function- Unified
detect_language()withget_language_name()helper - Centralized
IMAGE_EXTENSIONS,JS_REQUIRED_PATTERNSconstants
Configuration
supports_visionnow optional: Auto-detected from litellm when not explicitly set- No need to manually configure for most models (GPT-4o, Gemini, Claude, etc.)
- Explicit
supports_vision: true/falseoverrides auto-detection if needed
Changed
Package Rename
markit→markitai: Package renamed for clarity- CLI command remains
markitai
Python Version
- Python 3.11+ support: Lowered minimum Python version from 3.13 to 3.11
CLI Behavior
- Single file mode: Direct stdout output (no logging by default)
--verbose: Show logs before output in single file mode- Batch processing behavior unchanged
Code Quality
- Refactored PowerShell COM conversion scripts (~18% code reduction)
- Unified MIME type mapping across codebase
- Extracted common fixtures to
conftest.py - Improved error messages for network failures (SSL/connection/proxy)
- Architecture diagram updated in
docs/spec.md
Fixed
- URL filename cross-platform compatibility
- Cache invalidation for large documents (tail changes now detected)
- Image analysis race condition with
.llm.mdfile writing
0.2.4 - 2026-01-21
Changed
- Restructured
assets.jsonformat with flat asset array - Extract Live display management for early log capture
- Improved MS Office detection with file path fallback
Fixed
- Add openpyxl FileVersion compatibility patch
- Add pptx XMLSyntaxError compatibility patch
- Enhanced
check_symlink_safetywith nested symlink detection - LLM empty response retry logic
normalize_frontmatterfor consistent YAML field order
0.2.3 - 2026-01-20
Added
Persistent LLM Cache
- SQLite-based cache with LRU eviction and size limits (default 1GB)
- Dual-layer lookup: project cache + global cache
CacheConfiginMarkitaiConfigwith enabled/no_cache/max_size options--no-cacheCLI flag: Skip reading but still write (Bun semantics)markitai cache stats [--json]: View cache statisticsmarkitai cache clear [--scope]: Clear cache by scope
Vision Router Optimization
- Smart router selection: auto-detect image content in messages
vision_routerproperty filtering onlysupports_vision=truemodels- Replace hardcoded "vision" model name with "default" + smart routing
Legacy Office Conversion
- MS Office COM batch conversion: one app launch per file type
check_ms_word/excel/powerpoint_available()registry-based detection- Pre-convert legacy files before batch processing to reduce overhead
Performance (Phase 3)
- Parallel PDF processing: Concurrent page OCR & rendering
- Parallel image processing:
ProcessPoolExecutorfor CPU-bound compression - Adaptive worker count based on file size
- LRU eviction and byte-size limits for image cache
- Batch semaphore for memory pressure control
Changed
- OCR optimization:
recognize_numpy()andrecognize_pixmap()for direct array processing - Reuse already-rendered pixmap in PDF OCR (avoid re-rendering)
Fixed
- EMF/WMF format detection and PNG conversion support
DATA_URI_PATTERNregex for hyphenated MIME types (x-emf, x-wmf)- Base64 stripping: remove hallucinated images instead of replacing
- Batch timing: record
start_atbefore pre-conversion for accurate duration - Pyright venv detection: add venvPath/venv to pyproject.toml
0.2.2 - 2026-01-20
Added
constants.pymodule to consolidate hardcoded values- Unit tests for image and llm modules
convert_to_markdown.pyreference script
Changed
- Centralized constants usage across config.py, llm.py, batch.py, image.py
- Improved LLM content restoration with garbage detection logic
- Enable parallel batch processing for image analysis
- Move state saving outside semaphore to reduce blocking
Fixed
- Rich Panel markup parsing issue (escape file paths)
0.2.1 - 2026-01-20
Added
LLM Usage Tracking
- Context-based usage tracking (per-file instead of global)
get_context_cost()andget_context_usage()for per-file stats- Thread-safe lock for concurrent access to usage dictionaries
Type System
types.pywith TypedDict definitions (ModelUsageStats, LLMUsageByModel, AssetDescription)ImageAnalysis.llm_usagefor multi-model tracking (renamed frommodel)
Model Configuration
get_model_max_output_tokens()using litellm.get_model_info()- Auto-inject max_tokens with fallback to conservative default (8192)
Office Detection
utils/office.pymodule with cross-platform detectionhas_ms_office(): Windows COM-based MS Office detectionfind_libreoffice(): PATH + common paths search with@lru_cache
Image Processing
strip_base64_images()methodremove_nonexistent_images()to clean LLM-hallucinated references- Normalize whitespace for standalone image
.llm.mdoutput
Changed
- File conflict rename strategy:
.2.md→.v2.mdfor natural sort order - Batch state: add
screenshotsfield (separate from embedded images) - Batch state: add
log_filefield for run traceability - Store file paths as relative to input_dir in batch state
0.2.0 - 2026-01-19
Added
- Monorepo architecture with uv workspace (
packages/markitai/) - LiteLLM integration for unified LLM provider access
- New converter modules:
pdf,office,image,text,legacy - Workflow system for single file processing (
workflow/single.py) - Markdown-based prompt management system (
prompts/*.md) - Unified config with JSON schema validation (
config.schema.json) - Security module for path validation (
security.py) - Comprehensive test suite with fixtures
Changed
- CLI rewritten with Click (replaced Typer)
- Requires Python 3.13+
Removed
- Old
src/markitai/structure and all legacy code - Complex pipeline/router/state machine architecture
- Individual LLM provider implementations (OpenAI, Anthropic, etc.)
- Docker and CI scripts (to be re-added later)
Breaking Changes
- Configuration format changed (see migration guide)
- CLI command syntax updated
- Python 3.12 and below no longer supported
0.1.6 - 2026-01-14
Fixed
- Model routing strategy bugs
- Documentation accuracy improvements
0.1.5 - 2026-01-13
Changed
- Refactored prompt management system for better maintainability
- Simplified cleaner module logic
0.1.4 - 2026-01-13
Fixed
- JSON parsing edge cases in LLM responses
- Log formatting improvements for readability
0.1.3 - 2026-01-12
Added
- Test coverage improved to 81%
Changed
- Adopted
srclayout for project structure - Reorganized documentation to
docs/reference/ - Added GitHub Actions CI workflow
Fixed
- Provider-specific bugs in fallback handling
0.1.2 - 2026-01-12
Added
- Resilience features for network failures (retry logic, timeout handling)
CLAUDE.mdandAGENTS.mddocumentation for AI assistants
Changed
- Log optimization for cleaner, more informative output
0.1.1 - 2026-01-11
Changed
- Major architecture refactoring with service layer pattern
- Enhanced LLM support with better error handling and retries
0.1.0 - 2026-01-10
Added
Capability-Based Model Routing
required_capabilityandprefer_capabilityparameters for LLM calls- Text tasks prioritize text-only models for cost efficiency
- Vision tasks automatically use vision-capable models
- Backward compatible: parameters default to None (round-robin behavior)
Lazy Model Initialization
- Providers loaded on-demand instead of all at startup
- Significantly reduced initialization time for single-file conversions
warmup()method for batch mode to validate providers upfrontrequired_capabilitiesparameter ininitialize()
Concurrent Fallback Mechanism
- Primary model timeout triggers parallel backup model execution
- Neither model is interrupted - first response wins
- Configurable via
llm.concurrent_fallback_timeout(default: 180s) - Handles Gemini 504 timeout scenarios gracefully
Execution Mode Support
--fastflag for speed-optimized batch processing- Fast mode: skips validation, limits fallback attempts, reduces logging
- Default mode: full validation, detailed logging, comprehensive retries
- Configurable via
execution.modein config file
Enhanced Statistics
BatchStatsclass for comprehensive processing metrics- Per-model tracking: calls, tokens, duration, estimated cost
ModelCostConfigfor optional cost estimation- Summary format: "Complete: X success, Y failed | Total: Xs | Tokens: N"
Changed
- CLI architecture refactored for better modularity
- Config format migrated from JSON to YAML
0.0.1 - 2026-01-08
Added
- Initial release
- CLI commands:
convert,batch,config,provider - Multi-format support: Word (.doc, .docx), PowerPoint (.ppt, .pptx), Excel (.xls, .xlsx), PDF, HTML
- LLM enhancement: markdown formatting, frontmatter generation, image alt text
- 5 LLM providers with fallback: OpenAI, Anthropic, Gemini, Ollama, OpenRouter
- 3 PDF engines: pymupdf4llm (default), pymupdf, pdfplumber
- Image processing: extraction, compression (oxipng/mozjpeg), LLM analysis
- Batch processing with resume capability and concurrency control
- Unit and integration tests
- Docker multi-stage build
- Chinese and English documentation