MarkItDown
Open-source pickContent, Docs & MediaConvert PDFs, Office docs, images, HTML, audio, and more to clean Markdown using Microsoft's MarkItDown.
MarkItDown package details
Open-source curation scope
This listing is a convenience wrapper for discovery, attribution, setup links, install commands, and agent guidance. This package curates public setup links, install commands, and agent instructions. It does not bundle or relicense the upstream project unless explicitly stated. Users can also go directly to the public upstream source linked on this page.
How to get started
Install RAPR AI
Download and install RAPR AI on your computer
Find in Marketplace
Open RAPR AI, go to Packages, and browse the marketplace
Install from Marketplace
Click Install. RAPR sets up the wrapper package, connector guidance, or skill instructions for this listing.
MarkItDown
Convert virtually any file format to clean, usable Markdown using MarkItDown — Microsoft's open-source Python library for document-to-Markdown conversion.
What you can do
- Convert PDFs: Extract text content from PDF files into structured Markdown.
- Office documents: Convert DOCX, XLSX, and PPTX files preserving headings, tables, and structure.
- Images with OCR: Extract text from scanned documents and photos via an LLM vision plugin.
- Audio transcription: Convert MP3, WAV, and M4A recordings to Markdown transcripts.
- Batch processing: Convert entire folders of mixed file types in a single pipeline run.
- RAG pipelines: Feed the output directly into vector stores for LLM retrieval.
Prerequisites
pip install markitdown
# or with all extras:
pip install "markitdown[all]"
Example Requests
- "Convert all PDFs in the docs/ folder to Markdown for RAG indexing."
- "Extract the text from this Excel spreadsheet as a Markdown table."
- "Transcribe this MP3 meeting recording to Markdown."
- "Convert this PPTX presentation to Markdown so I can summarize it."
- "Pull the text from this scanned PDF image using the LLM plugin."
Ready to try MarkItDown?
Download RAPR AI and connect MarkItDown in seconds.