Documentation
ParseJet Documentation
ParseJet extracts text from any file or URL. One API call handles PDF, DOCX, YouTube, web pages, images, audio, video, and 25+ more formats.
Quick Start
Get your first parse result in under 60 seconds. No signup required.
Try it instantly
Paste any URL into ParseJet — no API key needed for your first 3 requests per day.
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' Get your API key
Sign in with Google or GitHub to get a free API key. Free tier includes 300 requests per month.
# Add your API key to requests
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
-H "Authorization: Bearer pj_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' Use the result
Every response returns the same JSON structure regardless of input format:
{
"text": "Extracted text content...",
"title": "Document Title",
"source_type": "webpage",
"metadata": { "url": "https://example.com" }
} Authentication
ParseJet offers three levels of access. You can start using the API immediately without any authentication.
| Level | How to access | Rate limit | Best for |
|---|---|---|---|
| Anonymous | No headers | 3/day, 2MB | Quick testing |
| Session | Sign in (cookie) | 10/day, 5MB | Dashboard tool |
| API Key | Authorization: Bearer pj_xxx | By plan | Production |
Tip: You don't need an API key to get started. Just send requests directly — the first 3 per day are free with no signup.
Core Concepts
Supported formats
ParseJet auto-detects the format from the file extension or URL pattern. You don't need to specify the format — just send the file or URL to /v1/parse/auto and ParseJet handles the rest.
| Category | Formats | Credits |
|---|---|---|
| Text | TXT, MD, JSON, CSV, XML, HTML | 1 |
| Documents | DOCX, PPTX, XLSX, EPUB | 2 |
| Complex | PDF, web pages, video | 3 |
| YouTube | YouTube video URLs | 5 |
| Other | Audio (MP3, WAV), images (JPG, PNG), RSS, OPML, email, notebooks | 1 |
Credits
Each API request consumes credits based on the complexity of the format being parsed. Simple text files cost 1 credit, while YouTube transcripts cost 5. Your monthly credit allowance depends on your plan.
Output format
By default, ParseJet returns raw extracted text. Add ?output_format=markdown to any request to get post-processed output with detected headings, lists, tables, and code blocks.
Guide
Parse a PDF
Extract text from any PDF file, including scanned documents and multi-page reports.
Upload a PDF file
curl -X POST https://api.parsejet.com/v1/parse/auto/file \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@report.pdf"
Convert to Markdown
Add output_format=markdown to preserve document structure:
curl -X POST https://api.parsejet.com/v1/parse/auto/file?output_format=markdown \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@report.pdf"
Credit cost: 3 credits per PDF. Supports files up to your plan's file size limit (10MB-200MB).
Guide
YouTube Transcripts
Get the full transcript of any YouTube video. Supports auto-generated captions in 100+ languages.
Get a transcript
curl -X POST https://api.parsejet.com/v1/parse/youtube \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}' Specify language
Use the language parameter for non-English videos:
curl -X POST https://api.parsejet.com/v1/parse/youtube \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "ja"}' Or use auto-detect
The /v1/parse/auto/url endpoint automatically detects YouTube URLs:
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
-H "Content-Type: application/json" \
-d '{"url": "https://youtu.be/VIDEO_ID"}' Credit cost: 5 credits per YouTube video. Metadata includes video_id, channel, and duration.
Guide
Web Scraping
Extract the main content from any web page. ParseJet automatically removes navigation, ads, sidebars, and boilerplate.
curl -X POST https://api.parsejet.com/v1/parse/webpage \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/blog/article"}' Credit cost: 3 credits per web page. Returns clean text with title and source URL in metadata.
Guide
Office Documents
Parse Word (DOCX), Excel (XLSX), PowerPoint (PPTX), and CSV files. Just upload the file — ParseJet detects the format automatically.
# Works with any Office format curl -X POST https://api.parsejet.com/v1/parse/auto/file \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@presentation.pptx" # Also works with spreadsheets curl -X POST https://api.parsejet.com/v1/parse/auto/file \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@data.xlsx"
Credit cost: 2 credits per document. Supported: DOCX, PPTX, XLSX, CSV.
API Reference
Response Format
All endpoints return the same JSON structure:
{
"text": "Extracted text content...",
"title": "Document Title",
"source_type": "pdf",
"metadata": { "pages": 12, "author": "Jane Doe" }
} | Field | Type | Description |
|---|---|---|
| text | string | The extracted text content |
| title | string | Document or page title |
| source_type | string | Format identifier (pdf, webpage, youtube, etc.) |
| metadata | object | Format-specific metadata (page count, author, duration, etc.) |
/v1/parse/auto
The recommended endpoint. Auto-detects format from file extension or URL type. Accepts file (multipart) or url (form field), not both.
curl -X POST https://api.parsejet.com/v1/parse/auto \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@document.pdf"
/v1/parse/auto/url
Parse any URL. Automatically distinguishes YouTube from regular web pages.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | yes | URL to parse |
| language | string | no | ISO 639-1 code for YouTube transcript language |
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
-H "Authorization: Bearer pj_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' /v1/parse/auto/file
Parse any uploaded file. Detects format from file extension, falls back to content-based detection.
curl -X POST https://api.parsejet.com/v1/parse/auto/file \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@spreadsheet.xlsx"
/v1/parse/webpage
Extract main content from a web page. Removes navigation, ads, and boilerplate.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | yes | Web page URL |
curl -X POST https://api.parsejet.com/v1/parse/webpage \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article"}' /v1/parse/youtube
Extract transcript from a YouTube video. Metadata includes video_id, channel, and duration.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | yes | YouTube video URL or video ID |
| language | string | no | ISO 639-1 language code |
curl -X POST https://api.parsejet.com/v1/parse/youtube \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en"}' /v1/parse/audio
Parse audio files. Supports MP3, WAV, M4A, OGG, FLAC, WebM. Max 25MB.
| Field | Type | Required | Description |
|---|---|---|---|
| file | file | yes | Audio file |
| language | string | no | ISO 639-1 code |
| with_timestamps | boolean | no | Include word-level timestamps |
curl -X POST https://api.parsejet.com/v1/parse/audio \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@recording.mp3" -F "language=en"
/v1/parse/video
Extract audio from video for transcription. Supports MP4, MKV, AVI, MOV, WebM.
curl -X POST https://api.parsejet.com/v1/parse/video \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@lecture.mp4" -F "language=en"
/v1/parse/epub
Parse EPUB ebook. Extracts text organized by chapters.
curl -X POST https://api.parsejet.com/v1/parse/epub \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@book.epub"
/v1/parse/feed
Parse RSS or Atom feed. Also supports OPML via /v1/parse/opml.
curl -X POST https://api.parsejet.com/v1/parse/feed \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@feed.xml"
/v1/parse/image
Analyze image. Supports JPG, PNG, GIF, BMP, WebP, TIFF. Max 20MB.
| Field | Type | Required | Description |
|---|---|---|---|
| file | file | yes | Image file |
| prompt | string | no | Custom prompt for image analysis |
| model | string | no | Vision model override |
curl -X POST https://api.parsejet.com/v1/parse/image \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@photo.jpg" -F "prompt=Describe this image"
/v1/parse/image/ocr
Extract text from image via OCR.
curl -X POST https://api.parsejet.com/v1/parse/image/ocr \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@screenshot.png"
SDKs
Official SDKs
TypeScript / JavaScript
npm install parsejet
import { ParseJet } from "parsejet";
const client = new ParseJet({ apiKey: "pj_YOUR_KEY" });
// Parse a URL
const result = await client.parse.url("https://example.com");
console.log(result.text);
// Parse a file
const result = await client.parse.file(buffer, "report.pdf");
console.log(result.text); Python
pip install parsejet
from parsejet import ParseJet
client = ParseJet(api_key="pj_YOUR_KEY")
# Parse a URL
result = client.parse.url("https://example.com")
print(result.text)
# Parse a file
with open("report.pdf", "rb") as f:
result = client.parse.file(f, "report.pdf")
print(result.text) AI Agents
MCP Server
Use ParseJet as an MCP (Model Context Protocol) server with Claude Code, Cursor, or any MCP-compatible AI agent.
Install
npm install -g @parsejet/mcp-server
Claude Code
Add to your project's .claude/settings.json:
{
"mcpServers": {
"parsejet": {
"command": "parsejet-mcp",
"env": {
"PARSEJET_API_KEY": "pj_YOUR_KEY"
}
}
}
} Available tools
| Tool | Description |
|---|---|
| parse_url | Parse any URL (web page, YouTube, etc.) |
| parse_file | Parse a local file (PDF, DOCX, images, etc.) |
| get_youtube_transcript | Get YouTube video transcript with optional language |
Rate Limits & Pricing
ParseJet uses a credit-based system. Each request consumes credits based on the format complexity.
| Plan | Price | Credits/mo | RPM | Max file |
|---|---|---|---|---|
| Free | $0 | 300 | 5 | 10MB |
| Pro | $19/mo | 3,000 | 30 | 50MB |
| Business | $49/mo | 20,000 | 60 | 100MB |
| Scale | $99/mo | 50,000 | 200 | 200MB |
| Enterprise | Custom | Custom | Custom | Custom |
Response headers include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After on 429 responses.
Error Codes
All errors return JSON with error and message fields.
| Status | Code | Description |
|---|---|---|
| 400 | unsupported_format | File type not supported |
| 401 | invalid_api_key | Missing or invalid API key |
| 413 | file_too_large | File exceeds plan limit |
| 422 | parse_error | File corrupted or unreadable |
| 429 | rate_limit_exceeded | RPM or daily/monthly limit hit |
| 502 | parser_unavailable | Parser backend unreachable |
| 504 | parser_timeout | Parse operation timed out |