Documentation

ParseJet Documentation

Name: ParseJet
Author: ParseJet

ParseJet extracts text from any file or URL. One API call handles PDF, DOCX, YouTube, web pages, images, audio, video, and 25+ more formats.

Quick Start

Get your first parse result in under 60 seconds. No signup required.

Try it instantly

Paste any URL into ParseJet — no API key needed for your first 3 requests per day.

curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Get your API key

# Add your API key to requests
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Use the result

Every response returns the same JSON structure regardless of input format:

{
  "text": "Extracted text content...",
  "title": "Document Title",
  "source_type": "webpage",
  "metadata": { "url": "https://example.com" }
}

Authentication

ParseJet offers three levels of access. You can start using the API immediately without any authentication.

Level	How to access	Rate limit	Best for
Anonymous	No headers	3/day, 2MB	Quick testing
Session	Sign in (cookie)	10/day, 5MB	Dashboard tool
API Key	`Authorization: Bearer pj_xxx`	By plan	Production

Tip: You don't need an API key to get started. Just send requests directly — the first 3 per day are free with no signup.

Core Concepts

Supported formats

ParseJet auto-detects the format from the file extension or URL pattern. You don't need to specify the format — just send the file or URL to /v1/parse/auto and ParseJet handles the rest.

Category	Formats	Credits
Text	TXT, MD, JSON, CSV, XML, HTML	1
Documents	DOCX, PPTX, XLSX, EPUB	2
Complex	PDF, web pages, video	3
YouTube	YouTube video URLs	5
Other	Audio (MP3, WAV), images (JPG, PNG), RSS, OPML, email, notebooks	1

Credits

Each API request consumes credits based on the complexity of the format being parsed. Simple text files cost 1 credit, while YouTube transcripts cost 5. Your monthly credit allowance depends on your plan.

Output format

By default, ParseJet returns raw extracted text. Add ?output_format=markdown to any request to get post-processed output with detected headings, lists, tables, and code blocks.

Guide

Parse a PDF

Extract text from any PDF file, including scanned documents and multi-page reports.

Upload a PDF file

curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@report.pdf"

Convert to Markdown

Add output_format=markdown to preserve document structure:

curl -X POST https://api.parsejet.com/v1/parse/auto/file?output_format=markdown \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@report.pdf"

Credit cost: 3 credits per PDF. Supports files up to your plan's file size limit (10MB-200MB).

Guide

YouTube Transcripts

Get the full transcript of any YouTube video. Supports auto-generated captions in 100+ languages.

Get a transcript

curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'

Specify language

Use the language parameter for non-English videos:

curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "ja"}'

Or use auto-detect

The /v1/parse/auto/url endpoint automatically detects YouTube URLs:

curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtu.be/VIDEO_ID"}'

Credit cost: 5 credits per YouTube video. Metadata includes video_id, channel, and duration.

Guide

Web Scraping

Extract the main content from any web page. ParseJet automatically removes navigation, ads, sidebars, and boilerplate.

curl -X POST https://api.parsejet.com/v1/parse/webpage \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/blog/article"}'

Credit cost: 3 credits per web page. Returns clean text with title and source URL in metadata.

Guide

Office Documents

Parse Word (DOCX), Excel (XLSX), PowerPoint (PPTX), and CSV files. Just upload the file — ParseJet detects the format automatically.

# Works with any Office format
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@presentation.pptx"

# Also works with spreadsheets
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@data.xlsx"

Credit cost: 2 credits per document. Supported: DOCX, PPTX, XLSX, CSV.

API Reference

Response Format

All endpoints return the same JSON structure:

{
  "text": "Extracted text content...",
  "title": "Document Title",
  "source_type": "pdf",
  "metadata": { "pages": 12, "author": "Jane Doe" }
}

Field	Type	Description
text	string	The extracted text content
title	string	Document or page title
source_type	string	Format identifier (pdf, webpage, youtube, etc.)
metadata	object	Format-specific metadata (page count, author, duration, etc.)

POST

/v1/parse/auto

The recommended endpoint. Auto-detects format from file extension or URL type. Accepts file (multipart) or url (form field), not both.

curl -X POST https://api.parsejet.com/v1/parse/auto \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@document.pdf"

POST

/v1/parse/auto/url

Parse any URL. Automatically distinguishes YouTube from regular web pages.

Parameter	Type	Required	Description
url	string	yes	URL to parse
language	string	no	ISO 639-1 code for YouTube transcript language

curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

POST

/v1/parse/auto/file

Parse any uploaded file. Detects format from file extension, falls back to content-based detection.

curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@spreadsheet.xlsx"

POST

/v1/parse/webpage

Extract main content from a web page. Removes navigation, ads, and boilerplate.

Parameter	Type	Required	Description
url	string	yes	Web page URL

curl -X POST https://api.parsejet.com/v1/parse/webpage \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

POST

/v1/parse/youtube

Extract transcript from a YouTube video. Metadata includes video_id, channel, and duration.

Parameter	Type	Required	Description
url	string	yes	YouTube video URL or video ID
language	string	no	ISO 639-1 language code

curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en"}'

POST

/v1/parse/audio

Parse audio files. Supports MP3, WAV, M4A, OGG, FLAC, WebM. Max 25MB.

Field	Type	Required	Description
file	file	yes	Audio file
language	string	no	ISO 639-1 code
with_timestamps	boolean	no	Include word-level timestamps

curl -X POST https://api.parsejet.com/v1/parse/audio \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@recording.mp3" -F "language=en"

POST

/v1/parse/video

Extract audio from video for transcription. Supports MP4, MKV, AVI, MOV, WebM.

curl -X POST https://api.parsejet.com/v1/parse/video \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@lecture.mp4" -F "language=en"

POST

/v1/parse/epub

Parse EPUB ebook. Extracts text organized by chapters.

curl -X POST https://api.parsejet.com/v1/parse/epub \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@book.epub"

POST

/v1/parse/feed

Parse RSS or Atom feed. Also supports OPML via /v1/parse/opml.

curl -X POST https://api.parsejet.com/v1/parse/feed \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@feed.xml"

POST

/v1/parse/image

Analyze image. Supports JPG, PNG, GIF, BMP, WebP, TIFF. Max 20MB.

Field	Type	Required	Description
file	file	yes	Image file
prompt	string	no	Custom prompt for image analysis
model	string	no	Vision model override

curl -X POST https://api.parsejet.com/v1/parse/image \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@photo.jpg" -F "prompt=Describe this image"

POST

/v1/parse/image/ocr

Extract text from image via OCR.

curl -X POST https://api.parsejet.com/v1/parse/image/ocr \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@screenshot.png"

SDKs

Official SDKs

TypeScript / JavaScript

npm install parsejet

import { ParseJet } from "parsejet";

const client = new ParseJet({ apiKey: "pj_YOUR_KEY" });

// Parse a URL
const result = await client.parse.url("https://example.com");
console.log(result.text);

// Parse a file
const result = await client.parse.file(buffer, "report.pdf");
console.log(result.text);

Python

pip install parsejet

from parsejet import ParseJet

client = ParseJet(api_key="pj_YOUR_KEY")

# Parse a URL
result = client.parse.url("https://example.com")
print(result.text)

# Parse a file
with open("report.pdf", "rb") as f:
    result = client.parse.file(f, "report.pdf")
    print(result.text)

AI Agents

MCP Server

Use ParseJet as an MCP (Model Context Protocol) server with Claude Code, Cursor, or any MCP-compatible AI agent.

Install

npm install -g @parsejet/mcp-server

Claude Code

Add to your project's .claude/settings.json:

{
  "mcpServers": {
    "parsejet": {
      "command": "npx",
      "args": ["-y", "@parsejet/mcp-server"],
      "env": {
        "PARSEJET_API_KEY": "pj_YOUR_KEY"
      }
    }
  }
}

Cursor

Go to Settings → MCP Servers, add a new server:

{
  "mcpServers": {
    "parsejet": {
      "command": "npx",
      "args": ["-y", "@parsejet/mcp-server"],
      "env": {
        "PARSEJET_API_KEY": "pj_YOUR_KEY"
      }
    }
  }
}

Claude.ai (Remote)

For Claude.ai web, use the remote HTTP endpoint — no local install needed:

Endpoint:  https://api.parsejet.com/mcp
Transport: Streamable HTTP
Auth:      Bearer pj_YOUR_KEY (in Authorization header)

Go to Claude.ai → Settings → Integrations → Add MCP Server → Enter the URL above.

Available tools

Tool	Description
parse_url	Parse any URL (web page, YouTube, etc.)
parse_file	Parse a local file (PDF, DOCX, images, etc.)
get_youtube_transcript	Get YouTube video transcript with optional language

Rate Limits & Pricing

ParseJet uses a credit-based system. Each request consumes credits based on the format complexity.

Plan	Price	Credits/mo	RPM	Max file
Free	$0	300	5	10MB
Pro	$19/mo	3,000	30	50MB
Business	$49/mo	20,000	60	100MB
Scale	$99/mo	50,000	200	200MB
Enterprise	Custom	Custom	Custom	Custom

Response headers include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After on 429 responses.

Error Codes

All errors return JSON with error and message fields.

Status	Code	Description
400	unsupported_format	File type not supported
401	invalid_api_key	Missing or invalid API key
413	file_too_large	File exceeds plan limit
422	parse_error	File corrupted or unreadable
429	rate_limit_exceeded	RPM or daily/monthly limit hit
502	parser_unavailable	Parser backend unreachable
504	parser_timeout	Parse operation timed out