MCP server for fetch web page content using Playwright headless browser.
MCP server for fetch web page content using Playwright headless browser.
JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
Parallel Processing: The fetch_urls
tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.
Run directly with npx:
npx -y fetcher-mcp
Run with the --debug
option to show the browser window for debugging:
npx -y fetcher-mcp --debug
Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"fetcher": {
"command": "npx",
"args": ["-y", "fetcher-mcp"]
}
}
}
fetch_url
- Retrieve web page content from a specified URL
url
: The URL of the web page to fetch (required parameter)timeout
: Page loading timeout in milliseconds, default is 30000 (30 seconds)waitUntil
: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'extractContent
: Whether to intelligently extract the main content, default is truemaxLength
: Maximum length of returned content (in characters), default is no limitreturnHtml
: Whether to return HTML content instead of Markdown, default is falsewaitForNavigation
: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is falsenavigationTimeout
: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)disableMedia
: Whether to disable media resources (images, stylesheets, fonts, media), default is truedebug
: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specifiedfetch_urls
- Batch retrieve web page content from multiple URLs in parallel
urls
: Array of URLs to fetch (required parameter)fetch_url
Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:
Please wait for the page to fully load
This will use the waitForNavigation: true
parameter.
Increase Timeout Duration: For websites that load slowly:
Please set the page loading timeout to 60 seconds
This adjusts both timeout
and navigationTimeout
parameters accordingly.
Preserve Original HTML Structure: When content extraction might fail:
Please preserve the original HTML content
Sets extractContent: false
and returnHtml: true
.
Fetch Complete Page Content: When extracted content is too limited:
Please fetch the complete webpage content instead of just the main content
Sets extractContent: false
.
Return Content as HTML: When HTML format is needed instead of default Markdown:
Please return the content in HTML format
Sets returnHtml: true
.
Please enable debug mode for this fetch operation
This sets debug: true
even if the server was started without the --debug
flag.Manual Login: To login using your own credentials:
Please run in debug mode so I can manually log in to the website
Sets debug: true
or uses the --debug
flag, keeping the browser window open for manual login.
Interacting with Debug Browser: When debug mode is enabled:
Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:
Please enable debug mode for this authentication step
Sets debug: true
for this specific request only, opening the browser window for manual login.
npm install
Install the browsers needed for Playwright:
npm run install-browser
npm run build
Use MCP Inspector for debugging:
npm run inspector
You can also enable visible browser mode for debugging:
node build/index.js --debug
Licensed under the MIT License