Skip to content

devalexanderdaza/puppeteer-extends

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

puppeteer-extends

License Build Status NPM Package

Modern, factory-based management for Puppeteer with advanced navigation features. Ideal for web scraping, testing, and automation.

Features

  • 🏭 Browser Factory - Multiple browser instances with intelligent reuse
  • đź§­ Enhanced Navigation - CloudFlare bypass, retry mechanisms, and more
  • 📝 TypeScript Support - Full type definitions for better DX
  • 🔄 Singleton Compatibility - Maintains backward compatibility
  • 🛡️ Stealth Mode - Integrated puppeteer-extra-plugin-stealth

Installation

pnpm install puppeteer-extends

Basic Usage

import { PuppeteerExtends, Logger } from 'puppeteer-extends';
import path from 'path';

const main = async () => {
  // Get browser instance with options
  const browser = await PuppeteerExtends.getBrowser({
    isHeadless: true,
    isDebug: true
  });
  
  // Create new page
  const page = await browser.newPage();
  
  // Navigate with enhanced features (CloudFlare bypass, retries)
  await PuppeteerExtends.goto(page, 'https://example.com', {
    maxRetries: 2,
    timeout: 30000
  });
  
  // Take screenshot
  await page.screenshot({ 
    path: path.join(__dirname, 'screenshot.png') 
  });
  
  // Clean up
  await PuppeteerExtends.closePage(page);
  await PuppeteerExtends.closeBrowser();
};

main().catch(console.error);

Multiple Browser Instances

import { PuppeteerExtends } from 'puppeteer-extends';

const main = async () => {
  // Create two browser instances with different configurations
  const mainBrowser = await PuppeteerExtends.getBrowser({
    instanceId: 'main',
    isHeadless: true
  });
  
  const proxyBrowser = await PuppeteerExtends.getBrowser({
    instanceId: 'proxy',
    isHeadless: true,
    customArguments: [
      '--proxy-server=http://my-proxy.com:8080',
      '--no-sandbox'
    ]
  });
  
  // Work with both browsers
  const mainPage = await mainBrowser.newPage();
  const proxyPage = await proxyBrowser.newPage();
  
  // Clean up specific browser
  await PuppeteerExtends.closeBrowser('proxy');
  
  // Or close all browsers at once
  await PuppeteerExtends.closeAllBrowsers();
};

Advanced Navigation Features

import { PuppeteerExtends } from 'puppeteer-extends';

const scrape = async () => {
  const browser = await PuppeteerExtends.getBrowser();
  const page = await browser.newPage();
  
  // Navigation with retries and custom options
  const success = await PuppeteerExtends.goto(page, 'https://example.com', {
    waitUntil: ['networkidle0'],
    timeout: 60000,
    maxRetries: 3,
    retryDelay: 5000,
    isDebug: true
  });
  
  if (success) {
    // Wait for specific element with timeout
    const elementVisible = await PuppeteerExtends.waitForSelector(
      page, 
      '.content-loaded', 
      10000
    );
    
    if (elementVisible) {
      // Extract data
      const data = await page.evaluate(() => {
        return {
          title: document.title,
          content: document.querySelector('.content')?.textContent
        };
      });
      
      console.log('Scraped data:', data);
    }
  }
  
  await PuppeteerExtends.closePage(page);
  await PuppeteerExtends.closeBrowser();
};

Customizing Logger

import { Logger } from 'puppeteer-extends';
import path from 'path';

// Set custom log directory
Logger.configure(path.join(__dirname, 'custom-logs'));

// Use logger
Logger.info('Starting browser');
Logger.debug('Debug information');
Logger.warn('Warning message');
Logger.error('Error occurred', new Error('Something went wrong'));

Plugin System

puppeteer-extends v1.7.0 introduces a powerful plugin system that allows extending functionality through lifecycle hooks:

import { PuppeteerExtends, StealthPlugin, ProxyPlugin } from 'puppeteer-extends';

// Register stealth plugin for improved anti-bot protection
await PuppeteerExtends.registerPlugin(new StealthPlugin({
  hideWebDriver: true,
  hideChromeRuntime: true
}));

// Register proxy rotation plugin
await PuppeteerExtends.registerPlugin(new ProxyPlugin({
  proxies: [
    { host: '123.45.67.89', port: 8080, username: 'user', password: 'pass' },
    { host: '98.76.54.32', port: 8080 },
  ],
  rotationStrategy: 'sequential',
  rotateOnNavigation: true,
  rotateOnError: true
}));

// Continue with normal usage - plugins will be active
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
// ...

Built-in Plugins

Plugin Description
StealthPlugin Enhances anti-bot detection protection
ProxyPlugin Provides proxy rotation with multiple strategies

Creating Custom Plugins

You can create custom plugins by implementing the PuppeteerPlugin interface:

import { PuppeteerPlugin, PluginContext } from 'puppeteer-extends';

class MyCustomPlugin implements PuppeteerPlugin {
  name = 'my-custom-plugin';
  
  // Called when plugin is registered
  async initialize(options?: any): Promise<void> {
    console.log('Plugin initialized with options:', options);
  }
  
  // Called before browser launch
  async onBeforeBrowserLaunch(options: any, context: PluginContext): Promise<any> {
    // Modify launch options if needed
    return {
      ...options,
      args: [...options.args, '--disable-features=site-per-process']
    };
  }
  
  // Called when a new page is created
  async onPageCreated(page: Page, context: PluginContext): Promise<void> {
    await page.evaluateOnNewDocument(() => {
      // Execute code in the page context
    });
  }
  
  // Called before navigation
  async onBeforeNavigation(page: Page, url: string, options: any, context: PluginContext): Promise<void> {
    console.log(`Navigating to: ${url}`);
  }
  
  // Handle errors
  async onError(error: Error, context: PluginContext): Promise<boolean> {
    console.error('Plugin caught error:', error);
    return false; // Return true if error was handled
  }
  
  // Called when plugin is unregistered
  async cleanup(): Promise<void> {
    console.log('Plugin cleanup');
  }
}

// Register the custom plugin
await PuppeteerExtends.registerPlugin(new MyCustomPlugin(), { 
  customOption: 'value' 
});

Available Plugin Hooks

Hook Description
initialize Called when plugin is registered
cleanup Called when plugin is unregistered
onBeforeBrowserLaunch Before browser instance is created
onAfterBrowserLaunch After browser instance is created
onBeforeBrowserClose Before browser instance is closed
onPageCreated When a new page is created
onBeforePageClose Before a page is closed
onBeforeNavigation Before navigation to a URL
onAfterNavigation After navigation completes (success or failure)
onError When an error occurs in any operation

Session Management

puppeteer-extends v1.7.0 introduces powerful session management capabilities, allowing you to persist cookies, localStorage, and other browser state between runs:

import { PuppeteerExtends, SessionManager } from 'puppeteer-extends';

// Create a session manager
const sessionManager = new SessionManager({
  name: 'my-session',
  sessionDir: './sessions',
  persistCookies: true,
  persistLocalStorage: true,
  domains: ['example.com'] // Only store cookies for these domains
});

// Get browser and create page
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();

// Apply session data to page (cookies, localStorage, etc.)
await sessionManager.applySession(page);

// Navigate somewhere
await PuppeteerExtends.goto(page, 'https://example.com/login');

// Perform login or other actions that modify session
await page.type('#username', 'myuser');
await page.type('#password', 'mypassword');
await page.click('#login-button');

// Wait for login to complete
await PuppeteerExtends.waitForNavigation(page);

// Extract and save session data (cookies, localStorage, etc.)
await sessionManager.extractSession(page);

// Next time you run your script, the session will be loaded automatically

Using the SessionPlugin

For more integrated session management, use the built-in SessionPlugin:

import { PuppeteerExtends } from 'puppeteer-extends';
import { SessionPlugin } from 'puppeteer-extends/plugins';

// Register the session plugin
await PuppeteerExtends.registerPlugin(new SessionPlugin({
  name: 'my-session',
  persistCookies: true,
  persistLocalStorage: true,
  extractAfterNavigation: true, // Auto-save after each navigation
  applyBeforeNavigation: true   // Auto-apply before each navigation
}));

// Session management is now automatic
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();

// Navigate anywhere - session will be applied/extracted automatically
await PuppeteerExtends.goto(page, 'https://example.com');

Session Options

Option Type Default Description
sessionDir string "./tmp/puppeteer-extends/sessions" Directory to store session files
name string "default" Session name (used for file name)
persistCookies boolean true Whether to save/load cookies
persistLocalStorage boolean true Whether to save/load localStorage
persistSessionStorage boolean false Whether to save/load sessionStorage
persistUserAgent boolean true Whether to save/load userAgent
domains string[] undefined Domains to include when saving cookies
extractAfterNavigation* boolean true Save session after each navigation
applyBeforeNavigation* boolean true Apply session before each navigation

*Only available in SessionPlugin

Event System

puppeteer-extends v1.7.0 includes a powerful event system that allows you to react to various lifecycle events:

import { PuppeteerExtends, Events, PuppeteerEvents } from 'puppeteer-extends';

// Listen for browser creation
Events.on(PuppeteerEvents.BROWSER_CREATED, (params) => {
  console.log(`Browser created with ID: ${params.instanceId}`);
});

// Listen for navigation events
Events.on(PuppeteerEvents.NAVIGATION_STARTED, (params) => {
  console.log(`Navigation started to: ${params.url}`);
});

Events.on(PuppeteerEvents.NAVIGATION_SUCCEEDED, (params) => {
  console.log(`Successfully navigated to: ${params.url}`);
});

// Listen for errors
Events.on(PuppeteerEvents.ERROR, (params) => {
  console.error(`Error in ${params.source}: ${params.error.message}`);
});

// Use one-time listeners
Events.once(PuppeteerEvents.PAGE_CREATED, (params) => {
  console.log('First page created!');
});

// Normal usage of puppeteer-extends will trigger these events
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
await PuppeteerExtends.goto(page, 'https://example.com');

Supported Events

Event Type Description
BROWSER_CREATED Browser instance created
BROWSER_CLOSED Browser instance closed
PAGE_CREATED New page created
PAGE_CLOSED Page closed
NAVIGATION_STARTED Navigation to URL started
NAVIGATION_SUCCEEDED Navigation completed successfully
NAVIGATION_FAILED Navigation failed after retries
ERROR Generic error occurred
NAVIGATION_ERROR Error during navigation
BROWSER_ERROR Error in browser operations
PAGE_ERROR Error in page operations
SESSION_APPLIED Session data applied to page
SESSION_EXTRACTED Session data extracted from page
SESSION_CLEARED Session data cleared
PLUGIN_REGISTERED Plugin registered
PLUGIN_UNREGISTERED Plugin unregistered

Asynchronous Event Handling

You can use asynchronous event handlers and wait for them:

// Register async event handler
Events.on(PuppeteerEvents.NAVIGATION_SUCCEEDED, async (params) => {
  // Do something asynchronous
  await saveToDatabase(params.url);
});

// Emit event and wait for all handlers to complete
await Events.emitAsync(PuppeteerEvents.CUSTOM_EVENT, { data: 'value' });

Captcha Handling

puppeteer-extends v1.7.0 introduces sophisticated captcha detection and solving capabilities:

import { PuppeteerExtends, CaptchaService } from 'puppeteer-extends';
import { CaptchaPlugin } from 'puppeteer-extends/plugins';

// Register the captcha plugin with automatic handling
await PuppeteerExtends.registerPlugin(new CaptchaPlugin({
  service: CaptchaService.TWOCAPTCHA, // or CaptchaService.ANTICAPTCHA
  apiKey: 'your-api-key',
  autoDetect: true,   // Auto-detect captchas on page load
  autoSolve: true     // Auto-solve detected captchas
}));

// Get browser and create page
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();

// Navigate to a page - captchas will be automatically detected and solved
await PuppeteerExtends.goto(page, 'https://example.com/with-captcha');

// You can also handle captchas manually
const plugin = PuppeteerExtends.getPlugin('captcha-plugin');
const captchaHelper = plugin.getCaptchaHelper();

// Detect captchas
const captchas = await captchaHelper.detectCaptchas(page);

// Solve a specific captcha
if (captchas.recaptchaV2.length > 0) {
  const solution = await captchaHelper.solveRecaptchaV2(page, captchas.recaptchaV2[0]);
  console.log(`Solved captcha: ${solution.token}`);
}

Supported Captcha Types

Type Description
RECAPTCHA_V2 Standard and invisible reCAPTCHA v2
RECAPTCHA_V3 Score-based reCAPTCHA v3
HCAPTCHA hCaptcha challenges
IMAGE_CAPTCHA Traditional text/image captchas
FUNCAPTCHA Arkose Labs FunCaptcha
TURNSTILE Cloudflare Turnstile

Supported Services

Service Description
TWOCAPTCHA 2Captcha service
ANTICAPTCHA Anti-Captcha service

CaptchaPlugin Options

Option Type Default Description
service CaptchaService - Captcha solving service to use
apiKey string - API key for the service
autoDetect boolean true Auto-detect captchas on page load
autoSolve boolean true Auto-solve detected captchas
timeout number 120 Timeout for solving (seconds)
ignoreUrls string[] [] URLs to ignore when detecting captchas

API Reference

PuppeteerExtends

Method Description
getBrowser(options?) Get or create browser instance
closeBrowser(instanceId?) Close specific browser
closeAllBrowsers() Close all browser instances
goto(page, url, options?) Navigate to URL with enhanced features
waitForNavigation(page, options?) Wait for navigation to complete
waitForSelector(page, selector, timeout?) Wait for element to be visible
closePage(page) Close page safely

BrowserOptions

Option Type Default Description
isHeadless boolean true Launch browser in headless mode
isDebug boolean false Enable debug logging
customArguments string[] DEFAULT_BROWSER_ARGS Custom browser launch arguments
userDataDir string ./tmp/puppeteer-extends User data directory path
instanceId string "default" Browser instance identifier

NavigationOptions

Option Type Default Description
waitUntil string[] ["load", "networkidle0"] Navigation completion events
timeout number 30000 Navigation timeout (ms)
maxRetries number 1 Maximum retry attempts
retryDelay number 5000 Delay between retries (ms)
isDebug boolean false Enable debug logging
headers object {} Custom request headers

Testing

# Run all tests
pnpm run test

# Run with coverage
pnpm run test:coverage

# Run in watch mode
pnpm run test:watch

License

Apache-2.0

About

Easy manage and instance a puppeteer using a singleton pattern. Minimal configuration implemented for install and use.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages