How to Configure HTTP/S and SOCKS Proxies for Puppeteer Web Scraping

Seo Za
March 24, 2026
7 minutes

Web scraping at scale inevitably runs into anti-bot defenses, with IP bans being a primary obstacle. Puppeteer, despite its power, exposes a distinctive browser fingerprint that modern anti-bot systems are designed to detect. Merely rotating IP addresses without proper authentication and accompanying fingerprint spoofing is a recipe for swift blocking. This guide delivers a complete walkthrough for implementing secure proxy routing via both HTTP/S and SOCKS proxies, and seamlessly integrating them into your automation workflows. You'll gain secure configuration templates, practical code samples, and critical best practices to achieve reliable, uninterrupted data extraction.

Understanding Proxies in Puppeteer Scraping

Websites defend against bots using rate limiting and IP tracking. Because headless browsers leave unique fingerprints, simply rotating IPs isn't sufficient—without comprehensive identity management, your scraper will still get blocked.

Proxies solve the network layer of this problem. A proxy server intermediates requests, distributing them across multiple IP addresses to avoid rate limits and bans. However, IP rotation must always be paired with fingerprint spoofing to truly mimic organic traffic.

There are two primary proxy protocols used in web scraping:

HTTP/S proxies: Optimized for HTTP/HTTPS traffic, lightweight, and straightforward to configure.
SOCKS proxies: Capable of handling any TCP traffic, offering greater flexibility and anonymity at the cost of slightly higher overhead.

Without a proxy, every request comes from your machine's IP, quickly triggering blocks. With a properly configured proxy pool, requests originate from diverse IPs, enabling sustained data extraction.

Request flow through a proxy:

Puppeteer → Proxy Server → Target Website(Response follows the reverse path)

Let's dive into setting up your own authenticated HTTP/S proxy using Squid.

Step-by-Step: Setting Up an Authenticated HTTP/S Proxy with Squid

Install Squid on Ubuntu 24.04 LTS:

sudo apt updatesudo apt install squid apache2-utils -y

Create authentication credentials using htpasswd (this creates the password file /etc/squid/passwords):
```
sudo htpasswd -c /etc/squid/passwords proxyuser
```
Note: The -c flag creates the file; omit it when adding subsequent users.
Configure squid.conf. Edit /etc/squid/squid.conf:
- Set the listening port (default is 3128).
- Define DNS servers (dns_nameservers).
- Configure the cache directory.
- Add an ACL for authenticated users and an http_access rule. Example:
```
auth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwordsauth_param basic realm proxyacl authenticated proxy_auth REQUIREDhttp_access allow authenticated
```
Configure the firewall (ufw) to allow the Squid port:
```
sudo ufw allow 3128/tcp
```
Restart and test the Squid proxy:
```
sudo systemctl restart squid
```
Test your setup using curl:
curl -x http://proxyuser:password@your-server-ip:3128 http://example.com.

Security is critical. Always use strong, unique passwords. For production environments, restrict source IPs via ACLs and consider tunneling Squid behind an HTTPS reverse proxy (like Nginx) if you are handling sensitive data. This secured HTTP proxy is now ready for your scripts.

Proxy Port	Authentication Method	Notes
3128	`basic_ncsa_auth` (htpasswd)	Standard, widely compatible, recommended for Squid.
8080	None or custom	Alternative port; avoid leaving open without authentication.

Hardening Your Squid Configuration for Production

A secure squid.conf enforces both authentication and network-level access controls. Use this baseline for Ubuntu 24.04 to prevent unauthorized relaying.

# Squid secure configurationhttp_port 3128cache_dir ufs /var/spool/squid 100 16 256dns_nameservers 8.8.8.8 8.8.4.4# Hide client IP for better anonymityforwarded_for deleterequest_header_access Via deny all# Authenticationauth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwordsauth_param basic realm proxyacl authenticated proxy_auth REQUIRED# Access Control Lists (ACLs)acl localnet src 10.0.0.0/8 192.168.0.0/16 # Replace with your scraper's IPshttp_access allow localnet authenticatedhttp_access deny all

Key directives explained:

forwarded_for delete: Strips the X-Forwarded-For header, hiding your scraper's real IP address from the target website.
acl localnet: Restricts incoming connections to your trusted networks.
http_access allow localnet authenticated: Only allows connections that match both the trusted IP range and valid credentials.

While HTTP proxies suffice for standard web scraping, SOCKS proxies like Dante provide enhanced flexibility for handling lower-level TCP connections.

Configuring SOCKS Proxies with Dante for Flexible Tunneling

A SOCKS proxy like Dante tunnels any TCP traffic, making it ideal for non-HTTP protocols and specific scraping edge cases. Unlike HTTP/S proxies, SOCKS operates at Layer 4 (Transport layer).

Feature	HTTP/S Proxy (Squid)	SOCKS Proxy (Dante)
OSI Layer	7 (Application)	4 (Transport)
Protocols	HTTP/HTTPS	Any TCP/UDP
Authentication	Basic, NTLM	SOCKS5: username/password; SOCKS4: none
Typical Use	Web scraping, caching, header manipulation	Raw socket tunneling, IP rotation

Install Dante server:

sudo apt update && sudo apt install dante-server -y

Configure /etc/danted.conf. Here is a recommended SOCKS5 setup with authentication:

logoutput: sysloguser.privileged: rootuser.unprivileged: nobody# The interface and port Dante listens oninternal: 0.0.0.0 port = 1080# The interface Dante uses for outgoing trafficexternal: eth0 method: usernameclient pass {  from: 0.0.0.0/0 to: 0.0.0.0/0  log: connect disconnect error}pass {  from: 0.0.0.0/0 to: 0.0.0.0/0  protocol: tcp  log: connect disconnect error}

Note: Ensure you replace eth0 with your server's actual external network interface (e.g., ens3 or eth1).

Create a proxy user:
```
sudo adduser proxyuser
```
Set a strong password. Dante authenticates directly via system users.
Allow port 1080 in your firewall:
```
sudo ufw allow 1080/tcp
```

Restart and enable the service:

sudo systemctl restart dantedsudo systemctl enable danted

Testing the setup:

SOCKS5 with auth: curl --socks5-hostname localhost:1080 --socks5-user proxyuser:password http://example.com

Implementing Proxy Authentication in Puppeteer: Code Examples

To route your Puppeteer traffic through an authenticated HTTP/S proxy, you must pass the proxy server address in the puppeteer.launch arguments and handle the actual credentials using page.authenticate(). Do not embed credentials directly in the proxy URL string, as Chromium ignores them and will throw a 407 error.

const puppeteer = require('puppeteer');(async () => {  // 1. Launch browser pointing to the proxy server  const browser = await puppeteer.launch({    args: ['--proxy-server=http://proxy.example.com:3128']  });    const page = await browser.newPage();  // 2. Provide the username and password for the proxy  await page.authenticate({    username: 'proxyuser',    password: 'securepassword123'  });  // 3. Navigate to the target  await page.goto('https://api.ipify.org'); // Check your IP    await browser.close();})();

Critical SOCKS5 Authentication Limitation
Chromium (and by extension, Puppeteer) does not support SOCKS5 with username/password authentication. If you pass credentials to a SOCKS5 proxy in Puppeteer, the connection will fail. Workarounds:

Use SOCKS4 if your proxy supports it (which relies on IP whitelisting instead of passwords).
Whitelist your scraping server's IP address in your Dante config (client pass { from: YOUR_IP ... }) and disable password auth.
Deploy a local gateway (e.g., Privoxy or HAProxy) that authenticates to the upstream SOCKS5 proxy and presents as a local unauthenticated HTTP proxy to Puppeteer.

Best Practices and Troubleshooting

Effective proxy management depends on balancing evasion tactics with computational overhead. Rotating proxies per request maximizes stealth but increases latency. Rotating per session reduces load but creates detectable behavior patterns.

Error	Common Cause	Fix
`ERR_PROXY_CONNECTION_FAILED`	Proxy is offline or blocked by firewall	Check `ufw` rules on the proxy server; ensure the service is running.
`407 Proxy Auth Required`	Invalid credentials or missing `page.authenticate()`	Do not put credentials in the `--proxy-server` string. Use `page.authenticate()`.
DNS leaks (HTTP proxy)	DNS queries bypass the proxy, revealing your location	Enforce DNS over proxy, or use SOCKS5.

Implementing robust retry logic is highly recommended. Proxies drop connections frequently. Use a fallback mechanism to switch proxies if a page fails to load.

async function fetchWithProxyFallback(url, proxyList, credentials) {  for (const proxy of proxyList) {    let browser;    try {      browser = await puppeteer.launch({        args: [`--proxy-server=${proxy}`]      });      const page = await browser.newPage();      await page.authenticate(credentials);      await page.goto(url, { waitUntil: 'domcontentloaded' });      return { page, browser };    } catch (err) {      console.warn(`Proxy ${proxy} failed. Retrying...`);      if (browser) await browser.close();      if (proxy === proxyList[proxyList.length - 1]) throw err; // Throw if last proxy fails    }  }}

Essential Proxy Commands Cheatsheet

Proxy Server Management Cheatsheet

Action	HTTP/S (Squid)	SOCKS (Dante)
Install (Ubuntu)	`sudo apt install squid apache2-utils`	`sudo apt install dante-server`
Create User	`sudo htpasswd -c /etc/squid/passwords user`	`sudo adduser proxyuser`
Check Status	`systemctl status squid`	`systemctl status danted`
Test Connection	`curl -x http://user:pass@localhost:3128 http://example.com`	`curl --socks5 localhost:1080 --socks5-user user:pass http://example.com`

Effective web scraping with Puppeteer hinges on sophisticated proxy management. We've covered setting up secure HTTP/S and SOCKS servers, integrating them correctly using page.authenticate(), and navigating Chromium's infamous SOCKS5 limitations. Remember: proxies alone aren't a silver bullet. Combine reliable proxy routing with robust fingerprint spoofing (using libraries like puppeteer-extra-plugin-stealth) to truly bypass modern bot protections. Monitor your success rates, rotate your IP pools frequently, and adapt to evolving countermeasures.