Web Scraping Techniques

1

Method 1 · The Basics

HTTP Request Node

The simplest way to grab data. Just point it at a URL and it fetches whatever's there — like sending someone to tapau your lunch.

Easy

How it works

The HTTP Request node sends a GET request to any URL and returns the raw response — HTML, JSON, XML, whatever. Think of it as typing a URL in your browser, but n8n reads the page for you. Great for public APIs and simple pages.

Best for

Fetching JSON from public APIs (weather, currency rates, stock prices)
Downloading CSV/Excel files from a URL
Grabbing raw HTML from simple, static pages

Schedule Trigger

→

HTTP Request

→

Process Data

Pro tip: Set the response format to "JSON" if the API returns JSON. For HTML pages, use "String" and pipe it to the HTML Extract node (Method 2).

2

Method 2 · CSS Selectors

HTTP Request + HTML Extract

Fetch a page, then surgically extract the exact pieces you want using CSS selectors. Like going to a buffet but only taking the prawns.

Easy

How it works

First, the HTTP Request node grabs the full HTML of a page. Then the HTML Extract node uses CSS selectors (like .price, h1, table tr) to pull out specific elements. You get clean, structured data from messy HTML.

Best for

Scraping product prices from e-commerce sites
Extracting article titles and links from news pages
Pulling table data (like exchange rates) from bank websites

Trigger

→

HTTP Request

→

HTML Extract

→

Google Sheets

How to find CSS selectors: Right-click any element on a website → "Inspect" → look at the class or ID. For example, a price might be .product-price or #main-price.

3

Method 3 · JavaScript Pages

Headless Browser (Puppeteer / Playwright)

For those annoying pages that load content with JavaScript. This method opens a real browser in the background — the page thinks a human is visiting.

Advanced

How it works

Some websites load data dynamically with JavaScript — HTTP Request gets an empty page because the content hasn't rendered yet. A headless browser (like Puppeteer) actually runs the JavaScript, waits for the page to fully load, then extracts the data. It's like sending a robot to sit at a computer and browse for you.

Best for

Single-page apps (SPAs) built with React/Vue/Angular
Pages that require scrolling to load more content (infinite scroll)
Sites that need login or cookie acceptance before showing data

Trigger

→

Code Node
Puppeteer

→

Process & Store

When to use this: If Method 2 returns empty results or a blank page, the site probably uses JavaScript rendering. That's your cue to switch to a headless browser. You'll need Puppeteer or Playwright installed on your n8n server.

4

Method 4 · The Smart Way

Hidden API Discovery

Why scrape the HTML when the website is already fetching clean JSON from its own API? Find the hidden API and use it directly. Work smarter, not harder.

Medium

How it works

Most modern websites fetch data from a backend API. Open your browser's Network tab (F12 → Network → XHR/Fetch), browse the site normally, and watch the API calls fly by. Copy that API URL into n8n's HTTP Request node — boom, you get clean JSON without parsing any HTML. It's the cheat code of scraping.

Best for

E-commerce sites (product listings, reviews, inventory)
Social media data (posts, followers, engagement metrics)
Real estate listings, job boards, flight prices
Any site where you want structured data, not messy HTML

Trigger

→

HTTP Request
hidden API

→

Clean JSON!

How to find hidden APIs: Open DevTools (F12) → Network tab → filter by "Fetch/XHR" → browse the page → look for requests returning JSON. Right-click the request → "Copy as cURL" → paste into n8n's HTTP Request node using the "Import cURL" option.

5

Method 5 · The Gentleman's Way

RSS Feed Reader

The polite way to scrape. Many sites want you to take their data via RSS — it's structured, legal, and the server won't get mad at you.

Easy

How it works

RSS feeds are structured XML feeds that websites publish on purpose. n8n has a built-in RSS Read node that parses these feeds automatically. Just paste the RSS URL and you get titles, descriptions, dates, and links — no CSS selectors, no JavaScript, no drama. Check if a site has RSS by adding /feed or /rss to the URL.

Best for

News monitoring — track industry news from multiple sources
Blog aggregation — compile posts from competitor blogs
Job postings from career sites that offer RSS
Government announcements, tender notices, gazette updates

Schedule

→

RSS Read

→

Filter New

→

Send Alert

Combine with AI: Pipe RSS items into an AI Summarizer node to get a daily digest of industry news in plain English, then send it to Slack or Telegram. Your personal news assistant!