AITraining2U · n8n Workshop

Web Scraping Techniques

5 ways to pull data from the internet using n8n — from "easy peasy" to "I feel like a hacker."

Back to Home
1
Method 1 · The Basics
HTTP Request Node
The simplest way to grab data. Just point it at a URL and it fetches whatever's there — like sending someone to tapau your lunch.
Easy

The HTTP Request node sends a GET request to any URL and returns the raw response — HTML, JSON, XML, whatever. Think of it as typing a URL in your browser, but n8n reads the page for you. Great for public APIs and simple pages.

  • Fetching JSON from public APIs (weather, currency rates, stock prices)
  • Downloading CSV/Excel files from a URL
  • Grabbing raw HTML from simple, static pages
Schedule Trigger
HTTP Request
Process Data
Pro tip: Set the response format to "JSON" if the API returns JSON. For HTML pages, use "String" and pipe it to the HTML Extract node (Method 2).
2
Method 2 · CSS Selectors
HTTP Request + HTML Extract
Fetch a page, then surgically extract the exact pieces you want using CSS selectors. Like going to a buffet but only taking the prawns.
Easy

First, the HTTP Request node grabs the full HTML of a page. Then the HTML Extract node uses CSS selectors (like .price, h1, table tr) to pull out specific elements. You get clean, structured data from messy HTML.

  • Scraping product prices from e-commerce sites
  • Extracting article titles and links from news pages
  • Pulling table data (like exchange rates) from bank websites
Trigger
HTTP Request
HTML Extract
Google Sheets
How to find CSS selectors: Right-click any element on a website → "Inspect" → look at the class or ID. For example, a price might be .product-price or #main-price.
3
Method 3 · JavaScript Pages
Headless Browser (Puppeteer / Playwright)
For those annoying pages that load content with JavaScript. This method opens a real browser in the background — the page thinks a human is visiting.
Advanced

Some websites load data dynamically with JavaScript — HTTP Request gets an empty page because the content hasn't rendered yet. A headless browser (like Puppeteer) actually runs the JavaScript, waits for the page to fully load, then extracts the data. It's like sending a robot to sit at a computer and browse for you.

  • Single-page apps (SPAs) built with React/Vue/Angular
  • Pages that require scrolling to load more content (infinite scroll)
  • Sites that need login or cookie acceptance before showing data
Trigger
Code Node
Puppeteer
Process & Store
When to use this: If Method 2 returns empty results or a blank page, the site probably uses JavaScript rendering. That's your cue to switch to a headless browser. You'll need Puppeteer or Playwright installed on your n8n server.
4
Method 4 · The Smart Way
Hidden API Discovery
Why scrape the HTML when the website is already fetching clean JSON from its own API? Find the hidden API and use it directly. Work smarter, not harder.
Medium

Most modern websites fetch data from a backend API. Open your browser's Network tab (F12 → Network → XHR/Fetch), browse the site normally, and watch the API calls fly by. Copy that API URL into n8n's HTTP Request node — boom, you get clean JSON without parsing any HTML. It's the cheat code of scraping.

  • E-commerce sites (product listings, reviews, inventory)
  • Social media data (posts, followers, engagement metrics)
  • Real estate listings, job boards, flight prices
  • Any site where you want structured data, not messy HTML
Trigger
HTTP Request
hidden API
Clean JSON!
How to find hidden APIs: Open DevTools (F12) → Network tab → filter by "Fetch/XHR" → browse the page → look for requests returning JSON. Right-click the request → "Copy as cURL" → paste into n8n's HTTP Request node using the "Import cURL" option.
5
Method 5 · The Gentleman's Way
RSS Feed Reader
The polite way to scrape. Many sites want you to take their data via RSS — it's structured, legal, and the server won't get mad at you.
Easy

RSS feeds are structured XML feeds that websites publish on purpose. n8n has a built-in RSS Read node that parses these feeds automatically. Just paste the RSS URL and you get titles, descriptions, dates, and links — no CSS selectors, no JavaScript, no drama. Check if a site has RSS by adding /feed or /rss to the URL.

  • News monitoring — track industry news from multiple sources
  • Blog aggregation — compile posts from competitor blogs
  • Job postings from career sites that offer RSS
  • Government announcements, tender notices, gazette updates
Schedule
RSS Read
Filter New
Send Alert
Combine with AI: Pipe RSS items into an AI Summarizer node to get a daily digest of industry news in plain English, then send it to Slack or Telegram. Your personal news assistant!