5 ways to pull data from the internet using n8n — from "easy peasy" to "I feel like a hacker."
The HTTP Request node sends a GET request to any URL and returns the raw response — HTML, JSON, XML, whatever. Think of it as typing a URL in your browser, but n8n reads the page for you. Great for public APIs and simple pages.
First, the HTTP Request node grabs the full HTML of a page. Then the HTML Extract node uses CSS selectors (like .price, h1, table tr) to pull out specific elements. You get clean, structured data from messy HTML.
.product-price or #main-price.Some websites load data dynamically with JavaScript — HTTP Request gets an empty page because the content hasn't rendered yet. A headless browser (like Puppeteer) actually runs the JavaScript, waits for the page to fully load, then extracts the data. It's like sending a robot to sit at a computer and browse for you.
Most modern websites fetch data from a backend API. Open your browser's Network tab (F12 → Network → XHR/Fetch), browse the site normally, and watch the API calls fly by. Copy that API URL into n8n's HTTP Request node — boom, you get clean JSON without parsing any HTML. It's the cheat code of scraping.
RSS feeds are structured XML feeds that websites publish on purpose. n8n has a built-in RSS Read node that parses these feeds automatically. Just paste the RSS URL and you get titles, descriptions, dates, and links — no CSS selectors, no JavaScript, no drama. Check if a site has RSS by adding /feed or /rss to the URL.