Articles

How to Power-Up LLMs with Web Scraping and RAG

In depth look at how to use LLM and web scraping for RAG applications using either LlamaIndex or LangChain.

Web Scraping With Cloud Browsers

Introduction cloud browsers and their benefits and a step-by-step setup with self-hosted Selenium-grid cloud browsers.

How to Scrape Forms

Learn how to scrape forms through a step-by-step guide using HTTP clients and headless browsers.

How to Build a Minimum Advertised Price (MAP) Monitoring Tool

Learn what minimum advertised price monitoring is and how to apply its concept using Python web scraping.

How to Scrape Reddit Posts, Subreddits and Profiles

In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.

How to Scrape With Headless Firefox

Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.

How to Use Tor For Web Scraping

In this article, we'll explain web scraping using Tor. For this, we'll use Tor as a proxy server to change the IP address randomly in either HTTP or SOCKS, as well as using it as a rotating proxy server.

How to Know What Anti-Bot Service a Website is Using?

In this article we'll take a look at two popular tools: WhatWaf and Wafw00f which can identify what WAF service is used.

How to Scrape LinkedIn in 2024

In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.