     [Blog](https://scrapfly.io/blog)   /  [python](https://scrapfly.io/blog/tag/python)   /  [How to Scrape LinkedIn Profiles, Companies, and Jobs in 2026](https://scrapfly.io/blog/posts/how-to-scrape-linkedin)   # How to Scrape LinkedIn Profiles, Companies, and Jobs in 2026

 by [Mazen Ramadan](https://scrapfly.io/blog/author/mazen) Jun 25, 2026 28 min read [\#python](https://scrapfly.io/blog/tag/python) [\#scrapeguide](https://scrapfly.io/blog/tag/scrapeguide) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-linkedin "Share on LinkedIn")    

 

 

   

Ever wondered why your LinkedIn scraper gets stopped dead, even when you're only reading public data? You view a few profiles, then hit a login wall, while your regular browser keeps browsing LinkedIn fine.

That's no accident. LinkedIn's public data is easy to reach but hard to reach at scale: fingerprinting, behavior detection, and fraud scoring flag bots before IP blocks fire. This guide has Python code to pull profiles, companies, jobs, and search anyway.

[Guide to LinkedIn API and AlternativesExplore the LinkedIn API, covering data endpoints, usage limitations, and accessibility.](https://scrapfly.io/blog/posts/guide-to-linkedin-api-and-alternatives)

[**Latest LinkedIn Scraper Code**github.com/scrapfly/scrapfly-scrapers/linkedin-scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/linkedin-scraper)



## Key Takeaways

- **Three layers, one fraud score:** auth wall, behavior tracking, and fingerprinting.
- **No login needed:** profiles, companies, jobs, and search are all public and scrapable.
- **Read the script tags:** profiles and companies ship data in `application/ld+json` blocks.
- **Jobs come from a hidden API:** `seeMoreJobPostings` pages 25 results at a time via `start`.
- **Jobs are the easy win:** Google indexes the jobs surface, so its detection bar is lower.
- **Scale brings blocks back:** rotate residential proxies, randomize TLS, act human.

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







## Why Scrape LinkedIn?

Consider what makes LinkedIn data valuable. Open LinkedIn profiles and company data fuel smarter business strategies and give organizations a real edge.

Here’s what you can achieve by accessing professional data effectively:

**Recruitment Intelligence:** Gain insights into job market trends, salary benchmarks, and analyze skill demand.

**Sales Prospecting:** Identify decision-makers, understand company org charts, and improve contact details.

**Market Research:** Discover industry growth patterns, track competitor hiring, and study talent migration.

**Competitive Analysis:** Spot company expansion signals, examine team structures, and monitor strategic moves.

LinkedIn holds professional data that's publicly visible but defensively protected. The challenge isn't accessing data, it's accessing it at scale without detection. Below we break down the specific defenses you’ll hit when you try to scale a scraper.

## Why LinkedIn Scraping Fails Without Proper Tools

LinkedIn has an advanced security system in place to stop web scrapers who try to access its public data. It rapidly creates authentication constraints requiring users to log in after only a few profile views.

Beyond behavior analysis, LinkedIn uses request fingerprinting. It weighs factors like IP address quality and origin, browser-specific headers and cookies, and device attributes. LinkedIn combines these signals into a fraud score for each visitor.

Understanding the nature of these constraints will give context for the hands on strategies and technical workarounds covered in the following sections.

### 1. Authentication Wall

LinkedIn aggressively limits access to its vast trove of data unless users are authenticated. Most profile, company, and job information is locked behind a login barrier after just a few page views.

Anonymous visitors are hit with sign-in prompts and blocked from seeing further details. This strict authentication wall is LinkedIn's primary line of defense against unauthorized scraping. Here are some common examples:

- **Profile views:** After viewing just 3–5 profiles, LinkedIn requires you to log in.
- **Company data:** Much of it is hidden unless you’re logged in.
- **Search:** Public users are completely blocked from using search.

**About Logging In:** Scraping while logged into an account might seem like an easy workaround. However, this goes against LinkedIn’s Terms of Service and can quickly lead to your accounts being permanently banned.

### 2. Behavioral Tracking

LinkedIn watches how people behave on the site and compares that to each incoming request. Scrapers usually stand out because their behavior looks different from a real user. Common differences include:

- **Request timing:** Human users don't view 100 profiles per minute, while bots do
- **Navigation patterns:** Real users click on buttons, scroll down pages, pause on random timing; bots don't
- **Mouse movements:** Real users have a natural mouse activity, for highlighting text, hovering into elements, etc. Bots don't have mouse activity.
- **Referrer and navigation flow:** Actual users move through internal links, such as search pages. On the other hand, bots request the URL directly without having common navigation profiles.

If your scraper does not mimic these behaviors, LinkedIn will notice. To avoid detection you need realistic timing, navigation, and interaction patterns. see our headless browser guide that shows how to add natural clicks, scrolls, and pauses.

[How to Scrape Dynamic Websites Using Headless Web BrowsersIntroduction to using web automation tools such as Puppeteer, Playwright, Selenium and ScrapFly to render dynamic websites for web scraping](https://scrapfly.io/blog/posts/scraping-using-browsers)

### 3. Request Fingerprinting

Beyond behavior, LinkedIn checks the technical fingerprint of each request. This includes several signals that are hard for simple scripts to fake:

- **IP address quality:** Whether the requested IP address is residential coming from a real home internet provider, or a datacenter associated with a hosting network.
- **TLS analysis:** When a request is sent to the application's web server, it establishes a TLS handshake. This handshake leads to creating a fingerprint called JA3. LinkedIn uses this fingerprint and compares it with those of normal users.
- **Headers and Cookies:** Whether the request uses the standard browser headers or it has a cookie a normal user might have. Such cookie chains are often obtained through natural navigation through the web app.
- **Device fingerprints:** Whether the used browser's attributes meet real browser requirements, these include the web browser metadata, device hardware capabilities, operating system and version, and other related specifications that couldn't meet real browser values.

**Fraud Scoring Logic**

LinkedIn combines all the signals above to calculate a **fraud score** for each incoming client. This score is then compared against the average patterns of real users. Based on that comparison, LinkedIn decides whether to:

- **Approve** the request as legitimate and allow it through.
- **Flag or block** the request if it appears automated or suspicious.

In short, every request to LinkedIn is evaluated for authenticity before any data is served.

If your score looks different from typical users, you will be forced to log in, or blocked. For technical deep dives and defensive techniques, check our guide on browser fingerprint impersonation.

[Bypass Proxy Detection with Browser Fingerprint ImpersonationStop proxy blocks with browser fingerprint impersonation using this guide for Playwright, Selenium, curl-impersonate &amp; Scrapfly](https://scrapfly.io/blog/posts/bypass-proxy-detection-with-browser-fingerprint-impersonation)

## How LinkedIn Data is Loaded and How to Scrape it?

LinkedIn is a single-page application that loads its data mainly through a couple of methods. Our LinkedIn scraper will extract LinkedIn data through multiple approaches. So let's briefly explain each, how it works under the hood, how to inspect it in the browser, and how to extract it in code!

### Rendering an HTML Template

LinkedIn is a single-page application that loads data through a couple of methods, and our scraper handles each. Let's explain how they work, how to inspect them in the browser, and how to extract them in code.

To scrape data from a page like this on LinkedIn, you only need to mimic what a browser does, then extract the information you care about. Here’s the general flow:

1. **Send a request** to the page URL and wait for a successful response.
2. **Read the response body**, which contains the full HTML of the page.
3. **Parse the HTML** and extract the fields you want using **CSS selectors** or **XPath**.

This approach is straightforward because the data is already there in the HTML so no need to load extra scripts or wait for background requests.

### Hydrating the Page Using Script Tags

Many modern sites load data inside `<script>` tags instead of the HTML, common in apps using **server-side** or **client-side rendering**. The script tag holds the data in JSON, which the browser uses to build the visible content.



To scrape LinkedIn data from pages like this, you can:

1. **Request the page URL** and get the full HTML response.
2. **Parse the HTML** and find the `<script>` tag that contains the JSON data.
3. **Extract and load** that JSON into an object for further processing.

This technique is often called **hidden data scraping** because the useful information is buried inside script tags that most users and basic scrapers ignore.

### Loading The Data from XHR Calls

Modern websites don’t always load everything in one go. Instead, they often use small background requests called **XHR calls** or **fetch requests** to get extra data as you browse.

For example, when you open a LinkedIn search page, the site first loads the layout, then sends hidden requests to get the actual profile or job results in JSON format. Once the data arrives, the page updates automatically.

To collect this kind of data, you can do the following:

1. **Start a headless browser**
2. **Open the target page** and watch its network activity
3. **Wait for or trigger** the XHR
4. **Capture and read** the response from the browser’s network logs

These XHR calls are often called hidden APIs because they work behind the scenes. In web scraping, getting data from them is known as hidden API scraping, since you’re using the same background requests that a browser already makes.

## Scraping LinkedIn with Scrapfly

Managing LinkedIn scraping in-house is a real overhead. That's why we built the Scrapfly LinkedIn scraper, an open-source tool ready to run. It uses Scrapfly's web scraping API, which handles the hard parts for you.



Scrapfly provides [web scraping](https://scrapfly.io/docs/scrape-api/getting-started), [screenshot](https://scrapfly.io/docs/screenshot-api/getting-started), and [extraction](https://scrapfly.io/docs/extraction-api/getting-started) APIs for data collection at scale.

- [Anti-bot protection bypass](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) - scrape web pages without blocking.
- [Rotating residential proxies](https://scrapfly.io/docs/scrape-api/proxy) - prevent IP address and geographic blocks.
- [JavaScript rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering) - scrape JavaScript-heavy pages through cloud browsers.
- [Full browser automation](https://scrapfly.io/docs/scrape-api/javascript-scenario) - control browsers to scroll, input, and click.
- [Format conversion](https://scrapfly.io/docs/scrape-api/getting-started#api_param_format) - scrape as HTML, JSON, Text, or Markdown.
- [Python](https://scrapfly.io/docs/sdk/python) and [Typescript](https://scrapfly.io/docs/sdk/typescript) SDKs, plus [Scrapy](https://scrapfly.io/docs/sdk/scrapy) and [no-code integrations](https://scrapfly.io/docs/integration/getting-started).

For LinkedIn, you point one request at a profile, company, or job URL and let Scrapfly clear the login wall, fingerprint checks, and proxy rotation.

### Web Scraping API

Scrape any website with our powerful API. Anti-bot bypass, JavaScript rendering, and rotating proxies built-in.



[Try Web Scraping API](https://scrapfly.io/docs/scrape-api/getting-started)



### What LinkedIn Data Can You Scrape?

Web scraping means collecting raw web data, avoiding detection, and turning it into clean datasets. Our **LinkedIn scraper** runs this whole process for you, from fetching to parsing, and returns the output as **JSON** you can use right away.

With Scrapfly, you can collect different types of LinkedIn data such as:

- **Profiles:** Name, headline, current position, work history, education, skills, number of connections (if visible), and profile picture URL
- **Companies:** Name, industry, size, headquarters, specialties, follower count, recent posts, and related pages
- **Jobs:** Title, location, description, number of applicants, posted date, seniority level, employment type, and required skills
- **Search results:** People search (name, headline, location) and job search (title, company, location, posted date)

Scraping LinkedIn data with Scrapfly is simple and efficient. Here is an example:

python```python
from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_API_KEY")
result = client.scrape(ScrapeConfig(
    "https://linkedin.com/in/username",
    asp=True,        # Enable anti-scraping protection
    render_js=True,  # Handle dynamic content
    country="US"
))
```



This setup takes care of the complex parts such as JavaScript rendering, location targeting, and anti-bot protection, so you can focus on analyzing the data instead of fighting detection systems.

## Scraping LinkedIn Code Examples

So far, we have covered all the key details needed to scrape LinkedIn data. Now it is time to move on to some practical examples. In the following sections, we will show ready-to-use Python code snippets for extracting data from different parts of LinkedIn.

These examples use **Scrapfly's Python SDK**, which takes care of the complex parts for you.

If you prefer libraries like [HTTPX](https://pypi.org/project/httpx/) or [requests](https://pypi.org/project/requests/) directly, you manage proxy rotation, fingerprint randomization, and session handling yourself. Without those, your scraper gets detected after a few requests.

To follow along with the examples, install the Scrapfly SDK with this command:

bash```bash
$ pip install scrapfly-sdk
```



### Scraping LinkedIn Profile Data

In this section, we'll extract data from publicly available data on LinkedIn user profiles. If we take a look at one of the public LinkedIn profiles (like the one for [Bill Gates](https://www.linkedin.com/in/williamhgates)) we can see loads of valuable public data:



LinkedIn public profile pageBefore we start scraping LinkedIn profiles, let's identify the HTML parsing approach. We can manually parse each data point from the HTML or **extract data from hidden script tags**.

[How to Scrape Hidden Web DataThe visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?](https://scrapfly.io/blog/posts/how-to-scrape-hidden-web-data)

To locate this hidden data, we can follow these steps:

- Open the [browser developer tools](https://scrapfly.io/blog/answers/browser-developer-tools-in-web-scraping) by pressing the `F12` key.
- Search for the selector: `//script[@type='application/ld+json']`.

This will lead to a `script` tag with the following details:



This gets us the core details available on the page, though a few fields like the job title are missing, as the page is viewed publicly. To scrape it, we'll extract the script and parse it:

python```python
import json
from typing import Dict, List
from parsel import Selector
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")

BASE_CONFIG = {
    # bypass linkedin.com web scraping blocking
    "asp": True,
    # set the proxy country to US
    "country": "US",
    "headers": {
        "Accept-Language": "en-US,en;q=0.5"
    }
}

def refine_profile(data: Dict) -> Dict: 
    """refine and clean the parsed profile data"""
    parsed_data = {}
    profile_data = [key for key in data["@graph"] if key["@type"]=="Person"][0]
    profile_data["worksFor"] = [profile_data["worksFor"][0]]
    articles = [key for key in data["@graph"] if key["@type"]=="Article"]
    for article in articles:
        selector = Selector(article["articleBody"])
        article["articleBody"] = "".join(selector.xpath("//p/text()").getall())
    parsed_data["profile"] = profile_data
    parsed_data["posts"] = articles
    return parsed_data


def parse_profile(response: ScrapeApiResponse) -> Dict:
    """parse profile data from hidden script tags"""
    selector = response.selector
    data = json.loads(selector.xpath("//script[@type='application/ld+json']/text()").get())
    refined_data = refine_profile(data)
    return refined_data


async def scrape_profile(urls: List[str]) -> List[Dict]:
    """scrape public linkedin profile pages"""
    to_scrape = [ScrapeConfig(url, **BASE_CONFIG) for url in urls]
    data = []
    # scrape the URLs concurrently
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        profile_data = parse_profile(response)
        data.append(profile_data)
    log.success(f"scraped {len(data)} profiles from Linkedin")
    return data
```



Run the codepython```python
async def run():
    profile_data = await scrape_profile(
        urls=[
            "https://www.linkedin.com/in/williamhgates"
        ]
    )
    # save the data to a JSON file
    with open("profile.json", "w", encoding="utf-8") as file:
        json.dump(profile_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())
```



In the above LinkedIn profile scraper, we define three functions. Let's break them down:

- `scrape_profile()`: To request LinkedIn account URLs concurrently and use the parsing logic to extract each profile data.
- `parse_profile()`: To parse the `script` tag containing the profile data.
- `refine_profile()`: To refine and organize the extracted data.

Here's a sample output of the LinkedIn profile data retrieved



With this LinkedIn lead scraper, we gather details on prospective leads: their job titles, companies, industries, and contact info. That data supports more personalized, strategic outreach.

For the full retrieved JSON output, you can view the example on [our example dataset on Scrapfly's LinkedIn scraper.](https://github.com/scrapfly/scrapfly-scrapers/blob/main/linkedin-scraper/results/profile.json)

Next, let's explore how to scrape **company data**!



### Scraping LinkedIn Company Data

LinkedIn company profiles include various valuable data points like the company's industry, addresses, number of employees, jobs, and related company businesses. Also, the company profiles are public, meaning that we can scrape their full details!

Let's start by taking a look at a company profile page on LinkedIn such as [Microsoft](https://www.linkedin.com/company/microsoft):



LinkedIn company pageJust like with people pages, the LinkedIn company page data can also be found in hidden `script` tags:



From the above image, we can see that the `script` tag doesn't contain the full company details. So to extract the entire company dataset we'll use a bit of HTML parsing as well:

python```python
import json
import jmespath
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

BASE_CONFIG = {
    "asp": True,
    "country": "US",
    "headers": {
        "Accept-Language": "en-US,en;q=0.5"
    }
}

def strip_text(text):
    """remove extra spaces while handling None values"""
    return text.strip() if text != None else text


def parse_company(response: ScrapeApiResponse) -> Dict:
    """parse company main overview page"""
    selector = response.selector
    script_data = json.loads(selector.xpath("//script[@type='application/ld+json']/text()").get())
    script_data = jmespath.search(
        """{
        name: name,
        url: url,
        mainAddress: address,
        description: description,
        numberOfEmployees: numberOfEmployees.value,
        logo: logo
        }""",
        script_data
    )
    data = {}
    for element in selector.xpath("//div[contains(@data-test-id, 'about-us')]"):
        name = element.xpath(".//dt/text()").get().strip()
        value = element.xpath(".//dd/text()").get().strip()
        data[name] = value
    addresses = []
    for element in selector.xpath("//div[contains(@id, 'address') and @id != 'address-0']"):
        address_lines = element.xpath(".//p/text()").getall()
        address = ", ".join(line.replace("\n", "").strip() for line in address_lines)
        addresses.append(address)
    affiliated_pages = []
    for element in selector.xpath("//section[@data-test-id='affiliated-pages']/div/div/ul/li"):
        affiliated_pages.append({
            "name": element.xpath(".//a/div/h3/text()").get().strip(),
            "industry": strip_text(element.xpath(".//a/div/p[1]/text()").get()),
            "address": strip_text(element.xpath(".//a/div/p[2]/text()").get()),
            "linkeinUrl": element.xpath(".//a/@href").get().split("?")[0]
        })
    similar_pages = []
    for element in selector.xpath("//section[@data-test-id='similar-pages']/div/div/ul/li"):
        similar_pages.append({
            "name": element.xpath(".//a/div/h3/text()").get().strip(),
            "industry": strip_text(element.xpath(".//a/div/p[1]/text()").get()),
            "address": strip_text(element.xpath(".//a/div/p[2]/text()").get()),
            "linkeinUrl": element.xpath(".//a/@href").get().split("?")[0]
        })
    data = {**script_data, **data}
    data["addresses"] = addresses    
    data["affiliatedPages"] = affiliated_pages
    data["similarPages"] = similar_pages
    return data


async def scrape_company(urls: List[str]) -> List[Dict]:
    """scrape prublic linkedin company pages"""
    to_scrape = [ScrapeConfig(url, **BASE_CONFIG) for url in urls]
    data = []
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        data.append(parse_company(response))
    log.success(f"scraped {len(data)} companies from Linkedin")
    return data
```



Run the codepython```python
async def run():
    profile_data = await scrape_company(
        urls=[
            "https://linkedin.com/company/microsoft"
        ]
    )
    # save the data to a JSON file
    with open("company.json", "w", encoding="utf-8") as file:
        json.dump(profile_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())
```



In the above LinkedIn scraping code, we define two functions. Let's break them down:

- `parse_company()`: To parse the company data from `script` tags while using [Quick Intro to Parsing JSON with JMESPath in Python](https://scrapfly.io/blog/posts/parse-json-jmespath-python) to refine it and parse other HTML elements using XPath selectors.
- `scrape_company()`: To request the company page URLs while utilizing the parsing logic.

Here's a sample output of the extracted company information:



For the full retrieved JSON output, you can view the example on [our example dataset on Scrapfly's LinkedIn scraper.](https://github.com/scrapfly/scrapfly-scrapers/blob/main/linkedin-scraper/results/company.json)

The above data covers the company overview. Next, let's move on to LinkedIn's job data and the scrapers built for it.



Scrapfly

#### Scale your web scraping effortlessly

Scrapfly handles proxies, browsers, and anti-bot bypass — so you can focus on data.

[Try Free →](https://scrapfly.io/register)## Scraping LinkedIn Jobs

You scrape [LinkedIn jobs data](https://scrapfly.io/use-case/jobs-web-scraping) from three public surfaces: search results, job pages, and a company's listings. Search and company lists use the hidden `seeMoreJobPostings` API; job pages expose `application/ld+json` data.

### Scraping LinkedIn Job Search Results

LinkedIn's job search covers millions of listings across industries and locations. You build a search by adding keyword and location parameters to the search URL, like this:

shell```shell
https://www.linkedin.com/jobs/search?keywords=python%2Bdeveloper&location=United%2BStates
```



The URL above uses basic filters. It also accepts extra parameters to narrow results, such as date, experience level, or city.

LinkedIn returns the first batch of results with the page, then pulls more through a hidden API as you scroll. We'll request the first page to read the total result count, then page through the rest with that API.

[How to Scrape Hidden APIsIn this tutorial we'll be taking a look at scraping hidden APIs which are becoming more and more common in modern dynamic websites - what's the best way to scrape them?](https://scrapfly.io/blog/posts/how-to-scrape-hidden-apis)

To find this hidden API, open your browser developer tools, select the network tab, filter by `Fetch/XHR`, and scroll the page. You'll see calls to the `seeMoreJobPostings/search` endpoint, paged with the `start` query parameter:

shell```shell
https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=python+developer&location=United+States&start=25
```



Here is the job search scraper:

python```python
import json
import asyncio
from typing import Dict, List
from loguru import logger as log
from urllib.parse import urlencode, quote_plus
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")

BASE_CONFIG = {
    "asp": True,
    "country": "US",
    "headers": {
        "Accept-Language": "en-US,en;q=0.5"
    },
    "render_js": True,
    "proxy_pool": "public_residential_pool"
}

def parse_job_search(response: ScrapeApiResponse) -> List[Dict]:
    """parse job data from job search pages"""
    selector = response.selector
    total_results = selector.xpath("//span[contains(@class, 'job-count')]/text()").get()
    total_results = int(total_results.replace(",", "").replace("+", "")) if total_results else None
    data = []
    search_elements = selector.xpath("//section[contains(@class, 'results-list')]/ul/li")
    if len(search_elements) == 0:  # pagination pages have a different structure
        search_elements = selector.xpath("//li")
    for element in search_elements:
        data.append({
            "title": element.xpath(".//div/a/span/text()").get().strip(),
            "company": element.xpath(".//div/div[contains(@class, 'info')]/h4/a/text()").get().strip(),
            "address": element.xpath(".//div/div[contains(@class, 'info')]/div/span/text()").get().strip(),
            "timeAdded": element.xpath(".//div/div[contains(@class, 'info')]/div/time/@datetime").get(),
            "jobUrl": element.xpath(".//div/a/@href").get().split("?")[0],
            "companyUrl": element.xpath(".//div/div[contains(@class, 'info')]/h4/a/@href").get().split("?")[0],
        })
    return {"data": data, "total_results": total_results}

async def scrape_job_search(keyword: str, location: str, max_pages: int = None) -> List[Dict]:
    """scrape Linkedin job search"""

    def form_urls_params(keyword, location):
        """form the job search URL params"""
        params = {
            "keywords": quote_plus(keyword),
            "location": location,
        }
        return urlencode(params)

    first_page_url = "https://www.linkedin.com/jobs/search?" + form_urls_params(keyword, location)
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(first_page_url, **BASE_CONFIG))
    data = parse_job_search(first_page)["data"]
    total_results = parse_job_search(first_page)["total_results"]
    # get the total number of pages to scrape, each page contains 25 results
    if max_pages and max_pages * 25 < total_results:
        total_results = max_pages * 25

    log.info(f"scraped the first job page, {total_results // 25 - 1} more pages")
    # scrape the remaining pages concurrently
    other_pages_url = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?"
    to_scrape = [
        ScrapeConfig(other_pages_url + form_urls_params(keyword, location) + f"&start={index}", **BASE_CONFIG)
        for index in range(25, total_results + 25, 25)
    ]
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        try:
            page_data = parse_job_search(response)["data"]
            data.extend(page_data)
        except Exception as e:
            log.error("An occured while scraping search pagination", e)
            pass

    log.success(f"scraped {len(data)} jobs from Linkedin job search")
    return data
```



Run the codepython```python
async def run():
    job_search_data = await scrape_job_search(
        keyword="Python Developer",
        location="United States",
        max_pages=3
    )
    # save the data to a JSON file
    with open("job_search.json", "w", encoding="utf-8") as file:
        json.dump(job_search_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())
```



We start by building the search URL from the keyword and location. Then we request the first page, read the result count, and page through the rest with the hidden API.

Here is an example output of the LinkedIn job search scraper:



The search results give you a list of jobs, but not the full text of each one. Let's pull that detail from the individual job pages.

### Scraping Individual Job Pages

Each LinkedIn job has its own page that holds the complete description, seniority level, employment type, and more. LinkedIn keeps this data in an `application/ld+json` script tag, so we use the hidden web data approach again.

Search for the selector `//script[@type='application/ld+json']` in the page source, and you'll find data like the below:

The `description` field holds the job description as encoded HTML. So we read the script tag and parse that field to get the full job details:

python```python
import json
import asyncio
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")

BASE_CONFIG = {
    "asp": True,
    "country": "US",
    "headers": {
        "Accept-Language": "en-US,en;q=0.5"
    },
    "render_js": True,
    "proxy_pool": "public_residential_pool"
}

def parse_job_page(response: ScrapeApiResponse):
    """parse individual job data from Linkedin job pages"""
    selector = response.selector
    script_data = json.loads(selector.xpath("//script[@type='application/ld+json']/text()").get())
    description = []
    for element in selector.xpath("//div[contains(@class, 'show-more')]/ul/li/text()").getall():
        text = element.replace("\n", "").strip()
        if len(text) != 0:
            description.append(text)
    script_data["jobDescription"] = description
    script_data.pop("description")  # remove the key with the encoded HTML
    return script_data

async def scrape_jobs(urls: List[str]) -> List[Dict]:
    """scrape Linkedin job pages"""
    to_scrape = [ScrapeConfig(url, **BASE_CONFIG) for url in urls]
    data = []
    # scrape the URLs concurrently
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        try:
            job_data = parse_job_page(response)
            if job_data is None:
                raise Exception("Job page is expired, no hidden json data found")
            data.append(job_data)
        except:
            log.debug(f"Job page with {response.context['url']} URL is expired")
    log.success(f"scraped {len(data)} jobs from Linkedin")
    return data
```



Run the codepython```python
async def run():
    job_data = await scrape_jobs(
        urls=[
            "https://www.linkedin.com/jobs/view/junior-software-developer-fresh-graduate-at-smart-is-4423536207",
            "https://www.linkedin.com/jobs/view/entry-level-python-developer-remote-at-synergisticit-4420671223",
            "https://www.linkedin.com/jobs/view/software-engineer-ai-team-at-sprig-4424454823"
        ]
    )
    # save the data to a JSON file
    with open("jobs.json", "w", encoding="utf-8") as file:
        json.dump(job_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())
```



We add the job page URLs to a list and request them concurrently. `parse_job_page()` reads the hidden script tag and pulls the HTML from the description field. Postings expire often, so swap in current URLs when you run this.

Here is what the job page scraper output looks like:

Job search covers listings across all companies. Next, let's narrow the same approach to a single company's openings.

### Scraping Jobs from a Specific Company

Every company page has its own jobs section under the `/jobs` path of the company URL:

LinkedIn loads this list on scroll through the same hidden API you saw in the search section, but with a company-specific endpoint:



The results page with the `start` URL query parameter:

shell```shell
https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/microsoft-jobs-worldwide?start=25
```



We request the first company jobs page to read the result count, then page through the rest with that endpoint:

python```python
import json
import asyncio
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")

BASE_CONFIG = {
    "asp": True,
    "country": "US",
    "headers": {
        "Accept-Language": "en-US,en;q=0.5"
    },
    "render_js": True,
    "proxy_pool": "public_residential_pool"
}

def parse_company_jobs(response: ScrapeApiResponse) -> List[Dict]:
    """parse job data from a company's jobs section"""
    selector = response.selector
    total_results = selector.xpath("//span[contains(@class, 'job-count')]/text()").get()
    total_results = int(total_results.replace(",", "").replace("+", "")) if total_results else None
    data = []
    search_elements = selector.xpath("//section[contains(@class, 'results-list')]/ul/li")
    if len(search_elements) == 0:  # pagination pages have a different structure
        search_elements = selector.xpath("//li")
    for element in search_elements:
        data.append({
            "title": element.xpath(".//div/a/span/text()").get().strip(),
            "company": element.xpath(".//div/div[contains(@class, 'info')]/h4/a/text()").get().strip(),
            "address": element.xpath(".//div/div[contains(@class, 'info')]/div/span/text()").get().strip(),
            "timeAdded": element.xpath(".//div/div[contains(@class, 'info')]/div/time/@datetime").get(),
            "jobUrl": element.xpath(".//div/a/@href").get().split("?")[0],
            "companyUrl": element.xpath(".//div/div[contains(@class, 'info')]/h4/a/@href").get().split("?")[0],
        })
    return {"data": data, "total_results": total_results}

async def scrape_company_jobs(url: str, max_pages: int = None) -> List[Dict]:
    """scrape a company's job listings"""
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(url, **BASE_CONFIG))
    data = parse_company_jobs(first_page)["data"]
    total_results = parse_company_jobs(first_page)["total_results"]
    # get the total number of pages to scrape, each page contains 25 results
    if max_pages and max_pages * 25 < total_results:
        total_results = max_pages * 25

    log.info(f"scraped the first job page, {total_results // 25 - 1} more pages")
    # scrape the remaining pages using the company jobs API
    company_slug = url.split("jobs/")[-1]
    jobs_api_url = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/" + company_slug
    to_scrape = [
        ScrapeConfig(jobs_api_url + f"?start={index}", **BASE_CONFIG)
        for index in range(25, total_results + 25, 25)
    ]
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        try:
            page_data = parse_company_jobs(response)["data"]
            data.extend(page_data)
        except Exception as e:
            log.error("An occured while scraping company jobs pagination", e)
            pass

    log.success(f"scraped {len(data)} jobs from Linkedin company job pages")
    return data
```



Run the codepython```python
async def run():
    company_jobs_data = await scrape_company_jobs(
        url="https://www.linkedin.com/jobs/microsoft-jobs-worldwide",
        max_pages=3
    )
    # save the data to a JSON file
    with open("company_jobs.json", "w", encoding="utf-8") as file:
        json.dump(company_jobs_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())
```



The `parse_company_jobs()` function reads the job cards with XPath selectors. The `scrape_company_jobs()` function requests the company jobs page, then pages through the rest with the company jobs API.

Here is an example output of the company jobs scraper:

LinkedIn is one of several job boards worth scraping. For more job data, see our guides on scraping [Indeed](https://scrapfly.io/blog/posts/how-to-scrape-indeedcom) and [Glassdoor](https://scrapfly.io/blog/posts/how-to-scrape-glassdoor). You can also pair this job page code with [crawling logic](https://scrapfly.io/blog/posts/crawling-with-python) to follow links from the search and company pages.

For more on web crawling with Python, see our dedicated tutorial.

[How to Crawl the Web with PythonIntroduction to web crawling with Python. What is web crawling? How it differs from web scraping? And a deep dive into code, building our own crawler and an example project crawling Shopify-powered websites.](https://scrapfly.io/blog/posts/crawling-with-python)

With this last feature, our LinkedIn scrapers are complete. They can scrape LinkedIn profiles, company, and job data. But pushing the scraping rate too high will lead the website to detect and block your requests.

So make sure to rotate high-quality residential proxies.



## FAQ

Is scraping LinkedIn legal?The hiQ Labs v. LinkedIn case ruled that scraping public data is legal under U.S. law. LinkedIn's Terms of Service still prohibit automated access, so seek legal advice for commercial projects.







Can I scrape LinkedIn without an account?You can scrape limited public data like basic profiles and company overviews, but access usually stops after a few page views. Search results and most job features need authentication, so unauthenticated scraping stays limited.







Can you scrape LinkedIn Jobs without an account?Job search results and individual job pages are public, so you can scrape them without logging in. Features like Easy Apply, candidate matching, and the recruiter view need an account and stay out of reach.







What's the difference between scraping LinkedIn Jobs and using the LinkedIn Jobs API?The official Jobs API works through partner agreements and ATS integrations, so it fits approved hiring tools but stays closed to most developers. Scraping the public jobs surface is the practical path for everyone else.







Does LinkedIn detect job scraping differently from profile scraping?LinkedIn builds job pages to rank in Google, so the public jobs surface has a lower detection bar than the login-walled profile surface. You still need anti-bot protection at volume, but jobs are easier to collect than profiles.







Why can't I use Puppeteer or Playwright for LinkedIn?They can render LinkedIn pages but lack the anti-detection setup, so you still handle residential proxies, session persistence, and fingerprint rotation yourself. Scrapfly manages all of that automatically.







How much does Scrapfly cost for LinkedIn scraping?Pricing depends on your request volume and the features you turn on, like JavaScript rendering and residential proxies. Check the [pricing page](https://scrapfly.io/pricing) for current rates.









[**Latest LinkedIn Scraper Code**github.com/scrapfly/scrapfly-scrapers/tree/main/linkedin-scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/linkedin-scraper)

## Summary

LinkedIn protects professional data through layered blocking: authentication walls, behavior tracking, and request fingerprinting.

DIY scraping means managing session persistence, residential proxy rotation, TLS fingerprint randomization, and behavior emulation, a heavy maintenance burden.

Scrapfly handles this for you:

- **Automated anti-bot bypass:** Handles fingerprinting, behavior tracking, and fraud score evasion
- **Maintained scrapers:** Updated within 48 hours when LinkedIn changes structure
- **Production-ready code:** Extract profiles, companies, jobs, and search results without managing infrastructure
- **Residential proxy pools:** Rotate IPs automatically to prevent detection

When LinkedIn updates its defenses, Scrapfly updates the scraper. You focus on data, not anti-bot engineering.

**Ready to start?**

bash```bash
git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/linkedin-scraper
```



Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.



 

   Table of Contents















 

  Table of Contents- [Key Takeaways](#key-takeaways)
- [Why Scrape LinkedIn?](#why-scrape-linkedin)
- [Why LinkedIn Scraping Fails Without Proper Tools](#why-linkedin-scraping-fails-without-proper-tools)
- [1. Authentication Wall](#1-authentication-wall)
- [2. Behavioral Tracking](#2-behavioral-tracking)
- [3. Request Fingerprinting](#3-request-fingerprinting)
- [How LinkedIn Data is Loaded and How to Scrape it?](#how-linkedin-data-is-loaded-and-how-to-scrape-it)
- [Rendering an HTML Template](#rendering-an-html-template)
- [Hydrating the Page Using Script Tags](#hydrating-the-page-using-script-tags)
- [Loading The Data from XHR Calls](#loading-the-data-from-xhr-calls)
- [Scraping LinkedIn with Scrapfly](#scraping-linkedin-with-scrapfly)
- [Web Scraping API](#web-scraping-api)
- [What LinkedIn Data Can You Scrape?](#what-linkedin-data-can-you-scrape)
- [Scraping LinkedIn Code Examples](#scraping-linkedin-code-examples)
- [Scraping LinkedIn Profile Data](#scraping-linkedin-profile-data)
- [Scraping LinkedIn Company Data](#scraping-linkedin-company-data)
- [Scraping LinkedIn Jobs](#scraping-linkedin-jobs)
- [Scraping LinkedIn Job Search Results](#scraping-linkedin-job-search-results)
- [Scraping Individual Job Pages](#scraping-individual-job-pages)
- [Scraping Jobs from a Specific Company](#scraping-jobs-from-a-specific-company)
- [FAQ](#faq)
- [Summary](#summary)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-linkedin) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-linkedin) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-linkedin) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-linkedin) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-linkedin) 



 ## Related Articles

 [  

 api 

### Guide to LinkedIn API and Alternatives

Explore the LinkedIn API, covering data endpoints, usage limitations, and accessibility.

 

 ](https://scrapfly.io/blog/posts/guide-to-linkedin-api-and-alternatives) [     

 python blocking 

### How to Bypass Anti-Bot Protection in 2026: All 8 Major Vendors

Identify and bypass Cloudflare, DataDome, PerimeterX, Kasada, Akamai, Incapsula, F5, and AWS WAF with Python code exampl...

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-anti-bot-protection) [     

 python web-scraping 

### The Best Open-Source Social Media Scrapers for 2026

Ranked guide to the best open-source social media scraping tools in 2026. Eight maintained scrapers for Instagram, X, Li...

 

 ](https://scrapfly.io/blog/posts/best-social-media-scraping-tools) 

  



   



 Scale your web scraping effortlessly, **1,000 free credits** [Start Free](https://scrapfly.io/register)