     [Blog](https://scrapfly.io/blog)   /  [beautifulsoup](https://scrapfly.io/blog/tag/beautifulsoup)   /  [How to Scrape Naver.com: Search, Images, News &amp; Blog Posts (Update 2026)](https://scrapfly.io/blog/posts/how-to-scrape-naver)   # How to Scrape Naver.com: Search, Images, News &amp; Blog Posts (Update 2026)

 by [Ziad Shamndy](https://scrapfly.io/blog/author/ziad) May 29, 2026 24 min read [\#beautifulsoup](https://scrapfly.io/blog/tag/beautifulsoup) [\#python](https://scrapfly.io/blog/tag/python) [\#requests](https://scrapfly.io/blog/tag/requests) [\#scrapeguide](https://scrapfly.io/blog/tag/scrapeguide) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-naver "Share on LinkedIn")    

 

 

         

   **Web Scraping API — Anti-Bot Bypass**Bypass any anti-scraper system and automatically resolve JavaScript and fingerprint challenges.

 

 [ Learn More  ](https://scrapfly.io/products/web-scraping-api#features) [  Docs ](https://scrapfly.io/docs/scrape-api/getting-started#features) 

 

 

South Korea's digital landscape is dominated by Naver.com, the country's leading search engine and web portal that processes over 74% of all search queries in Korea. Unlike Google's minimalist approach, Naver runs a comprehensive ecosystem of search, news, shopping, blogs, and specialized services, making it a goldmine of Korean market data.

The challenge is that Naver employs sophisticated anti-bot measures and serves dynamic content that trips up inexperienced scrapers. Many developers struggle with Korean character encoding, complex URL structures, and getting blocked by Naver's protection systems.

In this tutorial, you'll learn how to scrape Naver web search, image search, news articles, and blog posts using Python and the Scrapfly Web Scraping API. The code in this guide is the same code published in the [Scrapfly Naver scraper repository](https://github.com/scrapfly/scrapfly-scrapers/tree/main/naver-scraper).

## Key Takeaways

This guide scrapes four Naver surfaces, web search, image search, blog posts, and news articles, with the Scrapfly Python SDK. Each surface reuses the same `BASE_CONFIG` and async client.

- Build Naver search URLs with the NSO filter for sort and time period
- Extract structured web, image, blog, and news data with the Scrapfly client
- Run pagination concurrently across search pages with `concurrent_scrape`
- Render JavaScript and route through Korean residential proxies via `BASE_CONFIG`
- Parse JSON payloads embedded in Naver's HTML responses
- Save async scrape results as JSON files ready for downstream pipelines

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







## Understanding Naver's Structure

Naver organizes content across multiple specialized tabs, each with distinct URL patterns and data structures. The areas this scraper covers are.

**Web search**. Naver's core search returns web pages and specialized content blocks. Results are rendered from a JSON payload embedded in the page as `entry.bootstrap(...)`.

**Image search**. The image tab serves results from a JavaScript object named `imageSearchTabData`. Pagination works through a `start` parameter rather than discrete page numbers.

**Blog platform**. Naver Blog is one of Korea's most popular blogging services. Posts live behind an iframe and need a direct content selector for the rendered HTML.

**News**. Naver News aggregates articles from hundreds of Korean publishers, served at `n.news.naver.com` with rich metadata around press, date, and sections.

Each tab uses a `where` parameter and a tab-specific `ssc` parameter on the same `search.naver.com/search.naver` endpoint.

## Prerequisites and Setup

The scraper uses the Scrapfly Python SDK with async support and loguru for logging. Install both with pip.

bash```bash
$ pip install "scrapfly-sdk[all]" loguru
```



Set your Scrapfly API key as an environment variable. You can grab one from the [Scrapfly dashboard](https://scrapfly.io/dashboard).

bash```bash
$ export SCRAPFLY_KEY="your key from https://scrapfly.io/dashboard"
```



Now create `naver.py` and import the Scrapfly client, the async config, and helpers we'll use across the scraper.

python```python
import os
import json
import re
from pathlib import Path
from urllib.parse import urlencode
from loguru import logger as log
from typing import List, Dict, TypedDict, Optional, Literal, Any

from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key=os.environ["SCRAPFLY_KEY"])

BASE_CONFIG = {
    # Naver.com requires Anti Scraping Protection bypass feature.
    # for more: https://scrapfly.io/docs/scrape-api/anti-scraping-protection
    "asp": True,
    "render_js": True,
    "country": "kr", # match the domain's country
    "proxy_pool": "public_residential_pool",
}

output = Path(__file__).parent / "results"
output.mkdir(exist_ok=True)
```



`BASE_CONFIG` is the shared configuration every request reuses. `asp=True` activates Scrapfly's Anti Scraping Protection bypass, `render_js=True` runs a real browser so dynamic content loads, `country="kr"` routes through a Korean IP, and `proxy_pool="public_residential_pool"` uses residential proxies that match real Naver visitors.

## Building Naver Search URLs

Every Naver search tab uses the same endpoint with a different `where` and `ssc` combination. The helper below builds a single URL across all supported tabs and adds NSO (Naver Search Options) for sort and time period.

python```python
SearchType = Literal["web", "image", "blog", "cafe", "kin", "news", "influencer", "short content", "video"]
SortType = Literal["sim", "date", "asc", "dsc"]
PeriodType = Literal["all", "1d", "1w", "1m", "6m", "1y"]


def _build_nso_filter(sort: Optional[SortType] = None, period: Optional[PeriodType] = None) -> str:
    """
    Build NSO (Naver Search Options) filter string.

    NSO format: so:{sort},p:{period}
    - so: sort order (dd=date desc, da=date asc, r=relevance)
    - p: period filter (1d, 1w, 1m, 6m, 1y)
    """
    nso_parts = []
    if sort:
        sort_map = {
            "sim": "r",  # relevance
            "date": "dd",  # date descending (recent first)
            "asc": "da",  # date ascending (oldest first)
            "dsc": "dd",  # date descending (alias)
        }
        nso_parts.append(f"so:{sort_map.get(sort, 'r')}")
    if period:
        nso_parts.append(f"p:{period}")

    return ",".join(nso_parts) if nso_parts else ""


def _build_search_url(
    query: str,
    search_type: SearchType = "web",
    page: int = 1,
    display: int = 10,
    sort: Optional[SortType] = None,
    period: Optional[PeriodType] = None,
    date_from: Optional[str] = None,
    date_to: Optional[str] = None,
) -> str:
    """Build comprehensive Naver search URL with all parameters."""
    base_url = "https://search.naver.com/search.naver"

    # calculate pagination start parameter: (page - 1) * display + 1
    start = (page - 1) * display + 1

    params = {
        "query": query,
        "start": start,
    }

    type_configs = {
        "web": {"where": "web"},
        "image": {"where": "image", "ssc": "tab.image.all"},
        "blog": {"where": "blog", "ssc": "tab.blog.all"},
        "cafe": {"where": "cafe", "ssc": "tab.cafe.all"},
        "kin": {"where": "kin", "ssc": "tab.kin.kqna"},
        "news": {"where": "news", "ssc": "tab.news.all"},
        "influencer": {"where": "influencer", "ssc": "tab.influencer.all"},
        "short content": {"ssc": "tab.shortents.all"},
        "video": {"where": "video", "ssc": "tab.video.all"},
    }

    params.update(type_configs.get(search_type, {"where": search_type}))

    nso = _build_nso_filter(sort, period)
    if nso:
        params["nso"] = nso

    query_string = urlencode(params)
    return f"{base_url}?{query_string}"
```



`(page - 1) * display + 1` is Naver's pagination formula. Page 1 starts at `start=1`, page 2 at `start=11`, and so on. `urlencode` takes care of percent-encoding Korean characters in the query.

## Extracting the JSON Bootstrap From HTML

Naver hydrates search results from a JavaScript object embedded in the HTML. Two small helpers find that object and convert it into valid JSON.

python```python
def _extract_json_from_html(content: str, start_pos: int) -> Optional[str]:
    """Extract a balanced JSON object from HTML at start_pos (opening brace); returns JSON or None."""
    brace_count = 0
    in_string = False
    escape = False

    for i in range(start_pos, len(content)):
        char = content[i]

        if escape:
            escape = False
            continue
        if char == "\\":
            escape = True
            continue
        if char == '"':
            in_string = not in_string
            continue

        if not in_string:
            if char == "{":
                brace_count += 1
            elif char == "}":
                brace_count -= 1
                if brace_count == 0:
                    return content[start_pos : i + 1]

    return None


def _js_to_json(js_str: str) -> str:
    """Convert JavaScript object notation to valid JSON."""

    def replacer(m: re.Match) -> str:
        s = m.group(0)
        if s[0] == '"':
            return s
        if s[0] == "'":
            inner = s[1:-1].replace("\\'", "'").replace('"', '\\"')
            return f'"{inner}"'
        return f'{m.group(1)}"{m.group(2)}":'

    js_str = re.sub(
        r'"(?:[^"\\]|\\.)*"|\'(?:[^\'\\]|\\.)*\'|([{,\s])([a-zA-Z_$][a-zA-Z0-9_$]*)\s*:',
        replacer,
        js_str,
    )
    return re.sub(r",(\s*[}\]])", r"\1", js_str)
```



`_extract_json_from_html` walks the HTML character by character counting balanced braces. `_js_to_json` converts loose JavaScript object notation, where keys are unquoted and strings use single quotes, into strict JSON the standard library can parse.

## Scraping Naver Web Search

The web search parser pulls each result from the bootstrap payload and reads pagination from the page footer.

python```python
class SearchWebResult(TypedDict):
    title: str
    url: str
    snippet: str
    source: str
    rank: int


def parse_web_search(result: ScrapeApiResponse) -> dict[str, Any]:
    """Parse web search results from JSON embedded in HTML."""
    content = result.content
    selector = result.selector
    results: List[SearchWebResult] = []

    pattern = r"entry\.bootstrap\([^,]+,\s*\{"
    match = re.search(pattern, content)

    if match:
        json_start = content.index("{", match.end() - 1)
        json_str = _extract_json_from_html(content, json_start)

        try:
            if json_str:
                data = json.loads(json_str)
            else:
                return {"results": [], "max_pages": None, "num_of_displayed_results": 0}
            items = data.get("body", {}).get("props", {}).get("children", [])

            if items and "props" in items[0]:
                for idx, item in enumerate(items[0]["props"].get("children", []), 1):
                    if item.get("templateId") != "webItem":
                        continue

                    props = item.get("props", {})
                    title = props.get("title", "").replace("<mark>", "").replace("</mark>", "")
                    url = props.get("href", "")

                    if not title or not url:
                        continue

                    source = None
                    subtexts = props.get("profile", {}).get("subTexts", [])
                    if subtexts:
                        source = subtexts[0].get("text", "") if isinstance(subtexts[0], dict) else str(subtexts[0])

                    click_log = props.get("clickLog", {})
                    rank = click_log.get("title", {}).get("r") or click_log.get("profile", {}).get("r") or idx

                    results.append(
                        {
                            "title": title,
                            "url": url,
                            "snippet": props.get("bodyText", "").replace("<mark>", "").replace("</mark>", ""),
                            "source": source,
                            "rank": rank,
                        }
                    )
        except (json.JSONDecodeError, KeyError, TypeError) as e:
            log.debug(f"JSON parsing error: {e}")

    max_pages = 1
    page_numbers = []
    for link in selector.css("div.sc_page_inner a.btn"):
        href = link.css("::attr(href)").get("")
        text = link.css("::text").get("")

        page_match = re.search(r"[?&]page=(\d+)", href)
        if page_match:
            page_numbers.append(int(page_match.group(1)))
        elif text and text.strip().isdigit():
            page_numbers.append(int(text.strip()))

    if page_numbers:
        max_pages = max(page_numbers)
    num_of_displayed_results = len(results)

    return {
        "results": results,
        "max_pages": max_pages,
        "num_of_displayed_results": num_of_displayed_results,
    }
```



The parser strips `<mark>` highlighting from titles and snippets so the output stays clean. `max_pages` comes from the numeric pagination links Naver renders at the bottom of the result page.

The async function below scrapes the first page, parses pagination, then fans out concurrent requests for the remaining pages.

python```python
async def scrape_web_search(
    query: str,
    max_pages: int = 3,
    sort: Optional[SortType] = None,
    period: Optional[PeriodType] = None,
    scrape_all_pages: bool = False,
) -> {}:
    """Scrape Naver web search with pagination."""
    log.info(f"Scraping web search for query: {query}")

    results = []
    first_url = _build_search_url(query, "web", sort=sort, period=period)
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(first_url, **BASE_CONFIG))
    first_page_result = parse_web_search(first_page)
    total_pages = first_page_result["max_pages"]
    results = first_page_result["results"]
    displayed_results = first_page_result["num_of_displayed_results"]

    if scrape_all_pages:
        pages_to_scrape = total_pages
    else:
        pages_to_scrape = min(total_pages, max_pages)

    log.info(f"scraping {pages_to_scrape - 1} additional pages (total: {pages_to_scrape})")

    scraped_pages = 1
    if pages_to_scrape > 1:
        other_pages = [
            ScrapeConfig(
                _build_search_url(query, page=page, sort=sort, display=displayed_results, period=period),
                **BASE_CONFIG,
            )
            for page in range(2, pages_to_scrape + 1)
        ]
        async for result in SCRAPFLY.concurrent_scrape(other_pages):
            results.extend(parse_web_search(result)["results"])
            scraped_pages += 1

    log.success(f"scraped {len(results)} from Naver search")
    return {"results": results, "max_pages": total_pages}
```



`SCRAPFLY.concurrent_scrape` dispatches every remaining page in parallel under your account's concurrency limit, so scraping 10 pages takes about the same wall clock time as scraping one.

 Example web\_search.jsonjson```json
{
  "results": [
    {
      "title": "Python Tutorial",
      "url": "https://www.geeksforgeeks.org/python/python-programming-language-tutorial/",
      "snippet": "Python is one of the most popular programming languages. It's simple to use, packed with features and supported by a wide range of libraries and frameworks...",
      "source": "www.geeksforgeeks.org",
      "rank": 1
    },
    {
      "title": "점프 투 파이썬 - 위키독스",
      "url": "https://wikidocs.net/book/1",
      "snippet": "이 책은 파이썬이란 언어를 처음 접해보는 독자들과 프로그래밍을 한 번도 해 본적이 없는 사람들을 대상으로 한다...",
      "source": "wikidocs.net",
      "rank": 2
    },
    {
      "title": "Python | endoflife.date",
      "url": "https://endoflife.date/python",
      "snippet": "Python is an interpreted, high-level, general-purpose programming language.",
      "source": "endoflife.date",
      "rank": 3
    }
  ],
  "max_pages": 10
}
  
```



## Scraping Naver Image Search

The image tab does not paginate with discrete page numbers. Naver loads more results when the user scrolls. We work around it by reusing the same `start` parameter with `display` set to how many results the first page returned.

python```python
class SearchImageResult(TypedDict):
    title: str
    link: str
    source: str
    image_url: str
    thumbnail_url: str
    img_id: str
    color: str
    date: str
    writer: str
    domain: str
    rank: int


def parse_image_search(result: ScrapeApiResponse) -> Dict[str, Any]:
    """Parse image search results from JSON embedded in HTML."""
    content = result.content
    selector = result.selector
    results: List[SearchImageResult] = []
    data = None

    not_found = selector.css("div.not_found02")
    if not_found:
        log.info("No search results found (not_found02 element detected)")
        return {
            "results": [],
            "num_of_displayed_results": -1,
        }

    pattern = r"var\s+imageSearchTabData\s*=\s*\{"
    match = re.search(pattern, content)

    if match:
        json_start = content.index("{", match.end() - 1)
        js_str = _extract_json_from_html(content, json_start)

        try:
            if js_str:
                json_str = _js_to_json(js_str)
                data = json.loads(json_str)
                items = data.get("content", {}).get("items", [])

                for idx, item in enumerate(items, 1):
                    if item.get("type") != "image":
                        continue

                    viewer_thumb = item.get("viewerThumb", "")
                    thumbnail = item.get("thumbnail", viewer_thumb)

                    title = (
                        item.get("title", "")
                        .replace("<mark>", "")
                        .replace("</mark>", "")
                    )

                    results.append(
                        {
                            "title": title,
                            "link": item.get("link", ""),
                            "source": item.get("source", ""),
                            "image_url": viewer_thumb,
                            "thumbnail_url": thumbnail,
                            "img_id": item.get("imgId", ""),
                            "color": item.get("color", ""),
                            "date": item.get("dateInfo", ""),
                            "writer": item.get("writerTitle", ""),
                            "domain": item.get("tld", ""),
                            "rank": idx,
                        }
                    )
        except (json.JSONDecodeError, KeyError, TypeError) as e:
            log.error(f"JSON parsing error: {e}")

    num_of_displayed_results = len(results)

    return {
        "results": results,
        "num_of_displayed_results": num_of_displayed_results,
    }


async def scrape_image_search(
    query: str,
    max_pages: int = 3,
    sort: Optional[SortType] = None,
    period: Optional[PeriodType] = None,
    scrape_all_pages: bool = False,
) -> dict[str, Any]:
    """
    Scrape Naver image search with pagination.
    Naver image search uses scroll to load more results but we can still page through
    using the start parameter.
    """
    log.info(f"Scraping image search for query: {query}")

    results: List[SearchImageResult] = []
    first_url = _build_search_url(query, search_type="image", sort=sort, period=period)
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(first_url, **BASE_CONFIG))

    first_page_result = parse_image_search(first_page)
    results = first_page_result["results"]
    displayed_results = first_page_result["num_of_displayed_results"]

    scraped_pages = 1

    if scrape_all_pages:
        page = 2

        while True:
            page_url = _build_search_url(
                query,
                search_type="image",
                page=page,
                sort=sort,
                display=displayed_results,
                period=period,
            )
            page_result = await SCRAPFLY.async_scrape(
                ScrapeConfig(page_url, **BASE_CONFIG)
            )
            page_data = parse_image_search(page_result)
            current_displayed_results = page_data["num_of_displayed_results"]

            scraped_pages += 1
            page += 1
            if current_displayed_results == -1 or scraped_pages == 20: # safely check max 20 page 
                break
            results.extend(page_data["results"])
    else:
        # scrape up to max_pages
        if max_pages > 1:
            other_pages = [
                ScrapeConfig(
                    _build_search_url(query, search_type="image", page=page, sort=sort, display=displayed_results, period=period),
                    **BASE_CONFIG,
                )
                for page in range(2, max_pages + 1)
            ]
            async for result in SCRAPFLY.concurrent_scrape(other_pages):
                scraped_pages += 1
                page_data = parse_image_search(result)
                current_displayed_results = page_data["num_of_displayed_results"]
                if (
                    current_displayed_results == -1 or scraped_pages == 20
                ):  # safely check max 20 page
                    break
                results.extend(page_data["results"])

    log.success(f"Scraped {len(results)} image results from Naver")
    return {"results": results}
```



The hard cap of 20 pages protects against infinite loops when Naver keeps returning a few stale items. The `not_found02` CSS selector is Naver's empty-state element, used as the explicit signal that no more results are available.

 Example image\_search.jsonjson```json
{
  "results": [
    {
      "title": "대전파이썬학원 기초부터 단계별 커리큘럼 프로그래밍 핵심기술!",
      "link": "https://blog.naver.com/dongledongle111/224128920806",
      "source": "네이버 블로그",
      "image_url": "https://search.pstatic.net/common/?src=http%3A%2F%2Fblogfiles.naver.net%2F...",
      "thumbnail_url": "https://search.pstatic.net/common/?src=http%3A%2F%2Fblogfiles.naver.net%2F...",
      "img_id": "image_sas:blog_e00592fd75d08bb35037048816447a01",
      "color": "#E1B785",
      "date": "2025.12.31.",
      "writer": "바닐라떼",
      "domain": "blog.naver.com",
      "rank": 1
    },
    {
      "title": "데이터분석에 파이썬이 왜 필요해?",
      "link": "https://blog.naver.com/minchopizza/224112700826",
      "source": "네이버 블로그",
      "image_url": "https://search.pstatic.net/common/?src=...",
      "thumbnail_url": "https://search.pstatic.net/common/?src=...",
      "img_id": "image_sas:blog_03f1596ebde0670bf63830fb266a5786",
      "color": "#BDB8CE",
      "date": "2025.12.17.",
      "writer": "온 김에 구경하고 갑시다",
      "domain": "blog.naver.com",
      "rank": 3
    }
  ]
}
  
```



## Scraping Naver Blog Posts

Naver Blog posts live behind an iframe. Scrapfly's `render_js` follows the iframe automatically, so the parser only needs CSS selectors for the rendered post body.

python```python
class BlogPost(TypedDict):
    url: str
    title: str
    content: str
    author: Optional[str]
    date: Optional[str]
    images: Optional[str]
    category: Optional[str]


def _parse_blog(result: ScrapeApiResponse, original_url: str) -> BlogPost:
    """Parse title and content from a blog post's iframe page."""
    sel = result.selector
    title = " ".join(sel.css("div.se-title-text ::text, .pcol1 ::text").getall()).strip()
    if not title:
        title = sel.css("meta[property='og:title']::attr(content)").get("").strip()
    content = " ".join(
        t.strip()
        for t in sel.css("div.se-main-container ::text, div#postViewArea ::text").getall()
        if t.strip()
    )
    author = sel.css("span.nick ::text, a.blog_author ::text").get()
    date = sel.css("span.se_publishDate ::text, em.se_publishDate ::text").get()
    images = sel.css(
        "div.se-title-cover img::attr(src), div.se-main-container .se-module-image img::attr(src)"
    ).getall()
    category = sel.css("div.blog2_series a::text").get()

    return {
        "url": original_url,
        "title": title,
        "content": content,
        "author": author,
        "date": date,
        "images": images,
        "category": category,
    }


async def scrape_blog_post(urls: List[str]) -> List[BlogPost]:
    """Scrape Naver blog posts concurrently."""
    log.info(f"scraping {len(urls)} blog posts")
    configs = [ScrapeConfig(url, **BASE_CONFIG) for url in urls]
    results: List[BlogPost] = []
    async for result in SCRAPFLY.concurrent_scrape(configs):
        url = str(result.config["url"])
        results.append(_parse_blog(result, url))

    log.success(f"scraped {len(results)} blog posts from Naver")
    return results
```



The CSS selectors combine the modern SmartEditor markup (`div.se-main-container`) with the legacy editor (`div#postViewArea`) so the same parser works across post ages. `og:title` is the fallback when the visible title element is missing.

 Example blog\_post.jsonjson```json
[
  {
    "url": "https://blog.naver.com/oro-mam/224289142276",
    "title": "내 기억에서 잊고 싶은 공포 영화들",
    "content": "From, 블로그씨 블로그씨는 공포 영화를 잘 못 봐요. 여러분들은 공포, 스릴러 장르의 영화를 좋아하시나요? 기억에 남는 영화가 있다면 소개해 주세요~ ... 정말 공포물은 너무 싫네요~",
    "author": "주부오리",
    "date": "2026. 5. 18. 15:12",
    "images": [
      "https://postfiles.pstatic.net/MjAyNjA1MThfMTg1/.../%EC%98%81%ED%99%94_%ED%81%B4%EB%A1%9C%EB%B2%84%ED%95%84%EB%93%9C.jpg?type=w80_blur"
    ],
    "category": "몹쓸신잡"
  }
]
  
```



## Scraping Naver News Articles

News articles live on `n.news.naver.com` and use stable CSS class names for the headline, byline, and body. The parser also captures the original publisher's URL when Naver links to it.

python```python
class NewsArticle(TypedDict):
    url: str
    title: str
    description: Optional[str]
    content: str
    press: Optional[str]
    date: Optional[str]
    modified_date: Optional[str]
    images: List[str]
    sections: List[str]
    origin_url: Optional[str]


def _parse_news_article(result: ScrapeApiResponse) -> NewsArticle:
    """Parse title and content from a Naver news article page."""
    sel           = result.selector
    url           = str(result.config["url"])
    title         = sel.css("h2.media_end_head_headline span ::text, h2.media_end_head_headline ::text").get("").strip()
    description   = sel.css("meta[property='og:description']::attr(content)").get()
    content       = " ".join(sel.css("article#dic_area ::text").getall()).strip()
    press         = sel.css("span.media_end_head_top_press ::text, a.media_end_head_top_logo img::attr(alt)").get("").strip() or None
    date          = sel.css("._ARTICLE_DATE_TIME::attr(data-date-time)").get()
    modified_date = sel.css("._ARTICLE_MODIFY_DATE_TIME::attr(data-modify-date-time)").get()
    images        = [
        img.css("::attr(data-src)").get() or img.css("::attr(src)").get()
        for img in sel.css("article#dic_area img")
    ]
    images        = [src for src in images if src and not src.startswith("data:")]
    sections      = sel.css("em.media_end_categorize_item ::text").getall()
    origin_url    = sel.css("a.media_end_head_origin_link::attr(href)").get()
    return {
        "url": url,
        "title": title,
        "description": description,
        "content": content,
        "press": press,
        "date": date,
        "modified_date": modified_date,
        "images": images,
        "sections": sections,
        "origin_url": origin_url,
    }


async def scrape_news_article(urls: List[str]) -> List[NewsArticle]:
    """Scrape Naver news articles concurrently."""
    log.info(f"Scraping {len(urls)} news articles")
    configs = [ScrapeConfig(url, **BASE_CONFIG) for url in urls]
    results: List[NewsArticle] = []
    async for result in SCRAPFLY.concurrent_scrape(configs):
        results.append(_parse_news_article(result))

    log.success(f"scraped {len(results)} news articles from Naver")
    return results
```



The `data-src` lookup picks up lazy-loaded images, and the `data:` URI filter drops 1x1 placeholder pixels that appear in the rendered markup.

 Example news\_article.jsonjson```json
[
  {
    "url": "https://n.news.naver.com/article/001/0015234569",
    "title": "Acting President Choi reiterates efforts to make S. Korea top 5 global bio leader",
    "description": "acting president-bio industry Acting President Choi reiterates efforts to make S. Korea top 5 global bio leader By Kim H",
    "content": "Acting President Choi Sang-mok on Wednesday reaffirmed the government's commitment to turning South Korea into one of the world's top five leaders in the advanced bio sector...",
    "press": "연합뉴스",
    "date": "2025-02-26 11:40:06",
    "modified_date": "2025-02-26 11:41:12",
    "images": [
      "https://imgnews.pstatic.net/image/001/2025/02/26/AEN20250226001600320_01_i_20250226114112774.jpg?type=w860"
    ],
    "sections": ["세계"],
    "origin_url": "https://en.yna.co.kr/view/AEN20250226001600320?input=2106m"
  },
  {
    "url": "https://n.news.naver.com/article/001/0015234568",
    "title": "최 권한대행, 첨단의료기기 체험",
    "description": "최상목 대통령 권한대행 부총리 겸 기획재정부 장관이 26일 충북 청주시 오송 첨단의료복합단지에서 시력장애 보조형 VR을 체험하고 있다. 2025.2.26",
    "content": "(청주=연합뉴스) 한상균 기자 = 최상목 대통령 권한대행 부총리 겸 기획재정부 장관이 26일 충북 청주시 오송 첨단의료복합단지에서 시력장애 보조형 VR을 체험하고 있다. 2025.2.26",
    "press": "연합뉴스",
    "date": "2025-02-26 11:40:03",
    "modified_date": "2025-02-26 11:41:11",
    "images": [
      "https://imgnews.pstatic.net/image/001/2025/02/26/PYH2025022607040001300_P4_20250226114111065.jpg?type=w860"
    ],
    "sections": ["정치", "경제", "사회"],
    "origin_url": "https://www.yna.co.kr/view/PYH20250226070400013?input=1196m"
  }
]
  
```



## Running the Full Scraper

With `naver.py` complete, the entry point lives in a separate `run.py` so the scraper module stays import-friendly. The runner kicks off all four scrape jobs and writes each result to `./results/`.

python```python
import asyncio
import json
from pathlib import Path
import naver

output = Path(__file__).parent / "results"
output.mkdir(exist_ok=True)


async def run():
    # Enable scrapfly cache for basic use
    naver.BASE_CONFIG["cache"] = False

    print("running Naver scrape and saving results to ./results directory")

    # Scrape web search results
    search_data = await naver.scrape_web_search(query="파이썬", max_pages=3, period="6m")
    with open(output.joinpath("web_search.json"), "w", encoding="utf-8") as file:
        json.dump(search_data, file, indent=2, ensure_ascii=False)

    # Scrape image search results
    image_data = await naver.scrape_image_search(query="파이썬", max_pages=3, period="6m")
    with open(output.joinpath("image_search.json"), "w", encoding="utf-8") as file:
        json.dump(image_data, file, indent=2, ensure_ascii=False)

    # Scrape blog posts
    blog_posts = await naver.scrape_blog_post([
        "https://blog.naver.com/cherry_27_/224290687381",
        "https://blog.naver.com/jylove_0120/224289170856",
        "https://blog.naver.com/oro-mam/224289142276"
    ])
    with open(output.joinpath("blog_post.json"), "w", encoding="utf-8") as file:
        json.dump(blog_posts, file, indent=2, ensure_ascii=False)

    # Scrape news articles
    news_articles = await naver.scrape_news_article([
        "https://n.news.naver.com/article/001/0015234567",
        "https://n.news.naver.com/article/001/0015234568",
        "https://n.news.naver.com/article/001/0015234569",
    ])
    with open(output.joinpath("news_article.json"), "w", encoding="utf-8") as file:
        json.dump(news_articles, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())
```



Run it from the same directory.

bash```bash
$ python run.py
```



Every scraped artifact lands as a UTF-8 JSON file with `ensure_ascii=False`, which keeps Korean characters readable in the saved output.



Scrapfly

#### Extract structured data automatically?

Scrapfly's Extraction API uses AI to turn any webpage into structured data — no selectors needed.

[Try Free →](https://scrapfly.io/register)## How Do You Avoid Getting Blocked by Naver?

Naver blocks scrapers by watching IP origin (especially non-Korean geos), request headers (especially missing `Accept-Language: ko-KR`), request rate, and behavior signals. Use a Korean IP, set realistic Korean browser headers, throttle requests, and fall back to [anti-bot bypass techniques](https://scrapfly.io/blog/posts/how-to-bypass-anti-bot-protection-when-web-scraping) or Scrapfly when Naver escalates.

### Why Does Naver Block Requests from Outside Korea?

Naver strongly prefers Korean IPs. Non-Korean IPs may get a simplified version with fewer features, trigger CAPTCHAs more aggressively, get rate-limited more strictly, or receive 403/429 on several surfaces.

Quick diagnostic: if your scraper works locally but fails on a cloud server, the cause is almost certainly geo-blocking, not anti-bot detection. Test the same code through a Korean IP before assuming you need a full anti-bot solution. For any production Naver scraping, use a Korean residential or datacenter proxy. See our [introduction to proxies in web scraping](https://scrapfly.io/blog/posts/introduction-to-proxies-in-web-scraping) for the residential/datacenter/mobile tradeoffs.

### What Headers and Session Settings Does Naver Expect?

The key session headers for HTML scraping:

python```python
import requests

session = requests.Session()
session.headers.update({
    "Accept-Language": "ko-KR,ko;q=0.9,en;q=0.8",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
})
```



A persistent `requests.Session()` keeps cookies across calls, which Naver expects. Cold sessions get flagged more aggressively. Filter out Naver's ad and utility domains when cleaning scraped URL lists: `ader.naver.com`, `adcr.naver.com`, `help.naver.com`, `keep.naver.com`, `nid.naver.com`, `pay.naver.com`, and `m.pay.naver.com`.

For a deeper walkthrough of which headers matter and why, see our [Python requests headers guide](https://scrapfly.io/blog/posts/python-requests-headers-guide) and [how headers are used to block scrapers](https://scrapfly.io/blog/posts/how-to-avoid-web-scraping-blocking-headers).

### How Do You Handle Naver's CAPTCHAs and Rate Limits?

CAPTCHAs appear when you scrape too fast, use a non-Korean IP, skip `Accept-Language: ko-KR`, or spam the same query repeatedly. Mitigation: randomize delays between 1 and 3 seconds, rotate User-Agents across a realistic Chrome/Firefox pool, rotate IPs across a Korean residential pool, and respect `Retry-After` headers on 429 responses.

Building Korean proxy setup yourself is non-trivial. You either invest engineering time into a Korean residential pool with CAPTCHA handling, or use a managed service like Scrapfly that handles it all in one API call

## How Do You Scrape Naver at Scale with Scrapfly?

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale. For Naver specifically, Scrapfly's [Web Scraping API](https://scrapfly.io/products/web-scraping-api) is the fastest path to a production scraper because it handles every Naver-specific obstacle (Korean IP routing, JavaScript rendering of the `entry.bootstrap()` payload, anti-bot bypass, and CAPTCHAs) in a single API call.



Key features for Naver scraping:

- [Anti-Scraping Protection (ASP)](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) bypasses Naver's fingerprinting, CAPTCHAs, and behavior detection with `asp=True`
- [Residential proxies](https://scrapfly.io/docs/scrape-api/proxy) in South Korea through `country="KR"`, no proxy pool to manage
- [JavaScript rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering) through headless browsers with `render_js=True`, required for modern Naver SERPs
- [Sticky sessions](https://scrapfly.io/docs/scrape-api/session) via `session="naver-session"` so Naver sees returning-visitor cookies across requests
- [Python SDK](https://scrapfly.io/docs/sdk/python) for direct integration

Here's the reference implementation that scrapes Naver organic search results through Scrapfly:

python```python
import asyncio
import naver  # the canonical scraper module from the repo above

# All four surfaces share the same BASE_CONFIG (asp, render_js, country="kr", residential pool)
async def main():
    web_results   = await naver.scrape_web_search(query="파이썬", max_pages=2)
    image_results = await naver.scrape_image_search(query="서울 야경", max_pages=2)
    blog_posts    = await naver.scrape_blog_post(urls=["https://blog.naver.com/example/223456789012"])
    news_articles = await naver.scrape_news_article(urls=["https://n.news.naver.com/article/001/0015234567"])

    print(f"Web: {len(web_results['results'])}, Images: {len(image_results['results'])}, "
          f"Blogs: {len(blog_posts)}, News: {len(news_articles)}")

asyncio.run(main())
```



Scrapfly is the fastest path when you need full content at scale: web and image SERPs, blog post bodies, and news article text, all behind Korean IPs and JavaScript rendering without managing the infrastructure yourself.

### Web Scraping API

Scrape any website with our powerful API. Anti-bot bypass, JavaScript rendering, and rotating proxies built-in.



[Try Web Scraping API](https://scrapfly.io/docs/scrape-api/getting-started)





## FAQ

Why am I getting blocked when scraping Naver? Naver runs IP-based rate limiting, browser fingerprinting, and behavioral analysis. The two most common triggers are non-Korean IPs and missing JavaScript rendering. The `BASE_CONFIG` in this guide enables `asp`, `render_js`, `country="kr"`, and a Korean residential proxy pool, which covers all three.







How does the scraper handle Korean text?Scrapfly returns UTF-8 content by default and the parsers use Selector-based text extraction, so Korean characters survive end to end. The `json.dump` calls in `run.py` use `ensure_ascii=False` to keep Hangul readable in the saved files.







Can I scrape more than the default page count? Yes. Pass `max_pages` to `scrape_web_search` or `scrape_image_search` for a higher hard cap, or set `scrape_all_pages=True` to follow every page Naver exposes. The image scraper caps at 20 pages to avoid runaway loops on duplicate result sets.







Do I need a Korean proxy to scrape Naver?For production scraping, use Scrapfly with `country="kr"`, which handles Korean IP routing without you managing a proxy pool.









## Summary

Scraping Naver comes down to one decision per surface: pull the JSON Naver embeds in its HTML, or read the rendered DOM. Web and image search hide their results in a JavaScript object (`entry.bootstrap()` for web, `imageSearchTabData` for images) that you extract and parse as JSON. Blog posts and news articles keep their content in the rendered DOM behind stable selectors (`div.se-main-container` for blog bodies, `article#dic_area` for news).

Two things make or break a Naver scraper: a Korean IP and JavaScript rendering. Naver serves degraded content or CAPTCHAs to non-Korean IPs, and modern SERPs only populate after the browser runs. The shared `BASE_CONFIG` in this guide handles both with `country="kr"` and `render_js=True`.

The full scraper, including every function shown here, is published in the [Scrapfly Naver scraper repository](https://github.com/scrapfly/scrapfly-scrapers/tree/main/naver-scraper). For production-scale Naver scraping without building a Korean proxy pool and CAPTCHA pipeline yourself, [Scrapfly](https://scrapfly.io) handles all four surfaces in one API call.



Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.

 

   Table of Contents















 

  Table of Contents- [Key Takeaways](#key-takeaways)
- [Understanding Naver's Structure](#understanding-naver-s-structure)
- [Prerequisites and Setup](#prerequisites-and-setup)
- [Building Naver Search URLs](#building-naver-search-urls)
- [Extracting the JSON Bootstrap From HTML](#extracting-the-json-bootstrap-from-html)
- [Scraping Naver Web Search](#scraping-naver-web-search)
- [Scraping Naver Image Search](#scraping-naver-image-search)
- [Scraping Naver Blog Posts](#scraping-naver-blog-posts)
- [Scraping Naver News Articles](#scraping-naver-news-articles)
- [Running the Full Scraper](#running-the-full-scraper)
- [How Do You Avoid Getting Blocked by Naver?](#how-do-you-avoid-getting-blocked-by-naver)
- [Why Does Naver Block Requests from Outside Korea?](#why-does-naver-block-requests-from-outside-korea)
- [What Headers and Session Settings Does Naver Expect?](#what-headers-and-session-settings-does-naver-expect)
- [How Do You Handle Naver's CAPTCHAs and Rate Limits?](#how-do-you-handle-naver-s-captchas-and-rate-limits)
- [How Do You Scrape Naver at Scale with Scrapfly?](#how-do-you-scrape-naver-at-scale-with-scrapfly)
- [Web Scraping API](#web-scraping-api)
- [FAQ](#faq)
- [Summary](#summary)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-naver) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-naver) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-naver) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-naver) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-naver) 



 ## Related Articles

 [  

 python crawling 

### Intro to Web Scraping Images with Python

In this guide, we’ll explore how to scrape images from websites using different methods. We'll also cover the most commo...

 

 ](https://scrapfly.io/blog/posts/how-to-web-scrape-images-from-websites-python) [     

 api 

### Guide to Yahoo Finance API

Yahoo Financehttps://finance.yahoo.com/ is a comprehensive platform for accessing stock market data, financial news, com...

 

 ](https://scrapfly.io/blog/posts/guide-to-yahoo-finance-api) [     

 api 

### Guide to Google News API and Alternatives

In a world of endless information, accessing news data efficiently can be vital for many businesses. Google News has bee...

 

 ](https://scrapfly.io/blog/posts/guide-to-google-news-api-and-alternatives) 

  



   



 Extract structured data with AI, **1,000 free credits** [Start Free](https://scrapfly.io/register)