     [Blog](https://scrapfly.io/blog)   /  [api](https://scrapfly.io/blog/tag/api)   /  [How to Scrape Air France Flights with Python in 2026](https://scrapfly.io/blog/posts/how-to-scrape-air-france-flights)   # How to Scrape Air France Flights with Python in 2026

 by [Mayada Shaaban](https://scrapfly.io/blog/author/mayada-shaaban-90143e67) Jun 23, 2026 25 min read [\#api](https://scrapfly.io/blog/tag/api) [\#blocking](https://scrapfly.io/blog/tag/blocking) [\#headless-browser](https://scrapfly.io/blog/tag/headless-browser) [\#python](https://scrapfly.io/blog/tag/python) [\#scrapeguide](https://scrapfly.io/blog/tag/scrapeguide) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-air-france-flights "Share on LinkedIn")    

 

 

         

Air France flight prices don't show up in plain HTML. The search widget runs in the browser and calls private endpoints on submit. A cookie modal, a form, and bot checks sit in front of the widget. A `requests.get` returns the home page, not the offers.

This guide runs one Air France round-trip search through the [Scrapfly Cloud Browser API](https://scrapfly.io/docs/cloud-browser-api/getting-started). Instead of scraping the rendered cards, it captures the GraphQL response the page makes on submit. That response becomes clean offer records you can store or compare.

The full, runnable version of this scraper lives in the [Scrapfly Air France scraper repo](https://github.com/scrapfly/scrapfly-scrapers/tree/main/airfrance-scraper). For background on the bot checks airline sites use and how to handle them, see [How to Bypass Anti-Bot Protection When Web Scraping](https://scrapfly.io/blog/posts/how-to-bypass-anti-bot-protection-when-web-scraping).



## Key Takeaways

- **One script, one record per offer.** A Cloud Browser session walks the widget end to end.
- **No plain HTTP.** `requests.get` returns the homepage; only the form reaches the offers.
- **Capture, don't parse.** Read the `SearchResultAvailableOffersQuery` JSON, not the DOM.
- **Currency follows the site.** Match `country=` to the regional domain (`FR` for `.fr`).
- **Wait on a real end state.** Watch the itinerary list, not `networkidle`. Retry if stuck.
- **One symptom, one setting.** Map each block to a Cloud Browser param; none is a guarantee.

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







[**Airfrance Scraper**github.com/scrapfly/scrapfly-scrapers/tree/main/airfrance-scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/airfrance-scraper)

## Why Scrape Air France Flight Data?

You scrape Air France direct when you need first-party data. Prices, fare brands, cabin details, aircraft type, seats left in the bucket, CO2 per passenger, schedules, and promo fares. Aggregators clean that data up and add a margin of error.

Direct scraping gives you the price a customer would pay from the country you're testing, plus the operational fields most feeds drop.

Air France is a harder browser-session target than a static price feed. The data lives behind a search form, not a URL, and the page won't render offers without a real browser. More work to reach, but the resulting data is exactly what the customer sees.

Common reasons to pull this data start with price and schedule monitoring:

- Track fares on specific routes, including hub flights through Paris CDG and Paris Orly (ORY)
- Watch promo fares and price drops over time on a fixed route
- Watch route, aircraft, and schedule changes

The same data also feeds downstream products and analysis:

- Add airline-direct prices to a travel app alongside aggregator data
- Compare Air France prices to Lufthansa, KLM, or British Airways for the same dates
- Pull market and emissions data for travel companies that need first-party prices

Direct vs API is a tradeoff. APIs work when the feed has the fields you need at the right price and refresh rate. Scrape direct when you need a field no API exposes: a fare brand, a promo banner, a country-specific price, or the CO2 figure.

One quick note on scraping responsibly. Pull only public search results, throttle your requests, and respect Air France's Terms of Use.

Skip account-only flows like Flying Blue, passenger details, or checkout. This guide stays on the public search page.

With the use case and scope clear, set up the project next.



## Project Setup

You need four things to run the code in this guide:

- Python 3.10+ as the runtime
- A Scrapfly account and API key with Cloud Browser API access. Grab a free key from the [Scrapfly dashboard](https://scrapfly.io/dashboard)
- The [Scrapfly Python SDK](https://scrapfly.io/docs/sdk/python) (`scrapfly-sdk`) to build the Cloud Browser session URL
- [Playwright for Python](https://scrapfly.io/blog/posts/web-scraping-with-playwright-and-python) to connect to the remote browser over CDP

Install the Python packages in one command:

bash```bash
pip install scrapfly-sdk playwright loguru
```



`scrapfly-sdk` builds the Cloud Browser session URL. `playwright` connects to it and listens for responses. [loguru](https://github.com/Delgan/loguru) prints readable logs. We parse the response as JSON, so you don't need an HTML parser.

You don't need `playwright install` for this guide. The Cloud Browser API runs Chromium remotely, so the CDP client doesn't need local browser binaries.

Why Cloud Browser instead of the Web Scraping API? The Web Scraping API is a one-shot fetch: you pass a URL, you get the rendered page. Air France needs a few clicks first, and a single fetch can't carry that state or catch the booking response.

[Cloud Browser](https://scrapfly.io/blog/posts/web-scraping-with-cloud-browsers) gives you a long-lived session with proxies, fingerprints, and captcha handling built in.

For the full parameter list, see the [Cloud Browser Playwright integration docs](https://scrapfly.io/docs/cloud-browser-api/playwright). Set your key in an environment variable first:

bash```bash
export SCRAPFLY_KEY="your_scrapfly_key_here"
```



This sets your key as an environment variable so the script reads it with `os.environ["SCRAPFLY_KEY"]` instead of hard-coding the value.

With the project set up, pick the search inputs for the flights you want to scrape.



## How to Find Air France Flights to Scrape

Air France doesn't publish a URL per flight. Every offer comes from a search the user runs in the widget. So picking which flights to scrape means picking the search inputs.

The widget across the bottom of the homepage holds every input you need: trip type, cabin, From, To, travel dates, and passenger count.

Five inputs define a single Air France round-trip search:

| Input | Example | Why it matters |
|---|---|---|
| Origin airport | `PAR` (Paris) | IATA code for the departure. `PAR` is the city group for both CDG and ORY; pass a specific code to narrow it |
| Destination airport | `TYO` (Tokyo) | IATA code for arrival. `TYO` groups Haneda (HND) and Narita (NRT); the result tells you which one each offer uses |
| Departure and return date | `2026-06-23` / `2026-06-30` | Pricing changes daily. Record both search dates so you can replay later |
| Trip type | Round trip, one way, or flexible dates | This guide uses round trip, the booking widget's default. The response shape is the same for one way |
| Passenger count | 1 adult | Multi-passenger searches can return fewer offers since not every fare bucket has enough seats left |
| Country / locale / currency | `fr` / `fr-FR` / `EUR` | The regional site drives currency and fare rules. `airfrance.fr` returns EUR, `airfrance.us` returns USD, `airfrance.co.uk` returns GBP |

A few notes on how the route and date inputs shift results:

- Origin and destination are the only required fields. Typing a city group like `PAR` or `TYO` returns every airport in the metro; typing `CDG` narrows it to one airport with fewer connections.
- Trip type changes the required fields. Round trip needs a departure and a return date; one way needs only the departure.
- Departure date drives the biggest price swings. The same PAR to TYO flight can shift by a few hundred euros day to day. Record both search dates with every row.

Cabin behaves a little differently:

- The response carries every cabin's pricing, but this guide pulls the Economy product from each offer. Air France offers Premium Economy and Business only on long-haul aircraft.

One thing worth knowing: Air France doesn't carry the search filters in the URL. Every search lands on the same generic path (`/search/flights/0`) regardless of route, date, or cabin.

Cabin, passenger count, and a `bwsfe-state-searchStateUuid` live in `sessionStorage`; the route, dates, and trip type live on the server, keyed by that UUID.

Because a server-side UUID holds the page state, a deep-link trick (paste a result URL and skip the form) doesn't work. The page finds the UUID and asks the server for state, or falls back to a fresh search form.

Country, language, and currency also shape the price. The same route returns EUR on `airfrance.fr`, USD on `airfrance.us`, GBP on `airfrance.co.uk`, plus small point-of-sale gaps. For region comparisons, record country, currency, and timestamp on every result.

Pick one route and trip type and stick with it. We'll use PAR to TYO, round trip, 1 adult, about a week out. It's a main long-haul route that returns enough offers, nonstop and one-stop via hubs like Amsterdam, to exercise the parser.

Don't build the guide around result URLs. Air France does make one after a search, but the page needs the server-side state behind the UUID to render. Going through the form is the safer path.

Out of scope here: account-only flows like Flying Blue, saved profiles, seat maps, add-ons, and checkout. We stay on the public search page.

With the search parameters defined, build the scraper to execute the search.



## Scraping Air France Search Results with Scrapfly Cloud Browser API

The plan: open a Cloud Browser session, start listening for responses, and connect Playwright over CDP. Then load `airfrance.fr`, dismiss the cookie modal, fill the form, wait for the itinerary list, and read the booking response.

See the Cloud Browser Playwright integration docs for the connection syntax. Turn on [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode) while you build this flow to replay each step from the dashboard.

Note: the Cloud Browser API is currently in beta. Parameter names and behavior can shift. Re-check the Cloud Browser docs before you copy a session config into production.

Use the Scrapfly Python SDK to build the WebSocket URL. The SDK formats the parameters for you:

python```python
import os

from scrapfly import BrowserConfig, ScrapflyClient

client = ScrapflyClient(key=os.environ["SCRAPFLY_KEY"])

BROWSER_CONFIG = BrowserConfig(
    debug=True,
    country="FR",
    proxy_pool="residential",
    cache=True,
    blacklist=True,
    block_images=True,
    block_media=True,
    block_styles=True,
)
```



`proxy_pool="residential"` and `country="FR"` tie the session to a French \[%tref introduction-to-proxies-in-web-scraping "residential"\] exit, keeping locale and currency in sync with `airfrance.fr`.

`debug=True` records the run; `cache` and `blacklist` speed up repeat loads.

`block_images`, `block_media`, and `block_styles` strip the heavy assets you don't need; the offers still render and the booking response still fires.

Connect Playwright to that URL and reuse the existing context. The Cloud Browser session opens with one context ready:

python```python
from playwright.sync_api import sync_playwright

def open_session():
    p = sync_playwright().start()
    cdp_url = client.cloud_browser(BROWSER_CONFIG)
    browser = p.chromium.connect_over_cdp(cdp_url, timeout=180_000)
    context = browser.contexts[0]
    page = context.pages[0] if context.pages else context.new_page()
    page.set_viewport_size({"width": 1920, "height": 1080})
    return p, browser, page
```



The 1920x1080 viewport matches a desktop and keeps the widget in its full layout. Mobile widths show a different widget with different selectors.

### Capturing the Air France Booking Response

The offers come back as a GraphQL response when the form submits. Attach a response listener before you load the page, so you don't miss anything, and keep every JSON body for later:

python```python
def _start_xhr_collector(page):
    # Collect all XHR/fetch JSON responses into a list. Returns (list, handler).
    collected = []

    def on_response(response):
        if response.request.resource_type not in ("xhr", "fetch") or response.status != 200:
            return
        try:
            collected.append({"url": response.url, "payload": response.json()})
        except Exception:
            pass

    page.on("response", on_response)
    return collected, on_response
```



The listener keeps only successful `xhr`/`fetch` responses with a JSON body. After the search resolves, you'll filter this list for the one booking response. The operation name lives in the request URL, so a simple substring match finds it:

python```python
GQL_BOOKING_OP = "operationName=SearchResultAvailableOffersQuery"


def _find_booking_response(xhrs):
    # Return the first GraphQL SearchResultAvailableOffersQuery response.
    for xhr in xhrs:
        if GQL_BOOKING_OP in xhr["url"] and xhr["payload"]:
            return xhr["payload"]
    raise ValueError(
        f"availableOffers not found in any collected XHR ({len(xhrs)} responses captured)"
    )
```



`_find_booking_response` scans the captured list and returns the first payload whose URL carries the `SearchResultAvailableOffersQuery` operation. If nothing matched, it raises with the count of responses seen, so you can debug from the capture.

### Loading Air France and Handling Cookies or Locale Prompts

Air France shows an "Air France uses cookies" modal on first load. A separate `cookie-banner.js` script renders it, and the Accept button has a stable ID, `#accept_cookies_btn`. Click it, then wait for the widget controls:

python```python
def dismiss_cookie_banner(page):
    try:
        page.wait_for_selector("#accept_cookies_btn", timeout=15000)
        page.evaluate("() => document.querySelector('#accept_cookies_btn')?.click()")
        page.wait_for_selector("[data-testid='bwsfe-widget__trip-type-selector']", timeout=15000)
        page.wait_for_selector("[data-testid='bwsfe-widget__search-button']", timeout=15000)
    except Exception:
        pass
```



Clicking through `page.evaluate` instead of `locator.click()` skips the overlay's animation race, which can intercept the click. The ID is locale-independent (same on `.fr`, `.us`, `.co.uk`, and `.com.eg`). Use `#decline_cookies_btn` to opt out of marketing cookies.

Match the proxy `country` to the site you load. If you switch to `airfrance.us`, also set `country` to `US` so the locale and your exit IP agree. A mismatch will redirect you or trigger a region prompt.

### Submitting the Air France Flight Search Form

The widget is a set of Angular Material controls under `<bw-search-widget>`. Every control exposes a stable `data-testid` (`bwsfe-widget__*`, `bwsfe-station-picker__*`), so prefer those over ARIA labels, which change per locale, or deep DOM paths.

SSR markers like `_ngcontent-server-app-c276913315` rotate on every build, so never select on them.

The booking widget defaults to a round trip, so you can skip the trip-type selector here. Start with the dates: the trigger is a button with `data-testid="bwsfe-datepicker__toggle-button"`.

Inside the picker, each day is a `<button>` with `id="bwc-day_YYYY_M_D"` on a **0-indexed** month, so June is 5.

The ID is locale-independent and deterministic, so prefer it. Click the departure day, then the return day, then confirm:

python```python
from datetime import datetime


def _date_to_day_id(date_str):
    # Convert YYYY-MM-DD to the bwc-day ID format (0-indexed month, no leading zeros).
    d = datetime.strptime(date_str, "%Y-%m-%d")
    return f"{d.year}_{d.month - 1}_{d.day}"


def pick_date(page, date, return_date):
    page.wait_for_selector('[data-testid="bwsfe-datepicker__toggle-button"]', timeout=15000)
    toggle = page.locator('[data-testid="bwsfe-datepicker__toggle-button"]')
    if toggle.get_attribute("aria-expanded") != "true":
        toggle.click()
    page.wait_for_timeout(1500)

    page.click(f"#bwc-day_{_date_to_day_id(date)}")
    page.wait_for_timeout(500)
    page.click(f"#bwc-day_{_date_to_day_id(return_date)}")
    page.wait_for_timeout(500)
    page.click('[data-testid="bwc-calendar__confirm"]')
```



The picker shows two months at once (current + next). For dates further out, you'll need to click a "next month" arrow before the target day is in view.

Fill the route next. From and To aren't `<input>` elements; they're `contenteditable` `<span role="combobox">` nodes in the station picker. `locator.fill()` won't work on those, so click the span and use `keyboard.type`:

python```python
def fill_station(page, origin, destination):
    for role, iata_code in (("origin", origin), ("destination", destination)):
        picker = page.locator(
            f"[data-testid='bwsfe-connection-picker__station-picker--{role}']"
        )
        field = picker.locator("[data-testid='bwsfe-station-picker__input']").first
        field.scroll_into_view_if_needed()
        field.click(force=True)
        page.wait_for_timeout(800)
        page.keyboard.type(iata_code, delay=80)
        page.wait_for_timeout(1800)
        page.keyboard.press("Enter")
        page.wait_for_timeout(800)
```



`force=True` skips the CDP client's "is this safe to click" checks, which Material widgets often fail while open/close animations run.

The 1.8-second wait gives the autocomplete time to return suggestions. Air France's IATA matching is fuzzy, so a code can match airports in other cities if you press Enter too early.

Now submit and wait for the itinerary list to render. Wrap the whole fill in a small retry: if the search button gets stuck in its loading state, refill once before giving up:

python```python
SEARCH_BUTTON = "[data-testid='bwsfe-widget__search-button']"
SEARCH_BUTTON_LOADING = f"{SEARCH_BUTTON} .bwc-button-content--loading"


def _is_search_button_loading(page):
    return page.locator(SEARCH_BUTTON_LOADING).count() > 0


def _fill_search_form(page, origin, destination, date, return_date):
    for attempt in range(2):
        if attempt > 0:
            page.reload(wait_until="domcontentloaded", timeout=30000)
        dismiss_cookie_banner(page)
        pick_date(page, date, return_date)
        fill_station(page, origin, destination)
        page.wait_for_timeout(10_000)
        page.click(SEARCH_BUTTON)
        page.wait_for_load_state("domcontentloaded", timeout=10000)
        try:
            page.wait_for_selector("[data-testid='bwsfe-itinerary-list']", timeout=50000)
            return
        except Exception:
            if _is_search_button_loading(page):
                if attempt == 0:
                    continue
                raise Exception("Search button stuck in loading state after refill and retry")
            raise Exception("Itinerary list not found after search")
```



Waiting on `[data-testid='bwsfe-itinerary-list']` is the real end state. Airline pages keep connections open after the offers render (analytics, ad pixels, lazy panels), so `wait_until="networkidle"` times out before the offers appear.

Waiting on the itinerary list, and checking the button's loading class on failure, tells you whether the search resolved.

Once you reach that end state, the offers are ready to parse.



Scrapfly

#### Need to bypass anti-bot protection?

Scrapfly's Anti-Scraping Protection handles Cloudflare, DataDome, and more — automatically.

[Try Free →](https://scrapfly.io/register)## Parsing Air France Flight Offer Data

Once the search resolves, the offers sit in the captured `SearchResultAvailableOffersQuery` payload. The parser reads that JSON, no DOM scraping required, and each itinerary becomes a row you can store or compare.

### Air France Flight Offer Fields to Extract

Pull the same fields for every record. A fixed shape makes the data easy to compare later:

| Field | Type | Notes |
|---|---|---|
| `airline` | string | Operating carrier name (can differ from the marketing code, e.g. KLM on an `AF` number) |
| `flight_number` | string | Marketing carrier code + number, e.g. `AF 0186` |
| `departure_time` | string | `HH:MM` local, from the first segment |
| `departure_airport` | string | IATA code the offer departs from |
| `arrival_time` | string | `HH:MM` local, from the last segment |
| `arrival_airport` | string | IATA code the offer arrives at |
| `arrives_next_days` | integer | Day offset (`1` means next-day arrival) |
| `duration_minutes` | integer | Total trip duration |
| `stops` | integer | `0` for nonstop, `1+` for connections |
| `layovers` | list | One `{airport, duration_minutes}` per connection |
| `price` | string | Economy product amount |
| `currency` | string | Driven by the regional site you loaded (`EUR` on `.fr`) |
| `cabin_class` | string | `economy` for this guide |
| `seats_available` | integer or null | Seats left in the Economy bucket |
| `is_promo` | bool | Promo fare flag |
| `promo_title` | string or null | Promo label when present |
| `has_special_fare` | bool | Special-fare flag |
| `seat_map_eligible` | bool | Whether Air France offers seat selection |
| `plane_model` | string | Aircraft on the first segment, e.g. `Airbus A350-900` |
| `co2_kg` | integer | CO2 per passenger from the search metadata |
| `airport_change_warning` | list or null | Cities where you change airports during a connection |
| `scrape_country` | string | Proxy country you set on the Cloud Browser session |
| `scrape_locale` | string | Site locale you loaded |
| `captured_at` | string | UTC timestamp of the scrape |

The official scraper ran this against `airfrance.fr` for a PAR to TYO round-trip search. A clean record looks like this:

json```json
{
  "airline": "Air France",
  "flight_number": "AF 0186",
  "departure_time": "09:40",
  "departure_airport": "CDG",
  "arrival_time": "05:55",
  "arrival_airport": "HND",
  "arrives_next_days": 1,
  "duration_minutes": 795,
  "stops": 0,
  "layovers": [],
  "price": "593.13",
  "currency": "EUR",
  "cabin_class": "economy",
  "seats_available": 6,
  "is_promo": false,
  "promo_title": null,
  "has_special_fare": false,
  "seat_map_eligible": true,
  "plane_model": "Airbus A350-900",
  "co2_kg": 733,
  "airport_change_warning": null,
  "scrape_country": "fr",
  "scrape_locale": "fr-FR",
  "captured_at": "2026-06-16T12:55:38+00:00"
}
```





Note that the search used the `PAR` and `TYO` city groups, but the offer resolves to specific airports (`CDG` to `HND`). One-stop offers fill in `layovers`, for example 145 minutes at Amsterdam on a KLM-operated `AF` flight.

The parser below walks `data.availableOffers.offerItineraries`, reads the active connection's segments for routing and timing, and pulls the Economy product out of `upsellCabinProducts`:

python```python
import datetime as dt


def parse_flights(response, scrape_country="fr", scrape_locale="fr-FR"):
    offers = response["data"]["availableOffers"]
    captured_at = dt.datetime.now(dt.timezone.utc).isoformat()
    results = []

    for it in offers["offerItineraries"]:
        active = it["activeConnection"]
        segs = active["segments"]
        first, last = segs[0], segs[-1]

        # economy product info, from the priced upsell cabins
        economy = next(
            (c for p in it.get("upsellCabinProducts", [])
             for c in p["connections"]
             if c.get("cabinClass") == "ECONOMY" and c["price"]["amount"]),
            {},
        )

        layovers = [
            {
                "airport": segs[i]["destination"]["code"],
                "duration_minutes": segs[i].get("transferDuration"),
            }
            for i in range(len(segs) - 1)
        ]

        # airport change warning (e.g. land KIX, depart ITM)
        warnings = [
            w.get("city") for w in active.get("warnings", [])
            if w.get("__typename") == "OfferStationChangeWarning"
        ]

        results.append({
            "airline": active["operatingCarriers"][0]["name"],
            "flight_number": f"{first['marketingFlight']['carrier']['code']} {first['marketingFlight']['number']}",
            "departure_time": first["departureDateTime"][11:16],
            "departure_airport": first["origin"]["code"],
            "arrival_time": last["arrivalDateTime"][11:16],
            "arrival_airport": last["destination"]["code"],
            "arrives_next_days": active.get("dateVariation", 0),  # +1 means next day
            "duration_minutes": active["duration"],
            "stops": 0 if active["isDirect"] else len(segs) - 1,
            "layovers": layovers,
            "price": str(economy.get("price", {}).get("amount")),
            "currency": "EUR",
            "cabin_class": "economy",
            "seats_available": economy.get("numberOfSeatsAvailable"),
            "is_promo": economy.get("isPromo", False),
            "promo_title": economy.get("promoTitle"),
            "has_special_fare": economy.get("hasSpecialFare", False),
            "seat_map_eligible": first.get("seatMapEligible", False),
            "plane_model": first["equipmentName"],
            "co2_kg": offers["searchMetadata"]["environmentalInformation"]["co2InKg"],
            "airport_change_warning": warnings if warnings else None,
            "scrape_country": scrape_country,
            "scrape_locale": scrape_locale,
            "captured_at": captured_at,
        })

    return results
```



Because the data comes straight from the booking response, the fields are already typed: no regex over visible text, no per-locale price parsing. We fix `currency` to `EUR` because the session loads `airfrance.fr`; align it with the site you load.

The last three fields don't come from the response. You stamp `scrape_country`, `scrape_locale`, and `captured_at` at scrape time so cross-region and cross-day comparisons stay accurate.

### Putting It Together

The top-level function opens a session, collects responses, fills the form, finds the booking payload, and parses it. It retries the whole flow up to three times and always tears the session down in `finally`:

python```python
from loguru import logger as log

LANDING_URL = "https://wwws.airfrance.fr/"


def scrape_flights(origin, destination, departure_date, return_date, max_retries=3):
    last_error = None
    for attempt in range(1, max_retries + 1):
        p, browser, page = open_session()
        xhr_list, on_response = _start_xhr_collector(page)
        try:
            page.goto(LANDING_URL, wait_until="domcontentloaded", timeout=90_000)
            page.wait_for_timeout(5_000)
            _fill_search_form(page, origin, destination, departure_date, return_date)
            response = _find_booking_response(xhr_list)
            flights = parse_flights(response)
            log.success(f"scraped {len(flights)} flights for {origin} -> {destination}")
            return flights
        except Exception as e:
            last_error = e
            print(f"scrape_flights attempt {attempt}/{max_retries} failed: {e}")
        finally:
            for cleanup in (
                lambda: page.remove_listener("response", on_response),
                browser.close,
                p.stop,
            ):
                try:
                    cleanup()
                except Exception:
                    pass
    raise last_error
```



`scrape_flights` is the entry point that ties the helpers together. It opens a session, starts the collector, runs `_fill_search_form`, then hands the captured traffic to `_find_booking_response` and `parse_flights`.

The retry loop reopens a fresh session on failure, and the `finally` block closes the browser so a failed attempt never leaks a billing session.

Call it with a route and two dates:

python```python
flights = scrape_flights(
    origin="PAR",
    destination="TYO",
    departure_date="2026-06-23",
    return_date="2026-06-30",
)
```



Save each batch as JSON or push it to a database. Add a `captured_at` timestamp on every row so cross-day comparisons stay accurate.

### DOM Results vs XHR/JSON Responses

Two paths exist for pulling offer data off the page:

- Capture the XHR/JSON response Air France makes on submit, like the code above. The data comes back clean and already typed, and it carries fields the cards don't render (CO2, aircraft, seats left, layover durations). The tradeoff is that it's tied to a private GraphQL operation that can change.
- Parse the rendered DOM. This path is easy to debug because you can see the same elements in DevTools, but the cards drop fields, need per-locale price parsing, and shift markup often.

Pick one and stick with it. We capture the booking response here because it's the cleaner, richer source and matches the [official Scrapfly Air France scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/airfrance-scraper). If you go the DOM route instead, the `bwsfc-*` `data-testid` cards are the stable selectors.

For more on the capture technique, see [how to capture XHR requests with Playwright](https://scrapfly.io/blog/answers/how-to-capture-xhr-requests-playwright). Check that the response is public (not behind a server token) before you rely on it.

Parsing only matters once the page loads, so handle blocking next.



## Bypass Air France Blocking with Scrapfly Cloud Browser API

Air France runs the usual airline anti-bot stack: residential-only routes in some markets, bot-fingerprint checks, and an occasional captcha on submit. The symptoms map to a few Cloud Browser settings, so pick the fix that matches and don't stack them.

| Symptom | Likely cause | Cloud Browser response |
|---|---|---|
| Empty or mismatched results | Proxy country, locale, language, or currency don't match the site | Align `country` with the regional site (`FR` for `airfrance.fr`, `US` for `airfrance.us`), and record locale + currency on every row |
| Booking response never captured | Search didn't resolve, or button stuck loading | Wait on `bwsfe-itinerary-list`, check the button's loading class, and let the built-in retry refill once |
| Captcha/challenge page | Anti-bot challenge fired on submit | Turn on `debug=True` recordings to see what fired, then retry with a fresh session. No setting guarantees a bypass |
| Flaky multi-step flow | Session state drifting across steps | Use a fresh session per scrape (the default), and let `scrape_flights` retry up to three times |
| High bandwidth after the flow works | Heavy ads, images, and media on the result page | Keep `block_images=True`, `block_media=True`, and `block_styles=True`; the offers and the booking response still come through |

Cloud Browser handles the costly parts: residential proxies, [TLS and JA3 fingerprints](https://scrapfly.io/blog/posts/how-to-avoid-web-scraping-blocking-tls), real Chromium, and captcha solving. The Getting Started docs list every parameter.

### Proxy Country and Geo-Targeted Air France Prices

Air France prices the same route differently by point of sale. `country="FR"` returns EUR on `airfrance.fr`, while `country="US"` on `airfrance.us` returns USD with US fare rules.

The proxy country also sets the default language and currency. A mismatch can drop you onto a different regional site than the one you meant to scrape.

When you compare prices across regions, record `scrape_country`, `scrape_locale`, `scrape_currency`, and `captured_at` on every row. Without those fields the rows look the same and you lose the context that explains the price gap.

For a deeper look at the headers and routing that shape geo-pricing, see [scraping in another language or currency](https://scrapfly.io/blog/posts/how-to-scrape-in-another-language-or-currency).

### Sessions, Debug Mode, and Captcha Handling

Cloud Browser sessions keep cookies, fingerprint, and proxy when you reuse the same `session` ID. Use a fresh session for one-off scrapes, and a stable ID only when you need warm state. See the [session resume docs](https://scrapfly.io/docs/cloud-browser-api/session-resume) for the format.

Always close the browser when you're done, even on errors. The `finally` block in `scrape_flights` removes the listener and tears down the session. A leaked session keeps running and billing until `timeout` runs out.

If a captcha/challenge appears, the `debug=True` recording shows what fired, then retry with a fresh session. See the [Captcha Solver docs](https://scrapfly.io/docs/cloud-browser-api/captcha-solver) for supported types and credit cost. No setting guarantees a bypass, and blocking shifts during sales and peak booking.

For more on clearing challenges, see [how to bypass CAPTCHA while web scraping](https://scrapfly.io/blog/posts/how-to-bypass-captcha-while-web-scraping).



### Web Scraping API

Scrape any website with our powerful API. Anti-bot bypass, JavaScript rendering, and rotating proxies built-in.



[Try Web Scraping API](https://scrapfly.io/docs/scrape-api/getting-started)



## FAQ

Is there an Air France API for flight data or prices?Official and partner APIs may exist for specific travel data use cases. Scrape direct when API coverage, fields, freshness, or cost do not fit your needs.







Can BeautifulSoup scrape Air France flight results?You don't need an HTML parser here; the offers come back as JSON you read directly. [BeautifulSoup](https://scrapfly.io/blog/posts/web-scraping-with-python-beautifulsoup) only helps on the DOM route, which needs a browser session first.







Can I use Selenium instead of Playwright to scrape Air France?[Selenium](https://scrapfly.io/blog/posts/web-scraping-with-selenium-and-python) can control a Cloud Browser session over CDP too. The same workflow works with Selenium if that's your team's tool; we use Playwright here because it matches Scrapfly's documentation.







Why use the Scrapfly Cloud Browser API instead of the Web Scraping API for Air France?Air France needs multi-step browser interaction and a response you capture mid-flow. Cloud Browser API suits that better than a one-shot request/response scrape.







Can this Air France scraper work for Emirates or other airlines?The Cloud Browser API workflow transfers to other airlines like Emirates, Lufthansa, or British Airways. Each airline has its own form controls, fare brands, response shape, anti-bot behavior, and QA requirements.









## Summary

The workflow is straightforward once the pieces are in place. Pick your search inputs: route, dates, trip type, passengers, country and locale. Open a Scrapfly Cloud Browser session, start listening for responses, and walk the widget step by step.

Wait on the itinerary list, then pull the captured `SearchResultAvailableOffersQuery` payload and parse it into structured records with price, times, layovers, aircraft, seats, and CO2. Map any blocking symptoms back to Cloud Browser settings.

Cloud Browser API fits this guide because Air France needs multi-step browser interaction and a response captured mid-flow. A one-shot HTTP request can't dismiss the cookie modal, fill the form, and catch the booking call. A managed browser session can.

If running browsers, proxies, fingerprints, and captcha handling in-house is your bottleneck, try the Scrapfly Cloud Browser API on your own routes.



Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.

 

   Table of Contents















 

  Table of Contents- [Key Takeaways](#key-takeaways)
- [Why Scrape Air France Flight Data?](#why-scrape-air-france-flight-data)
- [Project Setup](#project-setup)
- [How to Find Air France Flights to Scrape](#how-to-find-air-france-flights-to-scrape)
- [Scraping Air France Search Results with Scrapfly Cloud Browser API](#scraping-air-france-search-results-with-scrapfly-cloud-browser-api)
- [Capturing the Air France Booking Response](#capturing-the-air-france-booking-response)
- [Loading Air France and Handling Cookies or Locale Prompts](#loading-air-france-and-handling-cookies-or-locale-prompts)
- [Submitting the Air France Flight Search Form](#submitting-the-air-france-flight-search-form)
- [Parsing Air France Flight Offer Data](#parsing-air-france-flight-offer-data)
- [Air France Flight Offer Fields to Extract](#air-france-flight-offer-fields-to-extract)
- [Putting It Together](#putting-it-together)
- [DOM Results vs XHR/JSON Responses](#dom-results-vs-xhr-json-responses)
- [Bypass Air France Blocking with Scrapfly Cloud Browser API](#bypass-air-france-blocking-with-scrapfly-cloud-browser-api)
- [Proxy Country and Geo-Targeted Air France Prices](#proxy-country-and-geo-targeted-air-france-prices)
- [Sessions, Debug Mode, and Captcha Handling](#sessions-debug-mode-and-captcha-handling)
- [FAQ](#faq)
- [Summary](#summary)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-air-france-flights) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-air-france-flights) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-air-france-flights) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-air-france-flights) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-air-france-flights) 



 ## Related Articles

 [  

 python scrapeguide 

### How to Scrape Booking.com (2026 Update)

Tutorial on how to scrape booking.com hotel and pricing data using Python. How to avoid blocking to web scrape data at s...

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-bookingcom) [  

 python headless-browser 

### Web Scraping with Selenium and Python

Introduction to web scraping dynamic javascript powered websites and web apps using Selenium browser automation library ...

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-selenium-and-python) [  

 python headless-browser 

### How To Take Screenshots In Python?

Learn how to take Python screenshots through Selenium and Playwright, including common browser tips and tricks for custo...

 

 ](https://scrapfly.io/blog/posts/how-to-take-screenshots-in-python) 

  ## Related Questions

- [ Q How to capture background requests and responses in Playwright? ](https://scrapfly.io/blog/answers/how-to-capture-xhr-requests-playwright)
 
  



   



 Bypass anti-bot protection automatically, **1,000 free credits** [Start Free](https://scrapfly.io/register)