Browser fingerprinting is the process of collecting information about a user's browser environment to generate a unique identifier or digital fingerprint for this user.
Browser fingerprinting uses JavaScript to gather information about a user's browser, operating system, location, time zone, system languages, fonts, extensions, devices, and screen resolution to create a unique digital fingerprint that remains consistent across browsing sessions, making it more reliable than traditional methods like cookies.
In this article, we will delve into the inner workings of one of the prominent browser fingerprinting tools, CreepJS. We will discuss its advantages and limitations, demonstrate how to use it with popular browser automation tools, and present some alternative options that offer similar functionality.
What is CreepJS?
CreepJS is an open-source Javascript-based project designed to detect leaks and vulnerabilities in modern anti-fingerprinting tools.
CreepJS includes several key features that help identify privacy vulnerabilities, fingerprinting patterns, and weaknesses in browser anti-fingerprinting techniques:
JavaScript Tampering Detection:
It detects and ignores prototype lies caused by anti-fingerprinting techniques, revealing attempts to modify the default behavior of browser APIs.
Fingerprint Profiling:
It generates detailed fingerprint profiles by capturing unique browser behaviors. This includes device properties, rendering capabilities, and browser-specific traits.
Browser Privacy Settings Analysis:
CreepJS inspects privacy-related browser settings to identify inconsistencies and deviations, highlighting areas where user privacy might be compromised.
Large-Scale Data Collection Simulation:
It simulates data collection on a large scale, validating the robustness of anti-fingerprinting measures across different APIs (e.g., Canvas, WebGL, etc.) and collecting consistency in how browsers handle this data.
New API Detection:
CreepJS detects new or updated APIs that are prone to fingerprinting and could compromise user privacy.
Cross-Browser and Cross-Platform Testing:
CreepJS is designed to test a wide range of browsers and platforms, including desktop and mobile environments. It compares and highlights privacy vulnerabilities across these environments.
Detailed Reporting:
It provides reports on detected fingerprinting vulnerabilities, including the potential risk of re-identification based on various browser traits.
High Entropy Data Collection:
The tool focuses on capturing high-entropy data, like GPU rendering, system fonts, and network conditions, to assess how much information can be used for fingerprinting.
Having explored the fundamental concept and purpose of CreepJS, we now turn our attention to its inner workings. While understanding what CreepJS is provides context for its role in browser fingerprinting, delving into how it operates reveals the true sophistication of this tool.
How does CreepJS work?
CreepJS's operation can be broken down into several key steps: data collection, processing, and analysis. CreepJS gathers information from numerous browser APIs, examines how the browser renders different elements, and even detects attempts to mask or alter this information. Let's dive deeper into each of these steps to understand the intricate workings of this powerful fingerprinting tool.
Data Collection:
CreepJS collects a wide range of data points from the user's browser and device. This includes information about the browser window, navigator properties, screen metrics, rendering capabilities (like Canvas and WebGL), audio context, installed fonts, CSS styles, mathematical calculations, console errors, and many more. The goal is to gather as much information as possible to create a unique fingerprint.
Hashing:
After collecting the data, CreepJS uses hashing algorithms to create unique identifiers for each piece of information. Hashing converts the collected data into fixed-size strings of characters, which are easier to compare and store. This process helps in creating a compact representation of the fingerprint.
Fingerprint Creation:
Using the hashed data, CreepJS creates a comprehensive fingerprint object. This object contains all the collected and hashed information, organized into categories such as browser features, screen properties, audio capabilities, font information, and more. This fingerprint serves as a unique identifier for the user's browser and device combination.
API Interaction:
CreepJS interacts with several APIs to process and analyze the fingerprint:
A Prediction API is used to decrypt and analyze fingerprints, comparing them against known patterns.
A Fingerprint API computes a fingerprint profile based on unique patterns detected.
A Web Traffic API analyzes patterns in web traffic to detect anomalies or suspicious behavior.
These APIs work together to process the fingerprint data and generate insights.
Entropy Analysis:
CreepJS calculates the entropy (a measure of uniqueness or randomness) of various components of the fingerprint. This helps in understanding how distinctive each aspect of the fingerprint is. Higher entropy indicates that a particular feature is more unique and thus more valuable for identification purposes.
Trust Score:
Based on various factors, including the consistency of the fingerprint over time, detected "lies" (inconsistencies in reported data), and other behavioral patterns, CreepJS computes a trust score. This score indicates how likely it is that the fingerprint represents a genuine, unmodified browser environment.
Resistance Detection:
CreepJS includes mechanisms to detect and fingerprint various privacy-enhancing technologies and browsers. This includes identifying features specific to browsers like Tor, Firefox with privacy enhancements, Brave, and others. It also attempts to detect privacy-focused browser extensions and tools that might be trying to mask or alter the browser's true fingerprint.
Rendering Results:
Finally, CreepJS presents the analyzed fingerprint data in a user-friendly interface. This typically includes visualizations of various metrics, the calculated trust score, predictions about the browser environment, and detailed breakdowns of the fingerprint components. This allows users (or researchers) to understand the uniqueness of their browser fingerprint and see what information might be leaking.
Now that we understand the core concepts of CreepJS's operation, let's explore some the techniques and methodologies employed by CreepJS that enable CreepJS to create such detailed and unique browser fingerprints.
CreepJS Fingerprinting Techniques
CreepJS is a sophisticated browser fingerprinting tool that employs a wide array of techniques to create a detailed and unique profile of a user's browser and device. These techniques are not just theoretical concepts but practical implementations of advanced web technologies. Let's take a look at them.
Canvas Fingerprinting
Canvas fingerprinting uses the HTML5 Canvas element to draw graphics and text, then generates a hash from the resulting image data.
CreepJS collects various data points:
Image rendering
Paint operations
Text rendering
Emoji rendering
Text metrics
WebGL Fingerprinting
WebGL (Web Graphics Library) fingerprinting exploits the differences in how GPUs and graphics drivers render 3D scenes.
CreepJS collects detailed WebGL information:
WebGL parameters
GPU model and vendor information
Supported extensions
Rendering capabilities
Audio Fingerprinting
Audio fingerprinting uses the Web Audio API to generate and analyze audio signals. CreepJS implements this technique by collecting audio metadata:
Audio processing characteristics
Frequency and time domain data
Compressor gain reduction
Sample sums
Screen Fingerprinting
Screen fingerprinting gathers information about the user's display. CreepJS collects screen data:
Screen resolution
Color depth
Pixel depth
Available screen space
Device pixel ratio
CreepJS also uses CSS media queries to gather additional screen-related information.
These techniques combined provide a comprehensive fingerprint of the user's device, making it difficult for users to hide their identity or use spoofing techniques without being detected. The fingerprints generated from these sources are then hashed and combined with other data points to create a unique identifier for the device and browser combination.
Using CreepJS with Browser Automation Tools
CreepJS, with its advanced detection capabilities, presents a unique hurdle for browser automation. It's designed to identify not just the browser itself, but also any attempts to mask or alter the browser's true identity - a common practice in automated browsing scenarios. This creates a fascinating cat-and-mouse game between fingerprinting techniques and automation tools.
Let's explore how popular frameworks like Playwright, Selenium, and Puppeteer intersect with sophisticated fingerprinting mechanisms like CreepJS.
For this we will:
Start a web browser
Tell it to go to the CreepJS test page hosted on github
Wait 5 seconds for all tests to complete
Take a screenshot of the full page and save it to a file
To explore playright's fingerprint with CreepJS. Install playwright in your python environment using the following pip command:
$ pip install playwright
Next, install your preferred browser web drivers of choice, we will use the Chromium web drivers
$ playwright install chromium
To open CreepJS using playwright, run the following python script:
from pathlib import Path
from playwright.sync_api import sync_playwright
import time
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
page = context.new_page()
page.goto("https://abrahamjuliot.github.io/creepjs/")
# wait for 5 seconds after page loads
time.sleep(5)
# take a screenshot of the full page
page.screenshot(path="full-page-screenshot.png", full_page=True)
To explore selenium's fingerprint with CreepJS. Install selenium in your python environment using the following pip command:
$ pip install selenium
You will also need to install ChromeDriver through the official download link. Make sure the driver file is in the same path as your pyhton script file.
To open CreepJS using selenium, run the following python script:
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("https://abrahamjuliot.github.io/creepjs/")
# wait for 5 seconds after page loads
time.sleep(5)
# take a screenshot of the full page
driver.save_screenshot("full-page-screenshot-selenium.png")
To explore puppeteer's fingerprint with CreepJS. Inintialize a node.js project and install selenium using the following commands:
$ npm init
$ npm install puppeteer
To open CreepJS using puppeteer, run the following javascript code:
const puppeteer = require("puppeteer");
async function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function main() {
const browser = await puppeteer.launch({
headless: false,
});
let page = await browser.newPage();
await page.goto("https://abrahamjuliot.github.io/creepjs/", {
waitUntil: "domcontentloaded",
});
// wait for 5 seconds after page loads
await sleep(5000);
# take a screenshot of the full page
await page.screenshot({ path: "screenshot.png", fullPage: true });
await page.close();
await browser.close();
}
main();
With these scripts we can see exactly how our automated browsers are being fingerprinted by CreepJS. Next, let's use this information to fortify our web scrapers against detection.
Fortifying Web Scrapers Against CreepJS Detection
Many web browser automation tools inadvertently reveal information about themselves to JavaScript execution contexts. This means that JavaScript can easily detect if the browser is being controlled by a program instead of a human, allowing fingerprinting tools like CreepJS to quickly identify a web scraper. Strengthening our fingerprint should be our initial priority - we must conceal the traces left by our web scrapers.
The leaks uncovered by browser automation tools are widely recognized within the web scraping community. Therefore, all major browser automation tools used for web scraping have numerous extension libraries supported by the community. These libraries share the common goal of fixing any known leaks that could make your scraping tool's fingerprint easily detectable by tools like CreepJS and others built on similar concepts and techniques.
You can learn more about how JavaScript and browser fingerprinting is used to block web scrapers in our dedicated article:
Let's quickly go over some of the popular libraries used with browser automation tools to bypass fingerprint detection:
Playwright
playwright-extra: a light-weight plugin framework for playwright that allows adding extra plugins, most importantly the stealth plugin which applies various evasion techniques to make detection of playwright harder.
undetected-playwright: A new addition in the stealth libraries race, inspired by playwright-extra. undetected-playwright enhance playwright by making it more difficult to detect when a browser is being controlled programmatically
Selenium
undetected-chromedriver: a modified selenium webdriver that has bulitin measures to combat websites that block automated headless browsers scraping their pages.
SeleniumBase UC Mode: a feature of SeleniumBase that allows bots to appear human, which lets them evade detection from anti-bot services that try to block them or trigger CAPTCHAs on various websites. UC Mode is based on undetected-chromedriver, but includes multiple updates, fixes, and improvements.
Puppeteer
puppeteer-extra-plugin-stealth: a popular plugin for puppeteer-extra, the puppeteer counterpart of playwright-extr also use to to prevent anti-bot detection.
rebrowser-puppeteer: a fork of puppeteer patched with rebrowser-patches, which is a collection of patches for puppeteer and playwright to avoid automation detection and leaks.
Learn more about those tools and much more through our article dedicated for stealth scraping tools:
CreepJS Limitations
While CreepJS is a powerful and sophisticated browser fingerprinting tool, it's important to recognize that it's not without its limitations. Understanding these limitations is crucial for researchers, developers, and privacy professionals to accurately assess the tool's strengths and weaknesses in the context of digital fingerprinting and online privacy.
Ethical concerns: The extensive data collection and fingerprinting techniques used by CreepJS raise significant privacy concerns and may be seen as invasive by users.
Potential for false positives: The complex nature of the fingerprinting process might lead to misidentification of legitimate users as using privacy tools or spoofing attempts.
Arms race with privacy tools: As CreepJS improves its detection capabilities, privacy tools and browsers will likely adapt, leading to an ongoing cat-and-mouse game.
Browser updates: Frequent browser updates might change or remove access to some of the APIs CreepJS relies on, potentially affecting its effectiveness over time.
Limited scope: CreepJS is primarily designed for research and demonstration purposes, and its use is limited to its GitHub page.
Complexity: The intricate nature of CreepJS might make it challenging to implement and maintain in real-world applications.
In summary, while CreepJS is a powerful and comprehensive fingerprinting tool with advanced detection capabilities, its use comes with significant ethical, legal, and practical considerations that need to be carefully weighed.
CreepJS Alternatives
The landscape of browser fingerprinting is diverse, with various tools and libraries offering different approaches and capabilities; CreepJS is not the only player in the field. Understanding these alternatives is crucial for developers, researchers, and privacy advocates to gain a broader perspective on the current state of fingerprinting technology.
FingerprintJS
FingerprintJS is a browser fingerprinting library that operates on the client side and is source-available. It collects browser attributes and uses them to create a hashed identifier for each visitor. Because FingerprintJS generates fingerprints directly in the browser, its accuracy ranges between 40% and 60%. This means that if two different users access the service using browsers that are identical in version, vendor, and platform, FingerprintJS will be unable to differentiate between the two due to the identical attributes.
To see how it works and view your own unique browser fingerprint, FingerprintJS offers a live demo available here.
Broprint.js
Broprint.js is a JavaScript library that generates a unique identifier for different browsers, such as Chrome, Firefox, or any other browsers that support canvas and audio fingerprinting. Browser fingerprinting can be easily implemented with this library. Check out the live demo.
Scrapfly Fingerprint Tools
Scrapfly hosts a variety of tools to assist with automation development. We use these tools for internal development and share them online.
Scrapfly's fingerprinting tools can generate browser fingerprints using various browser fingerprinting techniques:
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale. Each product is equipped with an automatic bypass for any anti-bot system and we achieve this by:
Maintaining a fleet of real, reinforced web browsers with real fingerprint profiles.
Millions of self-healing proxies of the highest possible trust score.
Constantly evolving and adapting to new anti-bot systems.
We've been doing this publicly since 2020 with the best bypass on the market!
It takes Scrapfly several full-time engineers to maintain this system, so you don't have to!
Summary
In this article, we have taken a look at the most popular browser fingerprint testing tool, CreepJS, and what makes it special.
We have covered what CreepJS is, its inner workings, and how it implements several browser fingerprint techniquest. Further, we have taken a look at how these fingerprint techniques can be bypassed using stealth enhancements for Puppetter, Playwright and Selenium.
Finally, take note that fingeprinting is a constantly evolving field and you can find the latest fingerprint tests on Scrapfly fingerprint tools page.
Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.
In this article, we'll explore the FlareSolverr tool and how to use it to get around Cloudflare while scraping. We'll start by explaining what FlareSolverr is, how it works, how to install and use it. Let's get started!