Guide to Google Scholar API and Alternatives
Learn how to access Google Scholar data without an official API. Explore alternatives and the best methods for data retrieval.
Google Image Search API allows developers to integrate Google Image Search functionality into their applications. This API provides access to a vast collection of images indexed by Google, enabling users to search for images based on various criteria such as keywords, image type, and more.
Whether you're building an image search feature, creating a visual recognition tool, or developing content analysis software, this guide will help you understand your options for programmatically accessing image search functionality.
Google previously provided a dedicated Image Search API as part of its AJAX Search API suite, but this service was deprecated in 2011. Since then, developers looking for official Google-supported methods to access image search results have had limited options.
However, Google does offer a partial solution through its Custom Search JSON API, which can be configured to include image search results. This requires setting up a Custom Search Engine (CSE) and limiting it to image search, but it comes with significant limitations:
For developers needing more robust image search capabilities, exploring alternative services is often necessary.
While Google does not provide an official Image Search API, there are several alternatives available:
Microsoft's Bing Image Search API provides a comprehensive solution for integrating image search capabilities into applications. Part of the Azure Cognitive Services suite, this API offers advanced search features and returns detailed metadata about images.
import requests
subscription_key = "YOUR_SUBSCRIPTION_KEY"
search_url = "https://api.bing.microsoft.com/v7.0/images/search"
search_term = "mountain landscape"
headers = {"Ocp-Apim-Subscription-Key": subscription_key}
params = {"q": search_term, "count": 10, "offset": 0, "mkt": "en-US", "safeSearch": "Moderate"}
response = requests.get(search_url, headers=headers, params=params)
response.raise_for_status()
search_results = response.json()
# Process the results
for image in search_results["value"]:
print(f"URL: {image['contentUrl']}")
print(f"Name: {image['name']}")
print(f"Size: {image['width']}x{image['height']}")
print("---")
In the above code, we're sending a request to the Bing Image Search API with our search term and additional parameters. The API returns a JSON response containing image URLs, names, and dimensions, which we can then process according to our application's needs.
The Bing API offers competitive pricing with a free tier that includes 1,000 transactions per month, making it accessible for small projects and testing before scaling.
DuckDuckGo doesn't offer an official API for image search, but it's worth noting that their image search results are primarily powered by Bing's search engine. For developers looking for a more privacy-focused approach, some have created unofficial wrappers around DuckDuckGo's search functionality.
Since this method relies on web scraping, you should have prior knowledge of it. If you're interested in learning more about web scraping and best practices, check out our article.
Ultimate modern intro to web scraping using Python. How to scrape data using HTTP or headless browsers, parse it using AI and scale and deploy.
Now, let's move on to the example.
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
def scrape_duckduckgo_images():
# Start Playwright in a context manager to ensure clean-up
with sync_playwright() as p:
# Launch the Chromium browser in non-headless mode for visual debugging
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Navigate to DuckDuckGo image search for 'python'
page.goto("https://duckduckgo.com/?q=python&iax=images&ia=images")
# Wait until the images load by waiting for the image selector to appear
page.wait_for_selector(".tile--img__img")
# Get the fully rendered page content including dynamically loaded elements
content = page.content()
# Parse the page content using BeautifulSoup for easier HTML traversal
soup = BeautifulSoup(content, "html.parser")
images = soup.find_all("img")
# Loop through the first three images only
for image in images[:3]:
# Safely extract the 'src' attribute with a default message if not found
src = image.get("src", "No src found")
# Safely extract the 'alt' attribute with a default message if not found
alt = image.get("alt", "No alt text")
print(src) # Print the image source URL
print(alt) # Print the image alt text
print("---------------------------------")
# Close the browser after the scraping is complete
browser.close()
scrape_duckduckgo_images()
//external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse3.mm.bing.net%2Fth%3Fid%3DOIP.jrcuppJ7JfrVrpa9iKnnnAHaHa%26pid%3DApi&f=1&ipt=a11d9de5b863682e82564114f090c443350005fe945cfdfdba2ca1a05a43fa2b&ipo=images
Advanced Python Tutorials - Real Python
---------------------------------
//external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse2.mm.bing.net%2Fth%3Fid%3DOIP.Po6Ot_fcf7ya7xkrOL27hQHaES%26pid%3DApi&f=1&ipt=156829965359c98ab2bbc69fb73e2a4963284ff665c83887d6278d6cecc08841&ipo=images
¿Para qué sirve Python?
---------------------------------
//external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse4.mm.bing.net%2Fth%3Fid%3DOIP._zLHmRNYHt-KYwYC8cC3RwHaHa%26pid%3DApi&f=1&ipt=04bdcfc11eee3ef4e96bf7d1b47230633b7c936363cf0c9f86c5dfa2e6fb4f32&ipo=images
¿Qué es Python y por qué debes aprender
In the above code, we're making a request to DuckDuckGo's search page with parameters that trigger the image search interface. However, this approach requires web scraping.
Scraping Google Images is technically possible and can be a good approach when API options don't meet your specific requirements. But there are several echnical obstacles that make it a complex and often unreliable approach
For many applications, using an official API from Bing or another provider is a more sustainable approach. However, for specific use cases or when other options aren't viable, let's explore some effective scraping techniques.
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
Here's an example of how to scrape a google images with the Scrapfly web scraping API:
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
tags=[
"player","project:default"
],
format="json",
extraction_model="search_engine_results",
country="us",
lang=[
"en"
],
asp=True,
render_js=True,
url="https://www.google.com/search?q=python&tbm=isch"
))
{
"query": "python - Google Search",
"results": [
{
"displayUrl": null,
"publishDate": null,
"richSnippet": null,
"snippet": null,
"title": "Wikipedia Python (programming language) - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Python_(programming_language)"
},
{
"displayUrl": null,
"publishDate": null,
"richSnippet": null,
"snippet": null,
"title": "Juni Learning What is Python Coding? | Juni Learning",
"url": "https://junilearning.com/blog/guide/what-is-python-101-for-students/"
},
{
"displayUrl": null,
"publishDate": null,
"richSnippet": null,
"snippet": null,
"title": "Wikiversity Python - Wikiversity",
"url": "https://en.wikiversity.org/wiki/Python"
},
...
}
For a direct approach to scraping Google Images using Python, the following code demonstrates how to extract image data using Requests and BeautifulSoup:
import requests
from bs4 import BeautifulSoup
import random
import time
from lxml import etree # For XPath support
def scrape_google_images_bs4(query, num_results=20):
# Encode the search query
encoded_query = query.replace(" ", "+")
# Set up headers to mimic a browser
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
]
headers = {
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Referer": "https://www.google.com/"
}
# Make the request
url = f"https://www.google.com/search?q={encoded_query}&tbm=isch"
response = requests.get(url, headers=headers)
if response.status_code != 200:
print(f"Failed to retrieve the page: {response.status_code}")
return []
# Parse the HTML using both BeautifulSoup and lxml for XPath
soup = BeautifulSoup(response.text, 'html.parser')
dom = etree.HTML(str(soup)) # Convert to lxml object for XPath
# Process the response
image_data = []
# Use XPath to select divs instead of class-based selection
# This pattern selects all similar divs in the structure
base_xpath = "/html/body/div[3]/div/div[14]/div/div[2]/div[2]/div/div/div/div/div[1]/div/div/div"
# Get all div indices to match the pattern
div_indices = range(1, num_results + 1) # Start with 1 through num_results
for i in div_indices:
try:
# Create XPath for the current div
current_xpath = f"{base_xpath}[{i}]"
div_element = dom.xpath(current_xpath)
if not div_element:
continue
item = {}
# Get the data-lpage attribute (page URL) from the div
page_url_xpath = f"{current_xpath}/@data-lpage"
page_url = dom.xpath(page_url_xpath)
if page_url:
item["page_url"] = page_url[0]
# Get the alt text of the image
alt_xpath = f"{current_xpath}//img/@alt"
alt_text = dom.xpath(alt_xpath)
if alt_text:
item["alt_text"] = alt_text[0]
if item:
image_data.append(item)
# Stop if we've reached the requested number of results
if len(image_data) >= num_results:
break
except Exception as e:
print(f"Error processing element {i}: {e}")
return image_data
# Example usage
image_data = scrape_google_images_bs4("python", num_results=5)
print(image_data)
[{'page_url': 'https://en.wikipedia.org/wiki/Python_(programming_language)', 'alt_text': '\u202aPython (programming language) - Wikipedia\u202c\u200f'},
{'page_url': 'https://beecrowd.com/blog-posts/best-python-courses/', 'alt_text': '\u202aPython: find out the best courses - beecrowd\u202c\u200f'},
{'page_url': 'https://junilearning.com/blog/guide/what-is-python-101-for-students/', 'alt_text': '\u202aWhat is Python Coding? | Juni Learning\u202c\u200f'},
{'page_url': 'https://medium.com/towards-data-science/what-is-a-python-environment-for-beginners-7f06911cf01a', 'alt_text': "\u202aWhat Is a 'Python Environment'? (For Beginners) | by Mark Jamison | TDS Archive | Medium\u202c\u200f"},
{'page_url': 'https://quantumzeitgeist.com/why-is-the-python-programming-language-so-popular/', 'alt_text': '\u202aWhy Is The Python Programming Language So Popular?\u202c\u200f'}]
In the above code, we created a Google Images scraper that uses XPath
targeting instead of class-based selectors for better reliability. The script mimics browser behavior with rotating user agents, fetches search results for a given query, and extracts both the source page URL (data-lpage
attribute) and image alt text
from the search results.
Reverse image search allows you to find similar images and their sources using an image as the query instead of text. Implementing this requires a slightly different approach, often involving browser automation with tools like Selenium.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import time
def google_reverse_image_search(image_url, max_results=5):
# Set up Chrome options
chrome_options = Options()
# chrome_options.add_argument("--headless") # Run in headless mode
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--window-size=1920,1080")
chrome_options.add_argument("--lang=en-US,en")
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en-US,en'})
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
# Initialize the driver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
try:
# Navigate to Google Images
driver.get("https://www.google.com/imghp?hl=en&gl=us")
# Find and click the camera icon for reverse search
camera_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//div[@aria-label='Search by image']"))
)
camera_button.click()
# Wait for the URL input field and enter the image URL
url_input = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//input[@placeholder='Paste image link']"))
)
url_input.send_keys(image_url)
# Click search button
search_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//div[text()='Search']"))
)
search_button.click()
# Wait for results page to load
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, "//div[contains(text(), 'All')]"))
)
# Extract similar image results
similar_images = []
# Click on "Find similar images" if available
try:
# Extract image data
for i in range(max_results):
try:
# Get image element using index in XPath
img_xpath = f"/html/body/div[3]/div/div[12]/div/div/div[2]/div[2]/div/div/div[1]/div/div/div/div/div/div/div[{i+1}]/div/div/div[1]/div/div/div/div/img"
img = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, img_xpath))
)
# Get image URL by clicking and extracting from larger preview
img.click()
time.sleep(1) # Wait for larger preview
# Find the large image
img_container = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, "//*[@id='Sva75c']/div[2]/div[2]/div/div[2]/c-wiz/div/div[2]/div/a[1]"))
)
img_url = driver.find_element(By.XPATH, "//*[@id='Sva75c']/div[2]/div[2]/div/div[2]/c-wiz/div/div[2]/div/a[1]/img").get_attribute("src")
# Get source website
source_url = img_container.get_attribute("href")
similar_images.append({
"url": img_url,
"source_url": source_url,
})
except Exception as e:
print(f"Error extracting image {i+1}: {e}")
except Exception as e:
print(f"Could not find 'similar images' link: {e}")
return similar_images
finally:
# Clean up
driver.quit()
# Example usage
sample_image_url = "https://avatars.githubusercontent.com/u/54183743?s=280&v=4"
similar_images = google_reverse_image_search(sample_image_url)
print("Similar Images:")
for idx, img in enumerate(similar_images, 1):
print(f"Image {idx}:")
print(f" URL: {img['url']}")
print(f" Source: {img['source_url']}")
print()
In the above code, we're using Selenium to automate the process of performing a reverse image search. This approach simulates a user visiting Google Images, clicking the camera icon, entering an image URL, and initiating the search. The full implementation would include parsing the results page to extract similar images, websites containing the image, and other relevant information.
This method requires more resources than simple HTTP requests but provides access to functionality that isn't easily available through direct scraping. For production use, you would need to add error handling, result parsing, and potentially proxy rotation to avoid detection.
No, Google does not offer an official Image Search API. The previously available Google Image Search API was deprecated and is no longer supported.
Alternatives to Google Image Search API include Bing Image Search API, DuckDuckGo Image Search, and image search APIs from other search engines like Yahoo and Yandex.
Scraping Google Images is possible, but it comes with challenges and legal considerations. It's important to use ethical scraping practices and consider using APIs provided by other search engines as alternatives.
In this article, we explored the Google Image Search API, its alternatives, and how to scrape Google Image Search results using Python. While Google does not offer an official Image Search API, developers can use the Google Custom Search JSON API or alternatives like Bing Image Search API and DuckDuckGo Image Search. Additionally, we discussed the challenges of scraping Google Images and provided example code snippets for scraping image search results.