How to Scrape Google Search Results with Python Without Getting Blocked (2026)
Learn how to scrape Google Search results with Python in 2026 — from DIY scrapers with curl_cffi to SERP APIs. Working code, no blocks, real-world SEO use cases.
Senior Developer

Why Google SERP Scraping Is the Hardest Job in Scraping
Scraping Google Search is arguably the most requested — and most misunderstood — Python scraping task in 2026.
It powers competitive SEO analysis, rank tracking, SERP feature monitoring, AI training data collection, and market research. If you know what keywords your competitors rank for and what SERP features appear for each one, you have a massive strategic advantage.
But Google's anti-bot defences are, in 2026, smarter than ever. If you try to scrape Google using a simple Python script with requests and BeautifulSoup, you will get blocked within 10 queries — often immediately. Google detects and blocks naive scrapers through TLS fingerprinting, JavaScript challenges, CAPTCHA injection, and IP banning at the data-centre level.
This guide gives you the full truth: what works, what doesn't, the legal context, and four working approaches ordered from DIY to fully managed.
Understanding Google's Anti-Bot Stack
Before writing any code, understand what you're up against:
Request arrives at Google
↓
TLS fingerprint check — Does your JA3 hash match a known browser?
↓
IP reputation check — Datacenter? Banned range? Previous violations?
↓
Header analysis — Correct order? sec-ch-ua present? Accept-Language set?
↓
JavaScript challenge — Can you execute JS? (Invisible to non-JS clients)
↓
Behavioural analysis — Request frequency, query patterns, session depth
↓
CAPTCHA — Last resort for uncertain casesYou need to defeat enough layers to score below Google's bot-confidence threshold. The more layers you address, the longer your scraper survives.
The Legal and Ethical Context
Scraping Google Search has been legally contested. The key points:
Google's Terms of Service prohibit automated scraping of search results without explicit permission
The Computer Fraud and Abuse Act (CFAA) in the US has been cited in cease-and-desist letters
However, Google offers its own Custom Search JSON API (100 free queries/day, paid beyond that) for legitimate programmatic access
For SEO tools, most professional platforms use licensed SERP data from providers like SerpApi, DataForSEO, or Bright Data — not DIY scrapers
Bottom line: Use the DIY approaches below for personal projects, learning, and low-volume research. For commercial or production SEO tools, use a licensed API.
Approach 1: The DIY Scraper (curl_cffi + BeautifulSoup)
This is the most technically educational approach and works for low-volume personal research.
Why requests fails immediately
import requests
# This gets you blocked within 1-2 queries in 2026
r = requests.get(
"https://www.google.com/search?q=python+web+scraping",
headers={"User-Agent": "Mozilla/5.0"}
)
print(r.status_code) # 429 or 302 (redirected to CAPTCHA page)Python's requests has a distinctive TLS fingerprint that Google's systems recognise instantly.
The working approach: curl_cffi with TLS impersonation
pip install curl_cffi beautifulsoup4 pandasfrom curl_cffi import requests as cffi_requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import random
# Google-like headers — exact order matters
HEADERS = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,"
"image/avif,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Cache-Control": "max-age=0",
"sec-ch-ua": '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
}
PROXIES_POOL = [
# Add residential proxies here: "http://user:pass@host:port"
# Without residential proxies, you'll hit rate limits quickly
]
def get_google_results(
query: str,
num_results: int = 10,
country: str = "us",
language: str = "en"
) -> list[dict]:
"""
Scrape Google SERP for organic results.
Returns a list of {rank, title, url, snippet} dicts.
"""
params = {
"q": query,
"num": num_results,
"hl": language,
"gl": country,
"pws": "0", # Disable personalised results
}
# Build query string manually (preserves param order)
query_string = "&".join(f"{k}={v}" for k, v in params.items())
url = f"https://www.google.com/search?{query_string}"
proxy = random.choice(PROXIES_POOL) if PROXIES_POOL else None
response = cffi_requests.get(
url,
headers=HEADERS,
impersonate="chrome120", # Exact TLS fingerprint match
proxies={"https": proxy} if proxy else None,
timeout=15,
)
if response.status_code != 200:
raise Exception(f"Got status {response.status_code}")
return parse_serp(response.text)
def parse_serp(html: str) -> list[dict]:
"""Parse organic results from Google SERP HTML."""
soup = BeautifulSoup(html, "lxml")
results = []
# Google's organic result containers — class names change frequently
# These were accurate as of June 2026; update via DevTools if needed
for rank, result_div in enumerate(soup.select("div.g"), start=1):
title_el = result_div.select_one("h3")
url_el = result_div.select_one("a[href]")
snippet_el = result_div.select_one(".VwiC3b, .lEBKkf")
if not title_el or not url_el:
continue
href = url_el.get("href", "")
# Google wraps URLs in /url?q= redirects — extract the real URL
if href.startswith("/url?q="):
href = href.split("/url?q=")[1].split("&")[0]
results.append({
"rank": rank,
"title": title_el.get_text(strip=True),
"url": href,
"snippet": snippet_el.get_text(strip=True) if snippet_el else "",
})
return results
def scrape_keyword_list(keywords: list[str], delay: tuple = (4, 9)) -> pd.DataFrame:
"""
Scrape Google results for a list of keywords.
Conservative delay between requests reduces ban risk.
"""
all_results = []
for i, keyword in enumerate(keywords):
print(f"[{i+1}/{len(keywords)}] Scraping: '{keyword}'")
try:
results = get_google_results(keyword, num_results=10)
for r in results:
r["keyword"] = keyword
all_results.extend(results)
except Exception as e:
print(f" Error on '{keyword}': {e}")
# Human-like delay between queries — NEVER remove this
sleep_time = random.uniform(*delay)
print(f" Waiting {sleep_time:.1f}s before next query...")
time.sleep(sleep_time)
return pd.DataFrame(all_results)
# Example usage
keywords = [
"python web scraping tutorial 2026",
"best python scraping libraries",
"how to scrape without getting blocked",
]
df = scrape_keyword_list(keywords)
df.to_csv("serp_results.csv", index=False)
print(df[["keyword", "rank", "title", "url"]].head(10))Extracting SERP Features
Beyond organic results, Google SERPs contain rich features worth extracting:
def extract_serp_features(html: str) -> dict:
"""Extract SERP features: featured snippet, PAA, local pack, etc."""
soup = BeautifulSoup(html, "lxml")
features = {}
# ── Featured Snippet (Position 0) ──────────────────────────
featured = soup.select_one(".hgKElc, .LGOjhe")
if featured:
features["featured_snippet"] = featured.get_text(strip=True)
# ── People Also Ask (PAA) ───────────────────────────────────
paa_questions = soup.select(".related-question-pair span.CSkcDe")
if paa_questions:
features["people_also_ask"] = [q.get_text(strip=True) for q in paa_questions]
# ── Related Searches ────────────────────────────────────────
related = soup.select("a .s75CSd")
if related:
features["related_searches"] = [r.get_text(strip=True) for r in related]
# ── Knowledge Panel ──────────────────────────────────────────
kp_title = soup.select_one(".qrShPb span")
if kp_title:
features["knowledge_panel_entity"] = kp_title.get_text(strip=True)
# ── Local Pack (Map Results) ─────────────────────────────────
local_results = soup.select(".rllt__details")
if local_results:
features["local_pack"] = [r.get_text(strip=True) for r in local_results[:3]]
return featuresApproach 2: Google Custom Search API (Official, Free Tier)
For clean, reliable data without bot-detection risk, Google's official API is often the best choice for low-volume use:
pip install google-api-python-clientfrom googleapiclient.discovery import build
import pandas as pd
# Requirements:
# 1. Create project at console.developers.google.com
# 2. Enable "Custom Search API"
# 3. Create API key
# 4. Create Programmable Search Engine at programmablesearchengine.google.com
# 5. Get your Search Engine ID (cx)
API_KEY = "YOUR_GOOGLE_API_KEY"
CSE_ID = "YOUR_CUSTOM_SEARCH_ENGINE_ID"
def google_api_search(query: str, num: int = 10) -> list[dict]:
"""
Official Google Custom Search API.
Free tier: 100 queries/day.
Paid: $5 per 1,000 queries beyond free tier.
"""
service = build("customsearch", "v1", developerKey=API_KEY)
results_list = []
# API returns max 10 results per call; paginate for more
for start in range(1, num + 1, 10):
response = service.cse().list(
q=query,
cx=CSE_ID,
start=start,
num=min(10, num - start + 1),
).execute()
for i, item in enumerate(response.get("items", []), start=start):
results_list.append({
"rank": i,
"title": item.get("title"),
"url": item.get("link"),
"snippet": item.get("snippet"),
"domain": item.get("displayLink"),
})
return results_list
# Clean, simple, and legal
results = google_api_search("python web scraping 2026", num=10)
df = pd.DataFrame(results)
print(df[["rank", "title", "domain"]])Limitations: The Custom Search API searches your defined search engine scope — not all of Google. Results differ from google.com organic results, making it less suitable for pure rank-tracking.
Approach 3: SerpApi (Managed, Production-Ready)
For production SEO tools and higher volume, SerpApi is the industry standard. It handles all anti-bot complexity and delivers clean, structured JSON:
pip install google-search-resultsfrom serpapi import GoogleSearch
import pandas as pd
def serpapi_search(
query: str,
api_key: str,
location: str = "India",
num: int = 10
) -> dict:
"""
SerpApi Google Search — returns structured JSON with all SERP features.
Pricing: $50/month for 5,000 searches (2026 rates).
Free tier: 100 searches/month.
"""
params = {
"engine": "google",
"q": query,
"api_key": api_key,
"location": location,
"hl": "en",
"gl": "in",
"num": num,
"no_cache": False, # Use cache when available to save credits
}
search = GoogleSearch(params)
results = search.get_dict()
output = {
"query": query,
"organic_results": [],
"featured_snippet": None,
"people_also_ask": [],
"related_searches": [],
"total_results": results.get("search_information", {})
.get("total_results"),
}
# Organic results
for r in results.get("organic_results", []):
output["organic_results"].append({
"rank": r.get("position"),
"title": r.get("title"),
"url": r.get("link"),
"snippet": r.get("snippet"),
"domain": r.get("displayed_link"),
})
# Featured snippet
if "answer_box" in results:
output["featured_snippet"] = results["answer_box"].get("answer") or \
results["answer_box"].get("snippet")
# People Also Ask
for paa in results.get("related_questions", []):
output["people_also_ask"].append({
"question": paa.get("question"),
"snippet": paa.get("snippet"),
})
# Related searches
output["related_searches"] = [
r.get("query") for r in results.get("related_searches", [])
]
return output
# Usage
result = serpapi_search(
query="python web scraping tutorial",
api_key="YOUR_SERPAPI_KEY",
location="Mumbai, India"
)
df = pd.DataFrame(result["organic_results"])
print(df[["rank", "title", "domain"]])
print("\nPeople Also Ask:")
for paa in result["people_also_ask"][:3]:
print(f" Q: {paa['question']}")Approach 4: Building a Rank Tracker (Practical SEO Tool)
Putting it all together into a real-world keyword rank tracker:
import pandas as pd
import json
import time
import random
from datetime import datetime, timezone
from pathlib import Path
class KeywordRankTracker:
"""
Track keyword rankings over time.
Stores historical data in JSON for trend analysis.
"""
def __init__(self, project_name: str, api_key: str):
self.project_name = project_name
self.api_key = api_key
self.data_file = Path(f"{project_name}_rankings.json")
self.history = self._load_history()
def _load_history(self) -> list:
if self.data_file.exists():
with open(self.data_file) as f:
return json.load(f)
return []
def _save_history(self):
with open(self.data_file, "w") as f:
json.dump(self.history, f, indent=2)
def check_rankings(
self,
keywords: list[str],
target_domain: str,
location: str = "India"
) -> pd.DataFrame:
"""
Check where target_domain ranks for each keyword.
"""
timestamp = datetime.now(timezone.utc).isoformat()
records = []
for keyword in keywords:
print(f"Checking: '{keyword}'")
result = serpapi_search(keyword, self.api_key, location)
ranking = None
for r in result["organic_results"]:
if target_domain.lower() in (r.get("domain") or "").lower():
ranking = r["rank"]
break
record = {
"timestamp": timestamp,
"keyword": keyword,
"domain": target_domain,
"rank": ranking, # None = not in top 10
"in_top_10": ranking is not None,
}
records.append(record)
self.history.append(record)
time.sleep(random.uniform(1.5, 3.0))
self._save_history()
return pd.DataFrame(records)
def get_trend_report(self) -> pd.DataFrame:
"""Show ranking changes over time for each keyword."""
if not self.history:
return pd.DataFrame()
df = pd.DataFrame(self.history)
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.sort_values("timestamp")
# Pivot to show rank by date
pivot = df.pivot_table(
index="keyword",
columns=df["timestamp"].dt.date,
values="rank",
aggfunc="first"
)
return pivot
# Usage
tracker = KeywordRankTracker(
project_name="my_blog",
api_key="YOUR_SERPAPI_KEY"
)
# Check weekly rankings
rankings = tracker.check_rankings(
keywords=[
"python web scraping tutorial",
"python async scraping",
"scrapy mongodb tutorial",
],
target_domain="yourblog.com",
location="India"
)
print("\nCurrent rankings:")
print(rankings[["keyword", "rank", "in_top_10"]].to_string(index=False))
# Show historical trend
trend = tracker.get_trend_report()
if not trend.empty:
print("\nRanking trend:")
print(trend)Competitor SERP Analysis: Who's Outranking You and Why
def analyse_serp_competitors(
keyword: str,
your_domain: str,
api_key: str
) -> dict:
"""
Analyse who ranks in the top 10 for a keyword,
what their titles/snippets look like, and where you stand.
"""
result = serpapi_search(keyword, api_key, num=10)
organics = result["organic_results"]
your_rank = None
competitor_analysis = []
for r in organics:
domain = r.get("domain", "")
is_you = your_domain.lower() in domain.lower()
if is_you:
your_rank = r["rank"]
# Title length analysis (55-60 chars is Google's sweet spot)
title_len = len(r.get("title") or "")
# Snippet length analysis
snippet_len = len(r.get("snippet") or "")
competitor_analysis.append({
"rank": r["rank"],
"domain": domain,
"title": r.get("title"),
"title_length": title_len,
"snippet_length": snippet_len,
"is_you": is_you,
})
return {
"keyword": keyword,
"your_rank": your_rank or "Not in top 10",
"gap_to_top": (your_rank - 1) if your_rank else None,
"competitors": competitor_analysis,
"featured_snippet_exists": result["featured_snippet"] is not None,
"paa_count": len(result["people_also_ask"]),
"paa_questions": [q["question"] for q in result["people_also_ask"]],
}
# Run competitor analysis
analysis = analyse_serp_competitors(
keyword="python web scraping tutorial 2026",
your_domain="yourblog.com",
api_key="YOUR_SERPAPI_KEY"
)
print(f"\nKeyword: {analysis['keyword']}")
print(f"Your rank: {analysis['your_rank']}")
print(f"Featured snippet: {'YES' if analysis['featured_snippet_exists'] else 'No'}")
print(f"\nTop 5 competitors:")
for c in analysis["competitors"][:5]:
marker = " ← YOU" if c["is_you"] else ""
print(f" #{c['rank']} {c['domain']}{marker} — title: {c['title_length']} chars")
print(f"\nPeople Also Ask ({analysis['paa_count']} questions):")
for q in analysis["paa_questions"]:
print(f" • {q}")Scaling Up: Async SERP Scraping
For bulk keyword research (hundreds of queries), async execution cuts runtime by 5–10x:
import asyncio
from curl_cffi.requests import AsyncSession
import random
SEMAPHORE = asyncio.Semaphore(5) # 5 concurrent — conservative for Google
async def async_google_search(
session: AsyncSession,
query: str,
delay_range: tuple = (3, 7)
) -> tuple[str, list[dict]]:
"""Async version of Google scraper."""
async with SEMAPHORE:
await asyncio.sleep(random.uniform(*delay_range))
params = f"q={query.replace(' ', '+')}&num=10&hl=en&gl=in&pws=0"
url = f"https://www.google.com/search?{params}"
try:
r = await session.get(
url,
headers=HEADERS,
impersonate="chrome120",
timeout=15,
)
results = parse_serp(r.text) if r.status_code == 200 else []
return query, results
except Exception as e:
print(f"Failed '{query}': {e}")
return query, []
async def bulk_serp_scrape(keywords: list[str]) -> pd.DataFrame:
"""Scrape hundreds of keywords asynchronously."""
all_records = []
async with AsyncSession(impersonate="chrome120") as session:
tasks = [async_google_search(session, kw) for kw in keywords]
results = await asyncio.gather(*tasks)
for keyword, serp_results in results:
for r in serp_results:
r["keyword"] = keyword
all_records.append(r)
return pd.DataFrame(all_records)
# 100 keywords in ~2 minutes instead of ~15 minutes
keywords = [f"python {topic}" for topic in [
"scraping tutorial", "async tutorial", "data pipeline",
"mongodb tutorial", "playwright guide", "scrapy example"
]]
df = asyncio.run(bulk_serp_scrape(keywords))
df.to_csv("bulk_serp.csv", index=False)Choosing Your Approach: Decision Guide
Situation | Best Approach |
|---|---|
Learning / personal project | DIY with curl_cffi (Approach 1) |
< 100 queries/day, official use | Google Custom Search API (Approach 2) |
Production SEO tool, any volume | SerpApi or DataForSEO (Approach 3) |
Bulk research, 1000+ keywords | Async DIY with residential proxies |
International SERP analysis | SerpApi (supports location targeting) |
SERP feature monitoring (PAA, featured) | SerpApi (returns structured features) |
Common Errors and Fixes
Getting a CAPTCHA page immediately Your IP is on a blocklist. Switch to a residential proxy or use SerpApi.
HTML parses but no results found Google changed its CSS class names — they change every few weeks. Open DevTools → Inspect → find the new class for .g results and update parse_serp().
429 Too Many Requests You're hitting Google's rate limit. Increase your delay to 6–10 seconds between queries, or switch to a proxy pool with IP rotation.
Results look different from browser You're likely getting a different locale. Add &hl=en&gl=us to your query params and set Accept-Language: en-US,en;q=0.9 in headers.
FAQ
Q: Is scraping Google legal? Technically it violates Google's Terms of Service. However, scraping publicly available search results for personal research purposes has not typically resulted in legal action against individuals. Commercial use of scraped Google data is a different matter — use SerpApi or DataForSEO for that.
Q: How many queries per day can I make before getting blocked? With rotating residential proxies and 4–9 second delays: 200–500 queries/day per IP. Without proxies: 20–50 before hitting a CAPTCHA.
Q: What's the best free alternative to SerpApi? The Google Custom Search JSON API gives 100 free queries/day officially. For DIY, use the curl_cffi approach above.
Q: Can I track rankings for Google India specifically? Yes — add &gl=in&hl=en to your query params, or set location: "India" in SerpApi params.
Summary
Approach | Cost | Volume | Reliability | Best For |
|---|---|---|---|---|
curl_cffi DIY | Free | Low-Medium | Medium | Learning, personal use |
Google CSE API | Free/paid | Low | High (official) | Authorised access |
SerpApi | $50+/month | Unlimited | Very High | Production tools |
DataForSEO | Pay-per-use | Unlimited | Very High | Enterprise SEO |
Comments (0)
Login to post a comment.