ZyVOP Logo
Content That Connects
SeriesCategoriesTags
ZyVOP Logo
Content That Connects

Empowering developers and creators with cutting-edge insights, comprehensive tutorials, and innovative solutions for the digital future.

Content

  • Tags
  • Write Article
  • Newsletter

Company

  • About Us
  • Contact

Connect

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • DMCA Policy
  • Code of Conduct

© 2026 ZyVOP. Crafted with care for the developer community.

Made with ❤️ by the ZyVOP team
All systems operational
HomeScrapingHow to Scrape Google Search Results with Python Without Getting Blocked (2026)
Scraping
👍1

How to Scrape Google Search Results with Python Without Getting Blocked (2026)

Learn how to scrape Google Search results with Python in 2026 — from DIY scrapers with curl_cffi to SERP APIs. Working code, no blocks, real-world SEO use cases.

#python google scraper 2026#google SERP scraping python#scraping#beautifulsoup4#python google SERP scraper#google rank tracker python#Google SERP Scraping with Python
Z
ZyVOP

Senior Developer

June 7, 2026
10 min read
9 views
How to Scrape Google Search Results with Python Without Getting Blocked (2026)

Why Google SERP Scraping Is the Hardest Job in Scraping

Scraping Google Search is arguably the most requested — and most misunderstood — Python scraping task in 2026.

It powers competitive SEO analysis, rank tracking, SERP feature monitoring, AI training data collection, and market research. If you know what keywords your competitors rank for and what SERP features appear for each one, you have a massive strategic advantage.

But Google's anti-bot defences are, in 2026, smarter than ever. If you try to scrape Google using a simple Python script with requests and BeautifulSoup, you will get blocked within 10 queries — often immediately. Google detects and blocks naive scrapers through TLS fingerprinting, JavaScript challenges, CAPTCHA injection, and IP banning at the data-centre level.

This guide gives you the full truth: what works, what doesn't, the legal context, and four working approaches ordered from DIY to fully managed.


Understanding Google's Anti-Bot Stack

Before writing any code, understand what you're up against:

Request arrives at Google
       ↓
TLS fingerprint check — Does your JA3 hash match a known browser?
       ↓
IP reputation check — Datacenter? Banned range? Previous violations?
       ↓
Header analysis — Correct order? sec-ch-ua present? Accept-Language set?
       ↓
JavaScript challenge — Can you execute JS? (Invisible to non-JS clients)
       ↓
Behavioural analysis — Request frequency, query patterns, session depth
       ↓
CAPTCHA — Last resort for uncertain cases

You need to defeat enough layers to score below Google's bot-confidence threshold. The more layers you address, the longer your scraper survives.


The Legal and Ethical Context

Scraping Google Search has been legally contested. The key points:

  • Google's Terms of Service prohibit automated scraping of search results without explicit permission

  • The Computer Fraud and Abuse Act (CFAA) in the US has been cited in cease-and-desist letters

  • However, Google offers its own Custom Search JSON API (100 free queries/day, paid beyond that) for legitimate programmatic access

  • For SEO tools, most professional platforms use licensed SERP data from providers like SerpApi, DataForSEO, or Bright Data — not DIY scrapers

Bottom line: Use the DIY approaches below for personal projects, learning, and low-volume research. For commercial or production SEO tools, use a licensed API.


Approach 1: The DIY Scraper (curl_cffi + BeautifulSoup)

This is the most technically educational approach and works for low-volume personal research.

Why requests fails immediately

import requests

# This gets you blocked within 1-2 queries in 2026
r = requests.get(
    "https://www.google.com/search?q=python+web+scraping",
    headers={"User-Agent": "Mozilla/5.0"}
)
print(r.status_code)  # 429 or 302 (redirected to CAPTCHA page)

Python's requests has a distinctive TLS fingerprint that Google's systems recognise instantly.

The working approach: curl_cffi with TLS impersonation

pip install curl_cffi beautifulsoup4 pandas
from curl_cffi import requests as cffi_requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import random

# Google-like headers — exact order matters
HEADERS = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,"
              "image/avif,image/webp,image/apng,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Cache-Control": "max-age=0",
    "sec-ch-ua": '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"Windows"',
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
}

PROXIES_POOL = [
    # Add residential proxies here: "http://user:pass@host:port"
    # Without residential proxies, you'll hit rate limits quickly
]

def get_google_results(
    query: str,
    num_results: int = 10,
    country: str = "us",
    language: str = "en"
) -> list[dict]:
    """
    Scrape Google SERP for organic results.
    Returns a list of {rank, title, url, snippet} dicts.
    """
    params = {
        "q": query,
        "num": num_results,
        "hl": language,
        "gl": country,
        "pws": "0",   # Disable personalised results
    }

    # Build query string manually (preserves param order)
    query_string = "&".join(f"{k}={v}" for k, v in params.items())
    url = f"https://www.google.com/search?{query_string}"

    proxy = random.choice(PROXIES_POOL) if PROXIES_POOL else None

    response = cffi_requests.get(
        url,
        headers=HEADERS,
        impersonate="chrome120",  # Exact TLS fingerprint match
        proxies={"https": proxy} if proxy else None,
        timeout=15,
    )

    if response.status_code != 200:
        raise Exception(f"Got status {response.status_code}")

    return parse_serp(response.text)


def parse_serp(html: str) -> list[dict]:
    """Parse organic results from Google SERP HTML."""
    soup = BeautifulSoup(html, "lxml")
    results = []

    # Google's organic result containers — class names change frequently
    # These were accurate as of June 2026; update via DevTools if needed
    for rank, result_div in enumerate(soup.select("div.g"), start=1):
        title_el  = result_div.select_one("h3")
        url_el    = result_div.select_one("a[href]")
        snippet_el = result_div.select_one(".VwiC3b, .lEBKkf")

        if not title_el or not url_el:
            continue

        href = url_el.get("href", "")
        # Google wraps URLs in /url?q= redirects — extract the real URL
        if href.startswith("/url?q="):
            href = href.split("/url?q=")[1].split("&")[0]

        results.append({
            "rank":    rank,
            "title":   title_el.get_text(strip=True),
            "url":     href,
            "snippet": snippet_el.get_text(strip=True) if snippet_el else "",
        })

    return results


def scrape_keyword_list(keywords: list[str], delay: tuple = (4, 9)) -> pd.DataFrame:
    """
    Scrape Google results for a list of keywords.
    Conservative delay between requests reduces ban risk.
    """
    all_results = []

    for i, keyword in enumerate(keywords):
        print(f"[{i+1}/{len(keywords)}] Scraping: '{keyword}'")
        try:
            results = get_google_results(keyword, num_results=10)
            for r in results:
                r["keyword"] = keyword
            all_results.extend(results)

        except Exception as e:
            print(f"  Error on '{keyword}': {e}")

        # Human-like delay between queries — NEVER remove this
        sleep_time = random.uniform(*delay)
        print(f"  Waiting {sleep_time:.1f}s before next query...")
        time.sleep(sleep_time)

    return pd.DataFrame(all_results)


# Example usage
keywords = [
    "python web scraping tutorial 2026",
    "best python scraping libraries",
    "how to scrape without getting blocked",
]

df = scrape_keyword_list(keywords)
df.to_csv("serp_results.csv", index=False)
print(df[["keyword", "rank", "title", "url"]].head(10))

Extracting SERP Features

Beyond organic results, Google SERPs contain rich features worth extracting:

def extract_serp_features(html: str) -> dict:
    """Extract SERP features: featured snippet, PAA, local pack, etc."""
    soup = BeautifulSoup(html, "lxml")
    features = {}

    # ── Featured Snippet (Position 0) ──────────────────────────
    featured = soup.select_one(".hgKElc, .LGOjhe")
    if featured:
        features["featured_snippet"] = featured.get_text(strip=True)

    # ── People Also Ask (PAA) ───────────────────────────────────
    paa_questions = soup.select(".related-question-pair span.CSkcDe")
    if paa_questions:
        features["people_also_ask"] = [q.get_text(strip=True) for q in paa_questions]

    # ── Related Searches ────────────────────────────────────────
    related = soup.select("a .s75CSd")
    if related:
        features["related_searches"] = [r.get_text(strip=True) for r in related]

    # ── Knowledge Panel ──────────────────────────────────────────
    kp_title = soup.select_one(".qrShPb span")
    if kp_title:
        features["knowledge_panel_entity"] = kp_title.get_text(strip=True)

    # ── Local Pack (Map Results) ─────────────────────────────────
    local_results = soup.select(".rllt__details")
    if local_results:
        features["local_pack"] = [r.get_text(strip=True) for r in local_results[:3]]

    return features

Approach 2: Google Custom Search API (Official, Free Tier)

For clean, reliable data without bot-detection risk, Google's official API is often the best choice for low-volume use:

pip install google-api-python-client
from googleapiclient.discovery import build
import pandas as pd

# Requirements:
# 1. Create project at console.developers.google.com
# 2. Enable "Custom Search API"
# 3. Create API key
# 4. Create Programmable Search Engine at programmablesearchengine.google.com
# 5. Get your Search Engine ID (cx)

API_KEY = "YOUR_GOOGLE_API_KEY"
CSE_ID  = "YOUR_CUSTOM_SEARCH_ENGINE_ID"

def google_api_search(query: str, num: int = 10) -> list[dict]:
    """
    Official Google Custom Search API.
    Free tier: 100 queries/day.
    Paid: $5 per 1,000 queries beyond free tier.
    """
    service = build("customsearch", "v1", developerKey=API_KEY)

    results_list = []
    # API returns max 10 results per call; paginate for more
    for start in range(1, num + 1, 10):
        response = service.cse().list(
            q=query,
            cx=CSE_ID,
            start=start,
            num=min(10, num - start + 1),
        ).execute()

        for i, item in enumerate(response.get("items", []), start=start):
            results_list.append({
                "rank":    i,
                "title":   item.get("title"),
                "url":     item.get("link"),
                "snippet": item.get("snippet"),
                "domain":  item.get("displayLink"),
            })

    return results_list


# Clean, simple, and legal
results = google_api_search("python web scraping 2026", num=10)
df = pd.DataFrame(results)
print(df[["rank", "title", "domain"]])

Limitations: The Custom Search API searches your defined search engine scope — not all of Google. Results differ from google.com organic results, making it less suitable for pure rank-tracking.


Approach 3: SerpApi (Managed, Production-Ready)

For production SEO tools and higher volume, SerpApi is the industry standard. It handles all anti-bot complexity and delivers clean, structured JSON:

pip install google-search-results
from serpapi import GoogleSearch
import pandas as pd

def serpapi_search(
    query: str,
    api_key: str,
    location: str = "India",
    num: int = 10
) -> dict:
    """
    SerpApi Google Search — returns structured JSON with all SERP features.
    Pricing: $50/month for 5,000 searches (2026 rates).
    Free tier: 100 searches/month.
    """
    params = {
        "engine":   "google",
        "q":        query,
        "api_key":  api_key,
        "location": location,
        "hl":       "en",
        "gl":       "in",
        "num":      num,
        "no_cache": False,  # Use cache when available to save credits
    }

    search  = GoogleSearch(params)
    results = search.get_dict()

    output = {
        "query":            query,
        "organic_results":  [],
        "featured_snippet": None,
        "people_also_ask":  [],
        "related_searches": [],
        "total_results":    results.get("search_information", {})
                                   .get("total_results"),
    }

    # Organic results
    for r in results.get("organic_results", []):
        output["organic_results"].append({
            "rank":    r.get("position"),
            "title":   r.get("title"),
            "url":     r.get("link"),
            "snippet": r.get("snippet"),
            "domain":  r.get("displayed_link"),
        })

    # Featured snippet
    if "answer_box" in results:
        output["featured_snippet"] = results["answer_box"].get("answer") or \
                                     results["answer_box"].get("snippet")

    # People Also Ask
    for paa in results.get("related_questions", []):
        output["people_also_ask"].append({
            "question": paa.get("question"),
            "snippet":  paa.get("snippet"),
        })

    # Related searches
    output["related_searches"] = [
        r.get("query") for r in results.get("related_searches", [])
    ]

    return output


# Usage
result = serpapi_search(
    query="python web scraping tutorial",
    api_key="YOUR_SERPAPI_KEY",
    location="Mumbai, India"
)

df = pd.DataFrame(result["organic_results"])
print(df[["rank", "title", "domain"]])
print("\nPeople Also Ask:")
for paa in result["people_also_ask"][:3]:
    print(f"  Q: {paa['question']}")

Approach 4: Building a Rank Tracker (Practical SEO Tool)

Putting it all together into a real-world keyword rank tracker:

import pandas as pd
import json
import time
import random
from datetime import datetime, timezone
from pathlib import Path

class KeywordRankTracker:
    """
    Track keyword rankings over time.
    Stores historical data in JSON for trend analysis.
    """

    def __init__(self, project_name: str, api_key: str):
        self.project_name = project_name
        self.api_key      = api_key
        self.data_file    = Path(f"{project_name}_rankings.json")
        self.history      = self._load_history()

    def _load_history(self) -> list:
        if self.data_file.exists():
            with open(self.data_file) as f:
                return json.load(f)
        return []

    def _save_history(self):
        with open(self.data_file, "w") as f:
            json.dump(self.history, f, indent=2)

    def check_rankings(
        self,
        keywords: list[str],
        target_domain: str,
        location: str = "India"
    ) -> pd.DataFrame:
        """
        Check where target_domain ranks for each keyword.
        """
        timestamp = datetime.now(timezone.utc).isoformat()
        records   = []

        for keyword in keywords:
            print(f"Checking: '{keyword}'")
            result  = serpapi_search(keyword, self.api_key, location)
            ranking = None

            for r in result["organic_results"]:
                if target_domain.lower() in (r.get("domain") or "").lower():
                    ranking = r["rank"]
                    break

            record = {
                "timestamp": timestamp,
                "keyword":   keyword,
                "domain":    target_domain,
                "rank":      ranking,          # None = not in top 10
                "in_top_10": ranking is not None,
            }
            records.append(record)
            self.history.append(record)
            time.sleep(random.uniform(1.5, 3.0))

        self._save_history()
        return pd.DataFrame(records)

    def get_trend_report(self) -> pd.DataFrame:
        """Show ranking changes over time for each keyword."""
        if not self.history:
            return pd.DataFrame()

        df = pd.DataFrame(self.history)
        df["timestamp"] = pd.to_datetime(df["timestamp"])
        df = df.sort_values("timestamp")

        # Pivot to show rank by date
        pivot = df.pivot_table(
            index="keyword",
            columns=df["timestamp"].dt.date,
            values="rank",
            aggfunc="first"
        )
        return pivot


# Usage
tracker = KeywordRankTracker(
    project_name="my_blog",
    api_key="YOUR_SERPAPI_KEY"
)

# Check weekly rankings
rankings = tracker.check_rankings(
    keywords=[
        "python web scraping tutorial",
        "python async scraping",
        "scrapy mongodb tutorial",
    ],
    target_domain="yourblog.com",
    location="India"
)

print("\nCurrent rankings:")
print(rankings[["keyword", "rank", "in_top_10"]].to_string(index=False))

# Show historical trend
trend = tracker.get_trend_report()
if not trend.empty:
    print("\nRanking trend:")
    print(trend)

Competitor SERP Analysis: Who's Outranking You and Why

def analyse_serp_competitors(
    keyword: str,
    your_domain: str,
    api_key: str
) -> dict:
    """
    Analyse who ranks in the top 10 for a keyword,
    what their titles/snippets look like, and where you stand.
    """
    result    = serpapi_search(keyword, api_key, num=10)
    organics  = result["organic_results"]
    your_rank = None

    competitor_analysis = []
    for r in organics:
        domain     = r.get("domain", "")
        is_you     = your_domain.lower() in domain.lower()
        if is_you:
            your_rank = r["rank"]

        # Title length analysis (55-60 chars is Google's sweet spot)
        title_len  = len(r.get("title") or "")
        # Snippet length analysis
        snippet_len = len(r.get("snippet") or "")

        competitor_analysis.append({
            "rank":         r["rank"],
            "domain":       domain,
            "title":        r.get("title"),
            "title_length": title_len,
            "snippet_length": snippet_len,
            "is_you":       is_you,
        })

    return {
        "keyword":    keyword,
        "your_rank":  your_rank or "Not in top 10",
        "gap_to_top": (your_rank - 1) if your_rank else None,
        "competitors": competitor_analysis,
        "featured_snippet_exists": result["featured_snippet"] is not None,
        "paa_count":  len(result["people_also_ask"]),
        "paa_questions": [q["question"] for q in result["people_also_ask"]],
    }


# Run competitor analysis
analysis = analyse_serp_competitors(
    keyword="python web scraping tutorial 2026",
    your_domain="yourblog.com",
    api_key="YOUR_SERPAPI_KEY"
)

print(f"\nKeyword: {analysis['keyword']}")
print(f"Your rank: {analysis['your_rank']}")
print(f"Featured snippet: {'YES' if analysis['featured_snippet_exists'] else 'No'}")
print(f"\nTop 5 competitors:")
for c in analysis["competitors"][:5]:
    marker = " ← YOU" if c["is_you"] else ""
    print(f"  #{c['rank']} {c['domain']}{marker} — title: {c['title_length']} chars")

print(f"\nPeople Also Ask ({analysis['paa_count']} questions):")
for q in analysis["paa_questions"]:
    print(f"  • {q}")

Scaling Up: Async SERP Scraping

For bulk keyword research (hundreds of queries), async execution cuts runtime by 5–10x:

import asyncio
from curl_cffi.requests import AsyncSession
import random

SEMAPHORE = asyncio.Semaphore(5)   # 5 concurrent — conservative for Google

async def async_google_search(
    session: AsyncSession,
    query: str,
    delay_range: tuple = (3, 7)
) -> tuple[str, list[dict]]:
    """Async version of Google scraper."""
    async with SEMAPHORE:
        await asyncio.sleep(random.uniform(*delay_range))

        params = f"q={query.replace(' ', '+')}&num=10&hl=en&gl=in&pws=0"
        url    = f"https://www.google.com/search?{params}"

        try:
            r = await session.get(
                url,
                headers=HEADERS,
                impersonate="chrome120",
                timeout=15,
            )
            results = parse_serp(r.text) if r.status_code == 200 else []
            return query, results
        except Exception as e:
            print(f"Failed '{query}': {e}")
            return query, []


async def bulk_serp_scrape(keywords: list[str]) -> pd.DataFrame:
    """Scrape hundreds of keywords asynchronously."""
    all_records = []

    async with AsyncSession(impersonate="chrome120") as session:
        tasks   = [async_google_search(session, kw) for kw in keywords]
        results = await asyncio.gather(*tasks)

    for keyword, serp_results in results:
        for r in serp_results:
            r["keyword"] = keyword
            all_records.append(r)

    return pd.DataFrame(all_records)


# 100 keywords in ~2 minutes instead of ~15 minutes
keywords = [f"python {topic}" for topic in [
    "scraping tutorial", "async tutorial", "data pipeline",
    "mongodb tutorial", "playwright guide", "scrapy example"
]]

df = asyncio.run(bulk_serp_scrape(keywords))
df.to_csv("bulk_serp.csv", index=False)

Choosing Your Approach: Decision Guide

Situation

Best Approach

Learning / personal project

DIY with curl_cffi (Approach 1)

< 100 queries/day, official use

Google Custom Search API (Approach 2)

Production SEO tool, any volume

SerpApi or DataForSEO (Approach 3)

Bulk research, 1000+ keywords

Async DIY with residential proxies

International SERP analysis

SerpApi (supports location targeting)

SERP feature monitoring (PAA, featured)

SerpApi (returns structured features)


Common Errors and Fixes

Getting a CAPTCHA page immediately Your IP is on a blocklist. Switch to a residential proxy or use SerpApi.

HTML parses but no results found Google changed its CSS class names — they change every few weeks. Open DevTools → Inspect → find the new class for .g results and update parse_serp().

429 Too Many Requests You're hitting Google's rate limit. Increase your delay to 6–10 seconds between queries, or switch to a proxy pool with IP rotation.

Results look different from browser You're likely getting a different locale. Add &hl=en&gl=us to your query params and set Accept-Language: en-US,en;q=0.9 in headers.


FAQ

Q: Is scraping Google legal? Technically it violates Google's Terms of Service. However, scraping publicly available search results for personal research purposes has not typically resulted in legal action against individuals. Commercial use of scraped Google data is a different matter — use SerpApi or DataForSEO for that.

Q: How many queries per day can I make before getting blocked? With rotating residential proxies and 4–9 second delays: 200–500 queries/day per IP. Without proxies: 20–50 before hitting a CAPTCHA.

Q: What's the best free alternative to SerpApi? The Google Custom Search JSON API gives 100 free queries/day officially. For DIY, use the curl_cffi approach above.

Q: Can I track rankings for Google India specifically? Yes — add &gl=in&hl=en to your query params, or set location: "India" in SerpApi params.


Summary

Approach

Cost

Volume

Reliability

Best For

curl_cffi DIY

Free

Low-Medium

Medium

Learning, personal use

Google CSE API

Free/paid

Low

High (official)

Authorised access

SerpApi

$50+/month

Unlimited

Very High

Production tools

DataForSEO

Pay-per-use

Unlimited

Very High

Enterprise SEO

Z

ZyVOP

Passionate developer sharing knowledge about modern web technologies and best practices.

Comments (0)

Login to post a comment.

Table of Contents

Why Google SERP Scraping Is the Hardest Job in ScrapingUnderstanding Google's Anti-Bot StackThe Legal and Ethical ContextApproach 1: The DIY Scraper (curl_cffi + BeautifulSoup)Why requests fails immediatelyThe working approach: curl_cffi with TLS impersonationExtracting SERP FeaturesApproach 2: Google Custom Search API (Official, Free Tier)Approach 3: SerpApi (Managed, Production-Ready)Approach 4: Building a Rank Tracker (Practical SEO Tool)Competitor SERP Analysis: Who's Outranking You and WhyScaling Up: Async SERP ScrapingChoosing Your Approach: Decision GuideCommon Errors and FixesFAQSummary

Stay Updated

Get the latest articles delivered to your inbox.

We respect your privacy. Unsubscribe anytime.

Popular Tags

#.env.example Node.js#0x profiling#10x faster python scraper tutorial#12-factor#2026#AI#AI agents#AI code quality#AI code security#AI coding