Which topics does this article cover?

It highlights python financial data scraping stocks crypto 2026, scrape yahoo finance python 2026, yfinance alternative python, python stock market data scraping, alpha vantage python tutorial.

Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)

Q: What is "Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)" about?

Financial data is the foundation of trading systems, investment research, and market intelligence platforms. Learn how to collect stock, crypto, news, earnings, and SEC filing data using Python, free APIs, custom scrapers, and production-ready storage pipelines.

Why Financial Data Is the Most Valuable Data You Can Scrape

Money moves on information. Stock prices react to earnings reports in milliseconds. Crypto markets swing violently on a single tweet. Interest rate decisions cascade through every asset class within seconds of announcement.

The developers and analysts who can collect, clean, and process financial data faster than others have a structural edge — whether they're building algorithmic trading strategies, financial dashboards, portfolio trackers, market research tools, or AI-powered investment assistants.

In 2026, the Python financial data ecosystem is rich but fragmented. The landscape looks like this:

yfinance remains the most popular free option, but relying on unofficial web scraping means it's prone to rate limits, blocked IPs, and broken requests — it broke significantly after Yahoo Finance's February 2025 redesign
Official financial APIs (Alpha Vantage, Polygon.io, Finnhub) offer reliability and proper data guarantees but cost money beyond their free tiers
Custom scrapers targeting financial news sites, SEC filings, and earnings call transcripts fill gaps that no API covers

This guide gives you the complete playbook: free tools for prototyping, production-grade APIs for serious work, and custom scrapers for the data nobody else is packaging.

Part 1: yfinance — Still Useful, But Know Its Limits

yfinance covers prices, financials, analyst estimates, options chains, and institutional holdings with no signup and no API key. It's free, with a comprehensive dataset going back 20+ years and an active community with 10,000+ GitHub stars.

The tradeoff is fragility — it's web scraping, not a true API. Yahoo can change endpoints without notice, and there is no SLA, support, or data accuracy guarantee.

Use yfinance for: prototyping, personal projects, learning, backtesting on historical data. Avoid it for: production trading systems, real-time alerts, mission-critical analysis.

pip install yfinance pandas matplotlib

Stock price history

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

def get_stock_history(
    ticker: str,
    period: str = "1y",     # Valid: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
    interval: str = "1d"    # Valid: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo
) -> pd.DataFrame:
    """
    Download historical OHLCV data for a stock ticker.

    Returns a DataFrame with columns:
    Open, High, Low, Close, Adj Close, Volume
    """
    stock = yf.Ticker(ticker)
    df    = stock.history(period=period, interval=interval)

    if df.empty:
        raise ValueError(f"No data returned for {ticker} — ticker may be invalid")

    # Clean up the index
    df.index = pd.to_datetime(df.index).tz_localize(None)
    df.index.name = "date"

    # Calculate daily return
    df["daily_return_pct"] = df["Close"].pct_change() * 100
    df["ticker"] = ticker

    return df

# Example: download 1 year of data for multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA", "RELIANCE.NS"]  # .NS for NSE India

dfs = {}
for ticker in tickers:
    try:
        dfs[ticker] = get_stock_history(ticker, period="1y")
        print(f"  {ticker}: {len(dfs[ticker])} trading days")
    except Exception as e:
        print(f"  {ticker}: Error — {e}")

# Combine all into one DataFrame
combined = pd.concat(dfs.values())
combined.to_csv("stock_history.csv")

Fundamental data: financials, balance sheet, dividends

def get_fundamentals(ticker: str) -> dict:
    """
    Collect comprehensive fundamental data for a stock.
    Returns income statement, balance sheet, cash flow, and key metrics.
    """
    stock = yf.Ticker(ticker)
    info  = stock.info

    return {
        "overview": {
            "name":              info.get("longName"),
            "sector":            info.get("sector"),
            "industry":          info.get("industry"),
            "country":           info.get("country"),
            "employees":         info.get("fullTimeEmployees"),
            "website":           info.get("website"),
            "description":       (info.get("longBusinessSummary") or "")[:300],
        },
        "valuation": {
            "market_cap":        info.get("marketCap"),
            "pe_ratio":          info.get("trailingPE"),
            "forward_pe":        info.get("forwardPE"),
            "peg_ratio":         info.get("pegRatio"),
            "price_to_book":     info.get("priceToBook"),
            "ev_to_ebitda":      info.get("enterpriseToEbitda"),
        },
        "profitability": {
            "revenue_ttm":       info.get("totalRevenue"),
            "gross_margin":      info.get("grossMargins"),
            "operating_margin":  info.get("operatingMargins"),
            "net_margin":        info.get("profitMargins"),
            "roe":               info.get("returnOnEquity"),
            "roa":               info.get("returnOnAssets"),
        },
        "dividends": {
            "dividend_yield":    info.get("dividendYield"),
            "dividend_rate":     info.get("dividendRate"),
            "payout_ratio":      info.get("payoutRatio"),
            "ex_dividend_date":  info.get("exDividendDate"),
        },
        "analyst": {
            "recommendation":    info.get("recommendationKey"),
            "mean_target":       info.get("targetMeanPrice"),
            "high_target":       info.get("targetHighPrice"),
            "low_target":        info.get("targetLowPrice"),
            "num_analysts":      info.get("numberOfAnalystOpinions"),
        },
        "earnings": {
            "eps_trailing":      info.get("trailingEps"),
            "eps_forward":       info.get("forwardEps"),
            "earnings_growth":   info.get("earningsGrowth"),
            "revenue_growth":    info.get("revenueGrowth"),
        }
    }

# Analyse a stock
data = get_fundamentals("AAPL")
print(f"\n── {data['overview']['name']} ──")
print(f"Sector:       {data['overview']['sector']}")
print(f"Market Cap:   ${data['valuation']['market_cap']:,.0f}")
print(f"P/E Ratio:    {data['valuation']['pe_ratio']}")
print(f"Net Margin:   {data['profitability']['net_margin']:.1%}")
print(f"Analyst:      {data['analyst']['recommendation'].upper()}")
print(f"Target Price: ${data['analyst']['mean_target']}")

Options chain data

def get_options_chain(ticker: str, expiry: str = None) -> dict:
    """
    Fetch options chain (calls and puts) for a stock.

    Args:
        ticker: Stock ticker symbol
        expiry: Expiration date string "YYYY-MM-DD". If None, uses nearest expiry.
    """
    stock = yf.Ticker(ticker)
    expiries = stock.options

    if not expiries:
        return {"error": f"No options data for {ticker}"}

    target_expiry = expiry if expiry in expiries else expiries[0]
    print(f"Using expiry: {target_expiry}")

    chain = stock.option_chain(target_expiry)

    return {
        "expiry":      target_expiry,
        "calls":       chain.calls,
        "puts":        chain.puts,
        "all_expiries": list(expiries),
    }

options = get_options_chain("AAPL")
calls_df = options["calls"]
# Find near-the-money calls
current_price = yf.Ticker("AAPL").info.get("regularMarketPrice", 0)
near_money = calls_df[
    (calls_df["strike"] >= current_price * 0.95) &
    (calls_df["strike"] <= current_price * 1.05)
]
print(near_money[["strike", "lastPrice", "bid", "ask", "impliedVolatility", "openInterest"]])

Part 2: Alpha Vantage — Free API with Real Guarantees

Alpha Vantage is a popular free and paid API that provides stock, forex, and cryptocurrency data. Unlike yfinance, it's a proper API with official endpoints, a free tier of 25 requests/day (500/day with a free key), and no risk of breaking when a website redesigns.

pip install requests pandas
# Get a free API key at: alphavantage.co/support/#api-key

import httpx
import pandas as pd
import asyncio
from datetime import datetime

ALPHA_VANTAGE_KEY = "YOUR_FREE_API_KEY"
BASE_URL = "https://www.alphavantage.co/query"

async def av_request(params: dict) -> dict:
    """Make a request to Alpha Vantage API."""
    params["apikey"] = ALPHA_VANTAGE_KEY
    async with httpx.AsyncClient() as client:
        r = await client.get(BASE_URL, params=params, timeout=20)
        r.raise_for_status()
        data = r.json()

    # Check for API error messages
    if "Note" in data:
        raise Exception(f"Rate limit hit: {data['Note']}")
    if "Error Message" in data:
        raise Exception(f"API error: {data['Error Message']}")

    return data


async def get_daily_prices(ticker: str, outputsize: str = "compact") -> pd.DataFrame:
    """
    Get daily OHLCV data.
    outputsize: "compact" = last 100 days, "full" = 20+ years
    """
    data = await av_request({
        "function":   "TIME_SERIES_DAILY",
        "symbol":     ticker,
        "outputsize": outputsize,
    })

    ts = data.get("Time Series (Daily)", {})
    rows = []
    for date_str, values in ts.items():
        rows.append({
            "date":   date_str,
            "open":   float(values["1. open"]),
            "high":   float(values["2. high"]),
            "low":    float(values["3. low"]),
            "close":  float(values["4. close"]),
            "volume": int(values["5. volume"]),
        })

    df = pd.DataFrame(rows)
    df["date"] = pd.to_datetime(df["date"])
    df = df.sort_values("date").reset_index(drop=True)
    df["ticker"] = ticker
    return df


async def get_company_overview(ticker: str) -> dict:
    """Get fundamental company data from Alpha Vantage."""
    data = await av_request({"function": "OVERVIEW", "symbol": ticker})
    return {
        "name":            data.get("Name"),
        "description":     (data.get("Description") or "")[:300],
        "sector":          data.get("Sector"),
        "industry":        data.get("Industry"),
        "market_cap":      data.get("MarketCapitalization"),
        "pe_ratio":        data.get("PERatio"),
        "eps":             data.get("EPS"),
        "dividend_yield":  data.get("DividendYield"),
        "52_week_high":    data.get("52WeekHigh"),
        "52_week_low":     data.get("52WeekLow"),
        "analyst_target":  data.get("AnalystTargetPrice"),
    }


async def get_forex_rate(from_currency: str, to_currency: str) -> dict:
    """Get real-time forex exchange rate."""
    data = await av_request({
        "function":      "CURRENCY_EXCHANGE_RATE",
        "from_currency": from_currency,
        "to_currency":   to_currency,
    })
    rate_data = data.get("Realtime Currency Exchange Rate", {})
    return {
        "from":       rate_data.get("1. From_Currency Code"),
        "to":         rate_data.get("3. To_Currency Code"),
        "rate":       float(rate_data.get("5. Exchange Rate", 0)),
        "last_updated": rate_data.get("6. Last Refreshed"),
    }


async def get_crypto_price(symbol: str, market: str = "USD") -> pd.DataFrame:
    """Get daily crypto price data."""
    data = await av_request({
        "function": "DIGITAL_CURRENCY_DAILY",
        "symbol":   symbol,
        "market":   market,
    })

    ts = data.get("Time Series (Digital Currency Daily)", {})
    rows = []
    for date_str, values in list(ts.items())[:30]:  # Last 30 days
        rows.append({
            "date":   date_str,
            "open":   float(values.get(f"1a. open ({market})", 0)),
            "high":   float(values.get(f"2a. high ({market})", 0)),
            "low":    float(values.get(f"3a. low ({market})", 0)),
            "close":  float(values.get(f"4a. close ({market})", 0)),
            "volume": float(values.get("5. volume", 0)),
        })

    df = pd.DataFrame(rows)
    df["date"] = pd.to_datetime(df["date"])
    return df.sort_values("date").reset_index(drop=True)


# Run examples
async def main():
    print("── Stock prices (AAPL, last 100 days) ──")
    df = await get_daily_prices("AAPL")
    print(df.tail(5))

    print("\n── Company overview ──")
    overview = await get_company_overview("AAPL")
    for k, v in overview.items():
        print(f"  {k}: {v}")

    print("\n── USD/INR forex rate ──")
    rate = await get_forex_rate("USD", "INR")
    print(f"  1 {rate['from']} = {rate['rate']:.2f} {rate['to']}")
    print(f"  Last updated: {rate['last_updated']}")

    print("\n── Bitcoin (last 30 days) ──")
    btc = await get_crypto_price("BTC", "USD")
    print(btc.tail(5))

asyncio.run(main())

Part 3: Scraping Financial News and Earnings Calendars

No API packages the real-time financial news that moves markets. Custom scrapers fill that gap.

Reuters / Financial news scraper

import httpx
import asyncio
from bs4 import BeautifulSoup
from curl_cffi.requests import AsyncSession
import pandas as pd
from datetime import datetime, timezone

async def scrape_financial_news(
    query: str,
    sources: list[str] = None,
    max_articles: int = 20
) -> pd.DataFrame:
    """
    Scrape financial news headlines from multiple sources.
    """
    if sources is None:
        sources = [
            f"https://finance.yahoo.com/search?q={query.replace(' ', '+')}",
            f"https://finviz.com/search.ashx?p={query.replace(' ', '+')}",
        ]

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120",
        "Accept-Language": "en-US,en;q=0.9",
    }

    articles = []

    async with AsyncSession(impersonate="chrome120") as session:
        for url in sources:
            try:
                r = await session.get(url, headers=headers, timeout=15)
                soup = BeautifulSoup(r.text, "lxml")

                # Generic news article extraction — adapt selectors per site
                # Yahoo Finance news items
                for item in soup.select("li.js-stream-content, div[data-test='LATEST_NEWS']"):
                    title_el = item.select_one("h3, a[data-ylk]")
                    link_el  = item.select_one("a[href]")
                    time_el  = item.select_one("time, span[data-reactid]")

                    if title_el:
                        articles.append({
                            "title":      title_el.get_text(strip=True),
                            "url":        link_el.get("href", "") if link_el else "",
                            "source":     url,
                            "scraped_at": datetime.now(timezone.utc).isoformat(),
                        })

                await asyncio.sleep(1.5)

            except Exception as e:
                print(f"  Error on {url}: {e}")

    df = pd.DataFrame(articles[:max_articles]).drop_duplicates(subset=["title"])
    return df


# Scrape news for a ticker
news_df = asyncio.run(scrape_financial_news("NVIDIA earnings 2026"))
print(news_df[["title", "scraped_at"]].head(10))
news_df.to_csv("financial_news.csv", index=False)

Scraping the Earnings Calendar

Earnings reports are the most market-moving events in equities. Knowing what's coming lets you position before the move.

async def get_earnings_calendar(week_offset: int = 0) -> pd.DataFrame:
    """
    Scrape the upcoming earnings calendar from Earnings Whispers.
    Returns companies reporting earnings in the target week.
    """
    # Earnings Whispers has a well-structured earnings calendar
    url = "https://www.earningswhispers.com/stocks"

    async with AsyncSession(impersonate="chrome120") as session:
        r = await session.get(
            url,
            headers={"User-Agent": "Mozilla/5.0 Chrome/120", "Accept-Language": "en-US"},
            timeout=20
        )

    soup = BeautifulSoup(r.text, "lxml")
    earnings = []

    # Each stock in the earnings calendar
    for item in soup.select(".earnings-estimates"):
        ticker_el = item.select_one(".ticker")
        name_el   = item.select_one(".company")
        date_el   = item.select_one(".date")
        time_el   = item.select_one(".time")   # BMO = Before Market Open, AMC = After
        eps_el    = item.select_one(".estimate")

        if ticker_el:
            earnings.append({
                "ticker":      ticker_el.get_text(strip=True),
                "company":     name_el.get_text(strip=True) if name_el else None,
                "report_date": date_el.get_text(strip=True) if date_el else None,
                "report_time": time_el.get_text(strip=True) if time_el else None,
                "eps_estimate": eps_el.get_text(strip=True) if eps_el else None,
            })

    return pd.DataFrame(earnings)


calendar = asyncio.run(get_earnings_calendar())
print(f"Found {len(calendar)} earnings reports this week")
print(calendar.head(10).to_string(index=False))

Part 4: Crypto Data — Free, Real-Time, No API Key

CoinGecko offers a generous free API that covers 10,000+ cryptocurrencies with real-time prices, market caps, historical data, and DeFi protocol data:

import httpx
import asyncio
import pandas as pd

COINGECKO_BASE = "https://api.coingecko.com/api/v3"

async def get_crypto_prices(
    coin_ids: list[str],
    vs_currency: str = "usd"
) -> pd.DataFrame:
    """
    Get current prices for multiple cryptocurrencies.
    No API key required on free tier (30 req/min).

    Common coin IDs: bitcoin, ethereum, solana, cardano, ripple
    Find IDs at: coingecko.com/en/all-cryptocurrencies
    """
    ids_str = ",".join(coin_ids)

    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/simple/price",
            params={
                "ids":                    ids_str,
                "vs_currencies":          vs_currency,
                "include_24hr_change":    "true",
                "include_24hr_vol":       "true",
                "include_market_cap":     "true",
                "include_last_updated_at":"true",
            },
            timeout=15
        )
        r.raise_for_status()
        data = r.json()

    rows = []
    for coin_id, values in data.items():
        rows.append({
            "coin_id":         coin_id,
            "price_usd":       values.get(vs_currency),
            "market_cap":      values.get(f"{vs_currency}_market_cap"),
            "volume_24h":      values.get(f"{vs_currency}_24h_vol"),
            "change_24h_pct":  values.get(f"{vs_currency}_24h_change"),
            "last_updated":    pd.to_datetime(
                                   values.get("last_updated_at"), unit="s"
                               ).isoformat() if values.get("last_updated_at") else None,
        })

    return pd.DataFrame(rows).sort_values("market_cap", ascending=False)


async def get_crypto_history(
    coin_id: str,
    days: int = 30,
    vs_currency: str = "usd"
) -> pd.DataFrame:
    """
    Get historical OHLCV data for a cryptocurrency.
    Free up to 365 days of daily candles.
    """
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/coins/{coin_id}/ohlc",
            params={"vs_currency": vs_currency, "days": days},
            timeout=15
        )
        r.raise_for_status()
        data = r.json()   # [[timestamp_ms, open, high, low, close], ...]

    df = pd.DataFrame(data, columns=["timestamp_ms", "open", "high", "low", "close"])
    df["date"] = pd.to_datetime(df["timestamp_ms"], unit="ms")
    df["coin"] = coin_id
    return df[["date", "coin", "open", "high", "low", "close"]]


async def get_defi_protocols() -> pd.DataFrame:
    """Get top DeFi protocols by Total Value Locked (TVL)."""
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/defi",
            timeout=15
        )
        data = r.json()

    return pd.DataFrame([{
        "name":              data.get("defi_market_cap"),
        "defi_market_cap":   data.get("defi_market_cap"),
        "eth_market_cap":    data.get("eth_market_cap"),
        "defi_dominance":    data.get("defi_dominance"),
        "top_coin":          data.get("top_coin_name"),
        "top_coin_dominance":data.get("top_coin_defi_dominance"),
    }])


async def crypto_dashboard():
    """Build a live crypto dashboard."""
    print("── Fetching crypto prices ──")
    prices_df = await get_crypto_prices(
        ["bitcoin", "ethereum", "solana", "cardano", "chainlink", "polkadot"]
    )
    print(prices_df[["coin_id", "price_usd", "change_24h_pct", "market_cap"]].to_string(index=False))

    print("\n── Bitcoin 30-day history ──")
    btc_history = await get_crypto_history("bitcoin", days=30)
    print(btc_history.tail(5))

    btc_high  = btc_history["high"].max()
    btc_low   = btc_history["low"].min()
    print(f"\n30-day range: ${btc_low:,.2f} – ${btc_high:,.2f}")

asyncio.run(crypto_dashboard())

Part 5: SEC EDGAR — Scraping Regulatory Filings

The U.S. SEC's EDGAR database is the most legally unambiguous financial data source — it's explicitly public, free, and comprehensive. Every public company's 10-K (annual), 10-Q (quarterly), 8-K (events), and proxy filings live here.

import httpx
import asyncio
import json
import pandas as pd

EDGAR_BASE = "https://data.sec.gov"
EDGAR_API  = "https://efts.sec.gov/LATEST/search-index"

# SEC requires a descriptive User-Agent with contact info
EDGAR_HEADERS = {
    "User-Agent": "FinancialResearchBot/1.0 (contact@yourdomain.com)",
    "Accept-Encoding": "gzip, deflate",
}

async def get_company_filings(ticker: str) -> dict:
    """
    Get all SEC filings for a company by ticker.
    Uses SEC EDGAR's official JSON API.
    """
    # Step 1: Look up CIK number from ticker
    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(
            "https://www.sec.gov/files/company_tickers.json",
            timeout=15
        )
        tickers_data = r.json()

    # Find CIK for this ticker
    cik = None
    for entry in tickers_data.values():
        if entry.get("ticker", "").upper() == ticker.upper():
            cik = str(entry["cik_str"]).zfill(10)
            company_name = entry.get("title")
            break

    if not cik:
        raise ValueError(f"Ticker {ticker} not found in EDGAR")

    print(f"Found {ticker}: {company_name} (CIK: {cik})")

    # Step 2: Get all filings for this CIK
    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(
            f"{EDGAR_BASE}/submissions/CIK{cik}.json",
            timeout=15
        )
        data = r.json()

    recent = data.get("filings", {}).get("recent", {})
    filings_df = pd.DataFrame({
        "form":          recent.get("form", []),
        "filed_at":      recent.get("filingDate", []),
        "report_date":   recent.get("reportDate", []),
        "accession_num": recent.get("accessionNumber", []),
        "description":   recent.get("primaryDocument", []),
    })

    return {
        "cik":          cik,
        "company":      company_name,
        "all_filings":  filings_df,
        "10k_filings":  filings_df[filings_df["form"] == "10-K"],
        "10q_filings":  filings_df[filings_df["form"] == "10-Q"],
        "8k_filings":   filings_df[filings_df["form"] == "8-K"],
    }


async def download_filing_text(cik: str, accession_num: str) -> str:
    """Download the full text of a specific SEC filing."""
    # Reformat accession number for URL: remove dashes
    acc_clean = accession_num.replace("-", "")
    cik_clean = cik.lstrip("0")

    index_url = (
        f"https://www.sec.gov/Archives/edgar/data/"
        f"{cik_clean}/{acc_clean}/{accession_num}-index.htm"
    )

    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(index_url, timeout=20)
        soup = BeautifulSoup(r.text, "lxml")

        # Find the primary document link
        doc_link = soup.select_one("table.tableFile2 a")
        if not doc_link:
            return ""

        doc_url = f"https://www.sec.gov{doc_link['href']}"
        doc_r   = await client.get(doc_url, timeout=30)
        return doc_r.text


async def main_edgar():
    filings = await get_company_filings("AAPL")

    print(f"\n── {filings['company']} SEC Filings ──")
    print(f"10-K (Annual): {len(filings['10k_filings'])} filings")
    print(f"10-Q (Quarterly): {len(filings['10q_filings'])} filings")
    print(f"8-K (Events): {len(filings['8k_filings'])} filings")

    print("\nMost recent 10-K filings:")
    print(filings["10k_filings"].head(3).to_string(index=False))

asyncio.run(main_edgar())

Part 6: Building a Portfolio Tracker

Combine everything into a live portfolio monitoring dashboard:

import asyncio
import pandas as pd
from datetime import datetime, timezone
import yfinance as yf

PORTFOLIO = {
    # ticker: (shares, avg_buy_price)
    "AAPL":        (10,  175.50),
    "MSFT":        (5,   380.00),
    "NVDA":        (3,   450.00),
    "RELIANCE.NS": (50,  2800.00),
}

async def get_current_prices(tickers: list[str]) -> dict:
    """Get current prices for all portfolio tickers."""
    prices = {}
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            info  = stock.fast_info
            prices[ticker] = {
                "price":       info.last_price,
                "prev_close":  info.previous_close,
                "currency":    info.currency,
            }
        except Exception:
            prices[ticker] = {"price": None, "prev_close": None, "currency": "USD"}
    return prices


def build_portfolio_report(portfolio: dict, current_prices: dict) -> pd.DataFrame:
    """Generate a portfolio performance report."""
    rows = []
    for ticker, (shares, avg_price) in portfolio.items():
        price_data   = current_prices.get(ticker, {})
        current      = price_data.get("price") or 0
        prev_close   = price_data.get("prev_close") or 0
        currency     = price_data.get("currency", "USD")

        cost_basis   = shares * avg_price
        market_value = shares * current
        gain_loss    = market_value - cost_basis
        gain_pct     = (gain_loss / cost_basis * 100) if cost_basis else 0
        day_change   = ((current - prev_close) / prev_close * 100) if prev_close else 0

        rows.append({
            "ticker":       ticker,
            "shares":       shares,
            "avg_price":    avg_price,
            "current":      round(current, 2),
            "cost_basis":   round(cost_basis, 2),
            "market_value": round(market_value, 2),
            "gain_loss":    round(gain_loss, 2),
            "gain_pct":     round(gain_pct, 2),
            "day_change_%": round(day_change, 2),
            "currency":     currency,
        })

    df = pd.DataFrame(rows)

    total_cost   = df["cost_basis"].sum()
    total_value  = df["market_value"].sum()
    total_gain   = df["gain_loss"].sum()
    total_return = (total_gain / total_cost * 100) if total_cost else 0

    print(f"\n{'═'*60}")
    print(f"  PORTFOLIO REPORT — {datetime.now().strftime('%Y-%m-%d %H:%M')}")
    print(f"{'═'*60}")
    print(df.to_string(index=False))
    print(f"\n  Total invested:  ${total_cost:>10,.2f}")
    print(f"  Market value:    ${total_value:>10,.2f}")
    print(f"  Total P&L:       ${total_gain:>10,.2f} ({total_return:+.2f}%)")
    print(f"{'═'*60}")

    return df


async def run_portfolio():
    tickers       = list(PORTFOLIO.keys())
    current       = await get_current_prices(tickers)
    report        = build_portfolio_report(PORTFOLIO, current)
    report.to_csv("portfolio_report.csv", index=False)

asyncio.run(run_portfolio())

Part 7: Financial Data Pipeline with PostgreSQL

For production-grade financial data storage with proper indexing:

from sqlalchemy import create_engine, text
import pandas as pd

ENGINE = create_engine("postgresql://user:password@localhost:5432/findata")

def init_financial_schema():
    """Create tables optimised for time-series financial data."""
    with ENGINE.connect() as conn:
        conn.execute(text("""
            CREATE TABLE IF NOT EXISTS prices (
                id         BIGSERIAL PRIMARY KEY,
                ticker     TEXT NOT NULL,
                date       DATE NOT NULL,
                open       NUMERIC(12, 4),
                high       NUMERIC(12, 4),
                low        NUMERIC(12, 4),
                close      NUMERIC(12, 4),
                adj_close  NUMERIC(12, 4),
                volume     BIGINT,
                source     TEXT DEFAULT 'yfinance',
                UNIQUE (ticker, date)
            );

            CREATE INDEX IF NOT EXISTS idx_prices_ticker_date
                ON prices (ticker, date DESC);

            CREATE TABLE IF NOT EXISTS fundamentals (
                id           BIGSERIAL PRIMARY KEY,
                ticker       TEXT NOT NULL,
                scraped_at   TIMESTAMPTZ DEFAULT NOW(),
                market_cap   BIGINT,
                pe_ratio     NUMERIC(10, 2),
                eps          NUMERIC(10, 4),
                revenue      BIGINT,
                net_margin   NUMERIC(8, 4),
                data         JSONB
            );
        """))
        conn.commit()

def upsert_prices(df: pd.DataFrame):
    """Upsert price history — insert new, update existing."""
    df.to_sql("prices_staging", ENGINE, if_exists="replace", index=False)
    with ENGINE.connect() as conn:
        conn.execute(text("""
            INSERT INTO prices (ticker, date, open, high, low, close, volume)
            SELECT ticker, date, open, high, low, close, volume
            FROM prices_staging
            ON CONFLICT (ticker, date) DO UPDATE SET
                close  = EXCLUDED.close,
                volume = EXCLUDED.volume
        """))
        conn.commit()
    print(f"Upserted {len(df)} price records.")

Choosing Your Data Source

Source	Cost	Reliability	Best for
yfinance	Free	Medium (scraping-based)	Prototyping, personal projects
Alpha Vantage	Free / $50+/mo	High	Production stocks + forex + crypto
Polygon.io	Free / $29+/mo	Very High	Real-time, high-frequency data
Finnhub	Free / paid	High	News sentiment + fundamentals
CoinGecko	Free / paid	High	Crypto data at any scale
SEC EDGAR	Free (official)	Very High	Regulatory filings (US stocks)
Custom scraper	Free (infra cost)	Low-Medium	Earnings calendars, news, niche data

FAQ

Q: Is yfinance reliable enough for backtesting trading strategies? For historical data exploration, yes. For production trading decisions, no — the scraping-based approach means occasional data gaps, split-adjustment errors, and outages. Use Polygon.io or a proper data vendor for anything where data quality directly affects money.

Q: What's the best free crypto data source in 2026? CoinGecko's free API covers 10,000+ coins with current prices, market caps, historical OHLCV, and DeFi data at 30 requests/minute. No API key needed for basic usage.

Q: Can I use SEC EDGAR data commercially? Yes — SEC EDGAR data is explicitly in the public domain and freely available for commercial use. The only requirement is identifying your User-Agent with contact information in the request header.

Q: How do I get Indian stock market data (NSE/BSE)? Append .NS for NSE or .BO for BSE to ticker symbols in yfinance (e.g. RELIANCE.NS). For a proper Indian market API, NSE India has an unofficial endpoint, and vendors like Zerodha's Kite Connect offer official API access.

Summary

Task	Tool	Notes
Quick stock history	`yfinance`	Free, but can break
Production stock data	Alpha Vantage / Polygon.io	Reliable, free tiers available
Crypto prices	CoinGecko API	Free, 10k+ coins
Financial news	Custom scraper	No API covers live news fully
Earnings calendar	Custom scraper	Earnings Whispers, Zacks
SEC filings	EDGAR JSON API	Official, free, legally clear
Portfolio tracking	`yfinance` + pandas	Combine with any price source

Why Financial Data Is the Most Valuable Data You Can Scrape

In 2026, the Python financial data ecosystem is rich but fragmented. The landscape looks like this:

yfinance remains the most popular free option, but relying on unofficial web scraping means it's prone to rate limits, blocked IPs, and broken requests — it broke significantly after Yahoo Finance's February 2025 redesign
Official financial APIs (Alpha Vantage, Polygon.io, Finnhub) offer reliability and proper data guarantees but cost money beyond their free tiers
Custom scrapers targeting financial news sites, SEC filings, and earnings call transcripts fill gaps that no API covers

This guide gives you the complete playbook: free tools for prototyping, production-grade APIs for serious work, and custom scrapers for the data nobody else is packaging.

Part 1: yfinance — Still Useful, But Know Its Limits

The tradeoff is fragility — it's web scraping, not a true API. Yahoo can change endpoints without notice, and there is no SLA, support, or data accuracy guarantee.

Use yfinance for: prototyping, personal projects, learning, backtesting on historical data. Avoid it for: production trading systems, real-time alerts, mission-critical analysis.

pip install yfinance pandas matplotlib

Stock price history

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

def get_stock_history(
    ticker: str,
    period: str = "1y",     # Valid: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
    interval: str = "1d"    # Valid: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo
) -> pd.DataFrame:
    """
    Download historical OHLCV data for a stock ticker.

    Returns a DataFrame with columns:
    Open, High, Low, Close, Adj Close, Volume
    """
    stock = yf.Ticker(ticker)
    df    = stock.history(period=period, interval=interval)

    if df.empty:
        raise ValueError(f"No data returned for {ticker} — ticker may be invalid")

    # Clean up the index
    df.index = pd.to_datetime(df.index).tz_localize(None)
    df.index.name = "date"

    # Calculate daily return
    df["daily_return_pct"] = df["Close"].pct_change() * 100
    df["ticker"] = ticker

    return df

# Example: download 1 year of data for multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA", "RELIANCE.NS"]  # .NS for NSE India

dfs = {}
for ticker in tickers:
    try:
        dfs[ticker] = get_stock_history(ticker, period="1y")
        print(f"  {ticker}: {len(dfs[ticker])} trading days")
    except Exception as e:
        print(f"  {ticker}: Error — {e}")

# Combine all into one DataFrame
combined = pd.concat(dfs.values())
combined.to_csv("stock_history.csv")

Fundamental data: financials, balance sheet, dividends

def get_fundamentals(ticker: str) -> dict:
    """
    Collect comprehensive fundamental data for a stock.
    Returns income statement, balance sheet, cash flow, and key metrics.
    """
    stock = yf.Ticker(ticker)
    info  = stock.info

    return {
        "overview": {
            "name":              info.get("longName"),
            "sector":            info.get("sector"),
            "industry":          info.get("industry"),
            "country":           info.get("country"),
            "employees":         info.get("fullTimeEmployees"),
            "website":           info.get("website"),
            "description":       (info.get("longBusinessSummary") or "")[:300],
        },
        "valuation": {
            "market_cap":        info.get("marketCap"),
            "pe_ratio":          info.get("trailingPE"),
            "forward_pe":        info.get("forwardPE"),
            "peg_ratio":         info.get("pegRatio"),
            "price_to_book":     info.get("priceToBook"),
            "ev_to_ebitda":      info.get("enterpriseToEbitda"),
        },
        "profitability": {
            "revenue_ttm":       info.get("totalRevenue"),
            "gross_margin":      info.get("grossMargins"),
            "operating_margin":  info.get("operatingMargins"),
            "net_margin":        info.get("profitMargins"),
            "roe":               info.get("returnOnEquity"),
            "roa":               info.get("returnOnAssets"),
        },
        "dividends": {
            "dividend_yield":    info.get("dividendYield"),
            "dividend_rate":     info.get("dividendRate"),
            "payout_ratio":      info.get("payoutRatio"),
            "ex_dividend_date":  info.get("exDividendDate"),
        },
        "analyst": {
            "recommendation":    info.get("recommendationKey"),
            "mean_target":       info.get("targetMeanPrice"),
            "high_target":       info.get("targetHighPrice"),
            "low_target":        info.get("targetLowPrice"),
            "num_analysts":      info.get("numberOfAnalystOpinions"),
        },
        "earnings": {
            "eps_trailing":      info.get("trailingEps"),
            "eps_forward":       info.get("forwardEps"),
            "earnings_growth":   info.get("earningsGrowth"),
            "revenue_growth":    info.get("revenueGrowth"),
        }
    }

# Analyse a stock
data = get_fundamentals("AAPL")
print(f"\n── {data['overview']['name']} ──")
print(f"Sector:       {data['overview']['sector']}")
print(f"Market Cap:   ${data['valuation']['market_cap']:,.0f}")
print(f"P/E Ratio:    {data['valuation']['pe_ratio']}")
print(f"Net Margin:   {data['profitability']['net_margin']:.1%}")
print(f"Analyst:      {data['analyst']['recommendation'].upper()}")
print(f"Target Price: ${data['analyst']['mean_target']}")

Options chain data

def get_options_chain(ticker: str, expiry: str = None) -> dict:
    """
    Fetch options chain (calls and puts) for a stock.

    Args:
        ticker: Stock ticker symbol
        expiry: Expiration date string "YYYY-MM-DD". If None, uses nearest expiry.
    """
    stock = yf.Ticker(ticker)
    expiries = stock.options

    if not expiries:
        return {"error": f"No options data for {ticker}"}

    target_expiry = expiry if expiry in expiries else expiries[0]
    print(f"Using expiry: {target_expiry}")

    chain = stock.option_chain(target_expiry)

    return {
        "expiry":      target_expiry,
        "calls":       chain.calls,
        "puts":        chain.puts,
        "all_expiries": list(expiries),
    }

options = get_options_chain("AAPL")
calls_df = options["calls"]
# Find near-the-money calls
current_price = yf.Ticker("AAPL").info.get("regularMarketPrice", 0)
near_money = calls_df[
    (calls_df["strike"] >= current_price * 0.95) &
    (calls_df["strike"] <= current_price * 1.05)
]
print(near_money[["strike", "lastPrice", "bid", "ask", "impliedVolatility", "openInterest"]])

Part 2: Alpha Vantage — Free API with Real Guarantees

pip install requests pandas
# Get a free API key at: alphavantage.co/support/#api-key

import httpx
import pandas as pd
import asyncio
from datetime import datetime

ALPHA_VANTAGE_KEY = "YOUR_FREE_API_KEY"
BASE_URL = "https://www.alphavantage.co/query"

async def av_request(params: dict) -> dict:
    """Make a request to Alpha Vantage API."""
    params["apikey"] = ALPHA_VANTAGE_KEY
    async with httpx.AsyncClient() as client:
        r = await client.get(BASE_URL, params=params, timeout=20)
        r.raise_for_status()
        data = r.json()

    # Check for API error messages
    if "Note" in data:
        raise Exception(f"Rate limit hit: {data['Note']}")
    if "Error Message" in data:
        raise Exception(f"API error: {data['Error Message']}")

    return data


async def get_daily_prices(ticker: str, outputsize: str = "compact") -> pd.DataFrame:
    """
    Get daily OHLCV data.
    outputsize: "compact" = last 100 days, "full" = 20+ years
    """
    data = await av_request({
        "function":   "TIME_SERIES_DAILY",
        "symbol":     ticker,
        "outputsize": outputsize,
    })

    ts = data.get("Time Series (Daily)", {})
    rows = []
    for date_str, values in ts.items():
        rows.append({
            "date":   date_str,
            "open":   float(values["1. open"]),
            "high":   float(values["2. high"]),
            "low":    float(values["3. low"]),
            "close":  float(values["4. close"]),
            "volume": int(values["5. volume"]),
        })

    df = pd.DataFrame(rows)
    df["date"] = pd.to_datetime(df["date"])
    df = df.sort_values("date").reset_index(drop=True)
    df["ticker"] = ticker
    return df


async def get_company_overview(ticker: str) -> dict:
    """Get fundamental company data from Alpha Vantage."""
    data = await av_request({"function": "OVERVIEW", "symbol": ticker})
    return {
        "name":            data.get("Name"),
        "description":     (data.get("Description") or "")[:300],
        "sector":          data.get("Sector"),
        "industry":        data.get("Industry"),
        "market_cap":      data.get("MarketCapitalization"),
        "pe_ratio":        data.get("PERatio"),
        "eps":             data.get("EPS"),
        "dividend_yield":  data.get("DividendYield"),
        "52_week_high":    data.get("52WeekHigh"),
        "52_week_low":     data.get("52WeekLow"),
        "analyst_target":  data.get("AnalystTargetPrice"),
    }


async def get_forex_rate(from_currency: str, to_currency: str) -> dict:
    """Get real-time forex exchange rate."""
    data = await av_request({
        "function":      "CURRENCY_EXCHANGE_RATE",
        "from_currency": from_currency,
        "to_currency":   to_currency,
    })
    rate_data = data.get("Realtime Currency Exchange Rate", {})
    return {
        "from":       rate_data.get("1. From_Currency Code"),
        "to":         rate_data.get("3. To_Currency Code"),
        "rate":       float(rate_data.get("5. Exchange Rate", 0)),
        "last_updated": rate_data.get("6. Last Refreshed"),
    }


async def get_crypto_price(symbol: str, market: str = "USD") -> pd.DataFrame:
    """Get daily crypto price data."""
    data = await av_request({
        "function": "DIGITAL_CURRENCY_DAILY",
        "symbol":   symbol,
        "market":   market,
    })

    ts = data.get("Time Series (Digital Currency Daily)", {})
    rows = []
    for date_str, values in list(ts.items())[:30]:  # Last 30 days
        rows.append({
            "date":   date_str,
            "open":   float(values.get(f"1a. open ({market})", 0)),
            "high":   float(values.get(f"2a. high ({market})", 0)),
            "low":    float(values.get(f"3a. low ({market})", 0)),
            "close":  float(values.get(f"4a. close ({market})", 0)),
            "volume": float(values.get("5. volume", 0)),
        })

    df = pd.DataFrame(rows)
    df["date"] = pd.to_datetime(df["date"])
    return df.sort_values("date").reset_index(drop=True)


# Run examples
async def main():
    print("── Stock prices (AAPL, last 100 days) ──")
    df = await get_daily_prices("AAPL")
    print(df.tail(5))

    print("\n── Company overview ──")
    overview = await get_company_overview("AAPL")
    for k, v in overview.items():
        print(f"  {k}: {v}")

    print("\n── USD/INR forex rate ──")
    rate = await get_forex_rate("USD", "INR")
    print(f"  1 {rate['from']} = {rate['rate']:.2f} {rate['to']}")
    print(f"  Last updated: {rate['last_updated']}")

    print("\n── Bitcoin (last 30 days) ──")
    btc = await get_crypto_price("BTC", "USD")
    print(btc.tail(5))

asyncio.run(main())

Part 3: Scraping Financial News and Earnings Calendars

No API packages the real-time financial news that moves markets. Custom scrapers fill that gap.

Reuters / Financial news scraper

import httpx
import asyncio
from bs4 import BeautifulSoup
from curl_cffi.requests import AsyncSession
import pandas as pd
from datetime import datetime, timezone

async def scrape_financial_news(
    query: str,
    sources: list[str] = None,
    max_articles: int = 20
) -> pd.DataFrame:
    """
    Scrape financial news headlines from multiple sources.
    """
    if sources is None:
        sources = [
            f"https://finance.yahoo.com/search?q={query.replace(' ', '+')}",
            f"https://finviz.com/search.ashx?p={query.replace(' ', '+')}",
        ]

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120",
        "Accept-Language": "en-US,en;q=0.9",
    }

    articles = []

    async with AsyncSession(impersonate="chrome120") as session:
        for url in sources:
            try:
                r = await session.get(url, headers=headers, timeout=15)
                soup = BeautifulSoup(r.text, "lxml")

                # Generic news article extraction — adapt selectors per site
                # Yahoo Finance news items
                for item in soup.select("li.js-stream-content, div[data-test='LATEST_NEWS']"):
                    title_el = item.select_one("h3, a[data-ylk]")
                    link_el  = item.select_one("a[href]")
                    time_el  = item.select_one("time, span[data-reactid]")

                    if title_el:
                        articles.append({
                            "title":      title_el.get_text(strip=True),
                            "url":        link_el.get("href", "") if link_el else "",
                            "source":     url,
                            "scraped_at": datetime.now(timezone.utc).isoformat(),
                        })

                await asyncio.sleep(1.5)

            except Exception as e:
                print(f"  Error on {url}: {e}")

    df = pd.DataFrame(articles[:max_articles]).drop_duplicates(subset=["title"])
    return df


# Scrape news for a ticker
news_df = asyncio.run(scrape_financial_news("NVIDIA earnings 2026"))
print(news_df[["title", "scraped_at"]].head(10))
news_df.to_csv("financial_news.csv", index=False)

Scraping the Earnings Calendar

Earnings reports are the most market-moving events in equities. Knowing what's coming lets you position before the move.

async def get_earnings_calendar(week_offset: int = 0) -> pd.DataFrame:
    """
    Scrape the upcoming earnings calendar from Earnings Whispers.
    Returns companies reporting earnings in the target week.
    """
    # Earnings Whispers has a well-structured earnings calendar
    url = "https://www.earningswhispers.com/stocks"

    async with AsyncSession(impersonate="chrome120") as session:
        r = await session.get(
            url,
            headers={"User-Agent": "Mozilla/5.0 Chrome/120", "Accept-Language": "en-US"},
            timeout=20
        )

    soup = BeautifulSoup(r.text, "lxml")
    earnings = []

    # Each stock in the earnings calendar
    for item in soup.select(".earnings-estimates"):
        ticker_el = item.select_one(".ticker")
        name_el   = item.select_one(".company")
        date_el   = item.select_one(".date")
        time_el   = item.select_one(".time")   # BMO = Before Market Open, AMC = After
        eps_el    = item.select_one(".estimate")

        if ticker_el:
            earnings.append({
                "ticker":      ticker_el.get_text(strip=True),
                "company":     name_el.get_text(strip=True) if name_el else None,
                "report_date": date_el.get_text(strip=True) if date_el else None,
                "report_time": time_el.get_text(strip=True) if time_el else None,
                "eps_estimate": eps_el.get_text(strip=True) if eps_el else None,
            })

    return pd.DataFrame(earnings)


calendar = asyncio.run(get_earnings_calendar())
print(f"Found {len(calendar)} earnings reports this week")
print(calendar.head(10).to_string(index=False))

Part 4: Crypto Data — Free, Real-Time, No API Key

CoinGecko offers a generous free API that covers 10,000+ cryptocurrencies with real-time prices, market caps, historical data, and DeFi protocol data:

import httpx
import asyncio
import pandas as pd

COINGECKO_BASE = "https://api.coingecko.com/api/v3"

async def get_crypto_prices(
    coin_ids: list[str],
    vs_currency: str = "usd"
) -> pd.DataFrame:
    """
    Get current prices for multiple cryptocurrencies.
    No API key required on free tier (30 req/min).

    Common coin IDs: bitcoin, ethereum, solana, cardano, ripple
    Find IDs at: coingecko.com/en/all-cryptocurrencies
    """
    ids_str = ",".join(coin_ids)

    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/simple/price",
            params={
                "ids":                    ids_str,
                "vs_currencies":          vs_currency,
                "include_24hr_change":    "true",
                "include_24hr_vol":       "true",
                "include_market_cap":     "true",
                "include_last_updated_at":"true",
            },
            timeout=15
        )
        r.raise_for_status()
        data = r.json()

    rows = []
    for coin_id, values in data.items():
        rows.append({
            "coin_id":         coin_id,
            "price_usd":       values.get(vs_currency),
            "market_cap":      values.get(f"{vs_currency}_market_cap"),
            "volume_24h":      values.get(f"{vs_currency}_24h_vol"),
            "change_24h_pct":  values.get(f"{vs_currency}_24h_change"),
            "last_updated":    pd.to_datetime(
                                   values.get("last_updated_at"), unit="s"
                               ).isoformat() if values.get("last_updated_at") else None,
        })

    return pd.DataFrame(rows).sort_values("market_cap", ascending=False)


async def get_crypto_history(
    coin_id: str,
    days: int = 30,
    vs_currency: str = "usd"
) -> pd.DataFrame:
    """
    Get historical OHLCV data for a cryptocurrency.
    Free up to 365 days of daily candles.
    """
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/coins/{coin_id}/ohlc",
            params={"vs_currency": vs_currency, "days": days},
            timeout=15
        )
        r.raise_for_status()
        data = r.json()   # [[timestamp_ms, open, high, low, close], ...]

    df = pd.DataFrame(data, columns=["timestamp_ms", "open", "high", "low", "close"])
    df["date"] = pd.to_datetime(df["timestamp_ms"], unit="ms")
    df["coin"] = coin_id
    return df[["date", "coin", "open", "high", "low", "close"]]


async def get_defi_protocols() -> pd.DataFrame:
    """Get top DeFi protocols by Total Value Locked (TVL)."""
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/defi",
            timeout=15
        )
        data = r.json()

    return pd.DataFrame([{
        "name":              data.get("defi_market_cap"),
        "defi_market_cap":   data.get("defi_market_cap"),
        "eth_market_cap":    data.get("eth_market_cap"),
        "defi_dominance":    data.get("defi_dominance"),
        "top_coin":          data.get("top_coin_name"),
        "top_coin_dominance":data.get("top_coin_defi_dominance"),
    }])


async def crypto_dashboard():
    """Build a live crypto dashboard."""
    print("── Fetching crypto prices ──")
    prices_df = await get_crypto_prices(
        ["bitcoin", "ethereum", "solana", "cardano", "chainlink", "polkadot"]
    )
    print(prices_df[["coin_id", "price_usd", "change_24h_pct", "market_cap"]].to_string(index=False))

    print("\n── Bitcoin 30-day history ──")
    btc_history = await get_crypto_history("bitcoin", days=30)
    print(btc_history.tail(5))

    btc_high  = btc_history["high"].max()
    btc_low   = btc_history["low"].min()
    print(f"\n30-day range: ${btc_low:,.2f} – ${btc_high:,.2f}")

asyncio.run(crypto_dashboard())

Part 5: SEC EDGAR — Scraping Regulatory Filings

import httpx
import asyncio
import json
import pandas as pd

EDGAR_BASE = "https://data.sec.gov"
EDGAR_API  = "https://efts.sec.gov/LATEST/search-index"

# SEC requires a descriptive User-Agent with contact info
EDGAR_HEADERS = {
    "User-Agent": "FinancialResearchBot/1.0 (contact@yourdomain.com)",
    "Accept-Encoding": "gzip, deflate",
}

async def get_company_filings(ticker: str) -> dict:
    """
    Get all SEC filings for a company by ticker.
    Uses SEC EDGAR's official JSON API.
    """
    # Step 1: Look up CIK number from ticker
    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(
            "https://www.sec.gov/files/company_tickers.json",
            timeout=15
        )
        tickers_data = r.json()

    # Find CIK for this ticker
    cik = None
    for entry in tickers_data.values():
        if entry.get("ticker", "").upper() == ticker.upper():
            cik = str(entry["cik_str"]).zfill(10)
            company_name = entry.get("title")
            break

    if not cik:
        raise ValueError(f"Ticker {ticker} not found in EDGAR")

    print(f"Found {ticker}: {company_name} (CIK: {cik})")

    # Step 2: Get all filings for this CIK
    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(
            f"{EDGAR_BASE}/submissions/CIK{cik}.json",
            timeout=15
        )
        data = r.json()

    recent = data.get("filings", {}).get("recent", {})
    filings_df = pd.DataFrame({
        "form":          recent.get("form", []),
        "filed_at":      recent.get("filingDate", []),
        "report_date":   recent.get("reportDate", []),
        "accession_num": recent.get("accessionNumber", []),
        "description":   recent.get("primaryDocument", []),
    })

    return {
        "cik":          cik,
        "company":      company_name,
        "all_filings":  filings_df,
        "10k_filings":  filings_df[filings_df["form"] == "10-K"],
        "10q_filings":  filings_df[filings_df["form"] == "10-Q"],
        "8k_filings":   filings_df[filings_df["form"] == "8-K"],
    }


async def download_filing_text(cik: str, accession_num: str) -> str:
    """Download the full text of a specific SEC filing."""
    # Reformat accession number for URL: remove dashes
    acc_clean = accession_num.replace("-", "")
    cik_clean = cik.lstrip("0")

    index_url = (
        f"https://www.sec.gov/Archives/edgar/data/"
        f"{cik_clean}/{acc_clean}/{accession_num}-index.htm"
    )

    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(index_url, timeout=20)
        soup = BeautifulSoup(r.text, "lxml")

        # Find the primary document link
        doc_link = soup.select_one("table.tableFile2 a")
        if not doc_link:
            return ""

        doc_url = f"https://www.sec.gov{doc_link['href']}"
        doc_r   = await client.get(doc_url, timeout=30)
        return doc_r.text


async def main_edgar():
    filings = await get_company_filings("AAPL")

    print(f"\n── {filings['company']} SEC Filings ──")
    print(f"10-K (Annual): {len(filings['10k_filings'])} filings")
    print(f"10-Q (Quarterly): {len(filings['10q_filings'])} filings")
    print(f"8-K (Events): {len(filings['8k_filings'])} filings")

    print("\nMost recent 10-K filings:")
    print(filings["10k_filings"].head(3).to_string(index=False))

asyncio.run(main_edgar())

Part 6: Building a Portfolio Tracker

Combine everything into a live portfolio monitoring dashboard:

import asyncio
import pandas as pd
from datetime import datetime, timezone
import yfinance as yf

PORTFOLIO = {
    # ticker: (shares, avg_buy_price)
    "AAPL":        (10,  175.50),
    "MSFT":        (5,   380.00),
    "NVDA":        (3,   450.00),
    "RELIANCE.NS": (50,  2800.00),
}

async def get_current_prices(tickers: list[str]) -> dict:
    """Get current prices for all portfolio tickers."""
    prices = {}
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            info  = stock.fast_info
            prices[ticker] = {
                "price":       info.last_price,
                "prev_close":  info.previous_close,
                "currency":    info.currency,
            }
        except Exception:
            prices[ticker] = {"price": None, "prev_close": None, "currency": "USD"}
    return prices


def build_portfolio_report(portfolio: dict, current_prices: dict) -> pd.DataFrame:
    """Generate a portfolio performance report."""
    rows = []
    for ticker, (shares, avg_price) in portfolio.items():
        price_data   = current_prices.get(ticker, {})
        current      = price_data.get("price") or 0
        prev_close   = price_data.get("prev_close") or 0
        currency     = price_data.get("currency", "USD")

        cost_basis   = shares * avg_price
        market_value = shares * current
        gain_loss    = market_value - cost_basis
        gain_pct     = (gain_loss / cost_basis * 100) if cost_basis else 0
        day_change   = ((current - prev_close) / prev_close * 100) if prev_close else 0

        rows.append({
            "ticker":       ticker,
            "shares":       shares,
            "avg_price":    avg_price,
            "current":      round(current, 2),
            "cost_basis":   round(cost_basis, 2),
            "market_value": round(market_value, 2),
            "gain_loss":    round(gain_loss, 2),
            "gain_pct":     round(gain_pct, 2),
            "day_change_%": round(day_change, 2),
            "currency":     currency,
        })

    df = pd.DataFrame(rows)

    total_cost   = df["cost_basis"].sum()
    total_value  = df["market_value"].sum()
    total_gain   = df["gain_loss"].sum()
    total_return = (total_gain / total_cost * 100) if total_cost else 0

    print(f"\n{'═'*60}")
    print(f"  PORTFOLIO REPORT — {datetime.now().strftime('%Y-%m-%d %H:%M')}")
    print(f"{'═'*60}")
    print(df.to_string(index=False))
    print(f"\n  Total invested:  ${total_cost:>10,.2f}")
    print(f"  Market value:    ${total_value:>10,.2f}")
    print(f"  Total P&L:       ${total_gain:>10,.2f} ({total_return:+.2f}%)")
    print(f"{'═'*60}")

    return df


async def run_portfolio():
    tickers       = list(PORTFOLIO.keys())
    current       = await get_current_prices(tickers)
    report        = build_portfolio_report(PORTFOLIO, current)
    report.to_csv("portfolio_report.csv", index=False)

asyncio.run(run_portfolio())

Part 7: Financial Data Pipeline with PostgreSQL

For production-grade financial data storage with proper indexing:

from sqlalchemy import create_engine, text
import pandas as pd

ENGINE = create_engine("postgresql://user:password@localhost:5432/findata")

def init_financial_schema():
    """Create tables optimised for time-series financial data."""
    with ENGINE.connect() as conn:
        conn.execute(text("""
            CREATE TABLE IF NOT EXISTS prices (
                id         BIGSERIAL PRIMARY KEY,
                ticker     TEXT NOT NULL,
                date       DATE NOT NULL,
                open       NUMERIC(12, 4),
                high       NUMERIC(12, 4),
                low        NUMERIC(12, 4),
                close      NUMERIC(12, 4),
                adj_close  NUMERIC(12, 4),
                volume     BIGINT,
                source     TEXT DEFAULT 'yfinance',
                UNIQUE (ticker, date)
            );

            CREATE INDEX IF NOT EXISTS idx_prices_ticker_date
                ON prices (ticker, date DESC);

            CREATE TABLE IF NOT EXISTS fundamentals (
                id           BIGSERIAL PRIMARY KEY,
                ticker       TEXT NOT NULL,
                scraped_at   TIMESTAMPTZ DEFAULT NOW(),
                market_cap   BIGINT,
                pe_ratio     NUMERIC(10, 2),
                eps          NUMERIC(10, 4),
                revenue      BIGINT,
                net_margin   NUMERIC(8, 4),
                data         JSONB
            );
        """))
        conn.commit()

def upsert_prices(df: pd.DataFrame):
    """Upsert price history — insert new, update existing."""
    df.to_sql("prices_staging", ENGINE, if_exists="replace", index=False)
    with ENGINE.connect() as conn:
        conn.execute(text("""
            INSERT INTO prices (ticker, date, open, high, low, close, volume)
            SELECT ticker, date, open, high, low, close, volume
            FROM prices_staging
            ON CONFLICT (ticker, date) DO UPDATE SET
                close  = EXCLUDED.close,
                volume = EXCLUDED.volume
        """))
        conn.commit()
    print(f"Upserted {len(df)} price records.")

Choosing Your Data Source

Source	Cost	Reliability	Best for
yfinance	Free	Medium (scraping-based)	Prototyping, personal projects
Alpha Vantage	Free / $50+/mo	High	Production stocks + forex + crypto
Polygon.io	Free / $29+/mo	Very High	Real-time, high-frequency data
Finnhub	Free / paid	High	News sentiment + fundamentals
CoinGecko	Free / paid	High	Crypto data at any scale
SEC EDGAR	Free (official)	Very High	Regulatory filings (US stocks)
Custom scraper	Free (infra cost)	Low-Medium	Earnings calendars, news, niche data

FAQ

Summary

Task	Tool	Notes
Quick stock history	`yfinance`	Free, but can break
Production stock data	Alpha Vantage / Polygon.io	Reliable, free tiers available
Crypto prices	CoinGecko API	Free, 10k+ coins
Financial news	Custom scraper	No API covers live news fully
Earnings calendar	Custom scraper	Earnings Whispers, Zacks
SEC filings	EDGAR JSON API	Official, free, legally clear
Portfolio tracking	`yfinance` + pandas	Combine with any price source

Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)

Why Financial Data Is the Most Valuable Data You Can Scrape

Part 1: yfinance — Still Useful, But Know Its Limits

Stock price history

Fundamental data: financials, balance sheet, dividends

Options chain data

Part 2: Alpha Vantage — Free API with Real Guarantees

Part 3: Scraping Financial News and Earnings Calendars

Reuters / Financial news scraper

Scraping the Earnings Calendar

Part 4: Crypto Data — Free, Real-Time, No API Key

Part 5: SEC EDGAR — Scraping Regulatory Filings

Part 6: Building a Portfolio Tracker

Part 7: Financial Data Pipeline with PostgreSQL

Choosing Your Data Source

FAQ

Summary

ZyVOP

Comments (0)

Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)

Why Financial Data Is the Most Valuable Data You Can Scrape

Part 1: yfinance — Still Useful, But Know Its Limits

Stock price history

Fundamental data: financials, balance sheet, dividends

Options chain data

Part 2: Alpha Vantage — Free API with Real Guarantees

Part 3: Scraping Financial News and Earnings Calendars

Reuters / Financial news scraper

Scraping the Earnings Calendar

Part 4: Crypto Data — Free, Real-Time, No API Key

Part 5: SEC EDGAR — Scraping Regulatory Filings

Part 6: Building a Portfolio Tracker

Part 7: Financial Data Pipeline with PostgreSQL

Choosing Your Data Source

FAQ

Summary

ZyVOP

Comments (0)

Popular Tags