ZyVOP Logo
Content That Connects
SeriesCategoriesTags
ZyVOP Logo
Content That Connects

Empowering developers and creators with cutting-edge insights, comprehensive tutorials, and innovative solutions for the digital future.

Content

  • Tags
  • Write Article

Company

  • About Us
  • Contact

Connect

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • DMCA Policy
  • Code of Conduct

© 2026 ZyVOP. Crafted with care for the developer community.

Made with ❤️ by the ZyVOP team
All systems operational
HomeScraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)

Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)

Build a complete Python financial data pipeline in 2026 — scrape stock prices, crypto data, earnings, and news using yfinance, Alpha Vantage, Polygon.io, and custom scrapers.

#python financial data scraping stocks crypto 2026#scrape yahoo finance python 2026#yfinance alternative python#python stock market data scraping#alpha vantage python tutorial#python crypto price scraper#scrapping 2026 python
Z
ZyVOP

Senior Developer

June 8, 2026
14 min read
4 views
Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)

Why Financial Data Is the Most Valuable Data You Can Scrape

Money moves on information. Stock prices react to earnings reports in milliseconds. Crypto markets swing violently on a single tweet. Interest rate decisions cascade through every asset class within seconds of announcement.

The developers and analysts who can collect, clean, and process financial data faster than others have a structural edge — whether they're building algorithmic trading strategies, financial dashboards, portfolio trackers, market research tools, or AI-powered investment assistants.

In 2026, the Python financial data ecosystem is rich but fragmented. The landscape looks like this:

  • yfinance remains the most popular free option, but relying on unofficial web scraping means it's prone to rate limits, blocked IPs, and broken requests — it broke significantly after Yahoo Finance's February 2025 redesign

  • Official financial APIs (Alpha Vantage, Polygon.io, Finnhub) offer reliability and proper data guarantees but cost money beyond their free tiers

  • Custom scrapers targeting financial news sites, SEC filings, and earnings call transcripts fill gaps that no API covers

This guide gives you the complete playbook: free tools for prototyping, production-grade APIs for serious work, and custom scrapers for the data nobody else is packaging.


Part 1: yfinance — Still Useful, But Know Its Limits

yfinance covers prices, financials, analyst estimates, options chains, and institutional holdings with no signup and no API key. It's free, with a comprehensive dataset going back 20+ years and an active community with 10,000+ GitHub stars.

The tradeoff is fragility — it's web scraping, not a true API. Yahoo can change endpoints without notice, and there is no SLA, support, or data accuracy guarantee.

Use yfinance for: prototyping, personal projects, learning, backtesting on historical data. Avoid it for: production trading systems, real-time alerts, mission-critical analysis.

pip install yfinance pandas matplotlib

Stock price history

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

def get_stock_history(
    ticker: str,
    period: str = "1y",     # Valid: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
    interval: str = "1d"    # Valid: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo
) -> pd.DataFrame:
    """
    Download historical OHLCV data for a stock ticker.

    Returns a DataFrame with columns:
    Open, High, Low, Close, Adj Close, Volume
    """
    stock = yf.Ticker(ticker)
    df    = stock.history(period=period, interval=interval)

    if df.empty:
        raise ValueError(f"No data returned for {ticker} — ticker may be invalid")

    # Clean up the index
    df.index = pd.to_datetime(df.index).tz_localize(None)
    df.index.name = "date"

    # Calculate daily return
    df["daily_return_pct"] = df["Close"].pct_change() * 100
    df["ticker"] = ticker

    return df

# Example: download 1 year of data for multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA", "RELIANCE.NS"]  # .NS for NSE India

dfs = {}
for ticker in tickers:
    try:
        dfs[ticker] = get_stock_history(ticker, period="1y")
        print(f"  {ticker}: {len(dfs[ticker])} trading days")
    except Exception as e:
        print(f"  {ticker}: Error — {e}")

# Combine all into one DataFrame
combined = pd.concat(dfs.values())
combined.to_csv("stock_history.csv")

Fundamental data: financials, balance sheet, dividends

def get_fundamentals(ticker: str) -> dict:
    """
    Collect comprehensive fundamental data for a stock.
    Returns income statement, balance sheet, cash flow, and key metrics.
    """
    stock = yf.Ticker(ticker)
    info  = stock.info

    return {
        "overview": {
            "name":              info.get("longName"),
            "sector":            info.get("sector"),
            "industry":          info.get("industry"),
            "country":           info.get("country"),
            "employees":         info.get("fullTimeEmployees"),
            "website":           info.get("website"),
            "description":       (info.get("longBusinessSummary") or "")[:300],
        },
        "valuation": {
            "market_cap":        info.get("marketCap"),
            "pe_ratio":          info.get("trailingPE"),
            "forward_pe":        info.get("forwardPE"),
            "peg_ratio":         info.get("pegRatio"),
            "price_to_book":     info.get("priceToBook"),
            "ev_to_ebitda":      info.get("enterpriseToEbitda"),
        },
        "profitability": {
            "revenue_ttm":       info.get("totalRevenue"),
            "gross_margin":      info.get("grossMargins"),
            "operating_margin":  info.get("operatingMargins"),
            "net_margin":        info.get("profitMargins"),
            "roe":               info.get("returnOnEquity"),
            "roa":               info.get("returnOnAssets"),
        },
        "dividends": {
            "dividend_yield":    info.get("dividendYield"),
            "dividend_rate":     info.get("dividendRate"),
            "payout_ratio":      info.get("payoutRatio"),
            "ex_dividend_date":  info.get("exDividendDate"),
        },
        "analyst": {
            "recommendation":    info.get("recommendationKey"),
            "mean_target":       info.get("targetMeanPrice"),
            "high_target":       info.get("targetHighPrice"),
            "low_target":        info.get("targetLowPrice"),
            "num_analysts":      info.get("numberOfAnalystOpinions"),
        },
        "earnings": {
            "eps_trailing":      info.get("trailingEps"),
            "eps_forward":       info.get("forwardEps"),
            "earnings_growth":   info.get("earningsGrowth"),
            "revenue_growth":    info.get("revenueGrowth"),
        }
    }

# Analyse a stock
data = get_fundamentals("AAPL")
print(f"\n── {data['overview']['name']} ──")
print(f"Sector:       {data['overview']['sector']}")
print(f"Market Cap:   ${data['valuation']['market_cap']:,.0f}")
print(f"P/E Ratio:    {data['valuation']['pe_ratio']}")
print(f"Net Margin:   {data['profitability']['net_margin']:.1%}")
print(f"Analyst:      {data['analyst']['recommendation'].upper()}")
print(f"Target Price: ${data['analyst']['mean_target']}")

Options chain data

def get_options_chain(ticker: str, expiry: str = None) -> dict:
    """
    Fetch options chain (calls and puts) for a stock.

    Args:
        ticker: Stock ticker symbol
        expiry: Expiration date string "YYYY-MM-DD". If None, uses nearest expiry.
    """
    stock = yf.Ticker(ticker)
    expiries = stock.options

    if not expiries:
        return {"error": f"No options data for {ticker}"}

    target_expiry = expiry if expiry in expiries else expiries[0]
    print(f"Using expiry: {target_expiry}")

    chain = stock.option_chain(target_expiry)

    return {
        "expiry":      target_expiry,
        "calls":       chain.calls,
        "puts":        chain.puts,
        "all_expiries": list(expiries),
    }

options = get_options_chain("AAPL")
calls_df = options["calls"]
# Find near-the-money calls
current_price = yf.Ticker("AAPL").info.get("regularMarketPrice", 0)
near_money = calls_df[
    (calls_df["strike"] >= current_price * 0.95) &
    (calls_df["strike"] <= current_price * 1.05)
]
print(near_money[["strike", "lastPrice", "bid", "ask", "impliedVolatility", "openInterest"]])

Part 2: Alpha Vantage — Free API with Real Guarantees

Alpha Vantage is a popular free and paid API that provides stock, forex, and cryptocurrency data. Unlike yfinance, it's a proper API with official endpoints, a free tier of 25 requests/day (500/day with a free key), and no risk of breaking when a website redesigns.

pip install requests pandas
# Get a free API key at: alphavantage.co/support/#api-key
import httpx
import pandas as pd
import asyncio
from datetime import datetime

ALPHA_VANTAGE_KEY = "YOUR_FREE_API_KEY"
BASE_URL = "https://www.alphavantage.co/query"

async def av_request(params: dict) -> dict:
    """Make a request to Alpha Vantage API."""
    params["apikey"] = ALPHA_VANTAGE_KEY
    async with httpx.AsyncClient() as client:
        r = await client.get(BASE_URL, params=params, timeout=20)
        r.raise_for_status()
        data = r.json()

    # Check for API error messages
    if "Note" in data:
        raise Exception(f"Rate limit hit: {data['Note']}")
    if "Error Message" in data:
        raise Exception(f"API error: {data['Error Message']}")

    return data


async def get_daily_prices(ticker: str, outputsize: str = "compact") -> pd.DataFrame:
    """
    Get daily OHLCV data.
    outputsize: "compact" = last 100 days, "full" = 20+ years
    """
    data = await av_request({
        "function":   "TIME_SERIES_DAILY",
        "symbol":     ticker,
        "outputsize": outputsize,
    })

    ts = data.get("Time Series (Daily)", {})
    rows = []
    for date_str, values in ts.items():
        rows.append({
            "date":   date_str,
            "open":   float(values["1. open"]),
            "high":   float(values["2. high"]),
            "low":    float(values["3. low"]),
            "close":  float(values["4. close"]),
            "volume": int(values["5. volume"]),
        })

    df = pd.DataFrame(rows)
    df["date"] = pd.to_datetime(df["date"])
    df = df.sort_values("date").reset_index(drop=True)
    df["ticker"] = ticker
    return df


async def get_company_overview(ticker: str) -> dict:
    """Get fundamental company data from Alpha Vantage."""
    data = await av_request({"function": "OVERVIEW", "symbol": ticker})
    return {
        "name":            data.get("Name"),
        "description":     (data.get("Description") or "")[:300],
        "sector":          data.get("Sector"),
        "industry":        data.get("Industry"),
        "market_cap":      data.get("MarketCapitalization"),
        "pe_ratio":        data.get("PERatio"),
        "eps":             data.get("EPS"),
        "dividend_yield":  data.get("DividendYield"),
        "52_week_high":    data.get("52WeekHigh"),
        "52_week_low":     data.get("52WeekLow"),
        "analyst_target":  data.get("AnalystTargetPrice"),
    }


async def get_forex_rate(from_currency: str, to_currency: str) -> dict:
    """Get real-time forex exchange rate."""
    data = await av_request({
        "function":      "CURRENCY_EXCHANGE_RATE",
        "from_currency": from_currency,
        "to_currency":   to_currency,
    })
    rate_data = data.get("Realtime Currency Exchange Rate", {})
    return {
        "from":       rate_data.get("1. From_Currency Code"),
        "to":         rate_data.get("3. To_Currency Code"),
        "rate":       float(rate_data.get("5. Exchange Rate", 0)),
        "last_updated": rate_data.get("6. Last Refreshed"),
    }


async def get_crypto_price(symbol: str, market: str = "USD") -> pd.DataFrame:
    """Get daily crypto price data."""
    data = await av_request({
        "function": "DIGITAL_CURRENCY_DAILY",
        "symbol":   symbol,
        "market":   market,
    })

    ts = data.get("Time Series (Digital Currency Daily)", {})
    rows = []
    for date_str, values in list(ts.items())[:30]:  # Last 30 days
        rows.append({
            "date":   date_str,
            "open":   float(values.get(f"1a. open ({market})", 0)),
            "high":   float(values.get(f"2a. high ({market})", 0)),
            "low":    float(values.get(f"3a. low ({market})", 0)),
            "close":  float(values.get(f"4a. close ({market})", 0)),
            "volume": float(values.get("5. volume", 0)),
        })

    df = pd.DataFrame(rows)
    df["date"] = pd.to_datetime(df["date"])
    return df.sort_values("date").reset_index(drop=True)


# Run examples
async def main():
    print("── Stock prices (AAPL, last 100 days) ──")
    df = await get_daily_prices("AAPL")
    print(df.tail(5))

    print("\n── Company overview ──")
    overview = await get_company_overview("AAPL")
    for k, v in overview.items():
        print(f"  {k}: {v}")

    print("\n── USD/INR forex rate ──")
    rate = await get_forex_rate("USD", "INR")
    print(f"  1 {rate['from']} = {rate['rate']:.2f} {rate['to']}")
    print(f"  Last updated: {rate['last_updated']}")

    print("\n── Bitcoin (last 30 days) ──")
    btc = await get_crypto_price("BTC", "USD")
    print(btc.tail(5))

asyncio.run(main())

Part 3: Scraping Financial News and Earnings Calendars

No API packages the real-time financial news that moves markets. Custom scrapers fill that gap.

Reuters / Financial news scraper

import httpx
import asyncio
from bs4 import BeautifulSoup
from curl_cffi.requests import AsyncSession
import pandas as pd
from datetime import datetime, timezone

async def scrape_financial_news(
    query: str,
    sources: list[str] = None,
    max_articles: int = 20
) -> pd.DataFrame:
    """
    Scrape financial news headlines from multiple sources.
    """
    if sources is None:
        sources = [
            f"https://finance.yahoo.com/search?q={query.replace(' ', '+')}",
            f"https://finviz.com/search.ashx?p={query.replace(' ', '+')}",
        ]

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120",
        "Accept-Language": "en-US,en;q=0.9",
    }

    articles = []

    async with AsyncSession(impersonate="chrome120") as session:
        for url in sources:
            try:
                r = await session.get(url, headers=headers, timeout=15)
                soup = BeautifulSoup(r.text, "lxml")

                # Generic news article extraction — adapt selectors per site
                # Yahoo Finance news items
                for item in soup.select("li.js-stream-content, div[data-test='LATEST_NEWS']"):
                    title_el = item.select_one("h3, a[data-ylk]")
                    link_el  = item.select_one("a[href]")
                    time_el  = item.select_one("time, span[data-reactid]")

                    if title_el:
                        articles.append({
                            "title":      title_el.get_text(strip=True),
                            "url":        link_el.get("href", "") if link_el else "",
                            "source":     url,
                            "scraped_at": datetime.now(timezone.utc).isoformat(),
                        })

                await asyncio.sleep(1.5)

            except Exception as e:
                print(f"  Error on {url}: {e}")

    df = pd.DataFrame(articles[:max_articles]).drop_duplicates(subset=["title"])
    return df


# Scrape news for a ticker
news_df = asyncio.run(scrape_financial_news("NVIDIA earnings 2026"))
print(news_df[["title", "scraped_at"]].head(10))
news_df.to_csv("financial_news.csv", index=False)

Scraping the Earnings Calendar

Earnings reports are the most market-moving events in equities. Knowing what's coming lets you position before the move.

async def get_earnings_calendar(week_offset: int = 0) -> pd.DataFrame:
    """
    Scrape the upcoming earnings calendar from Earnings Whispers.
    Returns companies reporting earnings in the target week.
    """
    # Earnings Whispers has a well-structured earnings calendar
    url = "https://www.earningswhispers.com/stocks"

    async with AsyncSession(impersonate="chrome120") as session:
        r = await session.get(
            url,
            headers={"User-Agent": "Mozilla/5.0 Chrome/120", "Accept-Language": "en-US"},
            timeout=20
        )

    soup = BeautifulSoup(r.text, "lxml")
    earnings = []

    # Each stock in the earnings calendar
    for item in soup.select(".earnings-estimates"):
        ticker_el = item.select_one(".ticker")
        name_el   = item.select_one(".company")
        date_el   = item.select_one(".date")
        time_el   = item.select_one(".time")   # BMO = Before Market Open, AMC = After
        eps_el    = item.select_one(".estimate")

        if ticker_el:
            earnings.append({
                "ticker":      ticker_el.get_text(strip=True),
                "company":     name_el.get_text(strip=True) if name_el else None,
                "report_date": date_el.get_text(strip=True) if date_el else None,
                "report_time": time_el.get_text(strip=True) if time_el else None,
                "eps_estimate": eps_el.get_text(strip=True) if eps_el else None,
            })

    return pd.DataFrame(earnings)


calendar = asyncio.run(get_earnings_calendar())
print(f"Found {len(calendar)} earnings reports this week")
print(calendar.head(10).to_string(index=False))

Part 4: Crypto Data — Free, Real-Time, No API Key

CoinGecko offers a generous free API that covers 10,000+ cryptocurrencies with real-time prices, market caps, historical data, and DeFi protocol data:

import httpx
import asyncio
import pandas as pd

COINGECKO_BASE = "https://api.coingecko.com/api/v3"

async def get_crypto_prices(
    coin_ids: list[str],
    vs_currency: str = "usd"
) -> pd.DataFrame:
    """
    Get current prices for multiple cryptocurrencies.
    No API key required on free tier (30 req/min).

    Common coin IDs: bitcoin, ethereum, solana, cardano, ripple
    Find IDs at: coingecko.com/en/all-cryptocurrencies
    """
    ids_str = ",".join(coin_ids)

    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/simple/price",
            params={
                "ids":                    ids_str,
                "vs_currencies":          vs_currency,
                "include_24hr_change":    "true",
                "include_24hr_vol":       "true",
                "include_market_cap":     "true",
                "include_last_updated_at":"true",
            },
            timeout=15
        )
        r.raise_for_status()
        data = r.json()

    rows = []
    for coin_id, values in data.items():
        rows.append({
            "coin_id":         coin_id,
            "price_usd":       values.get(vs_currency),
            "market_cap":      values.get(f"{vs_currency}_market_cap"),
            "volume_24h":      values.get(f"{vs_currency}_24h_vol"),
            "change_24h_pct":  values.get(f"{vs_currency}_24h_change"),
            "last_updated":    pd.to_datetime(
                                   values.get("last_updated_at"), unit="s"
                               ).isoformat() if values.get("last_updated_at") else None,
        })

    return pd.DataFrame(rows).sort_values("market_cap", ascending=False)


async def get_crypto_history(
    coin_id: str,
    days: int = 30,
    vs_currency: str = "usd"
) -> pd.DataFrame:
    """
    Get historical OHLCV data for a cryptocurrency.
    Free up to 365 days of daily candles.
    """
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/coins/{coin_id}/ohlc",
            params={"vs_currency": vs_currency, "days": days},
            timeout=15
        )
        r.raise_for_status()
        data = r.json()   # [[timestamp_ms, open, high, low, close], ...]

    df = pd.DataFrame(data, columns=["timestamp_ms", "open", "high", "low", "close"])
    df["date"] = pd.to_datetime(df["timestamp_ms"], unit="ms")
    df["coin"] = coin_id
    return df[["date", "coin", "open", "high", "low", "close"]]


async def get_defi_protocols() -> pd.DataFrame:
    """Get top DeFi protocols by Total Value Locked (TVL)."""
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"{COINGECKO_BASE}/defi",
            timeout=15
        )
        data = r.json()

    return pd.DataFrame([{
        "name":              data.get("defi_market_cap"),
        "defi_market_cap":   data.get("defi_market_cap"),
        "eth_market_cap":    data.get("eth_market_cap"),
        "defi_dominance":    data.get("defi_dominance"),
        "top_coin":          data.get("top_coin_name"),
        "top_coin_dominance":data.get("top_coin_defi_dominance"),
    }])


async def crypto_dashboard():
    """Build a live crypto dashboard."""
    print("── Fetching crypto prices ──")
    prices_df = await get_crypto_prices(
        ["bitcoin", "ethereum", "solana", "cardano", "chainlink", "polkadot"]
    )
    print(prices_df[["coin_id", "price_usd", "change_24h_pct", "market_cap"]].to_string(index=False))

    print("\n── Bitcoin 30-day history ──")
    btc_history = await get_crypto_history("bitcoin", days=30)
    print(btc_history.tail(5))

    btc_high  = btc_history["high"].max()
    btc_low   = btc_history["low"].min()
    print(f"\n30-day range: ${btc_low:,.2f} – ${btc_high:,.2f}")

asyncio.run(crypto_dashboard())

Part 5: SEC EDGAR — Scraping Regulatory Filings

The U.S. SEC's EDGAR database is the most legally unambiguous financial data source — it's explicitly public, free, and comprehensive. Every public company's 10-K (annual), 10-Q (quarterly), 8-K (events), and proxy filings live here.

import httpx
import asyncio
import json
import pandas as pd

EDGAR_BASE = "https://data.sec.gov"
EDGAR_API  = "https://efts.sec.gov/LATEST/search-index"

# SEC requires a descriptive User-Agent with contact info
EDGAR_HEADERS = {
    "User-Agent": "FinancialResearchBot/1.0 (contact@yourdomain.com)",
    "Accept-Encoding": "gzip, deflate",
}

async def get_company_filings(ticker: str) -> dict:
    """
    Get all SEC filings for a company by ticker.
    Uses SEC EDGAR's official JSON API.
    """
    # Step 1: Look up CIK number from ticker
    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(
            "https://www.sec.gov/files/company_tickers.json",
            timeout=15
        )
        tickers_data = r.json()

    # Find CIK for this ticker
    cik = None
    for entry in tickers_data.values():
        if entry.get("ticker", "").upper() == ticker.upper():
            cik = str(entry["cik_str"]).zfill(10)
            company_name = entry.get("title")
            break

    if not cik:
        raise ValueError(f"Ticker {ticker} not found in EDGAR")

    print(f"Found {ticker}: {company_name} (CIK: {cik})")

    # Step 2: Get all filings for this CIK
    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(
            f"{EDGAR_BASE}/submissions/CIK{cik}.json",
            timeout=15
        )
        data = r.json()

    recent = data.get("filings", {}).get("recent", {})
    filings_df = pd.DataFrame({
        "form":          recent.get("form", []),
        "filed_at":      recent.get("filingDate", []),
        "report_date":   recent.get("reportDate", []),
        "accession_num": recent.get("accessionNumber", []),
        "description":   recent.get("primaryDocument", []),
    })

    return {
        "cik":          cik,
        "company":      company_name,
        "all_filings":  filings_df,
        "10k_filings":  filings_df[filings_df["form"] == "10-K"],
        "10q_filings":  filings_df[filings_df["form"] == "10-Q"],
        "8k_filings":   filings_df[filings_df["form"] == "8-K"],
    }


async def download_filing_text(cik: str, accession_num: str) -> str:
    """Download the full text of a specific SEC filing."""
    # Reformat accession number for URL: remove dashes
    acc_clean = accession_num.replace("-", "")
    cik_clean = cik.lstrip("0")

    index_url = (
        f"https://www.sec.gov/Archives/edgar/data/"
        f"{cik_clean}/{acc_clean}/{accession_num}-index.htm"
    )

    async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
        r = await client.get(index_url, timeout=20)
        soup = BeautifulSoup(r.text, "lxml")

        # Find the primary document link
        doc_link = soup.select_one("table.tableFile2 a")
        if not doc_link:
            return ""

        doc_url = f"https://www.sec.gov{doc_link['href']}"
        doc_r   = await client.get(doc_url, timeout=30)
        return doc_r.text


async def main_edgar():
    filings = await get_company_filings("AAPL")

    print(f"\n── {filings['company']} SEC Filings ──")
    print(f"10-K (Annual): {len(filings['10k_filings'])} filings")
    print(f"10-Q (Quarterly): {len(filings['10q_filings'])} filings")
    print(f"8-K (Events): {len(filings['8k_filings'])} filings")

    print("\nMost recent 10-K filings:")
    print(filings["10k_filings"].head(3).to_string(index=False))

asyncio.run(main_edgar())

Part 6: Building a Portfolio Tracker

Combine everything into a live portfolio monitoring dashboard:

import asyncio
import pandas as pd
from datetime import datetime, timezone
import yfinance as yf

PORTFOLIO = {
    # ticker: (shares, avg_buy_price)
    "AAPL":        (10,  175.50),
    "MSFT":        (5,   380.00),
    "NVDA":        (3,   450.00),
    "RELIANCE.NS": (50,  2800.00),
}

async def get_current_prices(tickers: list[str]) -> dict:
    """Get current prices for all portfolio tickers."""
    prices = {}
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            info  = stock.fast_info
            prices[ticker] = {
                "price":       info.last_price,
                "prev_close":  info.previous_close,
                "currency":    info.currency,
            }
        except Exception:
            prices[ticker] = {"price": None, "prev_close": None, "currency": "USD"}
    return prices


def build_portfolio_report(portfolio: dict, current_prices: dict) -> pd.DataFrame:
    """Generate a portfolio performance report."""
    rows = []
    for ticker, (shares, avg_price) in portfolio.items():
        price_data   = current_prices.get(ticker, {})
        current      = price_data.get("price") or 0
        prev_close   = price_data.get("prev_close") or 0
        currency     = price_data.get("currency", "USD")

        cost_basis   = shares * avg_price
        market_value = shares * current
        gain_loss    = market_value - cost_basis
        gain_pct     = (gain_loss / cost_basis * 100) if cost_basis else 0
        day_change   = ((current - prev_close) / prev_close * 100) if prev_close else 0

        rows.append({
            "ticker":       ticker,
            "shares":       shares,
            "avg_price":    avg_price,
            "current":      round(current, 2),
            "cost_basis":   round(cost_basis, 2),
            "market_value": round(market_value, 2),
            "gain_loss":    round(gain_loss, 2),
            "gain_pct":     round(gain_pct, 2),
            "day_change_%": round(day_change, 2),
            "currency":     currency,
        })

    df = pd.DataFrame(rows)

    total_cost   = df["cost_basis"].sum()
    total_value  = df["market_value"].sum()
    total_gain   = df["gain_loss"].sum()
    total_return = (total_gain / total_cost * 100) if total_cost else 0

    print(f"\n{'═'*60}")
    print(f"  PORTFOLIO REPORT — {datetime.now().strftime('%Y-%m-%d %H:%M')}")
    print(f"{'═'*60}")
    print(df.to_string(index=False))
    print(f"\n  Total invested:  ${total_cost:>10,.2f}")
    print(f"  Market value:    ${total_value:>10,.2f}")
    print(f"  Total P&L:       ${total_gain:>10,.2f} ({total_return:+.2f}%)")
    print(f"{'═'*60}")

    return df


async def run_portfolio():
    tickers       = list(PORTFOLIO.keys())
    current       = await get_current_prices(tickers)
    report        = build_portfolio_report(PORTFOLIO, current)
    report.to_csv("portfolio_report.csv", index=False)

asyncio.run(run_portfolio())

Part 7: Financial Data Pipeline with PostgreSQL

For production-grade financial data storage with proper indexing:

from sqlalchemy import create_engine, text
import pandas as pd

ENGINE = create_engine("postgresql://user:password@localhost:5432/findata")

def init_financial_schema():
    """Create tables optimised for time-series financial data."""
    with ENGINE.connect() as conn:
        conn.execute(text("""
            CREATE TABLE IF NOT EXISTS prices (
                id         BIGSERIAL PRIMARY KEY,
                ticker     TEXT NOT NULL,
                date       DATE NOT NULL,
                open       NUMERIC(12, 4),
                high       NUMERIC(12, 4),
                low        NUMERIC(12, 4),
                close      NUMERIC(12, 4),
                adj_close  NUMERIC(12, 4),
                volume     BIGINT,
                source     TEXT DEFAULT 'yfinance',
                UNIQUE (ticker, date)
            );

            CREATE INDEX IF NOT EXISTS idx_prices_ticker_date
                ON prices (ticker, date DESC);

            CREATE TABLE IF NOT EXISTS fundamentals (
                id           BIGSERIAL PRIMARY KEY,
                ticker       TEXT NOT NULL,
                scraped_at   TIMESTAMPTZ DEFAULT NOW(),
                market_cap   BIGINT,
                pe_ratio     NUMERIC(10, 2),
                eps          NUMERIC(10, 4),
                revenue      BIGINT,
                net_margin   NUMERIC(8, 4),
                data         JSONB
            );
        """))
        conn.commit()

def upsert_prices(df: pd.DataFrame):
    """Upsert price history — insert new, update existing."""
    df.to_sql("prices_staging", ENGINE, if_exists="replace", index=False)
    with ENGINE.connect() as conn:
        conn.execute(text("""
            INSERT INTO prices (ticker, date, open, high, low, close, volume)
            SELECT ticker, date, open, high, low, close, volume
            FROM prices_staging
            ON CONFLICT (ticker, date) DO UPDATE SET
                close  = EXCLUDED.close,
                volume = EXCLUDED.volume
        """))
        conn.commit()
    print(f"Upserted {len(df)} price records.")

Choosing Your Data Source

Source

Cost

Reliability

Best for

yfinance

Free

Medium (scraping-based)

Prototyping, personal projects

Alpha Vantage

Free / $50+/mo

High

Production stocks + forex + crypto

Polygon.io

Free / $29+/mo

Very High

Real-time, high-frequency data

Finnhub

Free / paid

High

News sentiment + fundamentals

CoinGecko

Free / paid

High

Crypto data at any scale

SEC EDGAR

Free (official)

Very High

Regulatory filings (US stocks)

Custom scraper

Free (infra cost)

Low-Medium

Earnings calendars, news, niche data


FAQ

Q: Is yfinance reliable enough for backtesting trading strategies? For historical data exploration, yes. For production trading decisions, no — the scraping-based approach means occasional data gaps, split-adjustment errors, and outages. Use Polygon.io or a proper data vendor for anything where data quality directly affects money.

Q: What's the best free crypto data source in 2026? CoinGecko's free API covers 10,000+ coins with current prices, market caps, historical OHLCV, and DeFi data at 30 requests/minute. No API key needed for basic usage.

Q: Can I use SEC EDGAR data commercially? Yes — SEC EDGAR data is explicitly in the public domain and freely available for commercial use. The only requirement is identifying your User-Agent with contact information in the request header.

Q: How do I get Indian stock market data (NSE/BSE)? Append .NS for NSE or .BO for BSE to ticker symbols in yfinance (e.g. RELIANCE.NS). For a proper Indian market API, NSE India has an unofficial endpoint, and vendors like Zerodha's Kite Connect offer official API access.


Summary

Task

Tool

Notes

Quick stock history

yfinance

Free, but can break

Production stock data

Alpha Vantage / Polygon.io

Reliable, free tiers available

Crypto prices

CoinGecko API

Free, 10k+ coins

Financial news

Custom scraper

No API covers live news fully

Earnings calendar

Custom scraper

Earnings Whispers, Zacks

SEC filings

EDGAR JSON API

Official, free, legally clear

Portfolio tracking

yfinance + pandas

Combine with any price source

Z

ZyVOP

Passionate developer sharing knowledge about modern web technologies and best practices.

Comments (0)

Login to post a comment.

Table of Contents

Why Financial Data Is the Most Valuable Data You Can ScrapePart 1: yfinance — Still Useful, But Know Its LimitsStock price historyFundamental data: financials, balance sheet, dividendsOptions chain dataPart 2: Alpha Vantage — Free API with Real GuaranteesPart 3: Scraping Financial News and Earnings CalendarsReuters / Financial news scraperScraping the Earnings CalendarPart 4: Crypto Data — Free, Real-Time, No API KeyPart 5: SEC EDGAR — Scraping Regulatory FilingsPart 6: Building a Portfolio TrackerPart 7: Financial Data Pipeline with PostgreSQLChoosing Your Data SourceFAQSummary

Stay Updated

Get the latest articles delivered to your inbox.

We respect your privacy. Unsubscribe anytime.

Popular Tags

#.env.example Node.js#0x profiling#10x faster python scraper tutorial#12-factor#2026#AI#AI agents#AI code security#AI coding#AI coding tools 2026