Scraping Financial Data with Python: Stocks, Crypto & Market Intelligence (2026)
Build a complete Python financial data pipeline in 2026 — scrape stock prices, crypto data, earnings, and news using yfinance, Alpha Vantage, Polygon.io, and custom scrapers.
Senior Developer

Why Financial Data Is the Most Valuable Data You Can Scrape
Money moves on information. Stock prices react to earnings reports in milliseconds. Crypto markets swing violently on a single tweet. Interest rate decisions cascade through every asset class within seconds of announcement.
The developers and analysts who can collect, clean, and process financial data faster than others have a structural edge — whether they're building algorithmic trading strategies, financial dashboards, portfolio trackers, market research tools, or AI-powered investment assistants.
In 2026, the Python financial data ecosystem is rich but fragmented. The landscape looks like this:
yfinance remains the most popular free option, but relying on unofficial web scraping means it's prone to rate limits, blocked IPs, and broken requests — it broke significantly after Yahoo Finance's February 2025 redesign
Official financial APIs (Alpha Vantage, Polygon.io, Finnhub) offer reliability and proper data guarantees but cost money beyond their free tiers
Custom scrapers targeting financial news sites, SEC filings, and earnings call transcripts fill gaps that no API covers
This guide gives you the complete playbook: free tools for prototyping, production-grade APIs for serious work, and custom scrapers for the data nobody else is packaging.
Part 1: yfinance — Still Useful, But Know Its Limits
yfinance covers prices, financials, analyst estimates, options chains, and institutional holdings with no signup and no API key. It's free, with a comprehensive dataset going back 20+ years and an active community with 10,000+ GitHub stars.
The tradeoff is fragility — it's web scraping, not a true API. Yahoo can change endpoints without notice, and there is no SLA, support, or data accuracy guarantee.
Use yfinance for: prototyping, personal projects, learning, backtesting on historical data. Avoid it for: production trading systems, real-time alerts, mission-critical analysis.
pip install yfinance pandas matplotlib
Stock price history
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
def get_stock_history(
ticker: str,
period: str = "1y", # Valid: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
interval: str = "1d" # Valid: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo
) -> pd.DataFrame:
"""
Download historical OHLCV data for a stock ticker.
Returns a DataFrame with columns:
Open, High, Low, Close, Adj Close, Volume
"""
stock = yf.Ticker(ticker)
df = stock.history(period=period, interval=interval)
if df.empty:
raise ValueError(f"No data returned for {ticker} — ticker may be invalid")
# Clean up the index
df.index = pd.to_datetime(df.index).tz_localize(None)
df.index.name = "date"
# Calculate daily return
df["daily_return_pct"] = df["Close"].pct_change() * 100
df["ticker"] = ticker
return df
# Example: download 1 year of data for multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA", "RELIANCE.NS"] # .NS for NSE India
dfs = {}
for ticker in tickers:
try:
dfs[ticker] = get_stock_history(ticker, period="1y")
print(f" {ticker}: {len(dfs[ticker])} trading days")
except Exception as e:
print(f" {ticker}: Error — {e}")
# Combine all into one DataFrame
combined = pd.concat(dfs.values())
combined.to_csv("stock_history.csv")Fundamental data: financials, balance sheet, dividends
def get_fundamentals(ticker: str) -> dict:
"""
Collect comprehensive fundamental data for a stock.
Returns income statement, balance sheet, cash flow, and key metrics.
"""
stock = yf.Ticker(ticker)
info = stock.info
return {
"overview": {
"name": info.get("longName"),
"sector": info.get("sector"),
"industry": info.get("industry"),
"country": info.get("country"),
"employees": info.get("fullTimeEmployees"),
"website": info.get("website"),
"description": (info.get("longBusinessSummary") or "")[:300],
},
"valuation": {
"market_cap": info.get("marketCap"),
"pe_ratio": info.get("trailingPE"),
"forward_pe": info.get("forwardPE"),
"peg_ratio": info.get("pegRatio"),
"price_to_book": info.get("priceToBook"),
"ev_to_ebitda": info.get("enterpriseToEbitda"),
},
"profitability": {
"revenue_ttm": info.get("totalRevenue"),
"gross_margin": info.get("grossMargins"),
"operating_margin": info.get("operatingMargins"),
"net_margin": info.get("profitMargins"),
"roe": info.get("returnOnEquity"),
"roa": info.get("returnOnAssets"),
},
"dividends": {
"dividend_yield": info.get("dividendYield"),
"dividend_rate": info.get("dividendRate"),
"payout_ratio": info.get("payoutRatio"),
"ex_dividend_date": info.get("exDividendDate"),
},
"analyst": {
"recommendation": info.get("recommendationKey"),
"mean_target": info.get("targetMeanPrice"),
"high_target": info.get("targetHighPrice"),
"low_target": info.get("targetLowPrice"),
"num_analysts": info.get("numberOfAnalystOpinions"),
},
"earnings": {
"eps_trailing": info.get("trailingEps"),
"eps_forward": info.get("forwardEps"),
"earnings_growth": info.get("earningsGrowth"),
"revenue_growth": info.get("revenueGrowth"),
}
}
# Analyse a stock
data = get_fundamentals("AAPL")
print(f"\n── {data['overview']['name']} ──")
print(f"Sector: {data['overview']['sector']}")
print(f"Market Cap: ${data['valuation']['market_cap']:,.0f}")
print(f"P/E Ratio: {data['valuation']['pe_ratio']}")
print(f"Net Margin: {data['profitability']['net_margin']:.1%}")
print(f"Analyst: {data['analyst']['recommendation'].upper()}")
print(f"Target Price: ${data['analyst']['mean_target']}")Options chain data
def get_options_chain(ticker: str, expiry: str = None) -> dict:
"""
Fetch options chain (calls and puts) for a stock.
Args:
ticker: Stock ticker symbol
expiry: Expiration date string "YYYY-MM-DD". If None, uses nearest expiry.
"""
stock = yf.Ticker(ticker)
expiries = stock.options
if not expiries:
return {"error": f"No options data for {ticker}"}
target_expiry = expiry if expiry in expiries else expiries[0]
print(f"Using expiry: {target_expiry}")
chain = stock.option_chain(target_expiry)
return {
"expiry": target_expiry,
"calls": chain.calls,
"puts": chain.puts,
"all_expiries": list(expiries),
}
options = get_options_chain("AAPL")
calls_df = options["calls"]
# Find near-the-money calls
current_price = yf.Ticker("AAPL").info.get("regularMarketPrice", 0)
near_money = calls_df[
(calls_df["strike"] >= current_price * 0.95) &
(calls_df["strike"] <= current_price * 1.05)
]
print(near_money[["strike", "lastPrice", "bid", "ask", "impliedVolatility", "openInterest"]])Part 2: Alpha Vantage — Free API with Real Guarantees
Alpha Vantage is a popular free and paid API that provides stock, forex, and cryptocurrency data. Unlike yfinance, it's a proper API with official endpoints, a free tier of 25 requests/day (500/day with a free key), and no risk of breaking when a website redesigns.
pip install requests pandas
# Get a free API key at: alphavantage.co/support/#api-keyimport httpx
import pandas as pd
import asyncio
from datetime import datetime
ALPHA_VANTAGE_KEY = "YOUR_FREE_API_KEY"
BASE_URL = "https://www.alphavantage.co/query"
async def av_request(params: dict) -> dict:
"""Make a request to Alpha Vantage API."""
params["apikey"] = ALPHA_VANTAGE_KEY
async with httpx.AsyncClient() as client:
r = await client.get(BASE_URL, params=params, timeout=20)
r.raise_for_status()
data = r.json()
# Check for API error messages
if "Note" in data:
raise Exception(f"Rate limit hit: {data['Note']}")
if "Error Message" in data:
raise Exception(f"API error: {data['Error Message']}")
return data
async def get_daily_prices(ticker: str, outputsize: str = "compact") -> pd.DataFrame:
"""
Get daily OHLCV data.
outputsize: "compact" = last 100 days, "full" = 20+ years
"""
data = await av_request({
"function": "TIME_SERIES_DAILY",
"symbol": ticker,
"outputsize": outputsize,
})
ts = data.get("Time Series (Daily)", {})
rows = []
for date_str, values in ts.items():
rows.append({
"date": date_str,
"open": float(values["1. open"]),
"high": float(values["2. high"]),
"low": float(values["3. low"]),
"close": float(values["4. close"]),
"volume": int(values["5. volume"]),
})
df = pd.DataFrame(rows)
df["date"] = pd.to_datetime(df["date"])
df = df.sort_values("date").reset_index(drop=True)
df["ticker"] = ticker
return df
async def get_company_overview(ticker: str) -> dict:
"""Get fundamental company data from Alpha Vantage."""
data = await av_request({"function": "OVERVIEW", "symbol": ticker})
return {
"name": data.get("Name"),
"description": (data.get("Description") or "")[:300],
"sector": data.get("Sector"),
"industry": data.get("Industry"),
"market_cap": data.get("MarketCapitalization"),
"pe_ratio": data.get("PERatio"),
"eps": data.get("EPS"),
"dividend_yield": data.get("DividendYield"),
"52_week_high": data.get("52WeekHigh"),
"52_week_low": data.get("52WeekLow"),
"analyst_target": data.get("AnalystTargetPrice"),
}
async def get_forex_rate(from_currency: str, to_currency: str) -> dict:
"""Get real-time forex exchange rate."""
data = await av_request({
"function": "CURRENCY_EXCHANGE_RATE",
"from_currency": from_currency,
"to_currency": to_currency,
})
rate_data = data.get("Realtime Currency Exchange Rate", {})
return {
"from": rate_data.get("1. From_Currency Code"),
"to": rate_data.get("3. To_Currency Code"),
"rate": float(rate_data.get("5. Exchange Rate", 0)),
"last_updated": rate_data.get("6. Last Refreshed"),
}
async def get_crypto_price(symbol: str, market: str = "USD") -> pd.DataFrame:
"""Get daily crypto price data."""
data = await av_request({
"function": "DIGITAL_CURRENCY_DAILY",
"symbol": symbol,
"market": market,
})
ts = data.get("Time Series (Digital Currency Daily)", {})
rows = []
for date_str, values in list(ts.items())[:30]: # Last 30 days
rows.append({
"date": date_str,
"open": float(values.get(f"1a. open ({market})", 0)),
"high": float(values.get(f"2a. high ({market})", 0)),
"low": float(values.get(f"3a. low ({market})", 0)),
"close": float(values.get(f"4a. close ({market})", 0)),
"volume": float(values.get("5. volume", 0)),
})
df = pd.DataFrame(rows)
df["date"] = pd.to_datetime(df["date"])
return df.sort_values("date").reset_index(drop=True)
# Run examples
async def main():
print("── Stock prices (AAPL, last 100 days) ──")
df = await get_daily_prices("AAPL")
print(df.tail(5))
print("\n── Company overview ──")
overview = await get_company_overview("AAPL")
for k, v in overview.items():
print(f" {k}: {v}")
print("\n── USD/INR forex rate ──")
rate = await get_forex_rate("USD", "INR")
print(f" 1 {rate['from']} = {rate['rate']:.2f} {rate['to']}")
print(f" Last updated: {rate['last_updated']}")
print("\n── Bitcoin (last 30 days) ──")
btc = await get_crypto_price("BTC", "USD")
print(btc.tail(5))
asyncio.run(main())Part 3: Scraping Financial News and Earnings Calendars
No API packages the real-time financial news that moves markets. Custom scrapers fill that gap.
Reuters / Financial news scraper
import httpx
import asyncio
from bs4 import BeautifulSoup
from curl_cffi.requests import AsyncSession
import pandas as pd
from datetime import datetime, timezone
async def scrape_financial_news(
query: str,
sources: list[str] = None,
max_articles: int = 20
) -> pd.DataFrame:
"""
Scrape financial news headlines from multiple sources.
"""
if sources is None:
sources = [
f"https://finance.yahoo.com/search?q={query.replace(' ', '+')}",
f"https://finviz.com/search.ashx?p={query.replace(' ', '+')}",
]
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120",
"Accept-Language": "en-US,en;q=0.9",
}
articles = []
async with AsyncSession(impersonate="chrome120") as session:
for url in sources:
try:
r = await session.get(url, headers=headers, timeout=15)
soup = BeautifulSoup(r.text, "lxml")
# Generic news article extraction — adapt selectors per site
# Yahoo Finance news items
for item in soup.select("li.js-stream-content, div[data-test='LATEST_NEWS']"):
title_el = item.select_one("h3, a[data-ylk]")
link_el = item.select_one("a[href]")
time_el = item.select_one("time, span[data-reactid]")
if title_el:
articles.append({
"title": title_el.get_text(strip=True),
"url": link_el.get("href", "") if link_el else "",
"source": url,
"scraped_at": datetime.now(timezone.utc).isoformat(),
})
await asyncio.sleep(1.5)
except Exception as e:
print(f" Error on {url}: {e}")
df = pd.DataFrame(articles[:max_articles]).drop_duplicates(subset=["title"])
return df
# Scrape news for a ticker
news_df = asyncio.run(scrape_financial_news("NVIDIA earnings 2026"))
print(news_df[["title", "scraped_at"]].head(10))
news_df.to_csv("financial_news.csv", index=False)Scraping the Earnings Calendar
Earnings reports are the most market-moving events in equities. Knowing what's coming lets you position before the move.
async def get_earnings_calendar(week_offset: int = 0) -> pd.DataFrame:
"""
Scrape the upcoming earnings calendar from Earnings Whispers.
Returns companies reporting earnings in the target week.
"""
# Earnings Whispers has a well-structured earnings calendar
url = "https://www.earningswhispers.com/stocks"
async with AsyncSession(impersonate="chrome120") as session:
r = await session.get(
url,
headers={"User-Agent": "Mozilla/5.0 Chrome/120", "Accept-Language": "en-US"},
timeout=20
)
soup = BeautifulSoup(r.text, "lxml")
earnings = []
# Each stock in the earnings calendar
for item in soup.select(".earnings-estimates"):
ticker_el = item.select_one(".ticker")
name_el = item.select_one(".company")
date_el = item.select_one(".date")
time_el = item.select_one(".time") # BMO = Before Market Open, AMC = After
eps_el = item.select_one(".estimate")
if ticker_el:
earnings.append({
"ticker": ticker_el.get_text(strip=True),
"company": name_el.get_text(strip=True) if name_el else None,
"report_date": date_el.get_text(strip=True) if date_el else None,
"report_time": time_el.get_text(strip=True) if time_el else None,
"eps_estimate": eps_el.get_text(strip=True) if eps_el else None,
})
return pd.DataFrame(earnings)
calendar = asyncio.run(get_earnings_calendar())
print(f"Found {len(calendar)} earnings reports this week")
print(calendar.head(10).to_string(index=False))Part 4: Crypto Data — Free, Real-Time, No API Key
CoinGecko offers a generous free API that covers 10,000+ cryptocurrencies with real-time prices, market caps, historical data, and DeFi protocol data:
import httpx
import asyncio
import pandas as pd
COINGECKO_BASE = "https://api.coingecko.com/api/v3"
async def get_crypto_prices(
coin_ids: list[str],
vs_currency: str = "usd"
) -> pd.DataFrame:
"""
Get current prices for multiple cryptocurrencies.
No API key required on free tier (30 req/min).
Common coin IDs: bitcoin, ethereum, solana, cardano, ripple
Find IDs at: coingecko.com/en/all-cryptocurrencies
"""
ids_str = ",".join(coin_ids)
async with httpx.AsyncClient() as client:
r = await client.get(
f"{COINGECKO_BASE}/simple/price",
params={
"ids": ids_str,
"vs_currencies": vs_currency,
"include_24hr_change": "true",
"include_24hr_vol": "true",
"include_market_cap": "true",
"include_last_updated_at":"true",
},
timeout=15
)
r.raise_for_status()
data = r.json()
rows = []
for coin_id, values in data.items():
rows.append({
"coin_id": coin_id,
"price_usd": values.get(vs_currency),
"market_cap": values.get(f"{vs_currency}_market_cap"),
"volume_24h": values.get(f"{vs_currency}_24h_vol"),
"change_24h_pct": values.get(f"{vs_currency}_24h_change"),
"last_updated": pd.to_datetime(
values.get("last_updated_at"), unit="s"
).isoformat() if values.get("last_updated_at") else None,
})
return pd.DataFrame(rows).sort_values("market_cap", ascending=False)
async def get_crypto_history(
coin_id: str,
days: int = 30,
vs_currency: str = "usd"
) -> pd.DataFrame:
"""
Get historical OHLCV data for a cryptocurrency.
Free up to 365 days of daily candles.
"""
async with httpx.AsyncClient() as client:
r = await client.get(
f"{COINGECKO_BASE}/coins/{coin_id}/ohlc",
params={"vs_currency": vs_currency, "days": days},
timeout=15
)
r.raise_for_status()
data = r.json() # [[timestamp_ms, open, high, low, close], ...]
df = pd.DataFrame(data, columns=["timestamp_ms", "open", "high", "low", "close"])
df["date"] = pd.to_datetime(df["timestamp_ms"], unit="ms")
df["coin"] = coin_id
return df[["date", "coin", "open", "high", "low", "close"]]
async def get_defi_protocols() -> pd.DataFrame:
"""Get top DeFi protocols by Total Value Locked (TVL)."""
async with httpx.AsyncClient() as client:
r = await client.get(
f"{COINGECKO_BASE}/defi",
timeout=15
)
data = r.json()
return pd.DataFrame([{
"name": data.get("defi_market_cap"),
"defi_market_cap": data.get("defi_market_cap"),
"eth_market_cap": data.get("eth_market_cap"),
"defi_dominance": data.get("defi_dominance"),
"top_coin": data.get("top_coin_name"),
"top_coin_dominance":data.get("top_coin_defi_dominance"),
}])
async def crypto_dashboard():
"""Build a live crypto dashboard."""
print("── Fetching crypto prices ──")
prices_df = await get_crypto_prices(
["bitcoin", "ethereum", "solana", "cardano", "chainlink", "polkadot"]
)
print(prices_df[["coin_id", "price_usd", "change_24h_pct", "market_cap"]].to_string(index=False))
print("\n── Bitcoin 30-day history ──")
btc_history = await get_crypto_history("bitcoin", days=30)
print(btc_history.tail(5))
btc_high = btc_history["high"].max()
btc_low = btc_history["low"].min()
print(f"\n30-day range: ${btc_low:,.2f} – ${btc_high:,.2f}")
asyncio.run(crypto_dashboard())Part 5: SEC EDGAR — Scraping Regulatory Filings
The U.S. SEC's EDGAR database is the most legally unambiguous financial data source — it's explicitly public, free, and comprehensive. Every public company's 10-K (annual), 10-Q (quarterly), 8-K (events), and proxy filings live here.
import httpx
import asyncio
import json
import pandas as pd
EDGAR_BASE = "https://data.sec.gov"
EDGAR_API = "https://efts.sec.gov/LATEST/search-index"
# SEC requires a descriptive User-Agent with contact info
EDGAR_HEADERS = {
"User-Agent": "FinancialResearchBot/1.0 (contact@yourdomain.com)",
"Accept-Encoding": "gzip, deflate",
}
async def get_company_filings(ticker: str) -> dict:
"""
Get all SEC filings for a company by ticker.
Uses SEC EDGAR's official JSON API.
"""
# Step 1: Look up CIK number from ticker
async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
r = await client.get(
"https://www.sec.gov/files/company_tickers.json",
timeout=15
)
tickers_data = r.json()
# Find CIK for this ticker
cik = None
for entry in tickers_data.values():
if entry.get("ticker", "").upper() == ticker.upper():
cik = str(entry["cik_str"]).zfill(10)
company_name = entry.get("title")
break
if not cik:
raise ValueError(f"Ticker {ticker} not found in EDGAR")
print(f"Found {ticker}: {company_name} (CIK: {cik})")
# Step 2: Get all filings for this CIK
async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
r = await client.get(
f"{EDGAR_BASE}/submissions/CIK{cik}.json",
timeout=15
)
data = r.json()
recent = data.get("filings", {}).get("recent", {})
filings_df = pd.DataFrame({
"form": recent.get("form", []),
"filed_at": recent.get("filingDate", []),
"report_date": recent.get("reportDate", []),
"accession_num": recent.get("accessionNumber", []),
"description": recent.get("primaryDocument", []),
})
return {
"cik": cik,
"company": company_name,
"all_filings": filings_df,
"10k_filings": filings_df[filings_df["form"] == "10-K"],
"10q_filings": filings_df[filings_df["form"] == "10-Q"],
"8k_filings": filings_df[filings_df["form"] == "8-K"],
}
async def download_filing_text(cik: str, accession_num: str) -> str:
"""Download the full text of a specific SEC filing."""
# Reformat accession number for URL: remove dashes
acc_clean = accession_num.replace("-", "")
cik_clean = cik.lstrip("0")
index_url = (
f"https://www.sec.gov/Archives/edgar/data/"
f"{cik_clean}/{acc_clean}/{accession_num}-index.htm"
)
async with httpx.AsyncClient(headers=EDGAR_HEADERS) as client:
r = await client.get(index_url, timeout=20)
soup = BeautifulSoup(r.text, "lxml")
# Find the primary document link
doc_link = soup.select_one("table.tableFile2 a")
if not doc_link:
return ""
doc_url = f"https://www.sec.gov{doc_link['href']}"
doc_r = await client.get(doc_url, timeout=30)
return doc_r.text
async def main_edgar():
filings = await get_company_filings("AAPL")
print(f"\n── {filings['company']} SEC Filings ──")
print(f"10-K (Annual): {len(filings['10k_filings'])} filings")
print(f"10-Q (Quarterly): {len(filings['10q_filings'])} filings")
print(f"8-K (Events): {len(filings['8k_filings'])} filings")
print("\nMost recent 10-K filings:")
print(filings["10k_filings"].head(3).to_string(index=False))
asyncio.run(main_edgar())Part 6: Building a Portfolio Tracker
Combine everything into a live portfolio monitoring dashboard:
import asyncio
import pandas as pd
from datetime import datetime, timezone
import yfinance as yf
PORTFOLIO = {
# ticker: (shares, avg_buy_price)
"AAPL": (10, 175.50),
"MSFT": (5, 380.00),
"NVDA": (3, 450.00),
"RELIANCE.NS": (50, 2800.00),
}
async def get_current_prices(tickers: list[str]) -> dict:
"""Get current prices for all portfolio tickers."""
prices = {}
for ticker in tickers:
try:
stock = yf.Ticker(ticker)
info = stock.fast_info
prices[ticker] = {
"price": info.last_price,
"prev_close": info.previous_close,
"currency": info.currency,
}
except Exception:
prices[ticker] = {"price": None, "prev_close": None, "currency": "USD"}
return prices
def build_portfolio_report(portfolio: dict, current_prices: dict) -> pd.DataFrame:
"""Generate a portfolio performance report."""
rows = []
for ticker, (shares, avg_price) in portfolio.items():
price_data = current_prices.get(ticker, {})
current = price_data.get("price") or 0
prev_close = price_data.get("prev_close") or 0
currency = price_data.get("currency", "USD")
cost_basis = shares * avg_price
market_value = shares * current
gain_loss = market_value - cost_basis
gain_pct = (gain_loss / cost_basis * 100) if cost_basis else 0
day_change = ((current - prev_close) / prev_close * 100) if prev_close else 0
rows.append({
"ticker": ticker,
"shares": shares,
"avg_price": avg_price,
"current": round(current, 2),
"cost_basis": round(cost_basis, 2),
"market_value": round(market_value, 2),
"gain_loss": round(gain_loss, 2),
"gain_pct": round(gain_pct, 2),
"day_change_%": round(day_change, 2),
"currency": currency,
})
df = pd.DataFrame(rows)
total_cost = df["cost_basis"].sum()
total_value = df["market_value"].sum()
total_gain = df["gain_loss"].sum()
total_return = (total_gain / total_cost * 100) if total_cost else 0
print(f"\n{'═'*60}")
print(f" PORTFOLIO REPORT — {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print(f"{'═'*60}")
print(df.to_string(index=False))
print(f"\n Total invested: ${total_cost:>10,.2f}")
print(f" Market value: ${total_value:>10,.2f}")
print(f" Total P&L: ${total_gain:>10,.2f} ({total_return:+.2f}%)")
print(f"{'═'*60}")
return df
async def run_portfolio():
tickers = list(PORTFOLIO.keys())
current = await get_current_prices(tickers)
report = build_portfolio_report(PORTFOLIO, current)
report.to_csv("portfolio_report.csv", index=False)
asyncio.run(run_portfolio())Part 7: Financial Data Pipeline with PostgreSQL
For production-grade financial data storage with proper indexing:
from sqlalchemy import create_engine, text
import pandas as pd
ENGINE = create_engine("postgresql://user:password@localhost:5432/findata")
def init_financial_schema():
"""Create tables optimised for time-series financial data."""
with ENGINE.connect() as conn:
conn.execute(text("""
CREATE TABLE IF NOT EXISTS prices (
id BIGSERIAL PRIMARY KEY,
ticker TEXT NOT NULL,
date DATE NOT NULL,
open NUMERIC(12, 4),
high NUMERIC(12, 4),
low NUMERIC(12, 4),
close NUMERIC(12, 4),
adj_close NUMERIC(12, 4),
volume BIGINT,
source TEXT DEFAULT 'yfinance',
UNIQUE (ticker, date)
);
CREATE INDEX IF NOT EXISTS idx_prices_ticker_date
ON prices (ticker, date DESC);
CREATE TABLE IF NOT EXISTS fundamentals (
id BIGSERIAL PRIMARY KEY,
ticker TEXT NOT NULL,
scraped_at TIMESTAMPTZ DEFAULT NOW(),
market_cap BIGINT,
pe_ratio NUMERIC(10, 2),
eps NUMERIC(10, 4),
revenue BIGINT,
net_margin NUMERIC(8, 4),
data JSONB
);
"""))
conn.commit()
def upsert_prices(df: pd.DataFrame):
"""Upsert price history — insert new, update existing."""
df.to_sql("prices_staging", ENGINE, if_exists="replace", index=False)
with ENGINE.connect() as conn:
conn.execute(text("""
INSERT INTO prices (ticker, date, open, high, low, close, volume)
SELECT ticker, date, open, high, low, close, volume
FROM prices_staging
ON CONFLICT (ticker, date) DO UPDATE SET
close = EXCLUDED.close,
volume = EXCLUDED.volume
"""))
conn.commit()
print(f"Upserted {len(df)} price records.")Choosing Your Data Source
Source | Cost | Reliability | Best for |
|---|---|---|---|
yfinance | Free | Medium (scraping-based) | Prototyping, personal projects |
Alpha Vantage | Free / $50+/mo | High | Production stocks + forex + crypto |
Polygon.io | Free / $29+/mo | Very High | Real-time, high-frequency data |
Finnhub | Free / paid | High | News sentiment + fundamentals |
CoinGecko | Free / paid | High | Crypto data at any scale |
SEC EDGAR | Free (official) | Very High | Regulatory filings (US stocks) |
Custom scraper | Free (infra cost) | Low-Medium | Earnings calendars, news, niche data |
FAQ
Q: Is yfinance reliable enough for backtesting trading strategies? For historical data exploration, yes. For production trading decisions, no — the scraping-based approach means occasional data gaps, split-adjustment errors, and outages. Use Polygon.io or a proper data vendor for anything where data quality directly affects money.
Q: What's the best free crypto data source in 2026? CoinGecko's free API covers 10,000+ coins with current prices, market caps, historical OHLCV, and DeFi data at 30 requests/minute. No API key needed for basic usage.
Q: Can I use SEC EDGAR data commercially? Yes — SEC EDGAR data is explicitly in the public domain and freely available for commercial use. The only requirement is identifying your User-Agent with contact information in the request header.
Q: How do I get Indian stock market data (NSE/BSE)? Append .NS for NSE or .BO for BSE to ticker symbols in yfinance (e.g. RELIANCE.NS). For a proper Indian market API, NSE India has an unofficial endpoint, and vendors like Zerodha's Kite Connect offer official API access.
Summary
Task | Tool | Notes |
|---|---|---|
Quick stock history |
| Free, but can break |
Production stock data | Alpha Vantage / Polygon.io | Reliable, free tiers available |
Crypto prices | CoinGecko API | Free, 10k+ coins |
Financial news | Custom scraper | No API covers live news fully |
Earnings calendar | Custom scraper | Earnings Whispers, Zacks |
SEC filings | EDGAR JSON API | Official, free, legally clear |
Portfolio tracking |
| Combine with any price source |
Comments (0)
Login to post a comment.