ZyVOP Logo
Content That Connects
SeriesCategoriesTags
ZyVOP Logo
Content That Connects

Empowering developers and creators with cutting-edge insights, comprehensive tutorials, and innovative solutions for the digital future.

Content

  • Tags
  • Write Article

Company

  • About Us
  • Contact

Connect

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • DMCA Policy
  • Code of Conduct

© 2026 ZyVOP. Crafted with care for the developer community.

Made with ❤️ by the ZyVOP team
All systems operational
HomeFull-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows

Full-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows

Build fast, typo-tolerant search in PostgreSQL using full-text search, relevance ranking, autocomplete, and fuzzy matching — without Elasticsearch.

#PostgreSQL full-text search#tsvector tsquery Node.js#Postgres search 2026#pg_trgm fuzzy search#ts_rank Postgres#GIN index full-text search#websearch_to_tsquery#Postgres vs Elasticsearch
Z
ZyVOP

Senior Developer

May 26, 2026
7 min read
4 views
Full-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows

The reflex when someone asks for search is to reach for Elasticsearch or Algolia. For a lot of products, that reflex costs you operational complexity, a monthly bill, and a synchronization problem — keeping Elasticsearch in sync with your Postgres database — that you will debug at the worst possible time.

Postgres has a mature, production-capable full-text search engine built in. It handles most product search requirements up to tens of millions of rows, supports ranking, multilingual stemming, fuzzy matching, and prefix search. If your search volume fits those bounds, this is the right tool.

This guide covers everything you need: GIN indexes, tsvector, tsquery, ranking, partial matches, and the query builder that ties it together.


How Postgres Full-Text Search Works

Postgres represents a searchable document as a tsvector — a sorted list of lexemes (normalized word forms) with their positions. A search query is a tsquery — a boolean expression of lexemes.

-- A tsvector normalizes words and strips stop words
SELECT to_tsvector('english', 'The quick brown foxes are jumping');
-- Result: 'brown':3 'fox':4 'jump':6 'quick':2
-- 'The' and 'are' are stop words, removed
-- 'foxes' → 'fox', 'jumping' → 'jump' (stemming)

-- A tsquery is what you search for
SELECT to_tsquery('english', 'foxes & jumping');
-- Result: 'fox' & 'jump'

-- Match check
SELECT to_tsvector('english', 'The quick brown foxes are jumping')
  @@ to_tsquery('english', 'foxes & jumping');
-- Result: true

The @@ operator is the match operator. The @> and <@ operators check containment. This is the foundation of everything that follows.


Setting Up Your Schema

Option 1: Generated Column (Recommended)

Store the tsvector as a generated column — Postgres maintains it automatically on insert and update.

-- Products table with full-text search
ALTER TABLE products
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')),        'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(category, '')),    'C')
  ) STORED;

-- GIN index on the generated column — makes search fast
CREATE INDEX CONCURRENTLY idx_products_search
  ON products USING GIN(search_vector);

setweight assigns importance to different fields. 'A' is highest weight, 'D' is lowest. When ranking results, name matches rank higher than description matches — which is the right behavior.

Option 2: Trigger-Maintained Column (For Complex Cases)

If your search vector needs data from joined tables or complex logic, use a trigger:

-- Add a regular tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Function to rebuild the vector
CREATE OR REPLACE FUNCTION update_product_search_vector()
RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.name, '')),        'A') ||
    setweight(to_tsvector('english', coalesce(NEW.sku, '')),         'A') ||
    setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(NEW.brand, '')),       'C') ||
    setweight(to_tsvector('english', coalesce(NEW.tags::text, '')),  'C');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_search_vector_update
  BEFORE INSERT OR UPDATE ON products
  FOR EACH ROW EXECUTE FUNCTION update_product_search_vector();

-- Backfill existing rows
UPDATE products SET name = name;

CREATE INDEX CONCURRENTLY idx_products_search ON products USING GIN(search_vector);

Basic Search Query

// src/services/searchService.ts

export async function searchProducts(params: {
  query:    string;
  tenantId: string;
  limit:    number;
  cursor?:  string;
}) {
  const { query, tenantId, limit } = params;

  // Sanitize and format the query
  // websearch_to_tsquery is more forgiving than to_tsquery
  // It handles: "exact phrase", word1 OR word2, -excluded
  // It does NOT throw on malformed input — safe for user input
  const result = await db.query(`
    SELECT
      id, name, description, price, stock,
      ts_rank(search_vector, query)           AS rank,
      ts_headline(
        'english',
        description,
        query,
        'MaxWords=20, MinWords=10, StartSel=<mark>, StopSel=</mark>'
      ) AS excerpt
    FROM
      products,
      websearch_to_tsquery('english', $1) query
    WHERE
      tenant_id    = $2
      AND active   = true
      AND search_vector @@ query
    ORDER BY rank DESC, name ASC
    LIMIT $3
  `, [query, tenantId, limit + 1]);

  return result.rows;
}

websearch_to_tsquery is the right function for user input. Unlike to_tsquery, it does not throw on malformed queries — a user typing a single apostrophe will not crash your endpoint.

ts_headline generates a snippet from the matching document with the search terms highlighted. The StartSel and StopSel parameters let you wrap matches in any HTML tags you want.


Prefix Search (Autocomplete)

Standard tsquery requires complete words. For autocomplete — matching as the user types — use prefix matching:

-- Standard: matches 'run' exactly
to_tsquery('english', 'run')  -- matches: run, runs, running (stemming)

-- Prefix: matches anything starting with 'run'
to_tsquery('english', 'run:*')  -- matches: run, runner, runaway, running
export async function autocomplete(params: {
  query:    string;
  tenantId: string;
  limit?:   number;
}) {
  const { query, tenantId, limit = 10 } = params;

  // Sanitize input — strip special characters that would break tsquery
  const sanitized = query
    .trim()
    .replace(/[^a-zA-Z0-9\s-]/g, '')
    .split(/\s+/)
    .filter(Boolean)
    .join(' & ');

  if (!sanitized) return [];

  // Add :* prefix to the last word for prefix matching
  const tsQuery = sanitized.replace(/(\S+)$/, '$1:*');

  const result = await db.query(`
    SELECT
      id, name, price,
      ts_rank(search_vector, to_tsquery('english', $1)) AS rank
    FROM products
    WHERE
      tenant_id    = $2
      AND active   = true
      AND search_vector @@ to_tsquery('english', $1)
    ORDER BY rank DESC, name ASC
    LIMIT $3
  `, [tsQuery, tenantId, limit]);

  return result.rows;
}

Fuzzy Matching with pg_trgm

Full-text search requires the user to spell words correctly. pg_trgm adds trigram-based fuzzy matching — finding words that are "similar" even with typos.

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Trigram index for fuzzy matching on name
CREATE INDEX CONCURRENTLY idx_products_name_trgm
  ON products USING GIN(name gin_trgm_ops);
// Combine full-text and fuzzy search
export async function searchWithFuzzy(params: {
  query:    string;
  tenantId: string;
  limit:    number;
}) {
  const { query, tenantId, limit } = params;

  const result = await db.query(`
    SELECT
      id, name, description, price,
      -- Score: full-text rank + trigram similarity, combined
      (
        COALESCE(ts_rank(search_vector, websearch_to_tsquery('english', $1)), 0) * 2 +
        similarity(name, $1)
      ) AS combined_score
    FROM products
    WHERE
      tenant_id  = $2
      AND active = true
      AND (
        -- Full-text match OR fuzzy name match
        search_vector @@ websearch_to_tsquery('english', $1)
        OR similarity(name, $1) > 0.3
      )
    ORDER BY combined_score DESC
    LIMIT $3
  `, [query, tenantId, limit]);

  return result.rows;
}

The similarity() function returns a 0–1 score. 0.3 is a reasonable threshold — lower catches more typos but more false positives.


The Complete Search Endpoint

// src/routes/search.ts
import { z } from 'zod';

const SearchSchema = z.object({
  q:      z.string().min(1).max(200).trim(),
  limit:  z.coerce.number().int().min(1).max(50).default(20),
  cursor: z.string().optional(),
  sort:   z.enum(['relevance', 'price_asc', 'price_desc', 'newest']).default('relevance'),
  fuzzy:  z.coerce.boolean().default(false),
});

router.get('/search', authenticate, async (req, res) => {
  const parsed = SearchSchema.safeParse(req.query);
  if (!parsed.success) {
    return res.status(400).json({ error: 'Invalid search parameters' });
  }

  const { q, limit, cursor, sort, fuzzy } = parsed.data;

  // Skip search for very short queries — return empty or popular items
  if (q.length < 2) {
    return res.json({ data: [], pagination: { hasNextPage: false } });
  }

  try {
    const results = fuzzy
      ? await searchWithFuzzy({ query: q, tenantId: req.tenant.id, limit: limit + 1 })
      : await searchProducts({ query: q, tenantId: req.tenant.id, limit: limit + 1, cursor });

    const hasNextPage = results.length > limit;
    if (hasNextPage) results.pop();

    res.json({
      data: results,
      query: q,
      pagination: { hasNextPage },
    });

  } catch (err: any) {
    // If tsquery throws (malformed input that got through sanitization)
    // return empty results rather than a 500
    if (err.code === '42601') {
      return res.json({ data: [], query: q, pagination: { hasNextPage: false } });
    }
    throw err;
  }
});

// Autocomplete endpoint — fast, prefix-based
router.get('/search/suggest', authenticate, async (req, res) => {
  const q = z.string().min(2).max(100).trim().safeParse(req.query.q);
  if (!q.success) return res.json([]);

  const suggestions = await autocomplete({
    query:    q.data,
    tenantId: req.tenant.id,
    limit:    8,
  });

  res.json(suggestions.map(s => ({ id: s.id, name: s.name, price: s.price })));
});

When to Move to Elasticsearch

Postgres full-text search is genuinely capable for most product requirements. Consider moving when:

  • Your search index exceeds ~50 million rows and query time degrades noticeably

  • You need faceted search with real-time facet counts (category filter counts that update as you type)

  • You need multi-language stemming across many languages in the same index

  • You need ML-based semantic/vector search (though Postgres pgvector extension now handles this too)

  • You need search across multiple tables/indices with different schemas simultaneously

For everything else — product search, user search, content search, orders, documents — Postgres handles it, and you get one less system to operate.

Z

ZyVOP

Passionate developer sharing knowledge about modern web technologies and best practices.

Comments (0)

Login to post a comment.

Stay Updated

Get the latest articles delivered to your inbox.

We respect your privacy. Unsubscribe anytime.

Popular Tags

#.env.example Node.js#0x profiling#12-factor#AI agents#AI code security#AI coding tools 2026#AI-assisted development#AI-generated vulnerabilities#ALTER TABLE no lock#API Design