Which topics does this article cover?

It highlights PostgreSQL full-text search, tsvector tsquery Node.js, Postgres search 2026, pg_trgm fuzzy search, ts_rank Postgres.

Full-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows

Q: What is "Full-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows" about?

A Postgres-powered search system using tsvector, tsquery, GIN indexes, ts_rank relevance scoring, ts_headline highlights, prefix autocomplete, and pg_trgm fuzzy matching for typo tolerance — complete with cursor pagination and hybrid full-text + fuzzy search scoring, without external search infrastructure.

The reflex when someone asks for search is to reach for Elasticsearch or Algolia. For a lot of products, that reflex costs you operational complexity, a monthly bill, and a synchronization problem — keeping Elasticsearch in sync with your Postgres database — that you will debug at the worst possible time.

Postgres has a mature, production-capable full-text search engine built in. It handles most product search requirements up to tens of millions of rows, supports ranking, multilingual stemming, fuzzy matching, and prefix search. If your search volume fits those bounds, this is the right tool.

This guide covers everything you need: GIN indexes, tsvector, tsquery, ranking, partial matches, and the query builder that ties it together.

How Postgres Full-Text Search Works

Postgres represents a searchable document as a tsvector — a sorted list of lexemes (normalized word forms) with their positions. A search query is a tsquery — a boolean expression of lexemes.

-- A tsvector normalizes words and strips stop words
SELECT to_tsvector('english', 'The quick brown foxes are jumping');
-- Result: 'brown':3 'fox':4 'jump':6 'quick':2
-- 'The' and 'are' are stop words, removed
-- 'foxes' → 'fox', 'jumping' → 'jump' (stemming)

-- A tsquery is what you search for
SELECT to_tsquery('english', 'foxes & jumping');
-- Result: 'fox' & 'jump'

-- Match check
SELECT to_tsvector('english', 'The quick brown foxes are jumping')
  @@ to_tsquery('english', 'foxes & jumping');
-- Result: true

The @@ operator is the match operator. The @> and <@ operators check containment. This is the foundation of everything that follows.

Setting Up Your Schema

Option 1: Generated Column (Recommended)

Store the tsvector as a generated column — Postgres maintains it automatically on insert and update.

-- Products table with full-text search
ALTER TABLE products
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')),        'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(category, '')),    'C')
  ) STORED;

-- GIN index on the generated column — makes search fast
CREATE INDEX CONCURRENTLY idx_products_search
  ON products USING GIN(search_vector);

setweight assigns importance to different fields. 'A' is highest weight, 'D' is lowest. When ranking results, name matches rank higher than description matches — which is the right behavior.

Option 2: Trigger-Maintained Column (For Complex Cases)

If your search vector needs data from joined tables or complex logic, use a trigger:

-- Add a regular tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Function to rebuild the vector
CREATE OR REPLACE FUNCTION update_product_search_vector()
RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.name, '')),        'A') ||
    setweight(to_tsvector('english', coalesce(NEW.sku, '')),         'A') ||
    setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(NEW.brand, '')),       'C') ||
    setweight(to_tsvector('english', coalesce(NEW.tags::text, '')),  'C');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_search_vector_update
  BEFORE INSERT OR UPDATE ON products
  FOR EACH ROW EXECUTE FUNCTION update_product_search_vector();

-- Backfill existing rows
UPDATE products SET name = name;

CREATE INDEX CONCURRENTLY idx_products_search ON products USING GIN(search_vector);

Basic Search Query

// src/services/searchService.ts

export async function searchProducts(params: {
  query:    string;
  tenantId: string;
  limit:    number;
  cursor?:  string;
}) {
  const { query, tenantId, limit } = params;

  // Sanitize and format the query
  // websearch_to_tsquery is more forgiving than to_tsquery
  // It handles: "exact phrase", word1 OR word2, -excluded
  // It does NOT throw on malformed input — safe for user input
  const result = await db.query(`
    SELECT
      id, name, description, price, stock,
      ts_rank(search_vector, query)           AS rank,
      ts_headline(
        'english',
        description,
        query,
        'MaxWords=20, MinWords=10, StartSel=<mark>, StopSel=</mark>'
      ) AS excerpt
    FROM
      products,
      websearch_to_tsquery('english', $1) query
    WHERE
      tenant_id    = $2
      AND active   = true
      AND search_vector @@ query
    ORDER BY rank DESC, name ASC
    LIMIT $3
  `, [query, tenantId, limit + 1]);

  return result.rows;
}

websearch_to_tsquery is the right function for user input. Unlike to_tsquery, it does not throw on malformed queries — a user typing a single apostrophe will not crash your endpoint.

ts_headline generates a snippet from the matching document with the search terms highlighted. The StartSel and StopSel parameters let you wrap matches in any HTML tags you want.

Prefix Search (Autocomplete)

Standard tsquery requires complete words. For autocomplete — matching as the user types — use prefix matching:

-- Standard: matches 'run' exactly
to_tsquery('english', 'run')  -- matches: run, runs, running (stemming)

-- Prefix: matches anything starting with 'run'
to_tsquery('english', 'run:*')  -- matches: run, runner, runaway, running

export async function autocomplete(params: {
  query:    string;
  tenantId: string;
  limit?:   number;
}) {
  const { query, tenantId, limit = 10 } = params;

  // Sanitize input — strip special characters that would break tsquery
  const sanitized = query
    .trim()
    .replace(/[^a-zA-Z0-9\s-]/g, '')
    .split(/\s+/)
    .filter(Boolean)
    .join(' & ');

  if (!sanitized) return [];

  // Add :* prefix to the last word for prefix matching
  const tsQuery = sanitized.replace(/(\S+)$/, '$1:*');

  const result = await db.query(`
    SELECT
      id, name, price,
      ts_rank(search_vector, to_tsquery('english', $1)) AS rank
    FROM products
    WHERE
      tenant_id    = $2
      AND active   = true
      AND search_vector @@ to_tsquery('english', $1)
    ORDER BY rank DESC, name ASC
    LIMIT $3
  `, [tsQuery, tenantId, limit]);

  return result.rows;
}

Fuzzy Matching with pg_trgm

Full-text search requires the user to spell words correctly. pg_trgm adds trigram-based fuzzy matching — finding words that are "similar" even with typos.

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Trigram index for fuzzy matching on name
CREATE INDEX CONCURRENTLY idx_products_name_trgm
  ON products USING GIN(name gin_trgm_ops);

// Combine full-text and fuzzy search
export async function searchWithFuzzy(params: {
  query:    string;
  tenantId: string;
  limit:    number;
}) {
  const { query, tenantId, limit } = params;

  const result = await db.query(`
    SELECT
      id, name, description, price,
      -- Score: full-text rank + trigram similarity, combined
      (
        COALESCE(ts_rank(search_vector, websearch_to_tsquery('english', $1)), 0) * 2 +
        similarity(name, $1)
      ) AS combined_score
    FROM products
    WHERE
      tenant_id  = $2
      AND active = true
      AND (
        -- Full-text match OR fuzzy name match
        search_vector @@ websearch_to_tsquery('english', $1)
        OR similarity(name, $1) > 0.3
      )
    ORDER BY combined_score DESC
    LIMIT $3
  `, [query, tenantId, limit]);

  return result.rows;
}

The similarity() function returns a 0–1 score. 0.3 is a reasonable threshold — lower catches more typos but more false positives.

The Complete Search Endpoint

// src/routes/search.ts
import { z } from 'zod';

const SearchSchema = z.object({
  q:      z.string().min(1).max(200).trim(),
  limit:  z.coerce.number().int().min(1).max(50).default(20),
  cursor: z.string().optional(),
  sort:   z.enum(['relevance', 'price_asc', 'price_desc', 'newest']).default('relevance'),
  fuzzy:  z.coerce.boolean().default(false),
});

router.get('/search', authenticate, async (req, res) => {
  const parsed = SearchSchema.safeParse(req.query);
  if (!parsed.success) {
    return res.status(400).json({ error: 'Invalid search parameters' });
  }

  const { q, limit, cursor, sort, fuzzy } = parsed.data;

  // Skip search for very short queries — return empty or popular items
  if (q.length < 2) {
    return res.json({ data: [], pagination: { hasNextPage: false } });
  }

  try {
    const results = fuzzy
      ? await searchWithFuzzy({ query: q, tenantId: req.tenant.id, limit: limit + 1 })
      : await searchProducts({ query: q, tenantId: req.tenant.id, limit: limit + 1, cursor });

    const hasNextPage = results.length > limit;
    if (hasNextPage) results.pop();

    res.json({
      data: results,
      query: q,
      pagination: { hasNextPage },
    });

  } catch (err: any) {
    // If tsquery throws (malformed input that got through sanitization)
    // return empty results rather than a 500
    if (err.code === '42601') {
      return res.json({ data: [], query: q, pagination: { hasNextPage: false } });
    }
    throw err;
  }
});

// Autocomplete endpoint — fast, prefix-based
router.get('/search/suggest', authenticate, async (req, res) => {
  const q = z.string().min(2).max(100).trim().safeParse(req.query.q);
  if (!q.success) return res.json([]);

  const suggestions = await autocomplete({
    query:    q.data,
    tenantId: req.tenant.id,
    limit:    8,
  });

  res.json(suggestions.map(s => ({ id: s.id, name: s.name, price: s.price })));
});

When to Move to Elasticsearch

Postgres full-text search is genuinely capable for most product requirements. Consider moving when:

Your search index exceeds ~50 million rows and query time degrades noticeably
You need faceted search with real-time facet counts (category filter counts that update as you type)
You need multi-language stemming across many languages in the same index
You need ML-based semantic/vector search (though Postgres pgvector extension now handles this too)
You need search across multiple tables/indices with different schemas simultaneously

For everything else — product search, user search, content search, orders, documents — Postgres handles it, and you get one less system to operate.

This guide covers everything you need: GIN indexes, tsvector, tsquery, ranking, partial matches, and the query builder that ties it together.

How Postgres Full-Text Search Works

Postgres represents a searchable document as a tsvector — a sorted list of lexemes (normalized word forms) with their positions. A search query is a tsquery — a boolean expression of lexemes.

-- A tsvector normalizes words and strips stop words
SELECT to_tsvector('english', 'The quick brown foxes are jumping');
-- Result: 'brown':3 'fox':4 'jump':6 'quick':2
-- 'The' and 'are' are stop words, removed
-- 'foxes' → 'fox', 'jumping' → 'jump' (stemming)

-- A tsquery is what you search for
SELECT to_tsquery('english', 'foxes & jumping');
-- Result: 'fox' & 'jump'

-- Match check
SELECT to_tsvector('english', 'The quick brown foxes are jumping')
  @@ to_tsquery('english', 'foxes & jumping');
-- Result: true

The @@ operator is the match operator. The @> and <@ operators check containment. This is the foundation of everything that follows.

Setting Up Your Schema

Option 1: Generated Column (Recommended)

Store the tsvector as a generated column — Postgres maintains it automatically on insert and update.

-- Products table with full-text search
ALTER TABLE products
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')),        'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(category, '')),    'C')
  ) STORED;

-- GIN index on the generated column — makes search fast
CREATE INDEX CONCURRENTLY idx_products_search
  ON products USING GIN(search_vector);

setweight assigns importance to different fields. 'A' is highest weight, 'D' is lowest. When ranking results, name matches rank higher than description matches — which is the right behavior.

Option 2: Trigger-Maintained Column (For Complex Cases)

If your search vector needs data from joined tables or complex logic, use a trigger:

-- Add a regular tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Function to rebuild the vector
CREATE OR REPLACE FUNCTION update_product_search_vector()
RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.name, '')),        'A') ||
    setweight(to_tsvector('english', coalesce(NEW.sku, '')),         'A') ||
    setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(NEW.brand, '')),       'C') ||
    setweight(to_tsvector('english', coalesce(NEW.tags::text, '')),  'C');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_search_vector_update
  BEFORE INSERT OR UPDATE ON products
  FOR EACH ROW EXECUTE FUNCTION update_product_search_vector();

-- Backfill existing rows
UPDATE products SET name = name;

CREATE INDEX CONCURRENTLY idx_products_search ON products USING GIN(search_vector);

Basic Search Query

// src/services/searchService.ts

export async function searchProducts(params: {
  query:    string;
  tenantId: string;
  limit:    number;
  cursor?:  string;
}) {
  const { query, tenantId, limit } = params;

  // Sanitize and format the query
  // websearch_to_tsquery is more forgiving than to_tsquery
  // It handles: "exact phrase", word1 OR word2, -excluded
  // It does NOT throw on malformed input — safe for user input
  const result = await db.query(`
    SELECT
      id, name, description, price, stock,
      ts_rank(search_vector, query)           AS rank,
      ts_headline(
        'english',
        description,
        query,
        'MaxWords=20, MinWords=10, StartSel=<mark>, StopSel=</mark>'
      ) AS excerpt
    FROM
      products,
      websearch_to_tsquery('english', $1) query
    WHERE
      tenant_id    = $2
      AND active   = true
      AND search_vector @@ query
    ORDER BY rank DESC, name ASC
    LIMIT $3
  `, [query, tenantId, limit + 1]);

  return result.rows;
}

websearch_to_tsquery is the right function for user input. Unlike to_tsquery, it does not throw on malformed queries — a user typing a single apostrophe will not crash your endpoint.

ts_headline generates a snippet from the matching document with the search terms highlighted. The StartSel and StopSel parameters let you wrap matches in any HTML tags you want.

Prefix Search (Autocomplete)

Standard tsquery requires complete words. For autocomplete — matching as the user types — use prefix matching:

-- Standard: matches 'run' exactly
to_tsquery('english', 'run')  -- matches: run, runs, running (stemming)

-- Prefix: matches anything starting with 'run'
to_tsquery('english', 'run:*')  -- matches: run, runner, runaway, running

export async function autocomplete(params: {
  query:    string;
  tenantId: string;
  limit?:   number;
}) {
  const { query, tenantId, limit = 10 } = params;

  // Sanitize input — strip special characters that would break tsquery
  const sanitized = query
    .trim()
    .replace(/[^a-zA-Z0-9\s-]/g, '')
    .split(/\s+/)
    .filter(Boolean)
    .join(' & ');

  if (!sanitized) return [];

  // Add :* prefix to the last word for prefix matching
  const tsQuery = sanitized.replace(/(\S+)$/, '$1:*');

  const result = await db.query(`
    SELECT
      id, name, price,
      ts_rank(search_vector, to_tsquery('english', $1)) AS rank
    FROM products
    WHERE
      tenant_id    = $2
      AND active   = true
      AND search_vector @@ to_tsquery('english', $1)
    ORDER BY rank DESC, name ASC
    LIMIT $3
  `, [tsQuery, tenantId, limit]);

  return result.rows;
}

Fuzzy Matching with pg_trgm

Full-text search requires the user to spell words correctly. pg_trgm adds trigram-based fuzzy matching — finding words that are "similar" even with typos.

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Trigram index for fuzzy matching on name
CREATE INDEX CONCURRENTLY idx_products_name_trgm
  ON products USING GIN(name gin_trgm_ops);

// Combine full-text and fuzzy search
export async function searchWithFuzzy(params: {
  query:    string;
  tenantId: string;
  limit:    number;
}) {
  const { query, tenantId, limit } = params;

  const result = await db.query(`
    SELECT
      id, name, description, price,
      -- Score: full-text rank + trigram similarity, combined
      (
        COALESCE(ts_rank(search_vector, websearch_to_tsquery('english', $1)), 0) * 2 +
        similarity(name, $1)
      ) AS combined_score
    FROM products
    WHERE
      tenant_id  = $2
      AND active = true
      AND (
        -- Full-text match OR fuzzy name match
        search_vector @@ websearch_to_tsquery('english', $1)
        OR similarity(name, $1) > 0.3
      )
    ORDER BY combined_score DESC
    LIMIT $3
  `, [query, tenantId, limit]);

  return result.rows;
}

The similarity() function returns a 0–1 score. 0.3 is a reasonable threshold — lower catches more typos but more false positives.

The Complete Search Endpoint

// src/routes/search.ts
import { z } from 'zod';

const SearchSchema = z.object({
  q:      z.string().min(1).max(200).trim(),
  limit:  z.coerce.number().int().min(1).max(50).default(20),
  cursor: z.string().optional(),
  sort:   z.enum(['relevance', 'price_asc', 'price_desc', 'newest']).default('relevance'),
  fuzzy:  z.coerce.boolean().default(false),
});

router.get('/search', authenticate, async (req, res) => {
  const parsed = SearchSchema.safeParse(req.query);
  if (!parsed.success) {
    return res.status(400).json({ error: 'Invalid search parameters' });
  }

  const { q, limit, cursor, sort, fuzzy } = parsed.data;

  // Skip search for very short queries — return empty or popular items
  if (q.length < 2) {
    return res.json({ data: [], pagination: { hasNextPage: false } });
  }

  try {
    const results = fuzzy
      ? await searchWithFuzzy({ query: q, tenantId: req.tenant.id, limit: limit + 1 })
      : await searchProducts({ query: q, tenantId: req.tenant.id, limit: limit + 1, cursor });

    const hasNextPage = results.length > limit;
    if (hasNextPage) results.pop();

    res.json({
      data: results,
      query: q,
      pagination: { hasNextPage },
    });

  } catch (err: any) {
    // If tsquery throws (malformed input that got through sanitization)
    // return empty results rather than a 500
    if (err.code === '42601') {
      return res.json({ data: [], query: q, pagination: { hasNextPage: false } });
    }
    throw err;
  }
});

// Autocomplete endpoint — fast, prefix-based
router.get('/search/suggest', authenticate, async (req, res) => {
  const q = z.string().min(2).max(100).trim().safeParse(req.query.q);
  if (!q.success) return res.json([]);

  const suggestions = await autocomplete({
    query:    q.data,
    tenantId: req.tenant.id,
    limit:    8,
  });

  res.json(suggestions.map(s => ({ id: s.id, name: s.name, price: s.price })));
});

When to Move to Elasticsearch

Postgres full-text search is genuinely capable for most product requirements. Consider moving when:

Your search index exceeds ~50 million rows and query time degrades noticeably
You need faceted search with real-time facet counts (category filter counts that update as you type)
You need multi-language stemming across many languages in the same index
You need ML-based semantic/vector search (though Postgres pgvector extension now handles this too)
You need search across multiple tables/indices with different schemas simultaneously

For everything else — product search, user search, content search, orders, documents — Postgres handles it, and you get one less system to operate.

Full-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows

How Postgres Full-Text Search Works

Setting Up Your Schema

Option 1: Generated Column (Recommended)

Option 2: Trigger-Maintained Column (For Complex Cases)

Basic Search Query

Prefix Search (Autocomplete)

Fuzzy Matching with pg_trgm

The Complete Search Endpoint

When to Move to Elasticsearch

ZyVOP

Comments (0)

Full-Text Search in PostgreSQL: Skip Elasticsearch for the First 10 Million Rows

How Postgres Full-Text Search Works

Setting Up Your Schema

Option 1: Generated Column (Recommended)

Option 2: Trigger-Maintained Column (For Complex Cases)

Basic Search Query

Prefix Search (Autocomplete)

Fuzzy Matching with pg_trgm

The Complete Search Endpoint

When to Move to Elasticsearch

ZyVOP

Comments (0)

Popular Tags