Schedule - PGConf.EU 2013

Next generation of GIN

Date: 2013-10-31
Time: 13:40–14:30
Room: Pembroke
Level: Intermediate

This talk presents set of advances which significantly improves GIN index. Their primary target is to make full-text search (FTS) in PostgreSQL to be as fast as it's in stand-alone solutions such as Sphinx and Solr. However it has many other applications.

The set of advances is following: * Compression of item pointers in index * Store additional information in posting trees and posting lists * Fast scan: skip parts of posting trees during scan * Sorting result in index

These advances in GIN leads to following benefits to GIN indexes: * Indexes will become about 2 time smaller without any work in opclass. * Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams. * Fast scan dramatically GIN search in "frequest_term & rare_term" case. * Usage of additional information for sorting in index accelerates ranking in FTS and dramatically reduces its IO.

We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine.

Speaker

Alexander Korotkov
Oleg Bartunov

Platinum Sponsors

2ndQuadrant
EnterpriseDB

Gold Sponsors

Cybertec
Heroku

Silver Sponsors

Dalibo
Gilt
VMware
Zalando
EngineYard
Servoy
Vertabelo