Query Analysis & Clustering

Discover hidden patterns in user search behavior by automatically grouping similar queries.

From Raw Queries to Actionable Insights

A raw query log is full of variations, typos, and different phrasing for the same intent. Our query clustering feature uses a powerful and efficient algorithm to cut through the noise and group semantically similar queries, such as "i phone" and "iphone," or "mangetout" and "mange tout".

The Clustering Pipeline

Bigram Transformation

Each query is broken down into a set of bigrams (overlapping pairs of characters). For example, “apple” becomes {“ap”, “pp”, “pl”, “le”}. This helps find similarities even with typos.

MinHash Fingerprinting

A compact "fingerprint" is calculated for each set of bigrams using the MinHash algorithm. Strings with similar bigram sets will produce very similar fingerprints.

Locality-Sensitive Hashing (LSH)

To avoid comparing every query to every other query, we use LSH. This technique places similar fingerprints into the same "buckets" with high probability, dramatically speeding up the process. It's like a significantly faster version of vector search for this specific task.

Final Clustering

The system then runs a more precise (but slower) fuzzy matching comparison only on the small groups of candidates identified by LSH. The result is a clean set of clusters, each containing similar queries.

Query Analysis & Clustering

From Raw Queries to Actionable Insights

The Clustering Pipeline

Bigram Transformation

MinHash Fingerprinting

Locality-Sensitive Hashing (LSH)

Final Clustering

Powerful Features to Drive Better Search

A/B Testing Platform

Comprehensive Metrics

Virtual Search Assessor

Query Analysis & Clustering