Question 1

Why is MySQL FULLTEXT so slow?

Accepted Answer

MySQL FULLTEXT is slow because it stores indexes on disk using B-tree pages, requires disk I/O for every query, and uses uncompressed posting lists. MygramDB solves this with in-memory N-gram indexing delivering consistent sub-millisecond latency.

Question 2

How does MygramDB sync with MySQL?

Accepted Answer

MygramDB uses GTID-based binlog replication to sync with MySQL in real-time. It acts as a MySQL replica, receiving changes via the binary log. No ETL pipelines or manual sync needed. Write to MySQL as usual, MygramDB updates automatically.

Question 3

How much faster is MygramDB than MySQL FULLTEXT?

Accepted Answer

On a 1.1M Wikipedia article dataset, MygramDB delivers sub-millisecond search latency compared to MySQL FULLTEXT at 500ms-2.5s. COUNT queries are thousands of times faster. With verify_text enabled (v1.5.0), results are exact match with MySQL. Benchmarks are reproducible via make bench-up.

Question 4

Does MygramDB support Japanese/Chinese/Korean text?

Accepted Answer

Yes, MygramDB has excellent CJK support using ICU-based Unicode normalization and N-gram tokenization. It handles Japanese, Chinese, and Korean text perfectly without additional plugins or configuration.

Question 5

What is the difference between MygramDB and Elasticsearch?

Accepted Answer

MygramDB is a single-binary deployment with direct MySQL binlog sync, sub-millisecond latency, and low operational complexity. Elasticsearch offers distributed search and advanced features but requires cluster management, ETL pipelines, and JVM tuning. Choose MygramDB for simpler MySQL-based applications; Elasticsearch for large-scale distributed search.

Component	Memory
Index (n-gram map + posting lists)	~813 MB
DocumentStore + Text Store	~1.54 GB
Total RSS	~2.53 GB

Architecture

System Overview

Data Flow

Phase 1: Initial Snapshot

Phase 2: Live Replication

Phase 3: Query Processing

Thread Model

Persistence

Memory Layout

Architecture ​

System Overview ​

Data Flow ​

Phase 1: Initial Snapshot ​

Phase 2: Live Replication ​

Phase 3: Query Processing ​

Thread Model ​

Persistence ​

Memory Layout ​