Why is MySQL FULLTEXT so slow?

MySQL FULLTEXT is slow because it stores indexes on disk using B-tree pages, requires disk I/O for every query, and uses uncompressed posting lists. MygramDB solves this with in-memory N-gram indexing delivering consistent sub-millisecond latency.

How does MygramDB sync with MySQL?

MygramDB uses GTID-based binlog replication to sync with MySQL in real-time. It acts as a MySQL replica, receiving changes via the binary log. No ETL pipelines or manual sync needed. Write to MySQL as usual, MygramDB updates automatically.

How much faster is MygramDB than MySQL FULLTEXT?

On a 1.1M Wikipedia article dataset, MygramDB delivers sub-millisecond search latency compared to MySQL FULLTEXT at 500ms-2.5s. COUNT queries are thousands of times faster. With verify_text enabled (v1.5.0), results are exact match with MySQL. Benchmarks are reproducible via make bench-up.

Does MygramDB support Japanese/Chinese/Korean text?

Yes, MygramDB has excellent CJK support using ICU-based Unicode normalization and N-gram tokenization. It handles Japanese, Chinese, and Korean text perfectly without additional plugins or configuration.

What is the difference between MygramDB and Elasticsearch?

MygramDB is a single-binary deployment with direct MySQL binlog sync, sub-millisecond latency, and low operational complexity. Elasticsearch offers distributed search and advanced features but requires cluster management, ETL pipelines, and JVM tuning. Choose MygramDB for simpler MySQL-based applications; Elasticsearch for large-scale distributed search.

HTTP API ガイド

MygramDB は、WebアプリケーションやHTTPクライアントとの統合を容易にするRESTful JSON APIを提供します。

TCP APIとの違い

HTTP APIは検索・カウント・ファセット・ドキュメント取得をJSONで扱うための入口です。SYNC、DUMP、SET などの運用コマンドはTCP/CLI側で実行します。

設定

config.yaml でHTTPサーバーを有効化します：

yaml

api:
  tcp:
    bind: "127.0.0.1"
    port: 11016
  http:
    enable: true          # HTTPサーバーを有効化
    bind: "127.0.0.1"     # バインドアドレス（デフォルト: ローカルホストのみ）
    port: 8080            # HTTPポート（デフォルト: 8080）
    enable_cors: false    # ブラウザ公開時のみ有効化
    cors_allow_origin: "" # CORS有効時は許可するOriginを指定

公開時のセキュリティ

デフォルトでは TCP/HTTP サーバーはループバックにのみバインドします。公開する必要がある場合は api.tcp.bind / api.http.bind を明示的に設定し、network.allow_cidrs で許可 IP を厳密に指定し、MygramDB の前段に TLS/認証付きリバースプロキシ等を配置してください。CORS はデフォルトで無効であり、信頼できるOriginに限定して使用します。

用語補足

CORS はブラウザが別ドメインのAPIを呼ぶための仕組みです。サーバー間通信では通常不要です。ブラウザから直接MygramDBを呼ぶ場合だけ、信頼できるOriginに限定して有効化してください。

API エンドポイント

すべてのレスポンスは Content-Type: application/json のJSON形式です。テーブルルートは単一の識別子セグメント /tables/{identity} を使用します。{identity} は DB 修飾名 `database.table` です（たとえば app_db データベースの articles テーブルは /tables/app_db.articles と指定します）。単一データベース構成（設定されている個別のデータベースが1つだけ）の場合は、修飾なしの table 名もそのデータベースに解決されます。たとえば /tables/articles です。2つ以上のデータベースにまたがる構成の場合にのみ修飾が必要となり、修飾なしの名前は曖昧として拒否されます。

v1.7.0 で旧ルートの /{table}/search、/{table}/count、/{table}/{primary_key} は削除されました。テーブル操作はすべて /tables/{identity}/... を使ってください。

identity の考え方

identity は「検索対象テーブルを一意に表す名前」です。単一DBでは articles、複数DBでは app_db.articles のように指定します。

POST /tables/{identity}/search

フィルタとページネーションを使用した全文検索。

リクエスト:

http

POST /tables/app_db.threads/search HTTP/1.1
Content-Type: application/json

{
  "q": "breaking news AND tech NOT old",
  "filters": {
    "status": 1,
    "category": "tech"
  },
  "limit": 50,
  "offset": 0
}

リクエストボディパラメータ:

フィールド	型	必須	説明
`q`	string	はい	AND/NOT演算子を使用した検索クエリ
`filters`	object	いいえ	フィルタ条件（カラム: 値のペア）
`limit`	integer	いいえ	返す最大結果数（デフォルト: 100、最大: 1000）
`offset`	integer	いいえ	スキップする結果数（デフォルト: 0）
`sort`	object	いいえ	ソート設定（例: `{"column": "_score", "order": "DESC"}`）
`highlight`	object	いいえ	ハイライト設定（下記参照）
`fuzzy`	integer	いいえ	あいまい検索の編集距離（`1` または `2`）

q に制御句を混ぜない

HTTP APIでは LIMIT、FILTER、SORT などを q 文字列に混ぜず、専用のJSONフィールドで指定してください。これはHTTPリクエストを安全に検証し、TCPプロトコルとの挙動差を減らすためです。

クエリ構文:

シンプル検索: "keyword"
引用符付きフレーズ: "\"breaking news\"" （語句の並びを保ったフレーズを検索）
AND演算子: "tech AND AI AND machine learning"
OR演算子: "(mysql OR postgresql) AND performance"
NOT演算子: "news NOT sports"
組み合わせ: "tech AND AI NOT old"

レスポンス (200 OK):

json

{
  "count": 2,
  "limit": 50,
  "offset": 0,
  "results": [
    {
      "doc_id": 101,
      "primary_key": "article_101",
      "filters": {
        "status": 1,
        "category": "tech"
      }
    },
    {
      "doc_id": 205,
      "primary_key": "article_205",
      "filters": {
        "status": 1,
        "category": "tech"
      }
    }
  ]
}

エラーレスポンス (400 Bad Request):

json

{
  "error": "Missing required field: q"
}

エラーレスポンス (500 Internal Server Error):

json

{
  "error": "Internal error: database connection failed"
}

ハイライト設定:

フィールド	型	デフォルト	説明
`open_tag`	string	`<em>`	ハイライトする語句の開始タグ
`close_tag`	string	`</em>`	ハイライトする語句の終了タグ
`snippet_length`	integer	100	スニペットあたりの最大コードポイント数（1-10,000）
`max_fragments`	integer	3	スニペットフラグメントの最大数（1-100）

ハイライト付き検索の例:

http

POST /tables/app_db.articles/search HTTP/1.1
Content-Type: application/json

{
  "q": "機械学習",
  "highlight": {
    "open_tag": "<strong>",
    "close_tag": "</strong>",
    "snippet_length": 150,
    "max_fragments": 5
  },
  "sort": {"column": "_score", "order": "DESC"},
  "limit": 10
}

POST /tables/{identity}/count

全文クエリと任意のフィルタに一致するドキュメント数をカウントします。COUNT は合計件数のみを返します。ページネーション、ハイライト、あいまい検索、_score ソートは検索専用の機能であり、このエンドポイントでは拒否されます。

リクエスト:

http

POST /tables/app_db.threads/count HTTP/1.1
Content-Type: application/json

{
  "q": "breaking news AND tech",
  "filters": {
    "status": 1
  }
}

レスポンス (200 OK):

json

{
  "count": 42
}

POST /tables/{identity}/facet

カラムのファセットバケットを返します。クエリとフィルタでスコープを限定することもできます。

リクエストボディパラメータ:

フィールド	型	必須	説明
`column`	string	はい	集計対象のフィルタカラム
`q`	string	いいえ	ファセット対象を絞る検索クエリ
`filters`	object	いいえ	追加のフィルタ条件
`limit`	integer	いいえ	返すファセットバケット数の上限（1-1000）

offset、sort、highlight、fuzzy は FACET ではサポートされず、指定するとエラーになります。

リクエスト:

http

POST /tables/app_db.articles/facet HTTP/1.1
Content-Type: application/json

{
  "column": "category",
  "q": "database OR mysql",
  "filters": {
    "status": 1
  },
  "limit": 10
}

レスポンス (200 OK):

json

{
  "column": "category",
  "count": 2,
  "facets": [
    {"value": "tech", "count": 15},
    {"value": "ops", "count": 7}
  ]
}

HTTP API の公開範囲

HTTP API は検索、カウント、ドキュメント取得、ヘルス、メトリクス、レプリケーションステータス、機密値を伏せた設定の参照を公開します。 SET、SHOW VARIABLES、SYNC、DUMP などの管理コマンドは引き続き TCP/CLI プロトコルで利用でき、HTTP ルートとしては公開されません。

GET /tables/{identity}/

プライマリキーで単一のドキュメントを取得します。レスポンスには内部 doc_id も含まれます。

リクエスト:

http

GET /tables/app_db.threads/thread_12345 HTTP/1.1

レスポンス (200 OK):

json

{
  "doc_id": 12345,
  "primary_key": "thread_12345",
  "filters": {
    "status": 1,
    "user_id": 42
  }
}

エラーレスポンス (404 Not Found):

json

{
  "error": "Document not found"
}

GET /info

サーバー情報と詳細統計（Redis風の監視情報）。

/info の用途

/info は人が読んだり、簡単な監視スクリプトからJSONとして扱ったりする用途に向いています。Prometheus/Grafanaで継続監視する場合は /metrics を使ってください。

リクエスト:

http

GET /info HTTP/1.1

レスポンス (200 OK):

json

{
  "server": "MygramDB",
  "version": "1.0.0",
  "uptime_seconds": 3600,
  "total_requests": 15000,
  "total_commands_processed": 15000,
  "memory": {
    "used_memory_bytes": 524288000,
    "used_memory_human": "500.00 MB",
    "peak_memory_bytes": 629145600,
    "peak_memory_human": "600.00 MB",
    "used_memory_index": "400.00 MB",
    "used_memory_documents": "100.00 MB",
    "total_system_memory": 17179869184,
    "total_system_memory_human": "16.00 GB",
    "available_system_memory": 9126805504,
    "available_system_memory_human": "8.50 GB",
    "system_memory_usage_ratio": 0.47,
    "process_rss": 545259520,
    "process_rss_human": "520.00 MB",
    "process_rss_peak": 629145600,
    "process_rss_peak_human": "600.00 MB",
    "memory_health": "HEALTHY"
  },
  "index": {
    "total_documents": 1000000,
    "total_terms": 1500000,
    "total_postings": 5000000,
    "avg_postings_per_term": 3.33,
    "delta_encoded_lists": 1200000,
    "roaring_bitmap_lists": 300000
  },
  "tables": {
    "products": {
      "documents": 500000,
      "terms": 800000,
      "postings": 2500000,
      "ngram_size": 2,
      "memory_bytes": 262144000,
      "memory_human": "250.00 MB"
    },
    "users": {
      "documents": 500000,
      "terms": 700000,
      "postings": 2500000,
      "ngram_size": 1,
      "memory_bytes": 262144000,
      "memory_human": "250.00 MB"
    }
  }
}

レスポンスフィールド:

フィールド	説明
`server`	サーバー名（MygramDB）
`version`	サーバーバージョン
`uptime_seconds`	サーバー稼働時間（秒）
`total_requests`	処理されたリクエストの総数
`total_commands_processed`	処理されたコマンドの総数
メモリ（アプリケーション）
`memory.used_memory_bytes`	現在のメモリ使用量（バイト）（インデックス + ドキュメント）
`memory.used_memory_human`	人間が読みやすい形式の現在のメモリ使用量
`memory.peak_memory_bytes`	ピーク時のメモリ使用量（バイト）
`memory.peak_memory_human`	人間が読みやすい形式のピークメモリ使用量
`memory.used_memory_index`	インデックスが使用しているメモリ
`memory.used_memory_documents`	ドキュメントストアが使用しているメモリ
メモリ（システム）
`memory.total_system_memory`	物理RAM総容量（バイト）
`memory.total_system_memory_human`	人間が読みやすい形式のシステムメモリ総容量
`memory.available_system_memory`	利用可能な物理RAM（バイト）
`memory.available_system_memory_human`	人間が読みやすい形式の利用可能メモリ
`memory.system_memory_usage_ratio`	システム全体のメモリ使用率（0.0-1.0）
メモリ（プロセス）
`memory.process_rss`	プロセスRSS（使用中の物理メモリ）（バイト）
`memory.process_rss_human`	人間が読みやすい形式のプロセスRSS
`memory.process_rss_peak`	プロセス開始以降のRSSピーク値（バイト）
`memory.process_rss_peak_human`	人間が読みやすい形式のRSSピーク値
メモリ（ヘルス）
`memory.memory_health`	メモリヘルスステータス（HEALTHY/WARNING/CRITICAL/UNKNOWN）
インデックス（集計）
`index.total_documents`	全テーブルのドキュメント総数
`index.total_terms`	ユニーク語句の総数
`index.total_postings`	ポスティングの総数
`index.avg_postings_per_term`	語句あたりの平均ポスティング数
`index.delta_encoded_lists`	Delta圧縮を使用しているポスティングリスト数
`index.roaring_bitmap_lists`	Roaring Bitmapを使用しているポスティングリスト数
テーブル（テーブルごと）
`tables.<name>.documents`	テーブル内のドキュメント数
`tables.<name>.terms`	テーブル内の語句数
`tables.<name>.postings`	テーブル内のポスティング数
`tables.<name>.ngram_size`	テーブルのN-gramサイズ
`tables.<name>.memory_bytes`	テーブルのメモリ使用量（バイト）
`tables.<name>.memory_human`	人間が読みやすい形式のテーブルメモリ使用量

メモリヘルスステータス:

HEALTHY: システムメモリの20%以上が利用可能
WARNING: システムメモリの10-20%が利用可能
CRITICAL: システムメモリの10%未満が利用可能（OPTIMIZEは拒否されます）
UNKNOWN: ステータスを判定できない

このエンドポイントはJSON形式をサポートする監視ツールとの統合に適しています。

GET /metrics

監視とアラーティングのためのPrometheus Exposition Format形式のメトリクスエンドポイント。

用語補足

Prometheus Exposition Format は、Prometheusがメトリクスを収集するためのテキスト形式です。値だけでなく、Counter/Gaugeなどの型やラベルを持てるため、Grafanaなどで時系列として扱いやすくなります。

リクエスト:

http

GET /metrics HTTP/1.1

レスポンス (200 OK):

prometheus

# HELP mygramdb_server_info MygramDB server information
# TYPE mygramdb_server_info gauge
mygramdb_server_info{version="1.0.0"} 1

# HELP mygramdb_server_uptime_seconds Server uptime in seconds
# TYPE mygramdb_server_uptime_seconds counter
mygramdb_server_uptime_seconds 3600

# HELP mygramdb_memory_used_bytes Current memory usage in bytes
# TYPE mygramdb_memory_used_bytes gauge
mygramdb_memory_used_bytes{type="index"} 419430400
mygramdb_memory_used_bytes{type="documents"} 104857600
mygramdb_memory_used_bytes{type="total"} 524288000

# HELP mygramdb_memory_health_status Memory health status (0=UNKNOWN, 1=HEALTHY, 2=WARNING, 3=CRITICAL)
# TYPE mygramdb_memory_health_status gauge
mygramdb_memory_health_status 1

# HELP mygramdb_index_documents_total Total number of documents in the index
# TYPE mygramdb_index_documents_total gauge
mygramdb_index_documents_total{table="products"} 500000
mygramdb_index_documents_total{table="users"} 500000

# HELP mygramdb_command_total Total number of commands executed by type
# TYPE mygramdb_command_total counter
mygramdb_command_total{command="search"} 10000
mygramdb_command_total{command="count"} 2000
mygramdb_command_total{command="get"} 3000

Content-Type: text/plain; version=0.0.4; charset=utf-8

メトリクスカテゴリ:

カテゴリ	説明
サーバーメトリクス	サーバーバージョン、稼働時間、処理されたコマンド数
コマンド統計	コマンドタイプ別の実行回数（search、count、get、infoなど）
メモリメトリクス	アプリケーションメモリ（index/documents）、システムメモリ、プロセスRSS、ヘルスステータス
インデックスメトリクス	ドキュメント数、term数、posting数、最適化ステータス（`table`ラベル付きテーブル別）
クライアントメトリクス	現在の接続数、累計接続数
レプリケーションメトリクス	レプリケーションステータス、処理イベント数、操作別カウンタ（MySQLビルドのみ）

メトリクスタイプ:

Counter: 単調増加する値（例: mygramdb_command_total）
Gauge: 増減する値（例: mygramdb_memory_used_bytes）

Prometheusスクレイプ設定:

yaml

scrape_configs:
  - job_name: 'mygramdb'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:8080']
        labels:
          environment: 'production'

主要機能:

標準Prometheus形式: すべてのPrometheusベースの監視スタックと互換性あり
多次元メトリクス: ラベルによるグルーピング（例: table、command、status）
メモリヘルストラッキング: アラーティング用の数値ステータス値（1=HEALTHY、2=WARNING、3=CRITICAL）
テーブル別メトリクス: テーブル名ごとに分類されたインデックス統計
後方互換性: 既存の/infoエンドポイントは変更なし

/infoとの比較:

機能	`/info`	`/metrics`
フォーマット	JSON	Prometheusテキスト
用途	一般的な監視、デバッグ	Prometheus/Grafana統合
メトリクスタイプ	汎用的な値	型付きメトリクス（Counter/Gauge）
多次元対応	限定的	完全なラベルサポート
互換性	任意のHTTPクライアント	Prometheusエコシステム

両方のエンドポイントは同じ基礎データを提供しますが、異なる形式です。Prometheus統合には/metricsを使用し、一般的な監視や人間が読みやすい出力には/infoを使用してください。

GET /health

ロードバランサーと監視用のヘルスチェックエンドポイント。

リクエスト:

http

GET /health HTTP/1.1

レスポンス (200 OK):

json

{
  "status": "ok",
  "timestamp": 1699000000
}

GET /health/live

ライブネスプローブです。HTTPサーバープロセスが稼働していれば、ロード中またはレプリケーションが degraded の場合でも 200 OK を返します。

GET /health/ready

トラフィック制御用のレディネスプローブです。検索トラフィックを受け付けられる場合のみ 200 OK を返します。DUMP LOAD 中、または設定済みレプリケーションが利用不可の場合は 503 Service Unavailable を返します。

live と ready の使い分け

Kubernetesでは /health/live をLiveness Probe、/health/ready をReadiness Probeに使ってください。live は「プロセスを再起動すべきか」、ready は「検索トラフィックを流してよいか」を見るためのものです。

GET /health/detail

詳細な監視スナップショットです。"status": "healthy" または "status": "degraded" を含む 200 OK を返します。ロードバランサーのレディネス判定にはこのエンドポイントではなく /health/ready を使用してください。

GET /config

現在のサーバー設定サマリ（機密値は返却されません）。

リクエスト:

http

GET /config HTTP/1.1

レスポンス (200 OK):

json

{
  "mysql": {
    "configured": true,
    "database_defined": true
  },
  "api": {
    "tcp": {
      "enabled": true
    },
    "http": {
      "enabled": true,
      "cors_enabled": false
    }
  },
  "network": {
    "allow_cidrs_configured": false
  },
  "replication": {
    "enable": true
  },
  "notes": "機密情報はHTTP経由では提供されません。安全な接続上で CONFIG SHOW を利用してください。"
}

GET /replication/status

MySQLレプリケーションステータス（レプリケーション有効時のみ）。

リクエスト:

http

GET /replication/status HTTP/1.1

レスポンス (200 OK):

json

{
  "enabled": true,
  "current_gtid": "3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5"
}

エラーレスポンス (503 Service Unavailable):

json

{
  "error": "Replication not configured"
}

CORS サポート

ブラウザから直接アクセスする場合は api.http.enable_cors: true を設定し、api.http.cors_allow_origin に信頼できるOriginを指定します。不要な場合は CORS を無効のままにしてください。

CORS ヘッダー:

Access-Control-Allow-Origin: https://app.example.com
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type

使用例

以下の例では修飾なしの threads 識別子を使用しており、これは単一データベース構成で動作します。 2つ以上のデータベースがある場合は app_db.threads のように修飾してください（例: /tables/app_db.threads/search）。

cURL

検索:

bash

curl -X POST http://localhost:8080/tables/threads/search \
  -H "Content-Type: application/json" \
  -d '{
    "q": "機械学習 AND Python",
    "filters": {"status": 1},
    "limit": 10
  }'

ファセット:

bash

curl -X POST http://localhost:8080/tables/threads/facet \
  -H "Content-Type: application/json" \
  -d '{
    "column": "category",
    "q": "機械学習",
    "filters": {"status": 1},
    "limit": 10
  }'

ドキュメント取得:

bash

curl http://localhost:8080/tables/threads/thread_12345

ヘルスチェック:

bash

curl http://localhost:8080/health

JavaScript (fetch)

javascript

// 検索
const response = await fetch('http://localhost:8080/tables/threads/search', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    q: '機械学習 AND Python',
    filters: { status: 1 },
    limit: 10
  })
});

const data = await response.json();
console.log(`${data.count} 件の結果が見つかりました`);
data.results.forEach(doc => {
  console.log(`ドキュメント ${doc.doc_id}: ${doc.primary_key}`);
});

Python (requests)

python

import requests

# 検索
response = requests.post('http://localhost:8080/tables/threads/search', json={
    'q': '機械学習 AND Python',
    'filters': {'status': 1},
    'limit': 10
})

data = response.json()
print(f"{data['count']} 件の結果が見つかりました")
for doc in data['results']:
    print(f"ドキュメント {doc['doc_id']}: {doc['primary_key']}")

パフォーマンスの考慮事項

コネクションプーリング: レイテンシを抑えるためにHTTP keep-aliveを使用
ページネーション: 大きな結果セットには limit と offset を使用
キャッシング: アプリケーション層で頻繁なクエリのキャッシングを検討
ネットワークセキュリティ: network.allow_cidrs を使用して信頼できるIP範囲へのアクセスを制限

エラーハンドリング

すべてのエラーレスポンスは次の形式に従います：

json

{
  "error": "エラーメッセージの説明"
}

HTTPステータスコード:

コード	説明
200	成功
400	不正なリクエスト（無効な入力）
404	見つかりません（ドキュメントが存在しない）
500	内部サーバーエラー
503	サービス利用不可（機能が有効でない）

監視

HTTP APIは監視と可観測性のための複数のエンドポイントを提供します：

ヘルスチェック: GET /health - ロードバランサー用のシンプルなヘルスチェック
JSONメトリクス: GET /info - 一般的な監視ツール用のJSON形式の詳細統計
Prometheusメトリクス: GET /metrics - 時系列監視とアラーティング用のPrometheus互換メトリクス
レプリケーションステータス: GET /replication/status - MySQLレプリケーションステータス

監視スタック統合

Prometheus + Grafana:

Prometheusで/metricsエンドポイントをスクレイプするよう設定
MygramDB用のGrafanaダッシュボードをインポート
メモリヘルス、クエリレイテンシ、レプリケーション遅延に基づくアラートを設定

その他の監視ツール:

Datadog/New Relic: /infoのJSONエンドポイントをパース
Zabbix: /healthと/infoへのHTTPエージェントチェック
Nagios/Icinga: /healthエンドポイントを使用したチェックスクリプト

HTTP API ガイド ​

設定 ​

API エンドポイント ​

POST /tables/{identity}/search ​

POST /tables/{identity}/count ​

POST /tables/{identity}/facet ​

HTTP API の公開範囲 ​

GET /tables/{identity}/ ​

GET /info ​

GET /metrics ​

GET /health ​

GET /health/live ​

GET /health/ready ​

GET /health/detail ​

GET /config ​

GET /replication/status ​

CORS サポート ​

使用例 ​

cURL ​

JavaScript (fetch) ​

Python (requests) ​

パフォーマンスの考慮事項 ​

エラーハンドリング ​

監視 ​

監視スタック統合 ​

関連項目 ​

HTTP API ガイド

設定

API エンドポイント

POST /tables/{identity}/search

POST /tables/{identity}/count

POST /tables/{identity}/facet

HTTP API の公開範囲

GET /tables/{identity}/

GET /info

GET /metrics

GET /health

GET /health/live

GET /health/ready

GET /health/detail

GET /config

GET /replication/status

CORS サポート

使用例

cURL

JavaScript (fetch)

Python (requests)

パフォーマンスの考慮事項

エラーハンドリング

監視

監視スタック統合

関連項目