wpForo v3 AI Edition Documentation

⌘K
  1. Home
  2. Docs
  3. wpForo v3 AI Edition Docu...
  4. AI Features
  5. AI Content Indexing

AI Content Indexing

What Is AI Content Indexing?

AI Content Indexing is the process of converting your forum content into a format that AI features can understand and search. Instead of matching exact keywords like traditional search, AI indexing captures the meaning behind each topic and reply.

When your content is indexed, each post is converted into a numerical representation called a vector embedding — a series of numbers that encodes what the text means. These vectors allow the system to find content that is conceptually similar to a user’s query, even when the exact words are different.

Example A user searching for “my site won’t load after updating” will find a thread titled “blank page after plugin upgrade” — because they mean the same thing, even though none of the words match.

wpForo offers two ways to store these indexed vectors: Local Storage (in your WordPress database) and Cloud Storage (in dedicated cloud infrastructure). Both enable semantic search, but they differ significantly in how they process, store, and search your content.

Two Storage Modes

Local Storage

Vector embeddings are stored in your WordPress database. Search runs directly on your server using PHP.

  • Subscription plan: All
  • Vectors saved in your MySQL database
  • Search computed by your web server
  • Good for small to medium forums
  • No external storage dependency

Cloud Storage

Vector embeddings are stored in dedicated cloud infrastructure with optimized search algorithms.

  • Subscription plan: Business+
  • Vectors stored in purpose-built cloud index
  • Search powered by specialized algorithms
  • Scales to any forum size
  • No database or server load
Key Difference Both modes use the same AI model to generate embeddings. The critical difference is where the embeddings are stored and how they are searched. Cloud mode uses purpose-built search algorithms that are fundamentally faster and more accurate than what’s possible in a WordPress database.

Full Feature Comparison

Feature Local Storage Cloud Storage
Embedding generation Via cloud API (vectors returned to your server) Entirely in the cloud (processed and stored remotely)
Vector storage Your WordPress MySQL database Dedicated cloud vector index (isolated per forum)
Search method PHP-based sequential comparison Optimized k-nearest-neighbor (k-NN) algorithms
Search scope Processes vectors in batches of 2,000 Searches entire index in a single operation
Search quality at scale May miss best matches on large forums Always finds the closest matches across all content
Search speed (100K+ posts) Slower — CPU-bound computation on your server Near-instant — sub-second regardless of forum size
Database impact ~3–4 MB per 1,000 topics Zero — no data added to your database
Server CPU load Each search query uses CPU for vector math Zero — search runs on cloud infrastructure
Result caching Similar topics cached for 1 hour locally Fresh results every time (fast enough without cache)
Image & document indexing Supported (processed one at a time) Supported (processed in parallel on cloud)
Deduplication Local two-stage (group + fingerprint) Server-side intelligent deduplication
Recommended forum size Under 50,000 posts Any size

How the Embedding Process Differs

Embedding is the process of converting text into numerical vectors. Both modes use the same high-quality AI model with high-dimensional vectors. The difference is in how and where this process happens.

Local Mode: Two-Step Process

Your Forum Content

Sent to Cloud API (batches of up to 100 items)

AI Model Generates Vectors

Vectors Returned to Your Server

Stored in Your WordPress Database (binary format)

In local mode, your content is sent to the cloud API to generate embeddings, but the resulting vectors are returned to your server and stored in your WordPress database. This means:

  • Your database grows by ~3–4 MB for every 1,000 topics indexed
  • Large batches are limited to 100 items per API call, requiring multiple round trips
  • Images and documents must be processed one at a time
  • Your server stores and manages all vector data

Cloud Mode: Fully Managed Process

Your Forum Content

Sent to Cloud Infrastructure

AI Model Generates Vectors

Vectors Stored in Dedicated Cloud Index
(optimized for fast retrieval — nothing returns to your server)

In cloud mode, the entire embedding process happens remotely. Your content is sent to the cloud, processed, and stored in a dedicated vector index built specifically for fast search. Nothing comes back to your database.

  • Zero impact on your WordPress database size
  • Background processing — content is queued and processed asynchronously
  • All content types (text, images, documents) processed efficiently
  • Your forum gets a dedicated, isolated index for complete data separation
Important Even in local mode, the cloud API is required to generate embeddings — your WordPress server cannot create vector embeddings on its own. The difference is that local mode downloads and stores the vectors locally, while cloud mode keeps them in optimized cloud storage.

Why Cloud Delivers Better Search

The most significant difference between the two modes is not just speed — it’s search quality. Cloud mode finds better, more relevant results.

How Local Search Works

When a user searches in local mode, your WordPress server must:

  1. Generate a query vector The search query is sent to the cloud API to create a vector embedding (this step is the same in both modes).
  2. Load vectors from the database in batches Your server loads stored vectors from MySQL in groups of 2,000 at a time, to avoid running out of memory.
  3. Calculate similarity one by one For each stored vector, PHP calculates a mathematical similarity score (cosine similarity). This is pure CPU computation on your server.
  4. Sort and return results After checking all batches, the top results are sorted and returned.
The Problem at Scale On a forum with 500,000 posts, each search query requires your PHP server to perform hundreds of thousands of mathematical comparisons. This is slow (seconds instead of milliseconds) and keeps your CPU busy — potentially affecting other visitors loading the site at the same time.

How Cloud Search Works

Cloud search takes a fundamentally different approach. Instead of comparing the query against every vector one by one, the cloud index uses k-nearest-neighbor (k-NN) algorithms — data structures specifically designed to find the most similar items without scanning everything.

Think of it like the difference between looking through every book in a library one by one (local), versus using the library’s cataloging system to go straight to the right shelf (cloud).

Local Search

  • Sequential comparison (one by one)
  • Processes vectors in batches of 2,000
  • CPU-intensive on your web server
  • Lower similarity scores (5–25% range)
  • May miss relevant results at scale
  • Caches similar topics for 1 hour

Cloud Search

  • Indexed search (goes directly to best matches)
  • Searches entire index in one operation
  • Zero load on your web server
  • Higher similarity scores (30–90% range)
  • Always finds the best matches
  • Fresh results every time (no stale cache)
Why Higher Scores Matter Cloud search produces higher similarity scores (30–90%) compared to local search (5–25%) because the cloud’s search algorithms are mathematically more precise. This means cloud search can more confidently rank results, showing users the most relevant content first and filtering out marginal matches more effectively.

Performance at Scale

As your forum grows, the performance gap between local and cloud storage widens dramatically.

Forum Size Local Search Time Cloud Search Time Local DB Size Added
5,000 posts Fast (milliseconds) Fast (milliseconds) ~15–20 MB
50,000 posts Noticeable delay Fast (milliseconds) ~150–200 MB
250,000 posts Slow (seconds) Fast (milliseconds) ~750 MB – 1 GB
1,000,000+ posts Very slow / impractical Fast (milliseconds) ~3–4 GB+

Why the Gap Grows

Local mode has a linear relationship with content size: more posts means more vectors to compare on every search query. Your server does more work, searches take longer, and the database keeps growing.

Cloud mode uses indexed data structures that scale logarithmically — even if your forum doubles in size, search time barely increases. The search algorithm navigates directly to the most relevant results without examining every single vector.

Server Load Impact In local mode, every search query uses your web server’s CPU for vector math calculations. On a busy forum with 100 concurrent users searching, this can noticeably slow down your entire website — including page loads for users who aren’t even searching. Cloud mode eliminates this entirely: search happens on remote infrastructure, so your server handles only the lightweight API call.

Which Mode Should I Use?

Under 10,000 posts Local — Excellent performance, no external storage needed
10,000 – 50,000 posts Local — Good performance. Cloud available if you want faster searches and zero DB growth
50,000 – 100,000 posts Cloud recommended — Local search slows down and adds hundreds of MB to your database
100,000 – 500,000 posts Cloud strongly recommended — Local search becomes impractically slow
Over 500,000 posts Cloud essential — Only cloud mode can deliver usable search at this scale

Beyond Scale: Other Reasons to Choose Cloud

  • Better search accuracy — k-NN algorithms find more precise matches than sequential comparison
  • Zero database growth — your WordPress database stays lean
  • No server CPU overhead — search processing happens remotely
  • Fresh results — no 1-hour caching delay on similar topics
  • Future-proof — your search stays fast as your forum grows
Large Forums If your forum has hundreds of thousands of topics or millions of posts, cloud storage is strongly recommended. Local storage at that scale would add gigabytes to your WordPress database, searches would take seconds instead of milliseconds, and results may not include the most relevant matches.

What Gets Indexed

When you index your forum, the following content is processed and made searchable:

  • Topic titles and body content
  • All replies within each topic
  • Author display names
  • Forum names and topic URLs
  • Dates (when topics/replies were created)
  • Solved status and best answer flags
  • Like/vote counts
  • Image and document content (Professional plan and above)

How Content Is Processed

Each topic and its replies are split into smaller, overlapping chunks of text. Each chunk is then converted into a high-dimensional vector by an AI model. These vectors capture the meaning of the text, not just the keywords. When someone searches, their query is converted into a vector the same way, and the system finds the closest matching content vectors.

Smart Updates Indexing tracks changes at the individual post level using content fingerprints. When you re-index, only new or modified content is processed — unchanged posts are skipped. Deleted posts are automatically removed from the index. Re-indexing unchanged content does not consume additional credits.

Your Data & Privacy

We understand your forum data is valuable. Here’s exactly what happens with it in each mode:

Cloud Storage Mode

Content is sent to secure cloud infrastructure for processing and storage. Each forum gets its own isolated, dedicated index — your data is completely separated from other forums. All data is encrypted both in transit and at rest.

Local Storage Mode

Content is sent to the cloud API only to generate vector embeddings. The resulting vectors are returned to your server and stored in your WordPress database. The cloud does not permanently store your content in local mode.

Never Sent or Stored

  • User passwords
  • Email addresses
  • IP addresses
  • Private topics or unpublished drafts
  • WordPress admin credentials or configuration data

Getting Started

Please note that Cloud Storage is only available fro Business and higher subscription plans. Only the Local Storage is available for the Free, Starter and Professional plans.

  1. Choose Your Storage Mode Go to wpForo Settings > AI Features and select either Local or Cloud (gVectors on AWS) storage mode. For forums over 50,000 posts, we recommend Cloud.
  2. Start Indexing Go to AI Features > Content Indexing and click Start Indexing. You can index your entire forum or select specific forums, date ranges, or tags.
  3. Monitor Progress The indexing progress bar shows how many topics have been processed. For very large forums, you can index in batches — for example, one forum at a time.
  4. Done — AI Features Are Active Once indexing completes, semantic search, topic suggestions, and related topics are automatically enabled for your users.
Tip for Very Large Forums If you have hundreds of thousands of topics, we recommend indexing in batches by forum or date range rather than all at once. This keeps the process manageable and lets you verify results as you go.

AI Features Powered by Indexing

Once your content is indexed, several AI-powered features become available to your users:

🔍

Semantic Search

Users find relevant threads by meaning, not just keywords. Great for technical forums where users may not know the exact terminology.

💡

Topic Suggestions

When someone starts creating a new topic, the system suggests existing topics that discuss the same subject — reducing duplicate threads.

🔗

Related Topics

Shown alongside topics your users are reading, helping them discover relevant discussions and keeping them engaged.

🤖

AI Chatbot

An AI assistant that answers questions using your forum’s knowledge base. It references actual forum discussions in its responses.

Frequently Asked Questions

Can I switch between Local and Cloud mode?

Yes. You can switch storage modes at any time in the settings. After switching, you’ll need to re-index your content in the new mode. Your original forum data is never affected — only the search index changes.

If both modes use the cloud API to generate embeddings, why is cloud mode better?

The embedding generation is the same quality in both modes. The difference is in storage and search. Cloud mode stores vectors in a purpose-built search index using k-NN algorithms that find the best matches instantly across your entire content. Local mode stores vectors in your MySQL database and searches them sequentially using PHP, which is slower, uses your server’s CPU, and may miss the best matches on large forums.

Does indexing affect my forum’s performance?

No. Indexing runs in the background and processes content in small batches. Your forum remains fully usable during indexing. In Cloud mode, the heavy processing happens on external infrastructure, so there’s virtually no impact on your hosting.

How much database space does Local mode use?

Approximately 3–4 MB per 1,000 indexed topics. For a forum with 10,000 topics, that’s about 30–40 MB. For 100,000 topics, that’s 300–400 MB. For very large forums (500,000+ topics), this can reach several gigabytes. Cloud mode uses zero database space.

Does indexing use credits?

Yes, indexing consumes credits from your plan. The base cost is 1 credit per topic (including all its replies). Re-indexing unchanged content does not consume additional credits — only new or modified content uses credits.

What happens if I delete a topic?

When you delete a topic from your forum and run indexing, its vectors are automatically removed from the index. In Cloud mode, this happens during the next sync. No manual cleanup is needed.

Is my data secure in Cloud mode?

Yes. Your forum content is stored in an isolated, encrypted environment. Each forum has its own dedicated index — your data is completely separated from other forums. All data transmission uses encryption in transit (TLS).

Can I index only specific forums or topics?

Yes. The indexing interface lets you filter by forum, date range, topic tags, or even specific topic URLs. This is especially useful for large forums where you want to index content gradually.

Why do local search scores look different from cloud search scores?

Local search produces lower similarity scores (typically 5–25%) while cloud search produces higher scores (30–90%). This is due to different search algorithms, not different embedding quality. The system automatically adjusts score thresholds for each mode, so result quality is consistent at small scale. At large scale, cloud mode produces more accurate rankings.

I have millions of posts. Should I use Cloud mode?

Absolutely. Cloud mode is designed to handle forums of any size with consistent sub-second search performance. A forum with 10 million posts will get the same fast search results as a forum with 10,000 posts. Local mode at that scale would require gigabytes of database storage and searches would take several seconds per query.