FoxWeb Full-Text Search Engine

Table of Contents

Description
Working with Search Results
How Searches Work
Getting Help
Contact Information
Sample Application
Download
File List
Reference

Description

The FoxWeb Full-Text Search Engine is a VFP class that can be used to facilitate fast and efficient full-text searches in FoxPro tables. In order to support this functionality, the class creates an index, mapping each word contained in one or more field(s) in the source table. The fields being indexed may contain either plain text, or HTML data. In the case of HTML data, all HTML tags are removed before indexing the remaining text. This class was developed to provide full-text search capabilities for the FoxWeb Forum, but is now provided for free to the FoxPro community.

The class supports the following search features:

Working with Search Results

The Search method returns matching results in a cursor, which contains the key field values of matching records, as well as the total number of matches in each record. This cursor can be joined with the source table, using a number of methods. Additional WHERE clauses can be used to further limit the results:

SELECT * FROM messages
JOIN Results ON messages.MsgID = Results.IndexValue
WHERE messages.timestamp > M.StartDate
ORDER BY Results.TotWords, Results.Frequency, messages.timestamp

A more detailed example can be found in the description of the Search method in the Reference section.

How Searches Work

The Full-Text Search object first splits the search phrase into one or more search terms, separated from each other by either spaces, or commas. Search terms can be single words, a compound words (two or more words linked by a dash, or other symbol), or phrases (sets of words enclosed in double quotes). Each search term is broken down into the individual words in it (a word is a series of letters, numbers and underscore characters), while other characters are discarded. This means that the search term "tree-top" will match any occurrences of the word "tree", followed by "top", including "tree-top", "tree top" and "tree, top". The search engine then tries to find matches of the various search terms, taking into account wild-card characters in them.


Support

Getting Help

The FoxWeb Full-Text Search Engine is provided to the FoxPro community for free, so we can't offer direct support.  However, If you have a question you can visit the FoxWeb Forum, where you can search previous postings, post a new message and get help from other users and the developers.

Contact Information

You may send bug reports and other correspondence to support@foxweb.com.


Sample Application

The FTExample form illustrates the use of most of the FoxWeb Full-Text Search Engine's features.  The form's Init method instantiates a full-text object, from a sub-class that is defined in FTWrapper.prg.  The FTWrapper sub-class is used in order to provide status updates while the object is in the process of indexing.  When you run this form from VFP, you will first be asked to select the table you would like to work with.  The table sample.dbf is provided, but you can use this form with your own tables. 

The FTExample form has three tabs, each with a distinct use:

Create Index

This tab allows the creation of a full-text index for any VFP table.  By default, its fields are populated for indexing sample.dbf, which is provided with this sample application.  Although some of the controls in this page may appear confusing at first, they correspond to properties and methods of the Full-Text Search Engine class, so they are pretty easy to figure out.  The three buttons in this page correspond to the CreateIndex, UpdateIndex and DeleteIndex methods, while all remaining controls correspond to various properties that affect the indexing process.

Search

The Search tab provides the ability to perform full-text searches against an indexed table.  The controls in this page are only enabled if the selected table has already been indexed.

Index Information

This tab displays information about the full-text index of the selected table.  This information is retrieved via the GetIndexAttributes method.


Download

The latest version can always be downloaded from the FoxWeb web site:

Download the FoxWeb FullText class and support files

After downloading the zip file, expand its contents in an empty folder on your computer.


File List

fwFullText7.fxp, fwFullText8.fxp, etc. Compiled versions of the fwFullText class, each created with a different version of VFP.  You can either rename the one that corresponds to your version of VFP to fwFullText.fxp, or you can use the technique utilized in the FTExample sample form to automatically load the correct file.
fwUtil.prg FoxWeb utility class.  This file contains a number of useful all-purpose methods.  It is required by the fwFullText class and must be stored in the same location.
fwFullText.htm This documentation.
FTExample.sct, FTExample.scx Sample form that illustrates the use of the FoxWeb Full-Text Search Engine.
FTWrapper.prg This file sub-classes the fwFullText class and is utilized by the FTExample sample application.
FullTextNoise.txt Sample noise file utilized by the FTExample sample application.
sample.dbf, sample.fpt, sample.cdx Sample data table utilized by the FTExample sample application.

Reference of fwFullText Class

Properties

Errors Contains messages generated by the Full-Text Search Engine.
ExclusiveIndexing Should be set before a call to the CreateIndex and UpdateIndex methods to determine whether indexing should be done in exclusive or shared mode. Exclusive mode is faster, but it locks the full search index, preventing searches from occurring during the indexing process. Shared mode allows the partial index to be searched.
LastCorrectedSearchPhrase Returns a syntactically valid search phrase after a call to the Search method.
LastSearchTally Returns the number of matches found by the last call to the Search method.
MaxHits Should be set to the maximum number of results to return. If the search yields more records, then an error is returned.
MaxHitsPercent Should be set to the maximum percentage of records from searched table to return. If the search yields more records, then an error is returned. For example, if the table contains 150 records and more than 75 (50%) meet the search criteria, then an error is returned.
Table Must point to the table to be indexed. It should be set to point to the same table before a call to any of the methods below.

Methods

CreateIndex Creates a full-text index.
DeleteIndex Deletes an existing full-text index.
GetIndexAttributes Returns information about an existing full-text index.
IsIndexed Returns a boolean value, indicating whether a full-text index has been created for the selected table.
Search Performs a full-text search.
UpdateIndex Updates a previously-created full-text index.
UpdateRecordIndex Updates the full-text index for a particular record. It can be used to keep the full-text index up to date continuously as records are added, edited and deleted.
Version Returns version information.

Events

StatusMessage Gets called during full-text index creation.

Errors Property

Description

The Errors property is an object, based on the Errors class, which contains messages generated by the various methods of the Full-Text Search Engine. It can be used to retrieve errors and other messages after a call to a method of the fwFullText object. This property supports the following properties and methods:

Name Type Description
Count property Returns the total number of messages.
Item method The Item method accepts a single numerical argument, specifying the requested message and returns that message as an Error object. Error objects support the following properties: Description (message text), Number (the error number), Severity (a number indicating the severity of the error), Source (character string, indicating the source of the message).
Example
oFullText = NEWOBJECT('FullText', 'fwFullText.fxp')
oFullText.Search('keyword', 'ResultCursor', 'FieldToSearch')
IF oFullText.Errors.Count > 0
    * Read each message in the Errors object
    FOR M.i = 1 TO oFullText.Errors.Count
        * Retrieve the current error
        ? 'Message: ' + oFullText.Errors.Item(M.i).Description
        ? 'Number: ' + FORMAT(oFullText.Errors.Item(M.i).Number)
        ? 'Severity: ' + FORMAT(oFullText.Errors.Item(M.i).Severity)
        ? 'Source: ' + oFullText.Errors.Item(M.i).Source
    NEXT
ENDIF

The above code checks if there are any messages and lists them on the screen.


ExclusiveIndexing Property

Description

Should be set before a call to the CreateIndex and UpdateIndex methods to determine whether indexing should be done in exclusive or shared mode. Exclusive mode is faster, but it locks the full search index, preventing searches from occurring during the indexing process. Shared mode allows the partial index to be searched.

This property is read/write.

Syntax

oFullText.ExclusiveIndexing = .T.


LastCorrectedSearchPhrase Property

Description

Returns a syntactically valid search phrase after a call to the Search method. Can be used to replace the contents of the search form with a valid search.

This property is read only.

Syntax

cLastCorrectedSearchPhrase = oFullText.LastCorrectedSearchPhrase


LastSearchTally Property

Description

Returns the number of matches found by the last call to the Search method.

This property is read only.

Syntax

nLastSearchTally = oFullText.LastSearchTally


MaxHits Property

Description

Should be set to the maximum number of results to return. If the search yields more records, then an error is returned. The default value for this property is 100.

This property is read/write.

Syntax

oFullText.MaxHits = 100


MaxHitsPercent Property

Description

Should be set to the maximum percentage of records from searched table to return. If the search yields more records, then an error is returned. For example, if the table contains 150 records and more than 75 (50%) meet the search criteria, then an error is returned. The default value for this property is 50.

This property is read/write.

Syntax

oFullText.MaxHitsPercent = 50


Table Property

Description

Must point to the table to be indexed. It should be set to point to the same table before a call to any of the methods below.

This property is read/write.

Syntax

oFullText.Table = 'c:\data\messages.dbf'


CreateIndex Method

Description

Creates a full-text index from scratch.

Syntax

bSuccess = oFullText.CreateIndex(cKeyField, cTextFieldString, bProximity, cNoiseWords, nKeywordIDSize)

Parameters
cKeyField

The name of the key field, which is used to uniquely identify records in the table being indexed.

cTextFieldString

A comma-delimited string, containing all the fields that should be indexed. The full-text engine can index character and memo fields. Each field name can be followed with an optional colon (:) character and a content type indicator. In these cases, the content type-specific keywords will be omitted from the index. For example, "content:html" can be used to indicate that the content field contains HTML data and that all HTML tags are to be stripped before the contents of the field are indexed. Currently the only supported content type is "html".

bProximity

Boolean value, indicating whether the index should contain word proximity information. When this value is set to .F., then the index files are smaller, but you will not be able to search for phrases, or compound words. The recommended value for bProximity is .T..

cNoiseWords

A delimited list of words to exclude from the full-text index.  The words can be separated with commas, spaces, or carriage returns.  Specifying noise words that are used too often in the indexed table can dramatically reduce the size of your full-text index, but will prevent searching on those words. The provided FullTextNoise.txt file contains a good sample of words for the English language.

nKeywordIDSize

The number of bytes used to store keyword IDs. This value can be either 2, or 4. If you expect the number of unique words being indexed to exceed 64,000, then you should use 4, otherwise use 2, which results in a smaller full-text index. If you specify a value of 2 and the number of keywords is higher than what can be represented with two bytes, you will receive an error message.

Return Value

Boolean: Whether the call to the current method succeeded.

Comments

A text message, providing statistics and final results of the indexing process is inserted in the Errors object.

Example
oFullText = NEWOBJECT('FullText', 'fwFullText.fxp')
oFullText.Table = 'ForumMessages'
RetValue = oFullText.CreateIndex('MsgId', 'content:html,subject,user', .T., FILETOSTR('FullTextNoise.txt'), 2)
RELEASE oFullText
CLEAR CLASS fulltext

DeleteIndex Method

Description

Deletes a previously-created full-text index.

Syntax

bSuccess = oFullText.DeleteIndex()

Return Value

Boolean: Whether the index was deleted successfully.


GetIndexAttributes Method

Description

Retrieves information about an existing full-text index.

Syntax

bSuccess = oFullText.GetIndexAttributes([@cKeyField], [@aTextFields], [@bProximity], [@cNoiseWords], [@nKeywordIDSize])

Parameters

This method expects four memory variables, passed by reference (preceded with the @ character). These variables are populated with the following information.

cKeyField

The name of the key field.

aTextFields

A two-dimensional array, containing all index fields (one in each row). The first column contains the name of the field and the second contains the data type (currently "html", or nothing).

bProximity

Indicates whether the index contains word proximity information.

cNoiseWords

Comma separated list of noise words excluded from the full-text index.

nKeywordIDSize

Indicates the number of bytes that used to store keyword IDs.

Return Value

Boolean: Indicates whether the method call succeeded.


IsIndexed Method

Description

Can be used to determine whether a full-text index exists for the selected table.

Syntax

bIsIndexed = oFullText.IsIndexed()

Return Value

Boolean: The return value is .T. if a full-text index has already been created for the selected table.


Search Method

Description

The Search performs a full-text search, using a previously-created index.

Syntax

bSuccess = oFullText.Search(cSearchPhrase, cResultCursorName, [cTextFieldString], [bAnyWords])

Parameters
cSearchPhrase

The search conditions, which consists of a series of space-separated search terms. Each term can either be a word or a phrase enclosed in quotes. Wildcard characters (* for multiple characters and ? for single characters) can be used, but they can slow the search down if they are at the beginning of a word. For example, searching on "Cook*" is fast, but searching on "*ing" is slower.

cResultCursorName

The name of the cursor, where the result set will be stored. This cursor will contain three columns:

Field Name Data Type Description
IndexValue Varies The key of the matching record. This data type of this field will match the data type of the corresponding field in the table being searched.
TotWords Numeric The number of terms from cSearchPhrase, contained in this particular record. If bAnyWords was .F. then this value will always match the total number of terms passed in cSearchPhrase.
Frequency Numeric The total number of times any of the terms contained in cSearchPhrase were found in this particular record.

The calling program can do additional search by using a SELECT WHERE KeyField IN (SELECT KeyField FROM ResultCursorName).

cTextFieldString

An optional comma-delimited string, containing all the fields that should be searched. Obviously, all fields in the list must have already been indexed via the CreateIndex method. If cTextFieldString is not passed then all indexed fields are searched.

bAnyWords

A boolean value, indicating whether the Search method will look for records containing all search terms (AND), or any search terms (OR). If this argument is not passed then the search will default to matching only records containing all search terms.

Return Value

Boolean: Indicates whether the method encountered errors. Use the Errors object to retrieve details about any error(s).

Example
oFullText = NEWOBJECT('FullText', 'fwFullText.fxp')
oFullText.Table = 'ForumMessages'
M.FTSearchResult = oFullText.Search(;
    '"constant change" arbitrat*', ;
    'FTSearch', ;
    'MessageText')
M.SearchPhrase = oFullText.LastCorrectedSearchPhrase
DO CASE
CASE NOT M.FTSearchResult
    M.ErrMsg = 'Search failed'
    * Read each message in the Errors object
    FOR M.i = 1 TO oFullText.Errors.Count
        * Retrieve the current error
        oError = oFullText.Errors.Item(M.i)
        M.ErrMsg = M.ErrMsg + oError.Description + CHR(13) + CHR(10)
    NEXT
CASE oFullText.LastSearchTally = 0
    M.ErrMsg = 'No messages meet your search criteria'
OTHERWISE
    SELECT ForumMessages.* FROM ForumMessages
        JOIN FTSearch ON ForumMessages.MsgID = FTSearch.IndexValue
        ORDER BY FTSearch.TotWords, FTSearch.Frequency
    ...Add code to process the result set
ENDCASE
RELEASE oFullText
CLEAR CLASS fulltext

This will search for records in which the MessageText field contains both the phrase "constant change" and at least one word starting with "arbitrat". Results will be inserted in a cursor called 'FTSearch'.


UpdateIndex Method

Description

Updates a previously-created index. It is more convenient than CreateIndex, because you don't have to pass the key field and fields to be indexed again. It uses the values originally passed to the CreateIndex method.

Syntax

bSuccess = oFullText.UpdateIndex()

Return Value

Boolean: Whether the call to the current method succeeded.

Comments

A text message, providing statistics and final results of the indexing process is inserted in the Errors object.

Example
oFullText = NEWOBJECT('FullText', 'fwFullText.fxp')
oFullText.Table = 'ForumMessages'
RetValue = oFullText.UpdateIndex()
RELEASE oFullText
CLEAR CLASS fulltext

UpdateRecordIndex Method

Description

Updates a previously-created full-search index for a particular record. It can be used to keep the full-text index up to date as after individual records are added, edited and deleted, without the need to do a full re-index of the whole table.

Syntax

bSuccess = oFullText.UpdateRecordIndex(vIndexValue[, bDeleteRecord])

Parameters
vIndexValue

The value of the key field of the record to be re-indexed. This can be an existing record that was changed, or a new record just added to the table. The data type of this parameter will match the data type of the key field in the table being indexed.

bDeleteRecord

This optional parameter should be set to .T. if the record specified by vIndexValue was deleted from the table.

Return Value

Boolean: Whether the call to the current method succeeded

Example
UPDATE ForumMessages SET content = M.NewContent WHERE MsgID = M.MsgID
oFullText = NEWOBJECT('FullText', 'fwFullText.fxp')
oFullText.Table = 'ForumMessages'
RetValue = oFullText.UpdateRecordIndex(M.MsgID, .F.)
RELEASE oFullText
CLEAR CLASS fulltext

Version Method

Description

Returns version information for the object.

Parameters
nType

Specifies the type of information requested. If nType is 0 or omitted, then the method returns the class version. If nType is 1, then the method returns the version of VFP used to compile it.

Syntax

nVersion = oFullText.Version([nType])

Return Value

Numeric: The requested version information.


StatusMessage Event

Description

This event gets called during full-text index creation and can be sub-classed to support application-specific progress reporting.

Parameters
nCurrentRecord

The record currently being indexed.

nTotalRecords

The total number of records in the table being indexed. You can use the expression nCurrentRecord / nTotalRecords * 100 to determine the percentage of the records that have already been indexed.

nStartSeconds

The value returned by the SECONDS() function at the start of the indexing process. You can use the expression SECONDS() - nStartSeconds to determine how long the indexing process has been in progress.

Notes

If this event is not sub-classed, then it will do one of the following:


© Aegis Group