FullText Searches in SQL Server – I
Posted by decipherinfosys on December 14, 2008
One of the readers had asked in a recent request to write up a series on FullText Searches in SQL Server. Not many people use this feature and we recently had a need to use it in a project. In a series of posts, we will cover this feature in SQL Server 2005 and will also keep SQL Server 2008 in mind since we had implemented it in that version for a client of ours. This feature has been available in SQL Server since version 7.0. In SQL Server 2008, it is called iFTS (Integrated Full Text Search) and we can use this feature to do searches against data stored in SQL Server (documents, XML data or text/binary data columns).
There are different aspects of iFTS that one needs to know and we will break it up into these parts:
Part 1: How to create full text catalogs and indexes.
Part 2: How to use CONTAINS, CONTAINSTABLE, FREETEXT and FREETEXTTABLE to do these searches via T-SQL.
Part 3: Some complex search expressions, proximity searches and weighted searches.
Part 4: Using Thesaurus based word expansion and substitution.
Part 5: Administration of iFTS
In this post, we will look at how we can go about creating the catalogs and the indexes. Before we start, make sure that the SQL Server Full Text Service (msftesql) is running.
Let’s open up SSMS and for a given user database, go under Storage and then “Full Text Catalogs”. Right click on it and select “New Full Text Catalog” and you will see the dialog box like the one below in which you can enter information to create a new full text catalog:
The rest of the fields are pretty much self explanatory. The Accent Sensitivity field is to differentiate between characters with or without the accent marks. If you are more of a command line person, the actual command that gets fired when we create a catalog is:
create fulltext catalog TestCatalog
on filegroup [primary]
in path ‘c:\ftc\’
with accent_sensitivity = ON
Once the catalog has been created (you will see a folder created under that path: C:\FTC\), the next step is to create the full text index which can be done from SSMS by choosing the Full Text Index. One thing that confuses folks when playing with full text indexes in the beginning is the distinction between regular indexes and full text indexes. SQL Server uses b-tree structures for clustered or non-clustered indexes but a full text index is stored as inverted index structure. It uses the unique index that we will choose when we go through the wizard and thus relates the full text index entries to the table entries.
Open up the user database, navigate to the table that you want to create the Full Text Index on, right click on it and select the options as shown in the image below:
This will bring up the wizard- after the very first introductory screen, on the second one you will see that it will ask you to select a unique index for that table – this is what we had talked about above:
In the next screen, you have to select the table columns that you want to index – you can choose any column of the string data types (char, varchar, varchar(max), text and their unicode equivalents) or the binary data types (binary, varbinary, varbinary(max)) or xml data type columns. And then also select the language:
The next step in the wizard asks you about how you want to track changes to the index – if you want the data to be automatically updated, let it be at the default option – if however, your usage is only for reports which are going to be run only at specific time ranges of a day, then you might want to choose a manual run instead:
In the next step, we will now get the chance to choose the catalog – we will choose the TestCatalog that we created above – though it also gives you an option right here to create a new catalog in case you want to:
If you have chosen to do manual updates, in the next screen, you can then chose to define the population schedules – for our example, since we chose the automatic mode, there is no need for us to define a population schedule:
And that is it. You will need to press on Finish and your index will be created – you will see a summary screen and can also see the report once the index has been created after you clicked on Finish. All this can obviously also be done through T-SQL using the “CREATE FULLTEXT INDEX” command. Look up BOL for the syntax and the examples. After the index has been created, you will see new entries under the catalog folder (C:\FTC\TestCatalog\MssearchCatalogDir).
Before we close this post, it is also important to explain the difference in Full Text Search (FTS) in SQL Server 2005 and iFTS (Integrated Full Text Search) in SQL Server 2008. As we mentioned above, in the case of SQL Server 2005, FTS is implemented via a full text engine service (msftesql.exe). This service provided the majority of the FTS functionality including indexing, processing and filter daemon management. In the case of SQL Server 2008, there is no msftesql.exe – instead this functionality has moved into SQLSERVR.EXE and hence the word “i” in iFTS for “integrated”. It does include a filter daemon host process FDHOST.exe. While SQLSERVR.exe provides the iFTS components of indexing, querying, query processor and daemon process manager, the fdhost.exe provides filters and word breakers components.
In the next post, we will take a look at how we can use contains, containstable, freetext and freetexttable in T-SQL to do queries on the data.