Systems Engineering and RDBMS

FullText Searches in SQL Server – II

Posted by decipherinfosys on December 15, 2008

Yesterday, we had covered how to create full text catalogs and indexes and also the difference in the Full Text Search architecture between SQL Server 2005 and SQL Server 2008. Today, we will cover how we can query the full text index. There are four ways to query a full text index in T-SQL:

a) CONTAINS and FREETEXT predicates – these help to retrieve records from a table that match a specific search criteria and works kind of like the EXISTS clause.
b) CONTAINSTABLE and FREETEXTTABLE functions – these not only return the resultsets but in addition return a KEY column which is for the unique index that you selected when creating the index and a RANK column which gives us the relevance rating.

We will use the same table that we had used before in yesterday’s post – OPEN_TRAN_TEST. In it, we had created the full text index on COL2 which has this data set:

COL1        COL2
----------- ----------
1           TEST
3           TEST2
6           T2 test
7           a TEST2

And a simple query like:

select *
from dbo.open_tran_test
where freetext (*, ‘T2’)

will give:

COL1        COL2
----------- ----------
6           T2 test

while, this query:

select *
from dbo.open_tran_test
where freetext (*, ‘test2’)

will give:

COL1        COL2
----------- ----------
3           TEST2
7           a TEST2

The “*” denotes the wild card search which means that the query is supposed to search in all the full text index columns and the string “T2” or “TEST2” indicates the string to search for in a word match. So, essentially what FREETEXT does is that it uses a word breaker to break a string into individual words and in the above example, it has done an exact match for those words. We can also use it to do approximate matches of words either by:

i) Stemming words to locate the inflectional form of words, or
ii) Performing word expansions and replacements based on thesaurus files.

So, what do these 2 points mentioned above mean? It means that given a word that we are searching for, the FREETEXT predicate and the FREETEXTTABLE function can automatically stem words to find inflectional forms – so, in our example of “test” word, it can perform automatic stemming and find inflectional forms of “test” to include: testing, testable, tests etc…here is an example – let’s create some more records:

insert into open_tran_test (col2) values (‘tested 1’);
insert into open_tran_test (col2) values (‘1 testing’);
insert into open_tran_test (col2) values (‘more tests’);
insert into open_tran_test (col2) values (‘1 testable’);

And now the search:

select *
from dbo.open_tran_test
where freetext (*, ‘test’)

yields:

COL1        COL2
----------- ----------
1           TEST
6           T2 test
8           tested 1
10          more tests
13          1 testing

We can make the search more specific using the Language and the specific column:

select *
from dbo.open_tran_test
where freetext (col2, ‘test’, Language 1033);

Where Language 1033 indicates LCID 1033 which stands for US English. We will cover the Thesaurus files in another post. The FREETEXTTABLE function allows us to specify the table name, the column names to search for, the text string to search for, the language and the top_n_by_rank value to specify the top n rank columns. Here is an example of such a query:

select *
from freetexttable (dbo.open_tran_test, col2, ‘test’, Language 1033, 10) as x
inner join dbo.open_tran_test as ott
on x.[key] = ott.col1

And the output will contain the KEY column and the RANK column along with the 2 columns from the open_tran_test table:

KEY         RANK        COL1        COL2
----------- ----------- ----------- ----------
6           0           6           T2 test
8           0           8           tested 1
10          0           10          more tests
13          0           13          1 testing
1           0           1           TEST

Let’s now look at the CONTAINS predicate and the CONTAINSTABLE function as well. CONTAINS allows to do more complex searches. CONTAINS allows us to search for a word or phrase prefix as well as a word in the proximity of another word and also inflectional searches. So, the same query from above that was using freetext predicate now becomes:

select *
from dbo.open_tran_test
where contains(col2, ‘test’, language 1033)

However, this will return only direct matches:

COL1        COL2
----------- ----------
1           TEST
6           T2 test

which is a different behavior than freetext. If we want the inflectional forms or thesaurus synonyms, we then have to use FORMSOF generation term in our searches:

select *
from dbo.open_tran_test
where contains(col2, ‘FORMSOF(INFLECTIONAL, test)’, language 1033)

And now we will get exactly the same result as we did with freetext:

COL1        COL2
----------- ----------
1           TEST
6           T2 test
8           tested 1
10          more tests
13          1 testing

So, where is the flexibility? This seems to be less flexible than freetext. Well, there are quite a few things that you can do with CONTAINS that you cannot do with freetext:

a) You can specify multiple search terms using boolean operators like AND, OR, and AND NOT. Example:

select *
from dbo.open_tran_test
where contains(col2, ‘FORMSOF(INFLECTIONAL, test) OR test2’, language 1033)

b) In addition, CONTAINS also supports prefix searches using a wildcard. Example:

select *
from dbo.open_tran_test
where contains (col2, ‘”test*”‘)

This will return all the rows that have words in them that start with “test”.

c) The third advantage of CONTAINS predicate over FREETEXT is that it allows us to do proximity searches using the NEAR keyword. The NEAR keyword returns matches for words that are close to one another in the searched column. And by close, the words have to be in the proximity of each other within 50 words. Example:

select *
from dbo.open_tran_test
where contains (col2, ‘T2 NEAR tests’)

d) CONTAINS also allows us to assign different weightings to different words:

select *
from dbo.open_tran_test
where contains (col2, ‘ISABOUT (test weight (0.7), T2 weight (0.3))’);

In all these examples, we have used static strings for the searches for illustration purposes. One can easily use variables instead to do these searches.

CONTAINSTABLE has the same search capabilities like the CONTAINS predicate with the additional advantage of getting the KEY and the RANK columns back.

2 Responses to “FullText Searches in SQL Server – II”

  1. sql wildcard,sql wildcards,sql rollback,rollback sql,sql copy table,sql sum,sql mirroring,sum sql,sql cluster,sql server performance,truncate in sql,backup sql,backup sql database,backup sql server,sql performance,date functions in sql,sql over,trunc…

    […]FullText Searches in SQL Server – II « Systems Engineering and RDBMS[…]…

  2. sql wildcard,sql wildcards,sql rollback,rollback sql,sql copy table,sql sum,sql mirroring,sum sql,sql cluster,sql server performance,truncate in sql,backup sql,backup sql database,backup sql server,sql performance,date functions in sql,sql over,trunc…

    […]FullText Searches in SQL Server – II « Systems Engineering and RDBMS[…]…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: