Reading Text Files and csv or xls files using SQL code
Posted by decipherinfosys on February 4, 2009
A reader recently asked that occasionally she gets some text files or csv/xls files from their sales team that she needs to load up and query upon in order to provide some analysis back to the teams. She used one of the SSIS packages that we had presented on our blog before in order to dynamically load up any text file or csv/xls file – and used to dump that data into separate tables in a separate schema. She wanted to get away from that process and just be able to query those files directly since after the analysis is done, she had to drop those tables.
An alternative is to use either linked servers or just use the openrowset or opendatasource queries in the case of SQL Server. In the case of Oracle, you can use external tables or using heterogenous database links.
Let’s take up an example to illustrate this – say we have a xls file called SalesData.xls which resides in the C:\Blog folder on the server where we are going to write this query. Note – the file needs to reside on that server and not on your desktop/laptop. Let’s write up queries using openrowset and opendatasource to query the file.
FROM OPENROWSET(‘Microsoft.Jet.OLEDB.4.0’, ‘Excel 8.0;Database=C:\Blog\SalesData.xls’, [SalesData_tab$])
FROM OPENDATASOURCE (‘Microsoft.Jet.OLEDB.4.0’, ‘Data Source=C:\Blog\SalesData.xls;Extended Properties=Excel 8.0’)…[SalesData_tab$]
And there are BULK load options also available using which you can specify the error file, the rows per batch, code page etc.. A format file can be specified as well in the case of flat file imports.
Please do note that usage of openrowset as well as opendatasource requires three considerations:
a) DisallowAdhocAccess registry option should be set to 0 for specific OLEDB provider entry.
b) “Ad Hoc Distributed Queries” advance option should be enabled using sp_configure system stored procedure for SQL Server instance.
c) Authentication Delegation must be configured.
You can read up more on Openrowset and Opendatsource from the online BOL version. These are pretty powerful features and gives you an alternative to loading and storing temporary data in the database. And since you can directly write select queries against such files, you can then join the output from those files to the data that resides in tables in your database and do your analysis without the need to create data structures in the database to store this temporary data.