This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
We're trying to load a text file into a SQL Server database and ran into a bit of a challenge. The text file contains answers to questions on a questionnaire and each line represents a single response, so it'd look something like this:
The trouble we face is that, in some cases, the questions may be "check all that apply", as opposed to just "check one answer". Those types of questions can have multiple answers and the file we receive notes them like this:
In this case, the user selected both answers 4 and 5 for question 456. In the end, we'd like to have this represented by two rows in the database, one with answer #4 and one with answer #5.
Anyone have any slick ideas for how we might make that happen using SQL?
Normalize the data so that the answers and the questions are in separate tables. For example, if a question has one answer, that would be 2 rows: one in the question table and one in the answer table. If a question has 2 answers, that would 3 rows: one in the question table and two in the answer table.
Scott Selikoff wrote:Normalize the data so that the answers and the questions are in separate tables. For example, if a question has one answer, that would be 2 rows: one in the question table and one in the answer table. If a question has 2 answers, that would 3 rows: one in the question table and two in the answer table.
That's exactly what we're shooting for. We have a question table and we have an answer table, set up just as you've described.
The trouble is that we're trying to load a text file in which two answers are represented by a single row into that table structure. We're using SSIS to load the data so we're essentially using SQL. We've considered using a cursor to step over the rows in the file and processing each one but the amount of data we're working with is so large that it prevents us from using a cursor - it's just too slow.
What we have now is a "load table" that looks like this:
...and we want to translate that into this structure:
So this issue isn't really about normalization, its about performance tuning.
You can build a stored procedure insert the data from a single string, although I find them too unruly and frustrating to maintain. You can also drop all your indexes/foreign key constraints and perform a bulk load of the data, then reinstate the restrictions. For large data sets this can often be helpful. Otherwise, there's not much you can do. The data has to get into the database somehow.