A Varchar(1000) field vs 1000 Bit fields 
Author Message
 A Varchar(1000) field vs 1000 Bit fields

Recently we are working on a project that involves storing info about the
user's interests. The user is allowed to choose multiple interests and we
want to be able to query out any users to a specific interest.

Right now there seems to be two ways. Let's assume there are 1000 different
insterests. One way is to have a varchar(1000) or char(1000) field with a
"10101010..."-like string to mark the user interests and we can use
CHARINDEX function for queries. The other way is to use 1000 bit fields. I
would say the latter solution would be faster in executing the queries.

My questions are:
Which scheme do you think is better? Is there any other and more efficient
solutions to achieve this?

Looking forward to your inputs.
TH



Mon, 21 Nov 2005 16:30:37 GMT
 A Varchar(1000) field vs 1000 Bit fields

Queries against a CHAR(1000) will be more expensive, e.g.

    SELECT LikesWalksOnTheBeach FROM interestsTable WHERE userID=1

becomes a string nightmare,

    SELECT LikesOnTheBeach = CONVERT(BIT, SUBSTRING(bigColumn, 445, 1))
     FROM interestsTable WHERE userID=1

Also updating is going to be an even bigger nightmare, as you can imagine...

I would opt for the BIT fields.  As gruesome as the CREATE TABLE statement
is, it's going to be much easier to work with the data... and you're going
to have to deal with all 1000 values individually in some portion of your
application anyway, so whether you do that in T-SQL or application code
doesn't matter too much...

Also, you *can* (but won't always) get 8 bits into 1 byte so, depending on
the number of rows, you might see some space savings as well.


Quote:
> Recently we are working on a project that involves storing info about the
> user's interests. The user is allowed to choose multiple interests and we
> want to be able to query out any users to a specific interest.

> Right now there seems to be two ways. Let's assume there are 1000
different
> insterests. One way is to have a varchar(1000) or char(1000) field with a
> "10101010..."-like string to mark the user interests and we can use
> CHARINDEX function for queries. The other way is to use 1000 bit fields. I
> would say the latter solution would be faster in executing the queries.

> My questions are:
> Which scheme do you think is better? Is there any other and more efficient
> solutions to achieve this?

> Looking forward to your inputs.
> TH



Mon, 21 Nov 2005 16:37:50 GMT
 A Varchar(1000) field vs 1000 Bit fields

Of course, normalizing it into First Normal Form would make things easier still ...

--
Tom

---------------------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional

www.pinnaclepublishing.com/sql


Queries against a CHAR(1000) will be more expensive, e.g.

    SELECT LikesWalksOnTheBeach FROM interestsTable WHERE userID=1

becomes a string nightmare,

    SELECT LikesOnTheBeach = CONVERT(BIT, SUBSTRING(bigColumn, 445, 1))
     FROM interestsTable WHERE userID=1

Also updating is going to be an even bigger nightmare, as you can imagine...

I would opt for the BIT fields.  As gruesome as the CREATE TABLE statement
is, it's going to be much easier to work with the data... and you're going
to have to deal with all 1000 values individually in some portion of your
application anyway, so whether you do that in T-SQL or application code
doesn't matter too much...

Also, you *can* (but won't always) get 8 bits into 1 byte so, depending on
the number of rows, you might see some space savings as well.


Quote:
> Recently we are working on a project that involves storing info about the
> user's interests. The user is allowed to choose multiple interests and we
> want to be able to query out any users to a specific interest.

> Right now there seems to be two ways. Let's assume there are 1000
different
> insterests. One way is to have a varchar(1000) or char(1000) field with a
> "10101010..."-like string to mark the user interests and we can use
> CHARINDEX function for queries. The other way is to use 1000 bit fields. I
> would say the latter solution would be faster in executing the queries.

> My questions are:
> Which scheme do you think is better? Is there any other and more efficient
> solutions to achieve this?

> Looking forward to your inputs.
> TH



Mon, 21 Nov 2005 16:41:24 GMT
 A Varchar(1000) field vs 1000 Bit fields
Both of your schemes make queries very complex across all those columns and
and you will have problems when you want to add new interests. Here's an
alternative:

CREATE TABLE Interests (interestid INTEGER PRIMARY KEY, interestdescription
VARCHAR(35) UNIQUE NOT NULL)

CREATE TABLE Users (userid INTEGER PRIMARY KEY, username VARCHAR(35) NOT
NULL)

CREATE TABLE UserInterests (userid INTEGER NOT NULL REFERENCES Users
(userid), interestid INTEGER NOT NULL REFERENCES interests (interestid))

--
David Portas
------------
Please reply only to the newsgroup
--


Quote:
> Recently we are working on a project that involves storing info about the
> user's interests. The user is allowed to choose multiple interests and we
> want to be able to query out any users to a specific interest.

> Right now there seems to be two ways. Let's assume there are 1000
different
> insterests. One way is to have a varchar(1000) or char(1000) field with a
> "10101010..."-like string to mark the user interests and we can use
> CHARINDEX function for queries. The other way is to use 1000 bit fields. I
> would say the latter solution would be faster in executing the queries.

> My questions are:
> Which scheme do you think is better? Is there any other and more efficient
> solutions to achieve this?

> Looking forward to your inputs.
> TH



Mon, 21 Nov 2005 16:45:43 GMT
 A Varchar(1000) field vs 1000 Bit fields
Oops! Missed the PK off that last table:

CREATE TABLE UserInterests (userid INTEGER NOT NULL REFERENCES Users
(userid), interestid INTEGER NOT NULL REFERENCES interests (interestid),
PRIMARY KEY (userid, interestid))

--
David Portas
------------
Please reply only to the newsgroup
--



Mon, 21 Nov 2005 16:48:38 GMT
 A Varchar(1000) field vs 1000 Bit fields
Quote:
>> Of course, normalizing it into First Normal Form would make things easier

still ...

True enough...



Mon, 21 Nov 2005 16:46:45 GMT
 A Varchar(1000) field vs 1000 Bit fields

Quote:
> I would opt for the BIT fields.

Does it mean I have 1000(!!!) columns on a table?

regards

fabrizio



Mon, 21 Nov 2005 16:53:51 GMT
 A Varchar(1000) field vs 1000 Bit fields
Questions:

1. What will you do when you have an additional interest in the future?
Alter the table?
2. How do you know which position in the string represent a specific
interest? Is it represented in another table or documented in your data
dictionary?
3. What kind of queries do you run against this table? What are your data
requirements? How does your client applications/report writers use this
data?
4. How often do you update the response to a particular interest? Do you
delete a response?
5. If you are using 1/0 how do you distingush a negative response (say 0)
from a lack of response?

Assuming answers for the above questions, I would recommend the following
schema,

CREATE TABLE Interests (
     InterestId INT NOT NULL PRIMARY KEY,
     Descr VARCHAR(50) NOT NULL);
CREATE TABLE Users (
     UserId INT NOT NULL PRIMARY KEY,
     ....);
CREATE TABLE Responses (
     UserId INT NOT NULL
          REFERENCES Users(UserId)
          ON UPDATE CASCADE
          ON DELETE CASCADE,
     Interest_id INT NOT NULL
          REFERENCES Interests(Interest_id)
          ON UPDATE CASCADE
          ON DELETE CASCADE,
     Response CHAR(1) NOT NULL
          DEFAULT ('U')
     CHECK(Response IN ('Y', 'N', 'U')),
 PRIMARY KEY (UserId, InterestId));

Guessing your data requirements & business model, this normalized schema may
provide maximum flexibility for your queries.

--
- Anith
( Please reply to newsgroups only )



Mon, 21 Nov 2005 16:59:27 GMT
 A Varchar(1000) field vs 1000 Bit fields
Thank you all for your invaluable inputs and comments.
I think normalization is what I should do.

Thanks,
TH



Mon, 21 Nov 2005 20:19:21 GMT
 A Varchar(1000) field vs 1000 Bit fields

Good decision.  You'll use less disk space, it'll run faster and your code will be easier to maintain.  If you need more help, we're here.

--
Tom

---------------------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional

www.pinnaclepublishing.com/sql

Thank you all for your invaluable inputs and comments.
I think normalization is what I should do.

Thanks,
TH



Mon, 21 Nov 2005 20:27:46 GMT
 
 [ 10 post ] 

 Relevant Pages 

1. A2K crashes with text fields over 1000 chars

2. Selfcreated Table with AutoInc-Field starting at 1000?

3. Table with ftAutoInc-Field starting at 1000?

4. Help with relationships Too slow to link databases with 1000+ fields

5. can not fetch more than 255 chars from varchar(1000) column

6. varchar(1000) being truncated

7. Varchar(1000) returns 255 char only in SQL 7

8. Problem inserting more then 1000 chars in varchar(4000) column

9. Varchar (4096) only accepting 1000 characters

10. varchar 8000 maxing out at 1000 char (SQL 2000)

11. can only fetch first 255 chars from varchar(1000) column

12. subset_filterclause in sp_addmergearticle allows 2000 chars vs sysmergearticles allows 1000 chars


 
Powered by phpBB® Forum Software