Puzzler: Distinct counts on aggregate data using cube 
Author Message
 Puzzler: Distinct counts on aggregate data using cube

Hi all,

I have an odd need.  I have constructed a simple sample of what I am trying
to do.  It turns out that my sample is a bit stupid but here goes.

First, run creation sql at the bottom of this message to create the tables
and data that you need for my example.

Now, look at the results of the following.  Notice Goerge bought 2 jackets,
Bob 2 pants and 2 socks:

SELECT CustomerName, ItemName, COUNT(*)
FROM Purchase
JOIN Item ON Purchase.Item = Item.ItemId
JOIN Customer ON Purchase.Customer = Customer.CustomerId
GROUP BY CustomerName, ItemName

CustomerName ItemName
------------ ---------- -----------
Dan          gloves     1
Dan          hat        1
Goerge       hat        1
Goerge       jacket     2
Bob          pants      2
Dan          pants      1
Bob          shirt      1
Bob          socks      2

This query shows the total number of each item that was sold:

SELECT CASE WHEN GROUPING(ItemName) = 1 THEN 'All Items' ELSE ItemName END
AS ItemName, COUNT(*) AS NumberSold
FROM Purchase
JOIN Item ON Purchase.Item = Item.ItemId
GROUP BY ItemName WITH CUBE

ItemName   NumberSold
---------- -----------
gloves     1
hat        2
jacket     2
pants      3
shirt      1
socks      2
All Items  11

We can see how many distinct cutomers purchased each item with the
following:

SELECT ItemName, COUNT(DISTINCT Customer) AS NumberSold
FROM Purchase
JOIN Item ON Purchase.Item = Item.ItemId
GROUP BY ItemName

ItemName   NumberSold
---------- -----------
gloves     1
hat        2
jacket     1
pants      2
shirt      1
socks      1

What I need to do is find out both with one query.  I would like to be able
to do something like the following:

SELECT ItemName, CustomerName, COUNT(*) AS NumberSold, COUNT(DISTINCT
CustomerId)
FROM Purchase
JOIN Item ON Purchase.Item = Item.ItemId
JOIN Customer ON Purchase.Customer = Customer.CustomerId
GROUP BY ItemName, CustomerName WITH CUBE

This is not possible.  Taking into account that I need a lightning fast
query as I am dealing with a lot of data.  Also, my group by's contain a
much larger number of columns so even if a huge number of unions would
perform, they would be very inconvinient to build.

Thanks in advance for any help you can provide,
Matthew Field

****************************************************************
THE FOLLOWING IS THE SQL NEED TO RUN THE EXAMPLE QUERIES
****************************************************************

if exists (select * from sysobjects where id =
object_id(N'[dbo].[Customer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Customer]
GO

if exists (select * from sysobjects where id = object_id(N'[dbo].[Item]')
and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Item]
GO

if exists (select * from sysobjects where id =
object_id(N'[dbo].[Purchase]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Purchase]
GO

CREATE TABLE [dbo].[Customer] (
 [CustomerId] [int] NOT NULL ,
 [CustomerName] [char] (10) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Item] (
 [ItemId] [int] NOT NULL ,
 [ItemName] [char] (10) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Purchase] (
 [Item] [int] NOT NULL ,
 [Customer] [int] NOT NULL
) ON [PRIMARY]
GO

INSERT INTO Item VALUES (1, 'socks     ')
INSERT INTO Item VALUES (2, 'shirt     ')
INSERT INTO Item VALUES (3, 'pants     ')
INSERT INTO Item VALUES (4, 'jacket    ')
INSERT INTO Item VALUES (5, 'hat       ')
INSERT INTO Item VALUES (6, 'gloves    ')

INSERT INTO Customer VALUES (1, 'Bob       ')
INSERT INTO Customer VALUES (2, 'Dan       ')
INSERT INTO Customer VALUES (3, 'Frank     ')
INSERT INTO Customer VALUES (4, 'Goerge    ')
INSERT INTO Customer VALUES (5, 'Biff      ')

INSERT INTO Purchase VALUES (1, 1)
INSERT INTO Purchase VALUES (2, 1)
INSERT INTO Purchase VALUES (3, 1)
INSERT INTO Purchase VALUES (3, 2)
INSERT INTO Purchase VALUES (6, 2)
INSERT INTO Purchase VALUES (5, 2)
INSERT INTO Purchase VALUES (5, 4)
INSERT INTO Purchase VALUES (4, 4)
INSERT INTO Purchase VALUES (4, 4)
INSERT INTO Purchase VALUES (3, 1)
INSERT INTO Purchase VALUES (1, 1)



Sun, 23 Nov 2003 04:40:53 GMT
 Puzzler: Distinct counts on aggregate data using cube

Please give an example of the result set you would like.

David


Quote:
> Hi all,

> I have an odd need.  I have constructed a simple sample of what I am
trying
> to do.  It turns out that my sample is a bit stupid but here goes.

> First, run creation sql at the bottom of this message to create the tables
> and data that you need for my example.

> Now, look at the results of the following.  Notice Goerge bought 2
jackets,
> Bob 2 pants and 2 socks:

> SELECT CustomerName, ItemName, COUNT(*)
> FROM Purchase
> JOIN Item ON Purchase.Item = Item.ItemId
> JOIN Customer ON Purchase.Customer = Customer.CustomerId
> GROUP BY CustomerName, ItemName

> CustomerName ItemName
> ------------ ---------- -----------
> Dan          gloves     1
> Dan          hat        1
> Goerge       hat        1
> Goerge       jacket     2
> Bob          pants      2
> Dan          pants      1
> Bob          shirt      1
> Bob          socks      2

> This query shows the total number of each item that was sold:

> SELECT CASE WHEN GROUPING(ItemName) = 1 THEN 'All Items' ELSE ItemName END
> AS ItemName, COUNT(*) AS NumberSold
> FROM Purchase
> JOIN Item ON Purchase.Item = Item.ItemId
> GROUP BY ItemName WITH CUBE

> ItemName   NumberSold
> ---------- -----------
> gloves     1
> hat        2
> jacket     2
> pants      3
> shirt      1
> socks      2
> All Items  11

> We can see how many distinct cutomers purchased each item with the
> following:

> SELECT ItemName, COUNT(DISTINCT Customer) AS NumberSold
> FROM Purchase
> JOIN Item ON Purchase.Item = Item.ItemId
> GROUP BY ItemName

> ItemName   NumberSold
> ---------- -----------
> gloves     1
> hat        2
> jacket     1
> pants      2
> shirt      1
> socks      1

> What I need to do is find out both with one query.  I would like to be
able
> to do something like the following:

> SELECT ItemName, CustomerName, COUNT(*) AS NumberSold, COUNT(DISTINCT
> CustomerId)
> FROM Purchase
> JOIN Item ON Purchase.Item = Item.ItemId
> JOIN Customer ON Purchase.Customer = Customer.CustomerId
> GROUP BY ItemName, CustomerName WITH CUBE

> This is not possible.  Taking into account that I need a lightning fast
> query as I am dealing with a lot of data.  Also, my group by's contain a
> much larger number of columns so even if a huge number of unions would
> perform, they would be very inconvinient to build.

> Thanks in advance for any help you can provide,
> Matthew Field

> ****************************************************************
> THE FOLLOWING IS THE SQL NEED TO RUN THE EXAMPLE QUERIES
> ****************************************************************

> if exists (select * from sysobjects where id =
> object_id(N'[dbo].[Customer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[Customer]
> GO

> if exists (select * from sysobjects where id = object_id(N'[dbo].[Item]')
> and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[Item]
> GO

> if exists (select * from sysobjects where id =
> object_id(N'[dbo].[Purchase]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> drop table [dbo].[Purchase]
> GO

> CREATE TABLE [dbo].[Customer] (
>  [CustomerId] [int] NOT NULL ,
>  [CustomerName] [char] (10) NOT NULL
> ) ON [PRIMARY]
> GO

> CREATE TABLE [dbo].[Item] (
>  [ItemId] [int] NOT NULL ,
>  [ItemName] [char] (10) NOT NULL
> ) ON [PRIMARY]
> GO

> CREATE TABLE [dbo].[Purchase] (
>  [Item] [int] NOT NULL ,
>  [Customer] [int] NOT NULL
> ) ON [PRIMARY]
> GO

> INSERT INTO Item VALUES (1, 'socks     ')
> INSERT INTO Item VALUES (2, 'shirt     ')
> INSERT INTO Item VALUES (3, 'pants     ')
> INSERT INTO Item VALUES (4, 'jacket    ')
> INSERT INTO Item VALUES (5, 'hat       ')
> INSERT INTO Item VALUES (6, 'gloves    ')

> INSERT INTO Customer VALUES (1, 'Bob       ')
> INSERT INTO Customer VALUES (2, 'Dan       ')
> INSERT INTO Customer VALUES (3, 'Frank     ')
> INSERT INTO Customer VALUES (4, 'Goerge    ')
> INSERT INTO Customer VALUES (5, 'Biff      ')

> INSERT INTO Purchase VALUES (1, 1)
> INSERT INTO Purchase VALUES (2, 1)
> INSERT INTO Purchase VALUES (3, 1)
> INSERT INTO Purchase VALUES (3, 2)
> INSERT INTO Purchase VALUES (6, 2)
> INSERT INTO Purchase VALUES (5, 2)
> INSERT INTO Purchase VALUES (5, 4)
> INSERT INTO Purchase VALUES (4, 4)
> INSERT INTO Purchase VALUES (4, 4)
> INSERT INTO Purchase VALUES (3, 1)
> INSERT INTO Purchase VALUES (1, 1)



Sun, 23 Nov 2003 05:39:19 GMT
 Puzzler: Distinct counts on aggregate data using cube
Interestingly the RAC crosstab and pivoting utility
incorporates all your requirements:).


'Purchase join Item on Purchase.Item = Item.ItemId

'select ItemName,
cast(dbo.racrowfval(rowfunct,1) as varchar(3)) as [distinct cnt],
funct,_pvtcols_ from rac order by rd'

ItemName distinct cnt funct  Totals  Bob   Dan   Goerge
--------     ------------ ----- ------ ------ ------ ------
gloves       1                 cnt   1                   1
hat            2                 cnt   2                   1        1
jacket       1                 cnt   2                             2
pants        2                 cnt   3         2        1
shirt          1                 cnt   1         1
socks       1                 cnt   2         2
Totals      3                  cnt   11      5         3      3

stevie
Check out rac at:
www.angelfire.com/ny4/rac/
Inquire for new s2k beta.



Sun, 23 Nov 2003 06:46:53 GMT
 Puzzler: Distinct counts on aggregate data using cube
The result set that I would like would be this:

Item               Customer          NumberSold
----------     -------------         -----------
All Items      All Customers         8
All Items      Bob                       5
All Items      Dan                       3
All Items      Goerge                    3
gloves          All Customers             1
gloves          Dan                       1
hat               All Customers         2
hat               Dan                       1
hat               Goerge                    1
jacket           All Customers         1
jacket           Goerge                    2
pants            All Customers         2
pants            Bob                           2
pants            Dan                           1
shirt              All Customers             1
shirt              Bob                           1
socks            All Customers             1
socks             Bob                           2

Notice that the sock total is the number of unique users that purchased
socks.  This number is 1 even though the total number of socks purchased was
2 because Bob bought 2 pair.  This is the result set that I believe should
be produced if the following query could execute with COUNT(*) replaced with
COUNT(DISTINCT CustomerId):

SELECT CASE WHEN GROUPING(ItemName) = 1 THEN 'All Items' ELSE ItemName END
AS Item,
 CASE WHEN GROUPING(CustomerName) = 1 THEN 'All Customers' ELSE CustomerName
END AS Customer,
 COUNT(*) AS NumberSold
FROM Purchase
JOIN Item ON Purchase.Item = Item.ItemId
JOIN Customer ON Purchase.Customer = Customer.CustomerId
GROUP BY ItemName, CustomerName WITH CUBE
ORDER BY ItemName, CustomerName


Quote:
> Please give an example of the result set you would like.

> David


> > Hi all,

> > I have an odd need.  I have constructed a simple sample of what I am
> trying
> > to do.  It turns out that my sample is a bit stupid but here goes.

> > First, run creation sql at the bottom of this message to create the
tables
> > and data that you need for my example.

> > Now, look at the results of the following.  Notice Goerge bought 2
> jackets,
> > Bob 2 pants and 2 socks:

> > SELECT CustomerName, ItemName, COUNT(*)
> > FROM Purchase
> > JOIN Item ON Purchase.Item = Item.ItemId
> > JOIN Customer ON Purchase.Customer = Customer.CustomerId
> > GROUP BY CustomerName, ItemName

> > CustomerName ItemName
> > ------------ ---------- -----------
> > Dan          gloves     1
> > Dan          hat        1
> > Goerge       hat        1
> > Goerge       jacket     2
> > Bob          pants      2
> > Dan          pants      1
> > Bob          shirt      1
> > Bob          socks      2

> > This query shows the total number of each item that was sold:

> > SELECT CASE WHEN GROUPING(ItemName) = 1 THEN 'All Items' ELSE ItemName
END
> > AS ItemName, COUNT(*) AS NumberSold
> > FROM Purchase
> > JOIN Item ON Purchase.Item = Item.ItemId
> > GROUP BY ItemName WITH CUBE

> > ItemName   NumberSold
> > ---------- -----------
> > gloves     1
> > hat        2
> > jacket     2
> > pants      3
> > shirt      1
> > socks      2
> > All Items  11

> > We can see how many distinct cutomers purchased each item with the
> > following:

> > SELECT ItemName, COUNT(DISTINCT Customer) AS NumberSold
> > FROM Purchase
> > JOIN Item ON Purchase.Item = Item.ItemId
> > GROUP BY ItemName

> > ItemName   NumberSold
> > ---------- -----------
> > gloves     1
> > hat        2
> > jacket     1
> > pants      2
> > shirt      1
> > socks      1

> > What I need to do is find out both with one query.  I would like to be
> able
> > to do something like the following:

> > SELECT ItemName, CustomerName, COUNT(*) AS NumberSold, COUNT(DISTINCT
> > CustomerId)
> > FROM Purchase
> > JOIN Item ON Purchase.Item = Item.ItemId
> > JOIN Customer ON Purchase.Customer = Customer.CustomerId
> > GROUP BY ItemName, CustomerName WITH CUBE

> > This is not possible.  Taking into account that I need a lightning fast
> > query as I am dealing with a lot of data.  Also, my group by's contain a
> > much larger number of columns so even if a huge number of unions would
> > perform, they would be very inconvinient to build.

> > Thanks in advance for any help you can provide,
> > Matthew Field

> > ****************************************************************
> > THE FOLLOWING IS THE SQL NEED TO RUN THE EXAMPLE QUERIES
> > ****************************************************************

> > if exists (select * from sysobjects where id =
> > object_id(N'[dbo].[Customer]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
> > drop table [dbo].[Customer]
> > GO

> > if exists (select * from sysobjects where id =

object_id(N'[dbo].[Item]')

- Show quoted text -

Quote:
> > and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> > drop table [dbo].[Item]
> > GO

> > if exists (select * from sysobjects where id =
> > object_id(N'[dbo].[Purchase]') and OBJECTPROPERTY(id, N'IsUserTable') =
1)
> > drop table [dbo].[Purchase]
> > GO

> > CREATE TABLE [dbo].[Customer] (
> >  [CustomerId] [int] NOT NULL ,
> >  [CustomerName] [char] (10) NOT NULL
> > ) ON [PRIMARY]
> > GO

> > CREATE TABLE [dbo].[Item] (
> >  [ItemId] [int] NOT NULL ,
> >  [ItemName] [char] (10) NOT NULL
> > ) ON [PRIMARY]
> > GO

> > CREATE TABLE [dbo].[Purchase] (
> >  [Item] [int] NOT NULL ,
> >  [Customer] [int] NOT NULL
> > ) ON [PRIMARY]
> > GO

> > INSERT INTO Item VALUES (1, 'socks     ')
> > INSERT INTO Item VALUES (2, 'shirt     ')
> > INSERT INTO Item VALUES (3, 'pants     ')
> > INSERT INTO Item VALUES (4, 'jacket    ')
> > INSERT INTO Item VALUES (5, 'hat       ')
> > INSERT INTO Item VALUES (6, 'gloves    ')

> > INSERT INTO Customer VALUES (1, 'Bob       ')
> > INSERT INTO Customer VALUES (2, 'Dan       ')
> > INSERT INTO Customer VALUES (3, 'Frank     ')
> > INSERT INTO Customer VALUES (4, 'Goerge    ')
> > INSERT INTO Customer VALUES (5, 'Biff      ')

> > INSERT INTO Purchase VALUES (1, 1)
> > INSERT INTO Purchase VALUES (2, 1)
> > INSERT INTO Purchase VALUES (3, 1)
> > INSERT INTO Purchase VALUES (3, 2)
> > INSERT INTO Purchase VALUES (6, 2)
> > INSERT INTO Purchase VALUES (5, 2)
> > INSERT INTO Purchase VALUES (5, 4)
> > INSERT INTO Purchase VALUES (4, 4)
> > INSERT INTO Purchase VALUES (4, 4)
> > INSERT INTO Purchase VALUES (3, 1)
> > INSERT INTO Purchase VALUES (1, 1)



Sun, 23 Nov 2003 21:47:51 GMT
 Puzzler: Distinct counts on aggregate data using cube
I have dowloaded your tool and the exec that you have listed below does not

parameters.  What am I doing wrong?

Matt


Quote:
> Interestingly the RAC crosstab and pivoting utility
> incorporates all your requirements:).





> 'Purchase join Item on Purchase.Item = Item.ItemId
> join Customer on Purchase.Customer = Customer.CustomerId',


> 'select ItemName,
> cast(dbo.racrowfval(rowfunct,1) as varchar(3)) as [distinct cnt],
> funct,_pvtcols_ from rac order by rd'

> ItemName distinct cnt funct  Totals  Bob   Dan   Goerge
> --------     ------------ ----- ------ ------ ------ ------
> gloves       1                 cnt   1                   1
> hat            2                 cnt   2                   1        1
> jacket       1                 cnt   2                             2
> pants        2                 cnt   3         2        1
> shirt          1                 cnt   1         1
> socks       1                 cnt   2         2
> Totals      3                  cnt   11      5         3      3

> stevie
> Check out rac at:
> www.angelfire.com/ny4/rac/
> Inquire for new s2k beta.



Sun, 23 Nov 2003 22:40:22 GMT
 Puzzler: Distinct counts on aggregate data using cube



Quote:
> I have dowloaded your tool and the exec that you have listed below does
not

> parameters.  What am I doing wrong?

This is based on the new s2k beta.I will e-mail the script to you.

stevie



Sun, 23 Nov 2003 23:35:30 GMT
 
 [ 6 post ] 

 Relevant Pages 

1. distinct count in Aggregate function

2. Distinct Count Aggregate Function?

3. Problem with Distinct Count Aggregate function

4. CUBE cannot compute distinct aggregates.

5. distinct count in Aggregate function?

6. Distinct Count - Aggregate Function on Measures

7. using SELECT COUNT(*) to count distinct records

8. DISTINCT COUNT in Cube Editor on Dim columns

9. DISTINCT COUNT - virtual cubes

10. Process the Cube - Distinct Count

11. Counting Distinct in OLAP Cube problem?

12. only 1 distinct count measure per cube?


 
Powered by phpBB® Forum Software