questions about basic data warehouse concept 
Author Message
 questions about basic data warehouse concept

Dear all,

I have already implemented a data warehouse for my company, but I
still don't understand some basic concepts introduced by Ralph Kimball
and many others, can anyone help me out?

1.The meaning and justification of an Operational Data Store or ODS:
as far as I know, an ODS is characteristic of:
a.It is current, not historical
b.It is operational, it is subject to changes
c.It can serves as a staging area. Data are transported from OLTP's to
the warehouse.
d.It is integrated. This means there is only one ODS to accept data
from possibly multiple OLTP data sources; these data are not only
extracted and loaded, but also transformed in the process.
e.It is not used as analysis, data mining, querying, reporting or
anything. I cannot see any reasons we need an ODS, nor do I know the
purpose of its introduction.

2.The myth of the star schema and the snowflake architecture:
We use Cognos Power Play as the OLAP tool, and Cognos Impromptu as the
reporting tool, both support native connection to Oracle. We have
created Power Cubes for marketing and sales users. Power Cubes are the
so-called multi-dimensional databases, which the star schema and the
snowflake architecture are meant for. Power Cubes are separate objects
from the database, they cannot be directly manipulated by SQL or
anything, only be sliced and diced by Power Play client program. The
star schema and the snowflake architecture are actually embeded in a
Power Play Transformer model, the original database structure is
rather irrelavent to the performance of the Power Cubes. Even if the
database structure is fully normalized, it is the transforming process
that is impacted, not the Power cubes themselves. Therefore, the star
schema and the snowflake architecture, which are extensively
emphasized by many books and articles, are completely useless when
utilizing a tool like Cognos Power Play. Maybe inside Power Play are
start schema and snowflake architecture, but who knows?

3.How many data marts are there:
We would naturally assume one Power cube is one data mart, because one
Power cube is one multi-dimensional database. However, data marts
should be subject-oriented, remember? Our marketing cubes and sales
cubes actually share the same 'subject', the sales, they focus on
partly different dimensions and measures though. If we have only one
subject - sales, but two cubes for marketing and sales respectively,
how many data marts do we have?  Furthermore, there are limitations in
the creation of a Power cube, such as max. 50 million source data and
200M max. size of the cube file. (The latter is a practical
limitation, not one of the tool, a too big cube can be very slow)
Therefore, we might have to make five cubes for one department, then
again how many data marts do we have? One or five?

Your comments are highly appreciated, thaks in advance.

Dino



Thu, 26 Feb 2004 21:35:40 GMT
 questions about basic data warehouse concept


Quote:
> Dear all,

> I have already implemented a data warehouse for my company, but I
> still don't understand some basic concepts introduced by Ralph Kimball
> and many others, can anyone help me out?

> 1.The meaning and justification of an Operational Data Store or ODS:
> as far as I know, an ODS is characteristic of:
> a.It is current, not historical
> b.It is operational, it is subject to changes
> c.It can serves as a staging area. Data are transported from OLTP's to
> the warehouse.
> d.It is integrated. This means there is only one ODS to accept data
> from possibly multiple OLTP data sources; these data are not only
> extracted and loaded, but also transformed in the process.
> e.It is not used as analysis, data mining, querying, reporting or
> anything. I cannot see any reasons we need an ODS, nor do I know the
> purpose of its introduction.

ODSs are fairly specialized and not many companies need them. They are
typically used if you want to do near real-time operational reporting on
combined data from multiple OLTP systems.

- Show quoted text -

Quote:

> 2.The myth of the star schema and the snowflake architecture:
> We use Cognos Power Play as the OLAP tool, and Cognos Impromptu as the
> reporting tool, both support native connection to Oracle. We have
> created Power Cubes for marketing and sales users. Power Cubes are the
> so-called multi-dimensional databases, which the star schema and the
> snowflake architecture are meant for. Power Cubes are separate objects
> from the database, they cannot be directly manipulated by SQL or
> anything, only be sliced and diced by Power Play client program. The
> star schema and the snowflake architecture are actually embeded in a
> Power Play Transformer model, the original database structure is
> rather irrelavent to the performance of the Power Cubes. Even if the
> database structure is fully normalized, it is the transforming process
> that is impacted, not the Power cubes themselves. Therefore, the star
> schema and the snowflake architecture, which are extensively
> emphasized by many books and articles, are completely useless when
> utilizing a tool like Cognos Power Play. Maybe inside Power Play are
> start schema and snowflake architecture, but who knows?

Kimball's books and teachings assume you're using a relational database.
While this might be true in some cases, most of the popular tools are
actually MOLAPs which use an optimized compressed cube structure that isn't
even logically a star schema. Remember that the whole star/snowflake idea is
to make a relational database simulate a cube; you don't have to do that if
you have a real multidimensional database (like PowerPlay and lots of
others). And once you do have a cube, you don't need to worry about
structuring the relational feeder systems for optimum query performance.

These are some of of the tools that (usually) service queries from MOLAP
cubes, not directly from relational databases (and which are therefore
ignored by Kimball): Analysis Services, BusinessObjects, Essbase, Express,
Gentia, Holos, iTM1, PowerPlay.

Quote:

> 3.How many data marts are there:
> We would naturally assume one Power cube is one data mart, because one
> Power cube is one multi-dimensional database. However, data marts
> should be subject-oriented, remember? Our marketing cubes and sales
> cubes actually share the same 'subject', the sales, they focus on
> partly different dimensions and measures though. If we have only one
> subject - sales, but two cubes for marketing and sales respectively,
> how many data marts do we have?  Furthermore, there are limitations in
> the creation of a Power cube, such as max. 50 million source data and
> 200M max. size of the cube file. (The latter is a practical
> limitation, not one of the tool, a too big cube can be very slow)
> Therefore, we might have to make five cubes for one department, then
> again how many data marts do we have? One or five?

PowerPlay needs you to split a logical data mart into multiple powercubes
because it doesn't have a very scalable cube engine. Other MOLAP servers
don't have this limit, so one logically linked cube set can serve the needs
of a variety of people with interest in the same subject area, regardless of
the data volumes or dimensional complexity. Microsoft Analysis Services and
Oracle Express would be examples of very scalable MOLAP servers that can
handle hundreds of gigabytes of input data (of course, MOLAPs store
multidimensional data in a much smaller volume than the input data in a
relational database).

Nigel Pendse
OLAP Solutions
http://www.olapreport.com



Fri, 27 Feb 2004 21:28:55 GMT
 questions about basic data warehouse concept

Quote:

> Dear all,

> I have already implemented a data warehouse for my company, but I
> still don't understand some basic concepts introduced by Ralph Kimball
> and many others, can anyone help me out?

> 1.The meaning and justification of an Operational Data Store or ODS:
> as far as I know, an ODS is characteristic of:
> a.It is current, not historical
> b.It is operational, it is subject to changes
> c.It can serves as a staging area. Data are transported from OLTP's to
> the warehouse.
> d.It is integrated. This means there is only one ODS to accept data
> from possibly multiple OLTP data sources; these data are not only
> extracted and loaded, but also transformed in the process.
> e.It is not used as analysis, data mining, querying, reporting or
> anything. I cannot see any reasons we need an ODS, nor do I know the
> purpose of its introduction.

The above is correct.  The difference between Kimball and Inmon is the
ODS.  Kimball would created and ODS and throw it away.  Inmon would
update the Enterprise DW and populate all marts from there.  Inmon
does not like to throw things away.  However, in most large DWs, we
use the ODS to see what has changed, write that to our ODS, update the
DW, then toss the ODS.

- Show quoted text -

Quote:

> 2.The myth of the star schema and the snowflake architecture:
> We use Cognos Power Play as the OLAP tool, and Cognos Impromptu as the
> reporting tool, both support native connection to Oracle. We have
> created Power Cubes for marketing and sales users. Power Cubes are the
> so-called multi-dimensional databases, which the star schema and the
> snowflake architecture are meant for. Power Cubes are separate objects
> from the database, they cannot be directly manipulated by SQL or
> anything, only be sliced and diced by Power Play client program. The
> star schema and the snowflake architecture are actually embeded in a
> Power Play Transformer model, the original database structure is
> rather irrelavent to the performance of the Power Cubes. Even if the
> database structure is fully normalized, it is the transforming process
> that is impacted, not the Power cubes themselves. Therefore, the star
> schema and the snowflake architecture, which are extensively
> emphasized by many books and articles, are completely useless when
> utilizing a tool like Cognos Power Play. Maybe inside Power Play are
> start schema and snowflake architecture, but who knows?

Not useless.  A star/snowflake is a dimensional model and is
relational.  As opposed to an ODS.  All query and reporting is done
against the star.  Now a lot of us are moving away from the
start/snowflake in the enterprise DW, but they are still essential for
the marts.  All tools work best against the star schema.  Don't
confuse cubes with marts.  Cubes are built from marts.  They are a
database, but since they can only be accessed via the PP tool, they
are not marts.  If you build cubes against the enterprise DW and
bypass the marts,then yes, the star schemas are useless as far as the
cubes are concerned.  However, if you build cubes against the mart
then they are not.  For instance when you drill through from PP to
Impromptu, you will utilize the star.

Quote:

> 3.How many data marts are there:
> We would naturally assume one Power cube is one data mart, because one
> Power cube is one multi-dimensional database. However, data marts
> should be subject-oriented, remember? Our marketing cubes and sales
> cubes actually share the same 'subject', the sales, they focus on
> partly different dimensions and measures though. If we have only one
> subject - sales, but two cubes for marketing and sales respectively,
> how many data marts do we have?  Furthermore, there are limitations in
> the creation of a Power cube, such as max. 50 million source data and
> 200M max. size of the cube file. (The latter is a practical
> limitation, not one of the tool, a too big cube can be very slow)
> Therefore, we might have to make five cubes for one department, then
> again how many data marts do we have? One or five?

A mart is a set of relational tables.  A cube is a highly indexed flat
file.  It is not a mart.  It sounds like you have a sales mart and a
marketing mart, which together is your DW.   This is a common
practice.  Now you have cubes built against the DW (combo sales and
marketing).  This is common also.  But again, the cubes are not marts.
 They are highly indexded flat files, not relational tables.


Fri, 27 Feb 2004 23:18:17 GMT
 questions about basic data warehouse concept
On Mon, 10 Sep 2001 14:28:55 +0100, "Nigel Pendse"

Quote:

>> 3.How many data marts are there:
>> We would naturally assume one Power cube is one data mart, because one
>> Power cube is one multi-dimensional database. However, data marts
>> should be subject-oriented, remember? Our marketing cubes and sales
>> cubes actually share the same 'subject', the sales, they focus on
>> partly different dimensions and measures though. If we have only one
>> subject - sales, but two cubes for marketing and sales respectively,
>> how many data marts do we have?  Furthermore, there are limitations in
>> the creation of a Power cube, such as max. 50 million source data and
>> 200M max. size of the cube file. (The latter is a practical
>> limitation, not one of the tool, a too big cube can be very slow)
>> Therefore, we might have to make five cubes for one department, then
>> again how many data marts do we have? One or five?

>PowerPlay needs you to split a logical data mart into multiple powercubes
>because it doesn't have a very scalable cube engine. Other MOLAP servers
>don't have this limit, so one logically linked cube set can serve the needs
>of a variety of people with interest in the same subject area, regardless of
>the data volumes or dimensional complexity. Microsoft Analysis Services and
>Oracle Express would be examples of very scalable MOLAP servers that can
>handle hundreds of gigabytes of input data (of course, MOLAPs store
>multidimensional data in a much smaller volume than the input data in a
>relational database).

I am so grateful to your informative reply. In addition to
scalability, Power Play has another big problem: some measures'
failing to roll-up, so as not to be included in the cube. Examples are
"order count", "dealer count", "sales per order", "sales per dealer",
"units per order", "units per dealer" (dealers are membership
customers for a direct-selling company), etc. (note) I hope MOLAP
servers can resolve this problem too, since these are critical
measures for both marketing and sales departments. I also wonder how
an MOLAP server works: (1) does it work as a middle tier/an
application server? (2) does it work like a materialized view with
defineable refresh frequency? (3) does it allow flexible query with
SQL or something?

As for the data mart part, I wonder what a 'subject' is defined.
Actually the marketing and sales data mart share the same fact table,
they just include partly different measures and dimensions, do you
think these are two 'logical' data marts or one 'logical' data mart?

Quote:

>Nigel Pendse
>OLAP Solutions
>http://www.olapreport.com

BTW, can you show me how to pronounce 'Pendse'? :-)

Thanks again,
Dino

note: there is 'category count' in modeling measures in Power Play,
but it requires the related column become a dimension so that its
categories (distict values of the dimension) can be counted. However,
when there are too many orders and dealers, they will explod the model
when becoming dimensions. Power Play doesn't allow category count
without a corresponding dimension.



Sat, 28 Feb 2004 00:47:10 GMT
 questions about basic data warehouse concept
Hi, Dino;

 Can't help but ask you to get some assistance in your "cube"
modeling. The kinds of measures you're talking about work quite well
in Powerplay, and in fact it has many "built-in" functions and
features to handle these kinds of things.

 Easy things are easy to do. Hard things are hard to do. I don't care
what product you're talking about, the technology choices made by the
designers affect just which side of the line (easy or hard) a
particular feature lies. Depending on how you choose to solve the
problem with the product, that can also complicate the solution
unnecessarily.

 I'm thinking that you might be having issues because of mixing
granularity in the supporting queries for the cube. This can, if not
handled, cause strangeness.

Regards,

Buck Webb

On Tue, 11 Sep 2001 00:47:10 +0800, Dino Hsu

Quote:

>On Mon, 10 Sep 2001 14:28:55 +0100, "Nigel Pendse"

<snip>

>I am so grateful to your informative reply. In addition to
>scalability, Power Play has another big problem: some measures'
>failing to roll-up, so as not to be included in the cube. Examples are
>"order count", "dealer count", "sales per order", "sales per dealer",
>"units per order", "units per dealer" (dealers are membership
>customers for a direct-selling company), etc. (note) I hope MOLAP
>servers can resolve this problem too, since these are critical
>measures for both marketing and sales departments. I also wonder how
>an MOLAP server works: (1) does it work as a middle tier/an
>application server? (2) does it work like a materialized view with
>defineable refresh frequency? (3) does it allow flexible query with
>SQL or something?

>As for the data mart part, I wonder what a 'subject' is defined.
>Actually the marketing and sales data mart share the same fact table,
>they just include partly different measures and dimensions, do you
>think these are two 'logical' data marts or one 'logical' data mart?

<snip>

>note: there is 'category count' in modeling measures in Power Play,
>but it requires the related column become a dimension so that its
>categories (distict values of the dimension) can be counted. However,
>when there are too many orders and dealers, they will explod the model
>when becoming dimensions. Power Play doesn't allow category count
>without a corresponding dimension.



Sat, 28 Feb 2004 02:19:07 GMT
 questions about basic data warehouse concept

Quote:
> On Mon, 10 Sep 2001 14:28:55 +0100, "Nigel Pendse"

> >PowerPlay needs you to split a logical data mart into multiple powercubes
> >because it doesn't have a very scalable cube engine. Other MOLAP servers
> >don't have this limit, so one logically linked cube set can serve the
needs
> >of a variety of people with interest in the same subject area, regardless
of
> >the data volumes or dimensional complexity. Microsoft Analysis Services
and
> >Oracle Express would be examples of very scalable MOLAP servers that can
> >handle hundreds of gigabytes of input data (of course, MOLAPs store
> >multidimensional data in a much smaller volume than the input data in a
> >relational database).

> I am so grateful to your informative reply. In addition to
> scalability, Power Play has another big problem: some measures'
> failing to roll-up, so as not to be included in the cube. Examples are
> "order count", "dealer count", "sales per order", "sales per dealer",
> "units per order", "units per dealer" (dealers are membership
> customers for a direct-selling company), etc. (note) I hope MOLAP
> servers can resolve this problem too, since these are critical
> measures for both marketing and sales departments. I also wonder how
> an MOLAP server works: (1) does it work as a middle tier/an
> application server? (2) does it work like a materialized view with
> defineable refresh frequency? (3) does it allow flexible query with
> SQL or something?

Most/all MOLAP servers, including PowerPlay, can deal with semi-additive and
calculated measures. You just have to know how to use the product.

MOLAP servers are sometimes read-only, refreshed from one or more feeder
systems on a periodic basis. But many can also handle read-write apps where
some of the data is input by users (eg, for plans and forecasts).

MOLAP servers don't use SQL for queries (though, of course, they do use SQL
to read data from RDBMSs). But they do allow very flexible queries using a
varuety of APIs. There isn't a sinfle standard, but OLE DB for OLAP is used
by several.



Sat, 28 Feb 2004 04:18:46 GMT
 questions about basic data warehouse concept
On Mon, 10 Sep 2001 21:18:46 +0100, "Nigel Pendse"

I would like to confirm a few points:

Quote:

>Most/all MOLAP servers, including PowerPlay, can deal with semi-additive and
>calculated measures. You just have to know how to use the product.

Can you explain what semi-additive means? Maybe it is the good feature
I have been looking for.
A measure can roll up through differnt dimensions and levels, a
dimension is like a building and a level is like a layer (floor).
Normally a measure goes up (rolls up) one layer (say from 1F to 2F)
with addition, while average, maximum, minimun, (time series) and
count are also possible. There are two major limitation in Cognos
Trasnformer 6.5:
1.Only one roll-up method for one measure is supported, we cannot use
different roll-up methods for different dimensions or levels which the
measure goes against.
2.The 'category count' is a good feature for measures such as "dealer
count" or "order count", but it is impratical when the measure has too
many categories (more than 10,000).
Of course, the vendor has to solve this problem, the reason I ask here
is to know whether people encounter the same problem with tools other
than Cognos or tools other than cubes.

Quote:

>MOLAP servers are sometimes read-only, refreshed from one or more feeder
>systems on a periodic basis. But many can also handle read-write apps where
>some of the data is input by users (eg, for plans and forecasts).

Do you mean the data are 'updateable' by 'read-write'? Imaginably, an
MOLAP server is supposed to be more flexible than a flat file cube in
in terms of updateability.
Cubes created by Cognos Transformer are 'insertable' but not
'updateable'. It's incremental creation requires the administrator to
take care of problem of possible re-application of the same data.

Quote:

>MOLAP servers don't use SQL for queries (though, of course, they do use SQL
>to read data from RDBMSs). But they do allow very flexible queries using a
>varuety of APIs. There isn't a sinfle standard, but OLE DB for OLAP is used
>by several.

What is OLD DB, I saw this terms several times. Does it mean databases
such as Oracle, DB2, SQL server, Sybase?

Thanks again,
Dino



Sat, 28 Feb 2004 11:44:04 GMT
 
 [ 7 post ] 

 Relevant Pages 

1. data warehousing concept is it a successful concept?

2. 0-National-255080--Data Warehousing-DBMS-ORACLE-Visual Basic-ESSBASE-Hyperion-Data Warehouse Architect

3. Data warehousing experts needed to build/validate concepts test

4. Data Warehouse concept

5. general question about basic trigger concepts

6. US-CA-LA SYBASE DATA WAREHOUSE VISUAL BASIC POWERBUILDER DATA MODELING ANALYSIS ESSBASE

7. US-CA-LA SYBASE DATA WAREHOUSE VISUAL BASIC POWERBUILDER DATA MODELING ANALYSIS ESSBASE

8. MD-Annapolis Junction-240349--Data Warehousing-ORACLE-SQL-PL/SQL-Data Modeling-Data Warehouse Analysts

9. Oracle data warehousing and MSSQL data warehousing

10. VA-VA-103908--Data Warehousing-Software/Hardware-UNIX-ORACLE-Windows-DATA WAREHOUSE SPECIALIST, NCI

11. OH-Akron-111080--Data Warehousing-ORACLE-Data Warehouse Analyst


 
Powered by phpBB® Forum Software