For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation 
Author Message
 For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation



Quote:
>Heres a question for the DB Gurus (you may need to read it closely)
>An application has 10 threads, each of them reading about 1 gigabyte
>records off the network in a constant stream. Each record is about
>100 bytes in length and has a composite key (multiple columns make
>the unique key).
>This data needs to be stored in the database as is.
>Additionally, this data needs to be summarized for daily, monthly,
>quarterly and yearly totals.

The answer is "Do the Daily Aggregation On the Fly" and the others in
batch.

I assume you are not going to accumulate the individual records for
any length of time.  This would otherwise amount to 3.65 TeraBytes per
year!!!

My approach would be to partition the Daily aggregate table as well as
the incoming transactions so as to avoid a contention problem.  I
would then "insert or update" each transaction into the appropriate
Daily aggregate row using a stored procedure.  This works out to a
transaction rate of about:
 1GB / 100B / 24 / 3600 = 116 TPS per thread.

This is feasable with reasonably sized PC hardware.  Ten threads or
1160TPS is a bit trickier but not out of the range of a very high end
Wintel (read 4 CPUs and 1GB or ram) box or a decent sized UNIX SMP
box.  I am assuming that the traffic is not uniform but might burst as
high as 2500 TPS.

At the end of the day, you can update the Monthly aggregate table
using the Daily aggregate.  Depending upon the number of records at
the aggregate level, you may not even need to create and store the
Quarterly and Yearly aggregates.  If there are not two many records
and/or the response time requirement is measured in seconds, you can
summarize these "at query time".

Warning:  Choose your database and platform very carefully.  I would
use SYBASE on a nice SUN UltraSparc server with lots of smaller disk
spindles (mirrored if necessary but absolutely no RAID5)  I know these
products and have configured enough systems to feel comfortable doing
a capacity plan off the cuff.  If you are not, I urge you to benchmark
your options.

One more note.  SYBASE and ORACLE went head to head on a benchmark a
couple of years ago (this was prior to System 11 and ASE) for (I
believe) the Swedish National Telephone System.  The benchmark results
floated around the web for quite some time so you may be able to find
it.  The jist of the results were that SYBASE handle roughly twice the
transaction rate and twice the number of users while consuming about
half the CPU capacity and 60% of the disk space than ORACLE.
System-11 and ASE are substantially faster than the version of SYBASE
used in that benchmark so things can only have gotten better.

E-mail me if you have more specific questions.
Good luck,
Bill Grant.



Sun, 27 Aug 2000 03:00:00 GMT
 For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation

Quote:

>>Heres a question for the DB Gurus (you may need to read it closely)
>>An application has 10 threads, each of them reading about 1 gigabyte
>>records off the network in a constant stream. Each record is about
>>100 bytes in length and has a composite key (multiple columns make
>>the unique key).
>>This data needs to be stored in the database as is.
>>Additionally, this data needs to be summarized for daily, monthly,
>>quarterly and yearly totals.
>The answer is "Do the Daily Aggregation On the Fly" and the others in
>batch.
>I assume you are not going to accumulate the individual records for
>any length of time.  This would otherwise amount to 3.65 TeraBytes per
>year!!!
>My approach would be to partition the Daily aggregate table as well as
>the incoming transactions so as to avoid a contention problem.  I
>would then "insert or update" each transaction into the appropriate
>Daily aggregate row using a stored procedure.  This works out to a
>transaction rate of about:
> 1GB / 100B / 24 / 3600 = 116 TPS per thread.
>This is feasable with reasonably sized PC hardware.  Ten threads or
>1160TPS is a bit trickier but not out of the range of a very high end
>Wintel (read 4 CPUs and 1GB or ram) box or a decent sized UNIX SMP
>box.  I am assuming that the traffic is not uniform but might burst as
>high as 2500 TPS.
>At the end of the day, you can update the Monthly aggregate table
>using the Daily aggregate.  Depending upon the number of records at
>the aggregate level, you may not even need to create and store the
>Quarterly and Yearly aggregates.  If there are not two many records
>and/or the response time requirement is measured in seconds, you can
>summarize these "at query time".
>Warning:  Choose your database and platform very carefully.  I would
>use SYBASE on a nice SUN UltraSparc server with lots of smaller disk
>spindles (mirrored if necessary but absolutely no RAID5)  I know these
>products and have configured enough systems to feel comfortable doing
>a capacity plan off the cuff.  If you are not, I urge you to benchmark
>your options.
>One more note.  SYBASE and ORACLE went head to head on a benchmark a
>couple of years ago (this was prior to System 11 and ASE) for (I
>believe) the Swedish National Telephone System.  The benchmark results
>floated around the web for quite some time so you may be able to find
>it.  The jist of the results were that SYBASE handle roughly twice the
>transaction rate and twice the number of users while consuming about
>half the CPU capacity and 60% of the disk space than ORACLE.
>System-11 and ASE are substantially faster than the version of SYBASE
>used in that benchmark so things can only have gotten better.

Another point worth mentioning is that some servers, such as Interbase, are
especially designed so that "readers don't block writers," using a versioning-
based storage approach instead of the traditional locks in a way that can
drastically reduce the probability of contention.

Also, see if it is possible to place the ten incoming streams of requests into
a single queue for processing.  See if it is possible to somehow pre-sort or
otherwise optimize that stream to minimize disk-arm movement.  (Perhaps the
server you use is smart enough to know about such things.)  When all is said
and done, it really does come down to minimizing the physical motion of the
disk mechanisms...



Sun, 27 Aug 2000 03:00:00 GMT
 
 [ 2 post ] 

 Relevant Pages 

1. For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation

2. For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation

3. For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation

4. For The Gurus: On-The-Fly aggregation vs. end-of-day aggregation

5. Aggregation Wizard fails to find any aggregations

6. To AS2K guru : 100% aggregation is false! it missed many aggregation

7. multidimension vs 2 single dimension aggregations

8. Calculation Order vs. Aggregation Order

9. Aggregation vs Calculation

10. repost:adt vs aggregation (cross-posted)

11. Comparative vs Aggregation MDDB values

12. Last Day of Month, End of Month, First Day of Month


 
Powered by phpBB® Forum Software