How to select the <= date row? 
Author Message
 How to select the <= date row?

Hi there,  I would appreciate it if you could give me some ideas on the
following problem:

I have a couple of tables related to each other and they store time
variant data.  The "main table" contains event info.  All the general
details of any event that effects the value of a policy are logged here.  
Specific info about each event is logged in a set of event-specific
tables.  The result of this is that not all the tables get a new row for
every event that is logged in the "main table".

I need to get the complete status of a policy, or set of policies, as it
was at any particular point in time.  This means supplying at least a date
at which the policy's value should be retrieved, say 1 Jan 1995.  Normaly
one would just join all the tables policy number, but in this case there
may not be row for the specified date in one or more of the related event
specific tables.  In this case I need to get the row for the closesed
previous date.  The row with the largest date less than or equal to the
specified date, is the one that was still in force at the specified date.

Two things make this a particularly challenging problem:

 1) This query has to be done from a query tool, so we are restricted to a
    single SQL statement (unless Business Objects can build a report from
    more than one SQL statement).

2) Selecting the max(specified_date) where event_date <= specified_date
   works fine, but it will take a couple of days to wade through 40
   million rows.

e.g.:
Main_Event_Tbl          Event_A_Tbl         Event_B_Tbl
--------------          -----------         -----------
95/01/01                95/01/01
95/02/01                                    95/02/01
95/03/01                                    95/03/01



Mon, 23 Nov 1998 03:00:00 GMT
 How to select the <= date row?

There are a couple solutions you could try, depending on your specific
environment.  First, if you have bench marked your query and
empirically know that it will take a couple days to run the SQL you
suggested, have you t
ried using parallel query to improve the performance?  Parallel query
can help in any situation.

Second, you may have to consider adding another reference table (this
is only one of many possibilities) as follows.  Since the fastest
access path in ORACLE is by rowid, create a single table that stores
the policy id, t
he date-time stamp of the transaction, and the rowid of each row for
each table.  In the event you add another detail table, you can add
another column to this reference table.  Then, when you need to search
for a policy
for a given date, just hit the reference table with the policy id and
the desired date, get the rowid for each table, and hit the other
tables with rowid.  This should be doable in one SQL statement.
Illustration below.
 As you are aware, it may take some time to build this table in the
first place, but that is a fixed cost.  You will also need some AFTER
INSERT OR UPDATE OR DELETE triggers on each table to maintain the
reference table.

create table
    policy_reference_table (
        policy_id   number not null
        ,dts        date   not null
        ,dt1_rid    rowid  not null  -- based on rules for each table
        ,dt2_rid    rowid            -- Suggest better names be used
        ,...
    )
/

In the above, dt1_rid is a column containing the rowid of detail table
one, dt2_rid is the rowid for detail table 2, etc...  I expect you
would substitute actual table names or some reasonable abbreviation.  I
also recomm
end foreign key constraints on policy_id back to the master table.
Below is the SQL statement.  It should be quite fast.  You might even
consider creating a view for this query.

select
    p.policy_col_1       -- column of interest in your master table
    ,p.policy_col_2      -- column of interest in your master table
    ,...                 -- etc...
    ,rt1.detail_col_1    -- column of interest in a detail table
    ,rt2.detail_col_2    -- column of interest in another detail table
    ,...                 -- etc...
from
    policy_reference_table prt  -- this is the new reference table
    policy_table           p    -- this is your existing master table
    policy_ref_table_1     rt1  -- this is a detail table
    policy_ref_table_2     rt2  -- this is a detail table
    ...
where
    prt.policy_id   = &policy_id
and prt.dts         = &given_date
and p.policy_id     = prt.policy_id
and rt1.rowid       = prt.dt1_rid
and rt2.rowid       = prt.dt2_rid
and ...

Hope this helps.

Steve
804-262-6332

------------------------

Quote:
Mahs) writes:

>Hi there,  I would appreciate it if you could give me some ideas on
the
>following problem:

>I have a couple of tables related to each other and they store time
>variant data.  The "main table" contains event info.  All the general
>details of any event that effects the value of a policy are logged
here.  
>Specific info about each event is logged in a set of event-specific
>tables.  The result of this is that not all the tables get a new row
for
>every event that is logged in the "main table".

>I need to get the complete status of a policy, or set of policies, as
it
>was at any particular point in time.  This means supplying at least a
date
>at which the policy's value should be retrieved, say 1 Jan 1995.
Normaly
>one would just join all the tables policy number, but in this case
there
>may not be row for the specified date in one or more of the related
event
>specific tables.  In this case I need to get the row for the closesed
>previous date.  The row with the largest date less than or equal to
the
>specified date, is the one that was still in force at the specified
date.

>Two things make this a particularly challenging problem:

> 1) This query has to be done from a query tool, so we are restricted
to a
>    single SQL statement (unless Business Objects can build a report
from
>    more than one SQL statement).

>2) Selecting the max(specified_date) where event_date <=
specified_date
>   works fine, but it will take a couple of days to wade through 40
>   million rows.

>e.g.:
>Main_Event_Tbl          Event_A_Tbl         Event_B_Tbl
>--------------          -----------         -----------
>95/01/01                95/01/01
>95/02/01                                    95/02/01
>95/03/01                                    95/03/01



Mon, 23 Nov 1998 03:00:00 GMT
 How to select the <= date row?

I believe the problem he is having is not in doing a join across
several tables, but in selecting data which may or may not be in
a number of detail tables and in select a single date <= an input date.

To solve the problem of joining tables in which a join may or may
not exist, I would suggest performing an outer join. Usually you
do something like:

        select  main.id, subtbl_a.data1, subtbl_b.data2
        from    main, subtbl_a, subtbl_b
        where   main.id *= subtbl_a.id
        and     main.id *= subtbl_b.id

The larger problem is selecting the ids from the main table which
are closest to, but not past, the given date. A simple SQL solution
to this is to self join the main table to itself and select the max
date where the date is less than the cut-off date, grouped by id, as
follows:

        select b.id, max(b2.dt_event)
        from main b, main b2
        where
        b.id = b2.id
        and b2.dt_event <= "desired date"
        group by b.id

It is hard to believe that this would run for "a couple of days", but
I don't know what your setup is, so maybe it is possible. But if this is
OK, then just expand the query to outer join the detail tables as above.

If this is unacceptable, then I would suggest running a nightly cron
job to append to a small two column table which would contain those
ids which are "active" today. This would basically be the preceding
query.  You might at first have to run this over all the dates in
your main table, but that is only done once. Then you could perform
a simple equijoin from this small table, indexed on date, to the main
table, followed by the outer joins to get all your data back.

Finally, can you run stored procedures and just "select" the result
set back to your application? This would enable you to have multiple
SQL statements, but on the server, not the client.


|>There are a couple solutions you could try, depending on your specific
|>environment.  First, if you have bench marked your query and
|>empirically know that it will take a couple days to run the SQL you
|>suggested, have you t
|>ried using parallel query to improve the performance?  Parallel query
|>can help in any situation.
|>
|>Second, you may have to consider adding another reference table (this
|>is only one of many possibilities) as follows.  Since the fastest
|>access path in ORACLE is by rowid, create a single table that stores
|>the policy id, t
|>he date-time stamp of the transaction, and the rowid of each row for
|>each table.  In the event you add another detail table, you can add
|>another column to this reference table.  Then, when you need to search
|>for a policy
|>for a given date, just hit the reference table with the policy id and
|>the desired date, get the rowid for each table, and hit the other
|>tables with rowid.  This should be doable in one SQL statement.
|>Illustration below.
|> As you are aware, it may take some time to build this table in the
|>first place, but that is a fixed cost.  You will also need some AFTER
|>INSERT OR UPDATE OR DELETE triggers on each table to maintain the
|>reference table.
|>
|>create table
|>    policy_reference_table (
|>        policy_id   number not null
|>        ,dts        date   not null
|>        ,dt1_rid    rowid  not null  -- based on rules for each table
|>        ,dt2_rid    rowid            -- Suggest better names be used
|>        ,...
|>    )
|>/
|>
|>In the above, dt1_rid is a column containing the rowid of detail table
|>one, dt2_rid is the rowid for detail table 2, etc...  I expect you
|>would substitute actual table names or some reasonable abbreviation.  I
|>also recomm
|>end foreign key constraints on policy_id back to the master table.
|>Below is the SQL statement.  It should be quite fast.  You might even
|>consider creating a view for this query.
|>
|>select
|>    p.policy_col_1       -- column of interest in your master table
|>    ,p.policy_col_2      -- column of interest in your master table
|>    ,...                 -- etc...
|>    ,rt1.detail_col_1    -- column of interest in a detail table
|>    ,rt2.detail_col_2    -- column of interest in another detail table
|>    ,...                 -- etc...
|>from
|>    policy_reference_table prt  -- this is the new reference table
|>    policy_table           p    -- this is your existing master table
|>    policy_ref_table_1     rt1  -- this is a detail table
|>    policy_ref_table_2     rt2  -- this is a detail table
|>    ...
|>where
|>    prt.policy_id   = &policy_id
|>and prt.dts         = &given_date
|>and p.policy_id     = prt.policy_id
|>and rt1.rowid       = prt.dt1_rid
|>and rt2.rowid       = prt.dt2_rid
|>and ...
|>
|>
|>Hope this helps.
|>
|>Steve
|>804-262-6332
|>
|>
|>
|>------------------------

Quote:
|>Mahs) writes:

|>>
|>>Hi there,  I would appreciate it if you could give me some ideas on
|>the
|>>following problem:
|>>
|>>I have a couple of tables related to each other and they store time
|>>variant data.  The "main table" contains event info.  All the general
|>>details of any event that effects the value of a policy are logged
|>here.  
|>>Specific info about each event is logged in a set of event-specific
|>>tables.  The result of this is that not all the tables get a new row
|>for
|>>every event that is logged in the "main table".
|>>
|>>I need to get the complete status of a policy, or set of policies, as
|>it
|>>was at any particular point in time.  This means supplying at least a
|>date
|>>at which the policy's value should be retrieved, say 1 Jan 1995.
|>Normaly
|>>one would just join all the tables policy number, but in this case
|>there
|>>may not be row for the specified date in one or more of the related
|>event
|>>specific tables.  In this case I need to get the row for the closesed
|>>previous date.  The row with the largest date less than or equal to
|>the
|>>specified date, is the one that was still in force at the specified
|>date.
|>>
|>>Two things make this a particularly challenging problem:
|>>
|>> 1) This query has to be done from a query tool, so we are restricted
|>to a
|>>    single SQL statement (unless Business Objects can build a report
|>from
|>>    more than one SQL statement).
|>>
|>>2) Selecting the max(specified_date) where event_date <=
|>specified_date
|>>   works fine, but it will take a couple of days to wade through 40
|>>   million rows.
|>>
|>>e.g.:
|>>Main_Event_Tbl          Event_A_Tbl         Event_B_Tbl
|>>--------------          -----------         -----------
|>>95/01/01                95/01/01
|>>95/02/01                                    95/02/01
|>>95/03/01                                    95/03/01
|>>
|>
|>


Mon, 23 Nov 1998 03:00:00 GMT
 How to select the <= date row?



Quote:
>Two things make this a particularly challenging problem:

> 1) This query has to be done from a query tool, so we are restricted to a
>    single SQL statement (unless Business Objects can build a report from
>    more than one SQL statement).

>2) Selecting the max(specified_date) where event_date <= specified_date
>   works fine, but it will take a couple of days to wade through 40
>   million rows.

could you try something like

set rowcount = 50
select max (specified_date) ORDER by date

__________________________________________________
Reinoud van Leeuwen

http://www.xs4all.nl/~reinoud
__________________________________________________



Mon, 23 Nov 1998 03:00:00 GMT
 How to select the <= date row?

If the date field is indexed, it seems to me that there is enough information
in a " max(specified_date) where event_date <= specified_date" statement to
tell the server how to quickly locate the value.  Even if it does a scan, it
will scan the index because only the indexed field is involved.

Quote:

> Hi there,  I would appreciate it if you could give me some ideas on the
> following problem:

> I have a couple of tables related to each other and they store time
> variant data.  The "main table" contains event info.  All the general
> details of any event that effects the value of a policy are logged here.
> Specific info about each event is logged in a set of event-specific
> tables.  The result of this is that not all the tables get a new row for
> every event that is logged in the "main table".

> I need to get the complete status of a policy, or set of policies, as it
> was at any particular point in time.  This means supplying at least a date
> at which the policy's value should be retrieved, say 1 Jan 1995.  Normaly
> one would just join all the tables policy number, but in this case there
> may not be row for the specified date in one or more of the related event
> specific tables.  In this case I need to get the row for the closesed
> previous date.  The row with the largest date less than or equal to the
> specified date, is the one that was still in force at the specified date.

> Two things make this a particularly challenging problem:

>  1) This query has to be done from a query tool, so we are restricted to a
>     single SQL statement (unless Business Objects can build a report from
>     more than one SQL statement).

> 2) Selecting the max(specified_date) where event_date <= specified_date
>    works fine, but it will take a couple of days to wade through 40
>    million rows.

> e.g.:
> Main_Event_Tbl          Event_A_Tbl         Event_B_Tbl
> --------------          -----------         -----------
> 95/01/01                95/01/01
> 95/02/01                                    95/02/01
> 95/03/01                                    95/03/01



Mon, 23 Nov 1998 03:00:00 GMT
 How to select the <= date row?

Quote:

> To solve the problem of joining tables in which a join may or may
> not exist, I would suggest performing an outer join. Usually you
> do something like:

>         select  main.id, subtbl_a.data1, subtbl_b.data2
>         from    main, subtbl_a, subtbl_b
>         where   main.id *= subtbl_a.id
>         and     main.id *= subtbl_b.id

Since this was cross posted to Informix, try this:

        SELECT  <Your data set here>
        FROM    main A, OUTER subtbl_a SA, OUTER subtbl_b SB
        WHERE   A.event_date <= "<Your Date>"
        AND     A.id = SA.id
        AND     A.id = SB.id
        ORDER BY A.event_date DESCENDING

Quote:
> |>>Two things make this a particularly challenging problem:
> |>>
> |>> 1) This query has to be done from a query tool, so we are restricted
> |>to a
> |>>    single SQL statement (unless Business Objects can build a report
> |>from
> |>>    more than one SQL statement).
> |>>
> |>>2) Selecting the max(specified_date) where event_date <=
> |>specified_date
> |>>   works fine, but it will take a couple of days to wade through 40
> |>>   million rows.
> |>>

It sounds like you may not have a good index. If you can use
ISQL /DBACCESS, if this is an Informix problem, then you should
be ok.

If you try the following:
        SELECT *
        FROM    main
        WHERE   event_date <= "<YOUR DATE>"
        ORDER BY event_date DESCENDING
And it returns quickly, then there might be something wrong
with your optimizer. If so, try this query:

        SELECT  <Your data set here>
        FROM    main A, OUTER subtbl_a SA, OUTER subtbl_b SB
        WHERE   A.id = SA.id
        AND     A.id = SB.id
        AND     A.id IN (
                        SELECT id
                        FROM    main
                        WHERE   event_date <= "<YOUR DATE>"
                        )
        ORDER BY A.event_date DESCENDING

The nested query should reduce the rows that are returned.
You may also want to check that you are joining the tables
correctly. You need to make sure that the IDs are unique, otherwise the
result will have bad data with too many rows.

Just my $.02.

-Mikey
PS,     If you only wanted the one row, then use this query:

        SELECT  <Your data set here>
        FROM    main A, OUTER subtbl_a SA, OUTER subtbl_b SB
        WHERE   A.id = SA.id
        AND     A.id = SB.id
        AND     A.id IN (
                        SELECT id
                        FROM    main
                        WHERE   MAX(event_date) <= "<YOUR DATE>"
                        )
This should return one row, however if this takes a while check
your indexes.



Wed, 25 Nov 1998 03:00:00 GMT
 How to select the <= date row?

Quote:

>Hi there,  I would appreciate it if you could give me some ideas on the
>following problem:
>I have a couple of tables related to each other and they store time
>variant data.  The "main table" contains event info.  All the general
>details of any event that effects the value of a policy are logged here.  
>Specific info about each event is logged in a set of event-specific
>tables.  The result of this is that not all the tables get a new row for
>every event that is logged in the "main table".
>I need to get the complete status of a policy, or set of policies, as it
>was at any particular point in time.  This means supplying at least a date
>at which the policy's value should be retrieved, say 1 Jan 1995.  Normaly
>one would just join all the tables policy number, but in this case there
>may not be row for the specified date in one or more of the related event
>specific tables.  In this case I need to get the row for the closesed
>previous date.  The row with the largest date less than or equal to the
>specified date, is the one that was still in force at the specified date.
>Two things make this a particularly challenging problem:
> 1) This query has to be done from a query tool, so we are restricted to a
>    single SQL statement (unless Business Objects can build a report from
>    more than one SQL statement).
>2) Selecting the max(specified_date) where event_date <= specified_date
>   works fine, but it will take a couple of days to wade through 40
>   million rows.
>e.g.:
>Main_Event_Tbl          Event_A_Tbl         Event_B_Tbl
>--------------          -----------         -----------
>95/01/01                95/01/01
>95/02/01                                    95/02/01
>95/03/01                                    95/03/01

Would it help to have each policy have a unique identifier (identity
column?) and store an EffectiveFrom and EffectiveTo date with each
record? I do this for all history-bearing columns. The PK would be the
PolicyId/EffectiveTo combo:

PolicyId int identity
...
Status char(1)
EffectiveFrom datetime
EffectiveTo datetime

To get the record in effect as of a given datetime, use the following
terms in your where clause:

...


---

The first time you write a new policy, set the EffectiveFrom =
getdate() and EffectiveTo = '12-31-9999'. Updates to a policy first
set the current record's EffectiveTo = getdate(), and insert a new
record with the same Id, EffectivrFrom = getdate(), EffectiveTo =
'12-31-9999'. To delete a policy, set its Status from 'A'ctive to
'I'nactive and write a new record. That way, "as of" some prior date,
the record is still active, but after that it's deleted.

If you're doing something totally different then forgive my rambling.

Steve



Fri, 27 Nov 1998 03:00:00 GMT
 How to select the <= date row?

Quote:

> ....
> 2) Selecting the max(specified_date) where event_date <= specified_date
>    works fine, but it will take a couple of days to wade through 40
>    million rows.
> .... A couple of days?? What are you using, Oracle? :)

Seriously, though, I assume you have an index that can be used in the query,
i.e. one with a reverse ordered specified_date.

----------------------------------------------------------------------
John H. Frantz           Power-4gl: Extending Informix-4gl



Fri, 27 Nov 1998 03:00:00 GMT
 How to select the <= date row?

Quote:


>>Hi there,  I would appreciate it if you could give me some ideas on the
>>following problem:

>>I have a couple of tables related to each other and they store time
>>variant data.  The "main table" contains event info.  All the general
>>details of any event that effects the value of a policy are logged here.  
>>Specific info about each event is logged in a set of event-specific
>>tables.  The result of this is that not all the tables get a new row for
>>every event that is logged in the "main table".

>>I need to get the complete status of a policy, or set of policies, as it
>>was at any particular point in time.  This means supplying at least a date
>>at which the policy's value should be retrieved, say 1 Jan 1995.  

At ISUG '96, bill Cartwright presented "Using Sybase Triggers to Enforce 'Temporal'
Integrity". His technique allowed users to make inserts, updates, deletes, etc. and the
triggers would mark the rows with a begin or end time. It is a really slick technique.
His presentation should be on the Sybase web home page under events.

Bob Munson



Sat, 28 Nov 1998 03:00:00 GMT
 How to select the <= date row?

The query that starts with

Quote:
>2) Selecting the max(specified_date) where event_date <= specified_date
>   works fine, but it will take a couple of days to wade through 40
>   million rows.

is a step in the right direction, you just need to tell oracle to use
the specific index (on the date) and then stop after 1 row. like...

SELECT /* INDEX_DESC(eventA event_date_idx) */ date_col  
FROM eventA
WHERE event_date <= to_date(:t)

Check out Chapter 5 of the Application Developer's Guide (Tuning SQL
Statements), specificly the part on hints like INDEX_DESC

Chip Dawes
Motorola Manufacturing Systems



Sun, 29 Nov 1998 03:00:00 GMT
 How to select the <= date row?

Quote:


>>Hi there,  I would appreciate it if you could give me some ideas on the
>>following problem:
.....>
>>I need to get the complete status of a policy, or set of policies, as it
>>was at any particular point in time.  This means supplying at least a date
>>at which the policy's value should be retrieved, say 1 Jan 1995.  Normaly
>>one would just join all the tables policy number, but in this case there
>>may not be row for the specified date in one or more of the related event
>>specific tables.  In this case I need to get the row for the closesed
>>previous date.  The row with the largest date less than or equal to the
>>specified date, is the one that was still in force at the specified date.

>>Two things make this a particularly challenging problem:

>> 1) This query has to be done from a query tool, so we are restricted to a
>>    single SQL statement (unless Business Objects can build a report from
>>    more than one SQL statement).

>>2) Selecting the max(specified_date) where event_date <= specified_date
>>   works fine, but it will take a couple of days to wade through 40
>>   million rows.

I ran into the same problem.  I joined the 2 tables and ran my
reports against the 3rd table, at a time when no more data-entry
was being done.  If you are just doing an occasional query maybe
you could just query the 2 table and not join them?  I noticed
that performance can vary greatly depending on how the indexes are
set up.  With tables as big as yours, I don't know if this would
ever run fast enough but here it is...  I ran this in a vms .com file
in batch.

create table aud_join as
select x.rescode, x.startdate, x.enddate, x.pgmcode, x.hospcode,
  x.pgylevel, h.startdate h_startdate, h.enddate h_enddate,
  h.hospcode h_hospcode
from system.resstat95 x, system.reshosp95 h
  where (x.startdate, x.rescode) in
  (select max(startdate), rescode
   from system.resstat95 y
where (y.rescode = h.rescode and
 y.startdate <= h.startdate)
 group by y.rescode);

- Peggy -



Fri, 04 Dec 1998 03:00:00 GMT
 How to select the <= date row?


Quote:
> I need to get the complete status of a policy, or set of policies, as it
> was at any particular point in time.  This means supplying at least a
date
> at which the policy's value should be retrieved, say 1 Jan 1995.
Normaly
> one would just join all the tables policy number, but in this case there
> may not be row for the specified date in one or more of the related
event
> specific tables.  In this case I need to get the row for the closesed
> previous date.  The row with the largest date less than or equal to the
> specified date, is the one that was still in force at the specified

date.

Well, you could include a start and end date/time, index it, and use
between for your date in question.



Mon, 07 Dec 1998 03:00:00 GMT
 
 [ 12 post ] 

 Relevant Pages 

1. How to select the <= date row

2. How to select the <= date row

3. How to select the <= date row?

4. How to select the <= date row

5. How to select the <= date row?

6. <<<<<<<PROGRAMA GRATUITO PARA WINDOWS 95 /NT<<<<<<<<<<<

7. one SQL datacontrol or many TABLE datacontrols<<<<<<<<<<<<<<<<<

8. printing Individual Form Pages <<<<<<<<<<<<

9. Access Programming<<<<<<<<<<<<

10. Selecting a date field <-- an SQL question

11. question regarding select (multiple rows select into one result row)

12. Selecting rows between 2 date


 
Powered by phpBB® Forum Software