How to select the <= date row?
Author |
Message |
Stefan Ma #1 / 12
|
 How to select the <= date row?
Hi there, I would appreciate it if you could give me some ideas on the following problem: I have a couple of tables related to each other and they store time variant data. The "main table" contains event info. All the general details of any event that effects the value of a policy are logged here. Specific info about each event is logged in a set of event-specific tables. The result of this is that not all the tables get a new row for every event that is logged in the "main table". I need to get the complete status of a policy, or set of policies, as it was at any particular point in time. This means supplying at least a date at which the policy's value should be retrieved, say 1 Jan 1995. Normaly one would just join all the tables policy number, but in this case there may not be row for the specified date in one or more of the related event specific tables. In this case I need to get the row for the closesed previous date. The row with the largest date less than or equal to the specified date, is the one that was still in force at the specified date. Two things make this a particularly challenging problem: 1) This query has to be done from a query tool, so we are restricted to a single SQL statement (unless Business Objects can build a report from more than one SQL statement). 2) Selecting the max(specified_date) where event_date <= specified_date works fine, but it will take a couple of days to wade through 40 million rows. e.g.: Main_Event_Tbl Event_A_Tbl Event_B_Tbl -------------- ----------- ----------- 95/01/01 95/01/01 95/02/01 95/02/01 95/03/01 95/03/01
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Steve Lo #2 / 12
|
 How to select the <= date row?
There are a couple solutions you could try, depending on your specific environment. First, if you have bench marked your query and empirically know that it will take a couple days to run the SQL you suggested, have you t ried using parallel query to improve the performance? Parallel query can help in any situation. Second, you may have to consider adding another reference table (this is only one of many possibilities) as follows. Since the fastest access path in ORACLE is by rowid, create a single table that stores the policy id, t he date-time stamp of the transaction, and the rowid of each row for each table. In the event you add another detail table, you can add another column to this reference table. Then, when you need to search for a policy for a given date, just hit the reference table with the policy id and the desired date, get the rowid for each table, and hit the other tables with rowid. This should be doable in one SQL statement. Illustration below. As you are aware, it may take some time to build this table in the first place, but that is a fixed cost. You will also need some AFTER INSERT OR UPDATE OR DELETE triggers on each table to maintain the reference table. create table policy_reference_table ( policy_id number not null ,dts date not null ,dt1_rid rowid not null -- based on rules for each table ,dt2_rid rowid -- Suggest better names be used ,... ) / In the above, dt1_rid is a column containing the rowid of detail table one, dt2_rid is the rowid for detail table 2, etc... I expect you would substitute actual table names or some reasonable abbreviation. I also recomm end foreign key constraints on policy_id back to the master table. Below is the SQL statement. It should be quite fast. You might even consider creating a view for this query. select p.policy_col_1 -- column of interest in your master table ,p.policy_col_2 -- column of interest in your master table ,... -- etc... ,rt1.detail_col_1 -- column of interest in a detail table ,rt2.detail_col_2 -- column of interest in another detail table ,... -- etc... from policy_reference_table prt -- this is the new reference table policy_table p -- this is your existing master table policy_ref_table_1 rt1 -- this is a detail table policy_ref_table_2 rt2 -- this is a detail table ... where prt.policy_id = &policy_id and prt.dts = &given_date and p.policy_id = prt.policy_id and rt1.rowid = prt.dt1_rid and rt2.rowid = prt.dt2_rid and ... Hope this helps. Steve 804-262-6332 ------------------------
Quote: Mahs) writes: >Hi there, I would appreciate it if you could give me some ideas on the >following problem: >I have a couple of tables related to each other and they store time >variant data. The "main table" contains event info. All the general >details of any event that effects the value of a policy are logged here. >Specific info about each event is logged in a set of event-specific >tables. The result of this is that not all the tables get a new row for >every event that is logged in the "main table". >I need to get the complete status of a policy, or set of policies, as it >was at any particular point in time. This means supplying at least a date >at which the policy's value should be retrieved, say 1 Jan 1995. Normaly >one would just join all the tables policy number, but in this case there >may not be row for the specified date in one or more of the related event >specific tables. In this case I need to get the row for the closesed >previous date. The row with the largest date less than or equal to the >specified date, is the one that was still in force at the specified date. >Two things make this a particularly challenging problem: > 1) This query has to be done from a query tool, so we are restricted to a > single SQL statement (unless Business Objects can build a report from > more than one SQL statement). >2) Selecting the max(specified_date) where event_date <= specified_date > works fine, but it will take a couple of days to wade through 40 > million rows. >e.g.: >Main_Event_Tbl Event_A_Tbl Event_B_Tbl >-------------- ----------- ----------- >95/01/01 95/01/01 >95/02/01 95/02/01 >95/03/01 95/03/01
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Bob Berm #3 / 12
|
 How to select the <= date row?
I believe the problem he is having is not in doing a join across several tables, but in selecting data which may or may not be in a number of detail tables and in select a single date <= an input date. To solve the problem of joining tables in which a join may or may not exist, I would suggest performing an outer join. Usually you do something like: select main.id, subtbl_a.data1, subtbl_b.data2 from main, subtbl_a, subtbl_b where main.id *= subtbl_a.id and main.id *= subtbl_b.id The larger problem is selecting the ids from the main table which are closest to, but not past, the given date. A simple SQL solution to this is to self join the main table to itself and select the max date where the date is less than the cut-off date, grouped by id, as follows: select b.id, max(b2.dt_event) from main b, main b2 where b.id = b2.id and b2.dt_event <= "desired date" group by b.id It is hard to believe that this would run for "a couple of days", but I don't know what your setup is, so maybe it is possible. But if this is OK, then just expand the query to outer join the detail tables as above. If this is unacceptable, then I would suggest running a nightly cron job to append to a small two column table which would contain those ids which are "active" today. This would basically be the preceding query. You might at first have to run this over all the dates in your main table, but that is only done once. Then you could perform a simple equijoin from this small table, indexed on date, to the main table, followed by the outer joins to get all your data back. Finally, can you run stored procedures and just "select" the result set back to your application? This would enable you to have multiple SQL statements, but on the server, not the client.
|>There are a couple solutions you could try, depending on your specific |>environment. First, if you have bench marked your query and |>empirically know that it will take a couple days to run the SQL you |>suggested, have you t |>ried using parallel query to improve the performance? Parallel query |>can help in any situation. |> |>Second, you may have to consider adding another reference table (this |>is only one of many possibilities) as follows. Since the fastest |>access path in ORACLE is by rowid, create a single table that stores |>the policy id, t |>he date-time stamp of the transaction, and the rowid of each row for |>each table. In the event you add another detail table, you can add |>another column to this reference table. Then, when you need to search |>for a policy |>for a given date, just hit the reference table with the policy id and |>the desired date, get the rowid for each table, and hit the other |>tables with rowid. This should be doable in one SQL statement. |>Illustration below. |> As you are aware, it may take some time to build this table in the |>first place, but that is a fixed cost. You will also need some AFTER |>INSERT OR UPDATE OR DELETE triggers on each table to maintain the |>reference table. |> |>create table |> policy_reference_table ( |> policy_id number not null |> ,dts date not null |> ,dt1_rid rowid not null -- based on rules for each table |> ,dt2_rid rowid -- Suggest better names be used |> ,... |> ) |>/ |> |>In the above, dt1_rid is a column containing the rowid of detail table |>one, dt2_rid is the rowid for detail table 2, etc... I expect you |>would substitute actual table names or some reasonable abbreviation. I |>also recomm |>end foreign key constraints on policy_id back to the master table. |>Below is the SQL statement. It should be quite fast. You might even |>consider creating a view for this query. |> |>select |> p.policy_col_1 -- column of interest in your master table |> ,p.policy_col_2 -- column of interest in your master table |> ,... -- etc... |> ,rt1.detail_col_1 -- column of interest in a detail table |> ,rt2.detail_col_2 -- column of interest in another detail table |> ,... -- etc... |>from |> policy_reference_table prt -- this is the new reference table |> policy_table p -- this is your existing master table |> policy_ref_table_1 rt1 -- this is a detail table |> policy_ref_table_2 rt2 -- this is a detail table |> ... |>where |> prt.policy_id = &policy_id |>and prt.dts = &given_date |>and p.policy_id = prt.policy_id |>and rt1.rowid = prt.dt1_rid |>and rt2.rowid = prt.dt2_rid |>and ... |> |> |>Hope this helps. |> |>Steve |>804-262-6332 |> |> |> |>------------------------
Quote: |>Mahs) writes:
|>> |>>Hi there, I would appreciate it if you could give me some ideas on |>the |>>following problem: |>> |>>I have a couple of tables related to each other and they store time |>>variant data. The "main table" contains event info. All the general |>>details of any event that effects the value of a policy are logged |>here. |>>Specific info about each event is logged in a set of event-specific |>>tables. The result of this is that not all the tables get a new row |>for |>>every event that is logged in the "main table". |>> |>>I need to get the complete status of a policy, or set of policies, as |>it |>>was at any particular point in time. This means supplying at least a |>date |>>at which the policy's value should be retrieved, say 1 Jan 1995. |>Normaly |>>one would just join all the tables policy number, but in this case |>there |>>may not be row for the specified date in one or more of the related |>event |>>specific tables. In this case I need to get the row for the closesed |>>previous date. The row with the largest date less than or equal to |>the |>>specified date, is the one that was still in force at the specified |>date. |>> |>>Two things make this a particularly challenging problem: |>> |>> 1) This query has to be done from a query tool, so we are restricted |>to a |>> single SQL statement (unless Business Objects can build a report |>from |>> more than one SQL statement). |>> |>>2) Selecting the max(specified_date) where event_date <= |>specified_date |>> works fine, but it will take a couple of days to wade through 40 |>> million rows. |>> |>>e.g.: |>>Main_Event_Tbl Event_A_Tbl Event_B_Tbl |>>-------------- ----------- ----------- |>>95/01/01 95/01/01 |>>95/02/01 95/02/01 |>>95/03/01 95/03/01 |>> |> |>
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Reinoud van Leeuw #4 / 12
|
 How to select the <= date row?
Quote: >Two things make this a particularly challenging problem: > 1) This query has to be done from a query tool, so we are restricted to a > single SQL statement (unless Business Objects can build a report from > more than one SQL statement). >2) Selecting the max(specified_date) where event_date <= specified_date > works fine, but it will take a couple of days to wade through 40 > million rows.
could you try something like set rowcount = 50 select max (specified_date) ORDER by date __________________________________________________ Reinoud van Leeuwen
http://www.xs4all.nl/~reinoud __________________________________________________
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Steve Jorgense #5 / 12
|
 How to select the <= date row?
If the date field is indexed, it seems to me that there is enough information in a " max(specified_date) where event_date <= specified_date" statement to tell the server how to quickly locate the value. Even if it does a scan, it will scan the index because only the indexed field is involved. Quote:
> Hi there, I would appreciate it if you could give me some ideas on the > following problem: > I have a couple of tables related to each other and they store time > variant data. The "main table" contains event info. All the general > details of any event that effects the value of a policy are logged here. > Specific info about each event is logged in a set of event-specific > tables. The result of this is that not all the tables get a new row for > every event that is logged in the "main table". > I need to get the complete status of a policy, or set of policies, as it > was at any particular point in time. This means supplying at least a date > at which the policy's value should be retrieved, say 1 Jan 1995. Normaly > one would just join all the tables policy number, but in this case there > may not be row for the specified date in one or more of the related event > specific tables. In this case I need to get the row for the closesed > previous date. The row with the largest date less than or equal to the > specified date, is the one that was still in force at the specified date. > Two things make this a particularly challenging problem: > 1) This query has to be done from a query tool, so we are restricted to a > single SQL statement (unless Business Objects can build a report from > more than one SQL statement). > 2) Selecting the max(specified_date) where event_date <= specified_date > works fine, but it will take a couple of days to wade through 40 > million rows. > e.g.: > Main_Event_Tbl Event_A_Tbl Event_B_Tbl > -------------- ----------- ----------- > 95/01/01 95/01/01 > 95/02/01 95/02/01 > 95/03/01 95/03/01
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Mike Sege #6 / 12
|
 How to select the <= date row?
Quote:
> To solve the problem of joining tables in which a join may or may > not exist, I would suggest performing an outer join. Usually you > do something like: > select main.id, subtbl_a.data1, subtbl_b.data2 > from main, subtbl_a, subtbl_b > where main.id *= subtbl_a.id > and main.id *= subtbl_b.id
Since this was cross posted to Informix, try this: SELECT <Your data set here> FROM main A, OUTER subtbl_a SA, OUTER subtbl_b SB WHERE A.event_date <= "<Your Date>" AND A.id = SA.id AND A.id = SB.id ORDER BY A.event_date DESCENDING Quote: > |>>Two things make this a particularly challenging problem: > |>> > |>> 1) This query has to be done from a query tool, so we are restricted > |>to a > |>> single SQL statement (unless Business Objects can build a report > |>from > |>> more than one SQL statement). > |>> > |>>2) Selecting the max(specified_date) where event_date <= > |>specified_date > |>> works fine, but it will take a couple of days to wade through 40 > |>> million rows. > |>>
It sounds like you may not have a good index. If you can use ISQL /DBACCESS, if this is an Informix problem, then you should be ok. If you try the following: SELECT * FROM main WHERE event_date <= "<YOUR DATE>" ORDER BY event_date DESCENDING And it returns quickly, then there might be something wrong with your optimizer. If so, try this query: SELECT <Your data set here> FROM main A, OUTER subtbl_a SA, OUTER subtbl_b SB WHERE A.id = SA.id AND A.id = SB.id AND A.id IN ( SELECT id FROM main WHERE event_date <= "<YOUR DATE>" ) ORDER BY A.event_date DESCENDING The nested query should reduce the rows that are returned. You may also want to check that you are joining the tables correctly. You need to make sure that the IDs are unique, otherwise the result will have bad data with too many rows. Just my $.02. -Mikey PS, If you only wanted the one row, then use this query: SELECT <Your data set here> FROM main A, OUTER subtbl_a SA, OUTER subtbl_b SB WHERE A.id = SA.id AND A.id = SB.id AND A.id IN ( SELECT id FROM main WHERE MAX(event_date) <= "<YOUR DATE>" ) This should return one row, however if this takes a while check your indexes.
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Steven Lamot #7 / 12
|
 How to select the <= date row?
Quote:
>Hi there, I would appreciate it if you could give me some ideas on the >following problem: >I have a couple of tables related to each other and they store time >variant data. The "main table" contains event info. All the general >details of any event that effects the value of a policy are logged here. >Specific info about each event is logged in a set of event-specific >tables. The result of this is that not all the tables get a new row for >every event that is logged in the "main table". >I need to get the complete status of a policy, or set of policies, as it >was at any particular point in time. This means supplying at least a date >at which the policy's value should be retrieved, say 1 Jan 1995. Normaly >one would just join all the tables policy number, but in this case there >may not be row for the specified date in one or more of the related event >specific tables. In this case I need to get the row for the closesed >previous date. The row with the largest date less than or equal to the >specified date, is the one that was still in force at the specified date. >Two things make this a particularly challenging problem: > 1) This query has to be done from a query tool, so we are restricted to a > single SQL statement (unless Business Objects can build a report from > more than one SQL statement). >2) Selecting the max(specified_date) where event_date <= specified_date > works fine, but it will take a couple of days to wade through 40 > million rows. >e.g.: >Main_Event_Tbl Event_A_Tbl Event_B_Tbl >-------------- ----------- ----------- >95/01/01 95/01/01 >95/02/01 95/02/01 >95/03/01 95/03/01
Would it help to have each policy have a unique identifier (identity column?) and store an EffectiveFrom and EffectiveTo date with each record? I do this for all history-bearing columns. The PK would be the PolicyId/EffectiveTo combo: PolicyId int identity ... Status char(1) EffectiveFrom datetime EffectiveTo datetime To get the record in effect as of a given datetime, use the following terms in your where clause: ...
--- The first time you write a new policy, set the EffectiveFrom = getdate() and EffectiveTo = '12-31-9999'. Updates to a policy first set the current record's EffectiveTo = getdate(), and insert a new record with the same Id, EffectivrFrom = getdate(), EffectiveTo = '12-31-9999'. To delete a policy, set its Status from 'A'ctive to 'I'nactive and write a new record. That way, "as of" some prior date, the record is still active, but after that it's deleted. If you're doing something totally different then forgive my rambling. Steve
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
John H. Frant #8 / 12
|
 How to select the <= date row?
Quote:
> .... > 2) Selecting the max(specified_date) where event_date <= specified_date > works fine, but it will take a couple of days to wade through 40 > million rows. > .... A couple of days?? What are you using, Oracle? :)
Seriously, though, I assume you have an index that can be used in the query, i.e. one with a reverse ordered specified_date. ---------------------------------------------------------------------- John H. Frantz Power-4gl: Extending Informix-4gl
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Robert Munso #9 / 12
|
 How to select the <= date row?
Quote:
>>Hi there, I would appreciate it if you could give me some ideas on the >>following problem: >>I have a couple of tables related to each other and they store time >>variant data. The "main table" contains event info. All the general >>details of any event that effects the value of a policy are logged here. >>Specific info about each event is logged in a set of event-specific >>tables. The result of this is that not all the tables get a new row for >>every event that is logged in the "main table". >>I need to get the complete status of a policy, or set of policies, as it >>was at any particular point in time. This means supplying at least a date >>at which the policy's value should be retrieved, say 1 Jan 1995.
At ISUG '96, bill Cartwright presented "Using Sybase Triggers to Enforce 'Temporal' Integrity". His technique allowed users to make inserts, updates, deletes, etc. and the triggers would mark the rows with a begin or end time. It is a really slick technique. His presentation should be on the Sybase web home page under events.
Bob Munson
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Chip Dawe #10 / 12
|
 How to select the <= date row?
The query that starts with Quote: >2) Selecting the max(specified_date) where event_date <= specified_date > works fine, but it will take a couple of days to wade through 40 > million rows.
is a step in the right direction, you just need to tell oracle to use the specific index (on the date) and then stop after 1 row. like... SELECT /* INDEX_DESC(eventA event_date_idx) */ date_col FROM eventA WHERE event_date <= to_date(:t) Check out Chapter 5 of the Application Developer's Guide (Tuning SQL Statements), specificly the part on hints like INDEX_DESC Chip Dawes Motorola Manufacturing Systems
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
MARGARET MARY-THERESA BROWN, SUNY BUFFA #11 / 12
|
 How to select the <= date row?
Quote:
>>Hi there, I would appreciate it if you could give me some ideas on the >>following problem: .....> >>I need to get the complete status of a policy, or set of policies, as it >>was at any particular point in time. This means supplying at least a date >>at which the policy's value should be retrieved, say 1 Jan 1995. Normaly >>one would just join all the tables policy number, but in this case there >>may not be row for the specified date in one or more of the related event >>specific tables. In this case I need to get the row for the closesed >>previous date. The row with the largest date less than or equal to the >>specified date, is the one that was still in force at the specified date. >>Two things make this a particularly challenging problem: >> 1) This query has to be done from a query tool, so we are restricted to a >> single SQL statement (unless Business Objects can build a report from >> more than one SQL statement). >>2) Selecting the max(specified_date) where event_date <= specified_date >> works fine, but it will take a couple of days to wade through 40 >> million rows.
I ran into the same problem. I joined the 2 tables and ran my reports against the 3rd table, at a time when no more data-entry was being done. If you are just doing an occasional query maybe you could just query the 2 table and not join them? I noticed that performance can vary greatly depending on how the indexes are set up. With tables as big as yours, I don't know if this would ever run fast enough but here it is... I ran this in a vms .com file in batch. create table aud_join as select x.rescode, x.startdate, x.enddate, x.pgmcode, x.hospcode, x.pgylevel, h.startdate h_startdate, h.enddate h_enddate, h.hospcode h_hospcode from system.resstat95 x, system.reshosp95 h where (x.startdate, x.rescode) in (select max(startdate), rescode from system.resstat95 y where (y.rescode = h.rescode and y.startdate <= h.startdate) group by y.rescode); - Peggy -
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Paul J. LaCross #12 / 12
|
 How to select the <= date row?
Quote: > I need to get the complete status of a policy, or set of policies, as it > was at any particular point in time. This means supplying at least a date > at which the policy's value should be retrieved, say 1 Jan 1995. Normaly > one would just join all the tables policy number, but in this case there > may not be row for the specified date in one or more of the related event > specific tables. In this case I need to get the row for the closesed > previous date. The row with the largest date less than or equal to the > specified date, is the one that was still in force at the specified
date. Well, you could include a start and end date/time, index it, and use between for your date in question.
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
|
|