Moving Log Files: Moving Records for Archiving

I've got two tables, one is an archive of the second (tables are identical). I'd like to migrate records from one to the other (as in, move, insert into one while deleting from the other).

I know I can start a transaction, do an INSERT INTO...SELECT, followed by a DELETE, check rows affected, then closing with a commit transaction (or rollback if the counts don't match), but it seems as though I might be over thinking it. Is this considered the optimal approach?No, not really. An alternate approach might be to insert into the historical tables with an insert date and update "current" table with an active flag. Your OLTP will always look for the active flag.

If you use this approach, you're not deleting thousands of rows out of your tables, messing up your statistics, and blowing holes in your indexes during the middle of the day.|||Right. I did write the an SP to UNION the two tables (origin and archive) if I needed to include history in my queries, and I don't much like it, so I can see what you mean. Just add an Archived flag to the origin table, then run a process regularly to insert non-archived records into the historical archive. In which case, I'll need to include that Archived flag in my indexes for optimization.

If I go this route though, now that I reflect, I won't actually need an archive table (unless I simply want a physical backup - which is irrelevant with our nightly backup procedures). I suppose I could offload historical queries to the archive table, but historical research is rarely done, so the savings to the origin table would be negligable. With that, I'm going to have to re-evaluate my need for a second 'archive' table. Assuming historical research is not a regular activity, can you think of a reason why it would be beneficial to maintain a separate archive table if I implement an archive flag within the origin table?|||You didn't mention volume...but I would say the indicator won't optimize well

Also I imagine there would be more Update activity on the current and none on the history...so they might have different indexing strategies...

MOO

I'd go with 2|||Volume.. right.. my bad. Currently there's a low amount of data (only about 8k records), but this will increase as the application is used (these records represent transactions sent to our payroll system, so you can imagine this growing quite large).

So, Brett, you believe that two tables doing an INSERT INTO...SELECT and a DELETE wrapped in a Transaction would be optimal? Or is there an alternative SQL query that could accomplish the same job? I do understand the though about different indexes, it does make sense.

I believe I'm just concerned that all the INSERTs execute, and the DELETEs don't even begin until after the INSERTs complete. Although, with a transaction, the issue is moot I suppose, since all of the INSERTs and DELETEs can rollback if an issue is encountered. I realize I could do a 1-to-1 migration, but it would require starting and committing a transaction for every record being archived, and that doesn't seem efficient.|||Well, I guess if you want to track historical views of the data (a common practice), I would use a trigger to move to historical data, so I would not be bound to anyone process...

With that said, it sounds like you only want to keep the last change...right?

I would keep them all...

What Business need is this meant to support?|||If you are going to delete from the production table, your delete statement should include a link to the archive table on the primary key. That way, you ensure you do not delete any records which have not been moved to the archive table.

delete
from production
inner join archive on production.pkey = archive.pkey|||Ok.. The requirement is that I keep a record of the transaction sent to payroll for umpteen years. To smooth this out a bit, I was going to archive anything over 12 months into an archive table, removing the original record from the original table once it had been moved.

I figured as the original table grows, queries against the original table will take longer unless I do something. Hence, the creation of the archive table. Now, historical research is only done by administrators of the system, so I'm not worried about that, I'm more concerned about the primary transaction table since it's hit pretty frequently. So I figured I'd offload transactions over 12-months to the historical table to try to keep things clean, but I wanted to determine the most safe and efficient means of migrating those records.

If I have to do it in two queries, that's fine, I just wasn't sure if I was on the right track.

Blindman.. I like that delete, I didn't even think of doing it that way.

BEGIN TRAN T1
INSERT INTO tblTransArchive SELECT * FROM tblTrans WHERE dtTimeStamp < DATEADD(m, -12, GETDATE())
DELETE tblTrans FROM tblTrans T INNER JOIN tblTransArchive TA ON T.iTransID= TA.iTransID
COMMIT TRAN T1

This code works, and I'm fine with it, as long as it's the optimal way to do it.|||Here's the SP I created based around that:

CREATE PROCEDURE [dbo].[spArchiveTrans] AS

DECLARE @.InsertCount int
DECLARE @.DeleteCount int

BEGIN TRAN T1
INSERT INTO tblTransArchive SELECT * FROM tblTrans WHERE dtTimeStamp < DATEADD(m, 0, GETDATE())
SET @.InsertCount = @.@.ROWCOUNT
IF @.@.ERROR = 0
BEGIN
IF @.InsertCount > 0
BEGIN
DELETE tblTrans FROM tblTrans O INNER JOIN tblTransArchive OA ON T.iOrderID = TA.iOrderID
SET @.DeleteCount = @.@.ROWCOUNT
IF @.@.ERROR = 0
BEGIN
IF @.InsertCount = @.DeleteCount
COMMIT TRAN T1
ELSE
BEGIN
ROLLBACK TRAN T1
RAISERROR('Archive totals did not match. Transactions rolled back.', 16, 1)
END
END
ELSE
BEGIN
ROLLBACK TRAN T1
RAISERROR('Error removing transactions from origin. Transactions rolled back.', 16, 1)
END
END
ELSE
BEGIN
--Nothing to do
ROLLBACK TRAN T1
END
END
ELSE
BEGIN
ROLLBACK TRAN T1
RAISERROR('Error inserting transactions into archive. Transactions rolled back.', 16, 1)
END
GO|||Why don't you run a test with that archive flag idea first? Compare that to the performance issues all this moving of data and testing is going to cause you. You really want to nail this thing the first time around, or you'll be paying for it later.|||Yeah.. I'll give that a shot. My only concern with the archive flag is that the data will always be there intertwined with the active data. So every request for active data will have to sift through archived records. If I run a cleanup procedure every night, the performance impact is neglegable. Alternately, I'll have to find all the queries that hit the original table and add another flag to the query to ensure it pulls the correct columns, which I'm not complaining about, just concerned about missing one.

Wednesday, March 7, 2012

Moving Records for Archiving

No comments:

Post a Comment

Moving Log Files

Blog Archive

About Me