Friday, March 30, 2012

Moving whole rows in the script component

I am new to script components. I would like to make a simple filter that either passes a row through untouched or eliminates it. I have my input and output buffers set the same, and I have it set as asynchronous. Now these are big rows. Is there a painless way to copy all columns from the input to output buffer, or do I have to do a "Output0Buffer.Col1 = Row.Col1" for each column?

No painless way that I am aware off.

A question: why are you doing this in a script component? This seems more suited to a conditional split. Just send the rows to be eliminated to an output that isn't connected to another component, and they are gone.

If you have to use a script component (perhaps because of extremely complex logic), you can still use synchronous outputs (which means all your columns will carry over). Just create two outputs (both synchronous) and send rows you want to one, rows to discard to another. See Jamie's post here for some info on this: http://blogs.conchango.com/jamiethomson/archive/2005/09/05/SSIS-Nugget_3A00_-Multiple-outputs-from-a-synchronous-script-transform.aspx

|||I did simplify some here. I want to sort removing duplicates of an ID column, but of the dupes, I want it to select the one with the earliest value in a date column. I tried sorting by the date, then by the ID (removing duplicates), but the date sorting was lost, of course. So the conditional split doesn't work because the decision is dependent on more than one row.

I see a way to do this with SQL, making a few views, but it ended up being overly complicated. I also have it working synchronously as you suggested. That was an easy solution! That's the one I'll keep.

I guess I liked the idea of it working asynchronous more at first, but I didn't have a good reason why! I suppose it's faster to code and faster to process doing it synchronously.

Thanks for the quick response.
|||For future reference, here's all I had to do:

Sorted the dataflow coming in on the ID that I wanted to be unique (DocumentUniqueID).
Connected up a new Script Component.
Checked off all the Input Columns (and left them as ReadOnly).
Added an Output.
Set its SynchronousInputID to "Input 0".
Set its ExclusionGroup to 1.
Added this code to the script:

Code Snippet

Public Class ScriptMain
Inherits UserComponent

Private lastUniqueID As String

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Row.DocumentUniqueID = lastUniqueID Then
Row.DirectRowToOutput1()
End If

lastUniqueID = Row.DocumentUniqueID
End Sub

End Class


|||

If the source data is in Oracle, SQL Server 2005 or any other DB that has rank() function; you could solve the problem with plain SQL. You can generate an extra column called, let's say, MyRank that will sort each set of duplicate IDs using the given criteria; in you case date. Then you use the conditional split based on MyRank column. I talked about it in this post:

http://rafael-salas.blogspot.com/2007/04/remove-duplicates-using-t-sql-rank.html

|||

Rafael Salas wrote:

If the source data is in Oracle, SQL Server 2005 or any other DB that has rank() function; you could solve the problem with plain SQL. You can generate an extra column called, let's say, MyRank that will sort each set of duplicate IDs using the given criteria; in you case date. Then you use the conditional split based on MyRank column. I talked about it in this post:

http://rafael-salas.blogspot.com/2007/04/remove-duplicates-using-t-sql-rank.html

That's an even better solution. RANK() isn't a function I've used extensively, I overlooked it. Thanks for the tip.
sql

No comments:

Post a Comment