SSIS Aggregate Transformation
Like the aggregate functions in SQL SELECT statement, the Aggregate transformation is used to apply the aggregate functions to the input columns and put the result in the output columns. The GROUP By clause is provided to specify groups to aggregate across.
The operations supported in the transformation are listed below.
Operation | Description | Transact-SQL Keywords |
---|---|---|
Group by | Divides datasets into groups | GROUP BY |
Count | The number of rows in a group | COUNT |
Count distinct | The number of unique non-null values in a group | N/A |
Sum | The total of the values in a column | SUM |
Average | The average of the values in a column | AVG |
Minimum | The minimum value in a group | MIN |
Maximum | The maximum value in a group | MAX |
In the following example, I'm going to use the Aggregate transformation to get the number of books and its total cost by the year of the bought date. We'll use the the same data source as that in the previous section.
- Copy & past to a copy of the package Multicast.dtsx and rename the copy to Aggregate.dtsx.
- Open the package and click Data Flow tab, then remove the output from the Multicast to the Row Count transform.
- Drag and drop Derived Column and Aggregate transformation to the editor area and link them together as shown below.
- Right-click Derived Column transform and choose "Edit..." to open its editor. Add a new column Year and configure its expression like below then click OK button.
- Right-click the Aggregate transform and choose "Edit..." to open the Aggregate Transformation Editor. Then choose Year, (*) and Price and make their settings.
- Click OK button. Then add Data Viewer before and after the Aggregate transform.
- Run the package.
We'll group by the new column "Year" in the next step.
We chose "Group by" operation for Year, "Count all" for all the records with its output alias changed to "Number of Books", and "Sum" for Price with Output Alias changed to "Total Cost". CountDistinctScale and CountDistinctKeys are properties for Distinct count operation and their settings will improve running performance.
The left side is the Input data of the Aggregate transform and the right side is the output data. As you can see the Aggregate transformation generates aggregation data based on its input data.