Wednesday, March 10, 2010

Understanding difference between control flow and data flow in SSIS

You might wonder to find out difference between control flow designer and data flow designer. At a glance, they are similar, so why to divide it into 2 designer window. There must be a definite reason behind this. Yes, you are right if you think it as I think. Understanding nature of them will help you design scalable package. Below is the architecture background and nature that differentiates between them.

In architectural perspective, the smallest unit in control flow designer is task. And in data flow designer is component. Tasks require completion (success, failure or completion) before moving to next task. In data flow designer, one component will not wait for other component to work, all of them will work together in processing and managing data in streaming way.

In control flow designer, task functions as isolated unit of work. Concepts for tasks are Workflow orchestration, process oriented, serial or parallel execution and synchronous processing. Meanwhile components in data flow designer has different concept than control flow designer. Design concept for data flow designer are information oriented, data correlation and transformation, coordinated processing, streaming in nature, there must be sources and destinations. In data flow designer, component can performing data branching, data splitting and merging and provides parallel processing. All transformation works together among one another from source to destination, while at the same time data is still flowing from source. All component runs at the same time in a coordinated streaming fashion.

There could be more than one data flow task. Total execution package is calculated from first task begin to execute until last task. It is apparently clear, but it is important for there might be some tasks get executed in parallel. Maximizing parallel processing is recommended whenever possible to reduce overall package execution time.

No comments: