Understanding the Designer tool_SAP Data Services 4.x Cookbook-QQ阅读男生玄幻网

上QQ阅读APP看书，第一时间看更新

Understanding the Designer tool

Now that we have reviewed all the important server and client components of our new Data Services installation, it is time to get familiar with the most usable and most important tool in the Data Services product package. It will be our main focus in the following chapters, and of course, I am talking about our development GUI: the Designer tool.

Every object you create in Designer is stored in a local object library, which is a logical storage unit part of the physical local repository database. In this recipe, we will log in to a local repository via Designer, set up a couple of settings, and write our first "Hello World" program.

Getting ready…

Your Data Services ETL development environment is fully deployed and configured, so go ahead and start the Designer application.

How to do it…

First, let's change some default options to make our development life a little bit easier and to see how options windows in Data Services looks:

When you launch your Designer application, you see quite a sophisticated login screen. Enter the etl username we created in one of the previous recipes and its password to see the list of repositories available in the system.
At this point, you should see only one local repository, DS4_REPO, that was created by default during the Data Services installation. Double-click on it.
You should see your Designer application started.
Go to Tools | Options.
In the opened window, expand the Designer tree and choose General.
Set the Number of characters in workspace icon name option to 50 and select the Automatically calculate column mappings checkbox.
Click on OK to close the options window.

Before we create our first "Hello World" program, let's quickly take a look at Designer's user interface.

In this recipe, you will be required to work with only two areas: Local Object Library and the main development area. The biggest window on the right-hand side with the Start Page tab will open by default.

Local Object Library contains tabs with lists of objects you can create or use during your ETL development. These objects include Projects, Jobs, Work Flows, Data Flows, Transforms, Datastores, Formats, and Custom Functions:

All tabs are empty, as you have not created any objects of any kind yet, except for the Transforms tab. This tab contains a predefined set of transforms available for you to use for ETL development. Data Services does not allow you to create your own transforms (there is an exception that we will discuss in the upcoming chapters). So, everything you see on this tab is basically everything that is available for you to manipulate your data with.

Now, let's create our first "Hello World" program. As ETL development in Data Services is not quite the usual experience of developing with a programming language, we should agree on what our first program should do. In almost any programming language related book, this kind of program just performs an output of a "Hello World" string onto your screen. In our case, we will generate a "Hello World" string and output it in a table that will be automatically created by Data Services in our target database.

In the Designer application, go to the Local Object Library window, choose the Jobs tab, right-click on the Batch Jobs tree, and select New from the list of options that appears.

Choose the name for a new job Job_HelloWorld and enter it. After the job is created, double-click on it.
You will enter the job design window (see Job_HelloWorld – Job at the bottom of the application), and now, you can add objects to your job and set up its variables and parameters.
In the design window of the Job_HelloWorld – Job tab, create a dataflow. To do this, from the right tool panel, choose Data Flow object and left-click on a main design window to create it. Name it DF_HelloWorld.
Double-click on a newly created dataflow (or just click once on its title) to open the Data Flow design window. It appears as another tab in the main design window area.
Now, when we are designing the processing unit or dataflow, we can choose the transforms from the Transforms tab of the Local Object Library window to perform manipulation with the data. Click on the Transforms tab.
Here, select the Platform transforms tree and drag and drop the Row_Generation transform from it to the Data Flow design window.
Note
As we are generating a new "Hello World!" string, we should use the Row_Generation transform. It is a very useful way of generating rows in Data Services. All other transforms are performing operations on the rows extracted from source objects (tables or files) that are passing from source to target within a dataflow. In this example, we do not have a source table. Hence, we have to generate a record.
By default, the Row_Generation transform generates only one row with the ID as 0. Now, we have to create our string and present it as a field in a future target table. For this, we need to use the Query transform. Select it from the right tool panel or drag and drop it from Transforms to Platform. The icon of the Query transforms looks like this:
In the Data Flow design window, link Row_Generation to Query, as shown here, and double-click on the Query transform to open the Query Editor tab:

Note
In the next chapter, we will explain the details of the Query transform. In the meantime, let's just say that this is one of the most used transforms in Data Services. It allows you to join flows of your data and modify the dataset by adding/removing columns in the row, changing data types, and performing grouping operations. On the left-hand side of the Query Editor, you will see an incoming set of columns, and on the right-hand side, you will see the output. This is where you will define all your transformation functions for specific fields or assign hard-coded values. We are not interested in the incoming ID generated by the Row_Generation transform. For us, it served the purpose of creating a row that will hold our "Hello World!" value and will be inserted in a table.
In the right panel of Query Editor, right-click on Query and choose New Output Column…:
Select the following settings in the opened Column Properties window to define the properties of our newly created column and click on OK:
Now, when our generated row has one column, we have to populate it with value. For this, we have to use the Mapping tab in Query Editor. Select our output field TEXT and enter the "Hello World!" value in the mapping tab window. Do not forget single quotes, which mean a string in DS. Then, close Query Editor either with the tab cross in the top-right corner (do not confuse it with the Designer application cross that is located dangerously close to it) or just use the Back button (Alt + Left), a green arrow icon in the top instrument panel.
At this point, we have a source in our dataflow. We also have a transformation object (the Query transform), which defines our text column and assigns a value to it. What is missing is a target object we will insert our row to.
As we will use a table as a target object, we have to create a reference to a database within Data Services. We will use this reference to create a target table. Those database references are called datastores and are used as a presentation of the database layer. In the next step, we will create a reference to our STAGE database created in the previous chapter.
Go to the Datastores tab of Local Object Library. Then, right-click on the empty window and select New to open the Create New Datastore window.
Choose the following settings for the newly created datastore object:
Repeat steps 12 and 13 to create the rest of datastore objects connected to the databases we created in the previous recipes. Use the same database server name and user credentials and change only the Datastore Name and Database name fields when creating new datastores. See the following table for reference:

Now, you should have four datastores created, referencing all databases created in the SQL server: DS_STAGE, DS_ODS, DWH, and OLTP.
Now, we can use the DS_STAGE datastore to create our target table. Go back to the DF_HelloWorld in the Data Flow tab of the design window and select Template Table on the right tool panel. Put it on the right-hand side of the Query transform and choose HELLO_WORLD as the table name in the DS_STAGE datastore.
Our final dataflow should look like this now:
Go back to the Job_HelloWorld – Job tab and click on the Validate All button in the top instrument panel. You should get the following message in the output window of Designer on the left-hand side of your screen: Validate: No Errors Found (BODI-1270017).
Now, we are ready to execute our first job. For this, use the Execute… (F8) button from the top instrument panel. Agree to save the current objects and click on OK on the following screen.
See that the log screen that shows you the execution steps contains no execution errors. Then, go to your SQL Server Management Studio, open the STAGE database, and check the contents of the appeared HELLO_WORLD table. It has just one column, TEXT, with only one value, "Hello World!".

How it works…

"Hello World!" is a small example that introduces a lot of general and even sophisticated concepts. In the following sections, we will quickly review the most important ones. They will help you get familiar with the development environment in Data Services Designer. Keep in mind that we will return to all these subjects again throughout the book, discussing them in more detail.

Executing ETL code in Data Services

To execute any ETL code developed in the Data Services Designer tool, you have to create a job object. In Data Services, the only executable object is job. Everything else goes inside the job.

ETL code is organized as a hierarchy of objects inside the job object. To modify any new object by placing another object in it, you have to open the edited object in the main workspace design area and then drag and drop the required object inside it, placing them in the workspace area. In our recipe, we created a job object and placed the dataflow object in it. We then opened the dataflow object in the workspace area and placed transform objects inside it. As you can see in the following screenshot, workspace areas opened previously could be accessible through the tabs at the bottom of the workspace area:

The Project Area panel can display the hierarchy of objects in the form of a tree. To see it, you have to assign your newly created job to a specific project and open the project in Project Area by double-clicking on the project object in Local Object Library.

Executable ETL code contains one job object and can contain script, dataflow, and workflow objects combined in various ways inside the job.

As you saw from the recipe steps, you can create a new job by going to Local Object Library | Jobs.

Although you can combine all types of objects by placing them in the job directly, some objects, for example, transform objects, can be placed only into dataflow objects as dataflow is the only type of object that can process and actually migrate data (on a row-by-row basis). Hence, all transformations should happen only inside the dataflow. In the same way, you can only place datastore objects, such as tables and views, directly in dataflows as source and target objects for data to be moved from source to target and transformed along the way. When a dataflow object is executed within the job, it reads data row by row from the source and moves the row from left to right to the next transform object inside the dataflow until it reaches the end and is sent to the target object, which usually is a database table.

Throughout this book, you will learn the purpose of each object type and how and when it can be used.

For now, remember that all objects inside the job are executed in the sequential order from left to right if they are connected and simultaneously if they are not. Another important rule is that the parent object starts executing first and then all objects inside it. The parent object completes its execution only after all child objects have completed successfully.

Validating ETL code

To avoid job execution failures due to incorrect ETL syntax, you can validate the job and all its objects with the Validate Current or Validate All button on the top instrument panel inside the Designer tool:

Validate Current validates only the current object opened in the workspace design area and script objects in it and does not validate the underlying child object such as dataflows and workflows. In the preceding example, the object opened in the workspace is a job object that has one child dataflow object called DF_HelloWorld inside it. Only one job object will be validated and not DF_HelloWorld.

Validate All validates the current and all underlying objects. So, both are currently opened in the workspace object, and all objects you see in the workspace are validated. The same applies to the objects nested inside them, down to the very end of the object hierarchy.

So, to validate the whole job and its objects, you have to go to the job level by opening the job object in the workspace area and clicking on Validate All button on the top instrument panel.

Validation results are displayed in the Output panel. Warning messages do not affect the execution of the job and often indicate possible ETL design problems or show data type conversions performed by Data Services automatically. Error messages in the Output | Errors tab mean syntax or critical design errors made in ETL. Whenever you try to run the job after seeing "red" error validation messages, the job will fail with exactly the same errors that you saw at the beginning of execution, as every job is implicitly validated when executed.

Always validate your job manually before executing it to avoid job failures due to incorrect syntax or incorrect ETL design.

Template tables

This is a convenient way to specify the target table that does not yet exist in the database and send data to it. When a dataflow object where the template target table object is placed is executed, it runs two DDL commands, DROP TABLE <template table name> and CREATE TABLE <template table name>, using the output schema (set of columns) of the last object inside the dataflow before the target template table. Only after that, the dataflow processes all the data from the source, passing rows from left to right through all transformations, and finally inserts data into the freshly created target table.

Note

Note that tables are not created on the database level from template tables until the ETL code (dataflow object) is executed within Data Services. Simply placing the template table object inside a dataflow and creating it in a datastore structure is not enough for the actual physical table to be created in the database. You have to run your code.

They are displayed under different categories in the datastore. They appear separately from normal table objects:

The usage of template table is extremely useful during ETL development and testing. It enables you to not think about going to the database level and changing the structure of the tables by altering, deleting, or creating them manually if the ETL code that inserts the data in the table changes. Every time dataflow runs, it will be deleting and recreating the database table defined through the template table object, with the currently required table structure defined by your current ETL code.

Template table objects are easily converted to normal table objects using the "Import" command on them. This command is available from the object's context menu in the dataflow workspace or in the datastores tab in Local Object Library.

Query transform basics

Query transform is one of the most important and most often used transform objects in Data Services. Its main purpose is to read data from left object(s) (input schema(s)) and send data to the output schema (object to the right of the Query transform). You can join multiple datasets with the help of the Query transform using syntax rules of the SQL language.

Additionally, you can specify the mapping rules for the output schema columns inside the Query transform by applying various functions to the mapped fields. You can also specify hard-coded values or even create additional output schema columns, like we did in our HelloWorld example.

The example in the next screenshot is not from our HelloWorld example. However, it demonstrates how the row extracted previously from the source object (input schema) can be augmented with extra columns or can get its columns renamed or its values transformed by functions applied to the columns:

See how columns from two different tables are combined in a single dataset in the output schema, with columns renamed according to new standards and new columns created with NULL values in them.

The HelloWorld example

You have just created the simplest dataflow processing unit and executed it within your first job.

The dataflow object in our example has the Row_Generation transform, which generates rows with only one field. We generated one row with the help of this transform and added an extra field to the row with the help of the Query transform. We then inserted our final row into the HELLO_WORLD table created automatically by Data Services in the STAGE database.

You also have configured a couple of Designer properties and created a Datastore object that represents the Data Services view of the underlying database level. Not all database objects (tables and views) are visible within your datastore by default. You have to import only those you are going to work with. In our HelloWorld example, we did not import the table in the datastore, as we used the template table. To import the table that exists in the database into your datastore so that it can be used in ETL development, you can perform the following steps:

Go to Local Object Library | Datastores.
Expand the datastore object you want to import the table in.
Double-click on the Tables section to open the list of database tables available for import:
Right-click on the specific table in the External Metadata list and choose Import from the table context menu.
The table object will now appear in the Tables section of the chosen datastore. As it has not yet been placed in any dataflow object, the Usage column shows a 0 value:

Creating different datastores for the same database could also be a flexible and convenient way of categorizing your source and target systems.

There is also a concept of configurations when you can create multiple configurations of the same datastore with different parameters and switch between them. This is very useful when you are working in a complex development environment with development, test, and production databases. However, this is a topic for future discussion in the upcoming chapters.