CloverETL examples

To download example package go to download page.

Complex examples

This examples combines several components to perform useful transformation similar to one you could use in a real life.

Join example

This graphs combines information about orders made and information about individual items/products purchased within each order. It also adds information about which customer ordered the goods. To get all this information, it needs to join data from ORDERS.DBF (a dBase table) ODETAILS.DBF and Customers.txt. First two data sets are joined using MERGE_JOIN component, thus they have to be sorted first. Then information about customer is added using HASH_JOIN. As both joins require transformation code, two transformation classes are embedded directly into graph file and dynamically compiled at run-time. As there is a possibility that data set containing info about orders references customer which is not in Customers.txt file, second join is defined to be a left join. At the end, EXT_FILTER is used to split data to two sets - one with customer info complete and the other with missing.

Note:In order to run successfully,”janino.jar” or “tools.jar” library (present in $JAVA_HOME/lib/) has to be part of CLASSPATH


We assume that examples are unzipped to the CloverETL home directory. When executing this example from command line you should be in the examples/SimpleExamples subdirectory (eg. /home/user/cloverETL/examples/SimpleExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in the following commands:

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphDBFJoin.grf

Graph graphDBFJoinTL.grf is very similar to preceding one, but with transformation written in Etl transformation language .

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphDBFJoinTL.grf

Intersection example

This graph reads personal data from two sources and finds records with the same values on corresponding fields (lname – last_name, fname – first_name). Matching pairs are transformed into one output record and saved in intersection_customer_employee.txt file.


We assume that examples are unzipped to the CloverETL home directory. When executing examples from command line you should be in the examples/SimpleExamples subdirectory (eg. /home/user/cloverETL/examples/SimpleExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in the following command:

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphIntersectData.grf

Approximative join example

Approximative join component is used for joining data which are similar on given fields. It requires data to be prepared: it joins data from two data flows with the same value of matching key and similar value of join key. In this example data are read from from file customers0.dat and from database table employee; then for both flows there is generated matching key consisting of 4 letters of last name (flat file: lname, database: last_name) and 3 letters of first name (flat file: fname, database: first_name) . Approximative join component joins data from thees flows (sorted due the matching key); it compares records with the same matching key only, but to conforming output sends only thees, which join key is similar enough (conformity attribute), so among conforming records you can found such record: 4 Damstra Robert Damstra Roberta 0.875 - with conformity 0.875 (conformity equals 1 means that records are identical).

This graph illustrates usage of CustomizedRecordTransform class too. This class extend abstract class DataRecordTransform and allows to create complex transformation in relatively easy way.


We assume that examples are unzipped to the CloverETL home directory. When executing this example from command line you should be in the examples/ExtExamples subdirectory (eg. /home/user/cloverETL/examples/ExtExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in the following command:

Executing:
Prepare transform class:
javac -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar" trans/customizedTransformExample.java

Execute graph:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphAproximativeJoin.grf

Real life example

This is practical illustration of CloverETL usage . This graph transforms client's data for bank's branch . On inputs are clients, monthly interests, service charges and half-year bonuses. On outputs we have aggregated revenues by clients, clients without revenues and clients from database, who are not in input file.


We assume that examples are unzipped to the CloverETL home directory. When executing this example from command line you should be in the examples/AdvancedExamples subdirectory (eg. /home/user/cloverETL/examples/AdvancedExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in the following command:

Executing:
jjava -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphRevenues.grf

Slowly Changing Dimension (SCD) example

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own synthetic primary key.

In our example, we originally have dimension of exchange rates:

exchange_right_id|Country|Currency|Amount|Code|Valid_to|Valid_from
2731|Australia|dollar|1|AUD||07-18-2007

After new exchange rates arrived we add new record to the dimension and update field „Valid_to“ for old record. Matching key is composed from these fields: Country, Currency, Amount, Code. For better unerstanding we have four files in our example:

  • exchange_rates_DIM.txt - dimension file (serve as source and will be updated by graph result)
  • exchange_rates_DIM_insert.txt - new records for dimension. This records should be inserted to exchange_rates_DIM.txt (this functionallity is not provided by our graph)
  • exchange_rates_DIM_update.txt - these records should be updated in existing dimesion (exchange_rates_DIM.txt This functionallity is not provided by our graph))
  • exchange_rates_matched.txt - these records match

Advantages:

  • This allows us to accurately keep all historical information.

Disadvantages:

  • This will cause the size of the table to grow. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
  • This necessarily complicates the ETL process.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.


We assume that examples are unzipped to the CloverETL home directory. When executing this example from command line you should be in the examples/AdvancedExamples subdirectory (eg. /home/user/cloverETL/examples/AdvancedExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in the following command:

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/SCDType2_example1.grf


SOAP example

The purpose of this graph is to get the information about maximum air temperatures provided by the National Digital Forecast Database via SOAP protocol. The XMLXPathReader component sends a request specified by a parameter, reads the XML file containing weather information, maps the received information to the component output ports and sends it out through these output ports to connected edges. The information about location is sent out through the port 0 and 1. Then, it is joined together. The information about temporary ranges is sent out through the port 2 and 3 and it is joined together too. After that, the temporary ranges are grouped togehther using the Dedup component according to the layout key. The information about temperatures is sent out through the port 4 and 5 and it is also joined together. After that, the temperatures and more weather information are grouped together using the Dedup component according to the layout key and location. The temperatures and temporary ranges are joined together and, after that, these records are joined with the location information. The resulting records are written to excel files. The first one contains the full information, whereas the other one contains the information about average, minimum and maximum temperature values.


We assume that examples are unzipped to the CloverETL home directory. When executing this example from command line you should be in the examples/AdvancedExamples subdirectory (eg. /home/user/cloverETL/examples/AdvancedExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in the following command:

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/SOAP_EXAMPLE.grf


complex_examples.txt · Last modified: 2009/09/16 16:02 by jausperger
Back to top
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0