CloverETL examples

To download example package go to download page.

Simple Examples

SimpleCopy example

This example illustrates usage of SimpleCopy and Trash components. It reads employees.dat file, parses the records, sends them to SimpleCopy component which in turn replicates them to several Trash components connected to SimpleCopy component. One of the Trash components is configured to dump all incoming records into text file Trash1_output.txt - mostly for debugging purpose. The Trash component normally discards all records it receives on input port.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSimpleCopy.grf

Or you can use proper script:


This is the expected output produced by the framework:

***  CloverETL framework/transformation graph runner ver 2.1, (c) 2002-06 D.Pavlis, released under GNU Lesser General Public License  ***
 Running with framework version: 2.1 build#385 compiled 14/02/2007 18:12:57

DEBUG [main] - Plugin org.jetel.lookup loaded.
        id - org.jetel.lookup
        version - 2.1.0
        provider-name - null
                point-id  - lookup - {className=org.jetel.lookup.SimpleLookupTable, type=simpleLookup}
                point-id  - lookup - {className=org.jetel.lookup.DBLookupTable, type=dbLookup}

DEBUG [main] - Plugin org.jetel.sequence loaded.
        id - org.jetel.sequence
        version - 2.1.0
        provider-name - null
                point-id  - sequence - {className=org.jetel.sequence.SimpleSequence, type=SIMPLE_SEQUENCE}
                point-id  - sequence - {className=org.jetel.sequence.PrimitiveSequence, type=PRIMITIVE_SEQUENCE}

DEBUG [main] - Plugin org.jetel.thirdparty loaded.
        id - org.jetel.thirdparty
        version - 2.1.0
        provider-name - null
                point-id  - component - {type=XML_EXTRACT, className=org.jetel.component.XMLExtract}
                point-id  - component - {type=FILTER, className=org.jetel.component.Filter}
                point-id  - component - {type=LDAP_READER, className=com.linagora.component.LdapReader}
                point-id  - component - {type=LDAP_WRITER, className=com.linagora.component.LdapWriter}

DEBUG [main] - Plugin org.jetel.connection loaded.
        id - org.jetel.connection
        version - 2.1.0
        provider-name - null
                point-id  - connection - {className=org.jetel.connection.DBConnection, type=JDBC}
                point-id  - connection - {className=org.jetel.connection.JmsConnection, type=JMS}

DEBUG [main] - Plugin org.jetel.component loaded.
        id - org.jetel.component
        version - 2.1.0
        provider-name - null
                point-id  - component - {type=DATA_READER, className=org.jetel.component.DataReader}
                point-id  - component - {type=DATA_WRITER, className=org.jetel.component.DataWriter}
                point-id  - component - {type=DELIMITED_DATA_READER, className=org.jetel.component.DelimitedDataReader}
                point-id  - component - {type=DELIMITED_DATA_WRITER, className=org.jetel.component.DelimitedDataWriter}
                point-id  - component - {type=SIMPLE_COPY, className=org.jetel.component.SimpleCopy}
                point-id  - component - {type=CONCATENATE, className=org.jetel.component.Concatenate}
                point-id  - component - {type=SIMPLE_GATHER, className=org.jetel.component.SimpleGather}
                point-id  - component - {type=REFORMAT, className=org.jetel.component.Reformat}
                point-id  - component - {type=DB_INPUT_TABLE, className=org.jetel.component.DBInputTable}
                point-id  - component - {type=SORT, className=org.jetel.component.Sort}
                point-id  - component - {type=DB_OUTPUT_TABLE, className=org.jetel.component.DBOutputTable}
                point-id  - component - {type=FIXLEN_DATA_WRITER, className=org.jetel.component.FixLenDataWriter}
                point-id  - component - {type=DEDUP, className=org.jetel.component.Dedup}
                point-id  - component - {type=FIXLEN_DATA_READER, className=org.jetel.component.FixLenDataReader}
                point-id  - component - {type=MERGE, className=org.jetel.component.Merge}
                point-id  - component - {type=MERGE_JOIN, className=org.jetel.component.MergeJoin}
                point-id  - component - {type=EXT_MERGE_JOIN, className=org.jetel.component.MergeJoin}
                point-id  - component - {type=SORTED_JOIN, className=org.jetel.component.MergeJoin}
                point-id  - component - {type=TRASH, className=org.jetel.component.Trash}
                point-id  - component - {type=DB_EXECUTE, className=org.jetel.component.DBExecute}
                point-id  - component - {type=HASH_JOIN, className=org.jetel.component.HashJoin}
                point-id  - component - {type=EXT_HASH_JOIN, className=org.jetel.component.HashJoin}
                point-id  - component - {type=CHECK_FOREIGN_KEY, className=org.jetel.component.CheckForeignKey}
                point-id  - component - {type=DBF_DATA_READER, className=org.jetel.component.DBFDataReader}
                point-id  - component - {type=EXT_FILTER, className=org.jetel.component.ExtFilter}
                point-id  - component - {type=EXT_SORT, className=org.jetel.component.ExtSort}
                point-id  - component - {type=PARTITION, className=org.jetel.component.Partition}
                point-id  - component - {type=DATA_INTERSECTION, className=org.jetel.component.DataIntersection}
                point-id  - component - {type=AGGREGATE, className=org.jetel.component.Aggregate}
                point-id  - component - {type=SYS_EXECUTE, className=org.jetel.component.SystemExecute}
                point-id  - component - {type=KEY_GEN, className=org.jetel.component.KeyGenerator}
                point-id  - component - {type=APROX_MERGE_JOIN, className=org.jetel.component.AproxMergeJoin}
                point-id  - component - {type=DBJOIN, className=org.jetel.component.DBJoin}
                point-id  - component - {type=XLS_READER, className=org.jetel.component.XLSReader}
                point-id  - component - {type=XLS_WRITER, className=org.jetel.component.XLSWriter}
                point-id  - component - {type=CLOVER_WRITER, className=org.jetel.component.CloverDataWriter}
                point-id  - component - {type=CLOVER_READER, className=org.jetel.component.CloverDataReader}
                point-id  - component - {className=org.jetel.component.MysqlDataReader, type=MYSQL_DATA_READER}
                point-id  - component - {className=org.jetel.component.MysqlDataWriter, type=MYSQL_DATA_WRITER}
                point-id  - component - {className=org.jetel.component.StructureWriter, type=STRUCTURE_WRITER}
                point-id  - component - {className=org.jetel.component.Normalizer, type=NORMALIZER}
                point-id  - component - {className=org.jetel.component.Denormalizer, type=DENORMALIZER}
                point-id  - component - {className=org.jetel.component.JmsReader, type=JMS_READER}
                point-id  - component - {className=org.jetel.component.JmsWriter, type=JMS_WRITER}
                point-id  - component - {className=org.jetel.component.LookupJoin, type=LOOKUP_JOIN}
                point-id  - component - {className=org.jetel.component.LookupTableReaderWriter, type=LOOKUP_TABLE_READER_WRITER}
                point-id  - component - {className=org.jetel.component.DataGenerator, type=DATA_GENERATOR}
                point-id  - component - {className=org.jetel.component.MultiLevelReader, type=MULTI_LEVEL_READER}
                point-id  - component - {className=org.jetel.component.SequenceChecker, type=SEQUENCE_CHECKER}

DEBUG [main] - Plugin cz.opensys.oracle loaded.
        id - cz.opensys.oracle
        version - 2.1.0
        provider-name -
                point-id  - component - {className=cz.opensys.oracle.component.OracleDataReader, type=ORACLE_DATA_READER}
                point-id  - component - {className=cz.opensys.oracle.component.OracleDataWriter, type=ORACLE_DATA_WRITER}

Graph definition file: graphSimpleCopy.grf
INFO  [main] - Checking graph configuration...
DEBUG [main] - Opening input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [main] - Reading input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
INFO  [main] - Starting WatchDog thread ...
INFO  [WatchDog] - Thread started.
INFO  [WatchDog] - Running on 1 CPU(s) max available memory for JVM 65088 KB
INFO  [WatchDog] - [Clover] Initializing phase: 0
DEBUG [WatchDog] -  initializing edges:
DEBUG [WatchDog] -  all edges initialized successfully...
DEBUG [WatchDog] -  initializing nodes:
DEBUG [WatchDog] -      BROADCAST ...OK
DEBUG [WatchDog] -      BROADCAST2 ...OK
DEBUG [WatchDog] - Opening input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [WatchDog] - Reading input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [WatchDog] -      INPUT1 ...OK
DEBUG [WatchDog] -      TRASH1 ...OK
DEBUG [WatchDog] -      TRASH2 ...OK
DEBUG [WatchDog] -      TRASH3 ...OK
INFO  [WatchDog] - [Clover] phase: 0 initialized successfully.
INFO  [WatchDog] - Starting up all nodes in phase [0]
DEBUG [WatchDog] - BROADCAST ... started
DEBUG [WatchDog] - BROADCAST2 ... started
DEBUG [WatchDog] - INPUT1 ... started
DEBUG [WatchDog] - TRASH1 ... started
DEBUG [WatchDog] - TRASH2 ... started
DEBUG [WatchDog] - TRASH3 ... started
INFO  [WatchDog] - Sucessfully started all nodes in phase!
INFO  [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 1
INFO  [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO  [WatchDog] - Time: 15/03/07 15:22:29
INFO  [WatchDog] - Node                   Status     Port      #Records         #KB  Rec/s   KB/s
INFO  [WatchDog] - ----------------------------------------------------------------------------------
INFO  [WatchDog] - BROADCAST              FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] -                                   Out:0           12           8      0       0
INFO  [WatchDog] -                                   Out:1           12           8      0       0
INFO  [WatchDog] -                                   Out:2           12           8      0       0
INFO  [WatchDog] - BROADCAST2             FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] -                                   Out:0           12           8      0       0
INFO  [WatchDog] - INPUT1                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                         Out:0           12           8      0       0
INFO  [WatchDog] - TRASH1                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] - TRASH2                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] - TRASH3                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO  [WatchDog] - Forcing garbage collection ...
INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 0                                  1              1007
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO  [main] - WatchDog thread finished - total execution time: 1 (sec)
INFO  [main] - Graph execution finished successfully
Execution of graph successful !

Reformat example

Reformat component changes format and/or content of data records by applying “reformat” method on all data records flowing through it. For every record it receives through input port it calls transform() method of specified class (name defined through transformClass attribute of Reformat component), which has to implement org.jetel.component.RecordTransform interface. The transformation class has to be compiled separately prior to running Reformat component and it must reside in directory accessed by Java when searching for classes (i.e. either included in CLASSPATH or defined using ”-cp” parameter when starting java VM).

Executing:
Prepare reformat class:
javac -cp "../cloveretl.engine.jar" javaExamples/reformatOrders.java
Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphOrdersReformat.grf

Reformat example with transformation class source embedded

This example is similar to previous one except transformation class (its source code) is directly embedded into graph definition file. The source code is automatically compiled when the graph is started and then transformation object is created to be executed for each record.
When the graph is ran subsequently, the source is recompiled only if it changes between two runs.
In order to run successfully, environment variable $JAVA_HOME have to be set correctly.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphOrdersReformatInline.grf

Reformat example with Transformation Language Lite

This example is similar to previous ones except transformation is given in special form. If you need simple transformation, when output fields are combinations of inputs fields and some parameters, this is the simplest way how to do it.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphOrdersReformatUsingTransform.grf

Reformat example with Transformation Language

This example is similar to previous ones except transformation is given in Etl transform language. In this language you can write complicated transformations in relatively easy way.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphOrdersTLReformat.grf

Merge example

Merge component combines several sorted input data flows into one based on specified key. The resulting data flow is sorted according to specified key.
This component can be successfully used when you have several sorted data files and you need to combine them into one big sorted file, but you don't want to waste resources on resorting the big file.
The testing graph reads data form orders.sorted file (actually three readers access the file at the same time) and stores the result into orders.merged file.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphMergeData.grf

Filter example

Filter component allows filtering any input data records based on specified filter condition. It can split flow of data into two. One with records which fulfill the condition and the other with those which don't.
Filter condition is composed of one or more simple comparisons <,>,<=,>=,==,!=. Individual comparisons can be combined using logical operators “AND” and “OR”. Parenthesis can be used to group comparisons. There are some logical functions (see description of Extended Filter Component) you can use too.
This graph is a bit more complex and illustrates usage of Filter and HashJoin components. It produces four output files:

  • all employees who were hired before 1993-12-31
  • all employees who have in their memo that they are “fluent” in some language
  • all employees who have in their memo that they are “fluent” in some language and were hired before 1993-12-31
  • all employees who were hired after 1993-12-31

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphExtFilter.grf

Join example

Join component joins two or more sorted input data flows into one based on specified key. It can produce classical inner join or left outer join (based on parameters).
This component can be useful when you have main data set - for example with orders (driver) containing part number and you need to join it with other data set containing price for each part (slave).
The testing graph reads data form orders.dat file (driver) and employees.dat and produces records with combined information from both files.
Each pair of corresponding records is transformed by reformatJoinTest.class. The source code for this class is attached as well.

Executing:
Prepare reformat class:
javac -cp "../cloveretl.engine.jar" javaExamples/reformatJoinTest.java

Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphJoinData.grf

Hash Join example

Hash Join component joins two or more input data flows into one based on specified key. It can produce classical inner join or left outer join (based on parameters).
Functionality of this component is similar to previous one, but it doesn't require input data to be sorted. It is most suited to join data sets when one is relatively large (master) and the others are small (slaves) - typically when using lookup table to convert key into corresponding value.
The testing graph reads data form orders.dat file (driver) and employees.dat and produces records with combined information from both files.
Each pair of corresponding records is transformed by reformatJoinTest.class. The source code for this class is attached as well.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphJoinHash.grf

Hash Join with transformation class source embedded

As with Reformat component, transformation class which is executed for input records can be embedded directly into graph. It is again dynamically compiled at run-time.
In this example you can see joining more then two input sets. Master data from orders.dat are joined with two slave inputs: first employees.dat due to key field EmployeeID and with the second, customers_delimited.txt, due to key field CustomerID. So in output record there are information from all three input files.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphJoinHashInline.grf

Hash Join with Transformation Language Lite

As with Reformat component, transformation can be written in summary form.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphJoinHashUsingTransform.grf

Hash Join with Transformation Language

As with Reformat component, transformation can be written in Etl transform language.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphJoinHashUsingTransformLanguage.grf

Sort example

Sort component sorts input data flow based on specified key. You can use more than one field name for key.
Sorted data flow is required by several components (Merge, Join, etc..) so this is very useful component.
The testing graph reads data form orders.dat file, sorts them and stores the result into orders.sorted file.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSortData.grf

Sort Universal example

Similar to previous example, this one sorts input data. The difference is that the exact datafile to be sorted is not hardcoded within graph definition, it is determined based on supplied parameters.
CloverETL allows global graph properties to be defined and these properties can be reference from within the graph. Each reference is replaced by property value at run-time. In our example parameter's definitions are embedded in graph code (Executing: java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSortUniversal.grf) but it is possible to define parameters in-line. In such case you can pass on parameters in executing command:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins -P:sortKey=<..your.fieldName..> -P:metadata=<..metadata filename..> -P:data=<..datafile..> -P:outputData=<..datafile..> graphSortUniversal.grf

like this

java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins -P:sortKey="CompanyName" -P:metadata="metadata/delimited/customers.fmt" -P:data="data/delimited/customers_delimited.txt" -P:outputData="output/output.txt" graphSortUniversal.grf

Phases example

Components can be grouped into individual phases. Each phase has assigned a unique number. In each transformation graph, there must exists at least one phase (with any number assigned to it). Execution of graph is done by phases. It starts with phase with the lowest number and continues in ascending order. All components in particular phase are initialized and run. Then components from next phase are started and so on. If there are any connections (Edges) between components from different phases, these connections are buffered so the first component (from preceding phase) can finish its job and the second component (from following - not necessarily next) can read the data produced by the fist component.

Example presented here is divided into three phases. First,data is read and sorted, in second,data is copied into three similar flows. In third, data is again merged into one flow and saved to file.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphPhasesDemo.grf

Partition data example

Partition component offers various ways of splitting input data flow into several output flows. The exact number of output flows depends on how many output ports are connected.
There are currently three partition algorithms implemented:

  • Round-Robin - first record goes into first port, second into second…etc. When the last port was used, it starts again with the first one.
  • Hash - based on specified key, a hash value is calculated and based on it the output port is selected. It should be secured that for two different records with same keys, the exact same output port is selected.
  • Range - based on specified key and range array, records are sent to output port based on value present in key-field. It is very similar to CASE operator, with ranges defined by their upper boundary (inclusive).

It is possible to write your own partition function. It has to implement org.jetel.component.partition.PartitionFunction and can be write as java source or in Etl transform language.

Executing example:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphPartition.grf

System call example

It is possible to call system command from CloverETL framework. System execute component can be standalone component or can work with records flowing form other component. Output from executing system command can be sent to other component or can be logged to file. If there is not given interpreter, “command” attribute is treated as simple system command with parameters, else “command” attribute is saved in temporary batch file and there is called interpreter to execute this script

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSystemExecute.grf

Working with Excel files example

CloverETL can read and write data from/to xls files. In this example data are read from ORDERS.xls file, partitioned to 3 parts and each of thees parts are saved in different sheets in one of orders.partitioned0.xls, … , orders.partitioned7.xls files (see Writers description).

Important: 1.Because it is impossible to write to different sheets in the same time each writing component has to be in different phase. 2. jxl.jar (for older CloverETL poi-2.5.1.jar) - reading/writing to excel files library, has to be in execution path.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:../lib/jxl.jar" org.jetel.main.runGraph -plugins ../plugins graphXLSReadWrite.grf

XML extract example

CloverETL can read data from xml files. In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.

Important: tools.jar has to be in execution path.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphXMLExtract.grf

XML XPath reader example

CloverETL can read data from xml files. In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.

Important: tools.jar has to be in execution path.

Executing:
java -cp "../cloveretl.engine.jar:../lib/saxon8.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphXPathReader.grf

Different ways of reading files

There are few components for reading data from flat file: Delimited Data Reader – reads delimited data, Fixlen Data Reader – reads data of fixed length and Universal Data Reader, which can read delimited or fixed length data depending on metadata on output port. There are others reader components for reading data in different formats: DBFDataReader – reads dBase tables, XMLExtract – reads xml files, XLSDataReader – reads excel files, Clover Data Reader – reads data saved in clover internal format. Most of thees readers can read data from “MultiFile” - it means that in fileURL attribute can be wildcards as * or ?. (For more information see Readers.) This is illustrated by following example:

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphCloverData.grf

It is possible to omit some data from flat file. For such purpose there is “shift” attribute on metadata, which indicates how many characters should be omitted from last field. Usage of this feature you can see in following example:

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphGenerateData.grf

Fix lenght data reader and byte mode using. This graph demonstrates difference between byte mode and char mode for FixLenDataReader.


Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples"" org.jetel.main.runGraph -plugins ../plugins graphFixLenByteMode.grf

This file is an input file.

BLOB

Database related examples

When experimenting with database, presented example graphs expect that there exists database table “EMPLOYEE” with following structure:

create table EMPLOYEE (
  EMP_NO INTEGER,
  FIRST_NAME VARCHAR(80),
  LAST_NAME VARCHAR(80),
  PHONE_EXT VARCHAR(80),
  HIRE_DATE DATE,
  DEPT_NO VARCHAR(80),
  JOB_CODE VARCHAR(80),
  JOB_GRADE NUMERIC,
  JOB_COUNTRY VARCHAR(80),
  SALARY NUMERIC,
  FULL_NAME VARCHAR(80)
)

If you wish to experiment with unloading data from this table, you can use populate_employee_table.sql file (supplied with these examples) to insert some records into the table. There is also Oracle specific file populate_employee_table.ora.

DBUnload example

DBInputTable component can be used for extracting data from database. It allows a SQL query to be specified and run against database. The result set produced by DB is then sent by the component to output port - and gets propagated to other processing nodes. It uses JDBC to communicate with DB.
You may wonder why to use CloverETL for processing data which is already in DB. It should be faster to perform any processing within DB itself. Well, it is not always true.
DBUnload component (as all other DB related components) needs to know which JDBC driver to use, what is the DB connection string and under what user name it should connect to database. These parameters can be provided directly to component (one by one) or special config file can be created from which component is able to read required info.
Following is an example of config file which can be used to connect to Borland Interbase DB:


dbDriver=interbase.interclient.Driver
dbURL=jdbc:interbase://localhost/home/projects/interbase/examples/database/employee.gdb
user=SYSDBA
password=masterkey

I hope that the parameters (their names) are self-explanatory. If not, read some JDBC related documentation.

In this example, the DBInputTable component connects to database and unloads all records from employee table. It then stores the records in employees.list.out file.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphDBUnload.grf

DBLoad example

DBOutputTable component does the opposite to DBInputTable, it populates database table with data processed by CloverETL framework. Similarly to DBInputTable, it requires some JDBC related parameters to be supplied. In order to be able to run this example, you have to create a DB table with the name EMPLOYEE_TMP and structure similar to EMPLOYEE table presented above.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphDBLoad.grf

DBJoin example

DBJoin component joins records from input port with records from database. Each pair of matching records is sent to transformation (can be written in four ways : external class, java source embedded in graph definition, summary transformation or in Transformation Language) to get output record. In this example for records from flat file are searched records from database where field lname on input has the same value as field last_name in database records.

Executing:
Prepare transform class:
javac -cp "../cloveretl.engine.jar" javaExamples/TransformForDBjoin.java

Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphDBJoin.grf

DBLookup example

Graph graphDBLookup.grf has similar functionality as preceding graph, but you can see usage of Database Lookup Table.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphDBLookup.grf

Complex examples

This examples combines several components to perform useful transformation similar to one you could use in a real life.

Join example

Thees graphs combine information about orders made and information about individual items/products purchased within each order. It also adds information about which customer ordered the goods. To get all this information, it needs to join data from ORDERS.DBF (a dBase table) ODETAILS.DBF and Customers.txt. First two data sets are joined using MERGE_JOIN component, thus they have to be sorted first. Then information about customer is added using HASH_JOIN. As both joins require transformation code, two transformation classes are embedded directly into graph file and dynamically compiled at run-time. As there is a possibility that data set containing info about orders references customer which is not in Customers.txt file, second join is defined to be a left join. At the end, EXT_FILTER is used to split data to two sets - one with customer info complete and the other with missing.

Note:In order to run successfully,”tools.jar” library (present in $JAVA_HOME/lib/) has to be part of CLASSPATH.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphDBFJoin.grf

Graph graphDBFJoinTL.grf is very similar to preceding one, but with transformation written in Etl transformation language .

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphDBFJoinTL.grf

Intersection example

This graph reads personal data from two sources and finds records with the same values on corresponding fields (lname – last_name, fname – first_name). Matching pairs are transformed into one output record and saved in intersection_customer_employee.txt file.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphIntersectData.grf

Approximative join example

Approximative join component is used for joining data which are similar on given fields. It requires data to be prepared: it joins data from two data flows with the same value of matching key and similar value of join key. In this example data are read from from file customers0.dat and from database table employee; then for both flows there is generated matching key consisting of 4 letters of last name (flat file: lname, database: last_name) and 3 letters of first name (flat file: fname, database: first_name) . Approximative join component joins data from thees flows (sorted due the matching key); it compares records with the same matching key only, but to conforming output sends only thees, which join key is similar enough (conformity attribute), so among conforming records you can found such record: 4 Damstra Robert Damstra Roberta 0.875 - with conformity 0.875 (conformity equals 1 means that records are identical).

This graph illustrates usage of CustomizedRecordTransform class too. This class extend abstract class DataRecordTransform and allows to create complex transformation in relatively easy way.

Executing:
Prepare transform class:
javac -cp "../cloveretl.engine.jar:../lib/commons-logging.jar" javaExamples/customizedTransformExample.java
Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphAproximativeJoin.grf

Real life example

This is practical illustration of CloverETL usage . This graph transforms client's data for bank's branch . On inputs are clients, monthly interests, service charges and half-year bonuses. On outputs we have aggregated revenues by clients, clients without revenues and clients from database, who are not in input file.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphRevenues.grf

Slowly Changing Dimension (SCD) example

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own synthetic primary key.

In our example, we originally have dimension of exchange rates:

exchange_right_id|Country|Currency|Amount|Code|Valid_to|Valid_from|2731|Australia|dollar|1|AUD||07-18-2007

After new exchange rates arrived we add new record to the dimension and update field „Valid_to“ for old record. Matching key is composed from these fields: Country, Currency, Amount, Code. For better unerstanding we have four files in our example:

  • exchange_rates_DIM.txt - dimension file (serve as source and will be updated by graph result)
  • exchange_rates_DIM_insert.txt - new records for dimension. This records should be inserted to exchange_rates_DIM.txt (this functionallity is not provided by our graph)
  • exchange_rates_DIM_update.txt - these records should be updated in existing dimesion (exchange_rates_DIM.txt This functionallity is not provided by our graph))
  • exchange_rates_matched.txt - these records match

Advantages:

  • This allows us to accurately keep all historical information.

Disadvantages:

  • This will cause the size of the table to grow. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
  • This necessarily complicates the ETL process.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins SCDType2.grf


Index of all examples

graphAggregateSorted.grf This graph illustrates usage of Aggregate component.
graphAggregateUnsorted.grf This graph illustrates usage of Aggregate component.
graphAproximativeJoin.grf In this example data are read from from file customers0.dat and from database table employee; then for both flows there is generated matching key consisting of 4 letters of last name (flat file: lname, database: last_name) and 3 letters of first name (flat file: fname, database: first_name) . Approximative join component joins data from thees flows (sorted due the matching key); it compares records with the same matching key only, but to conforming output sends only thees, which join key is similar enough (conformity attribute), so among conforming records you can found such record: “4 Damstra Robert Damstra Roberta 0.875” - with conformity 0.875 (conformity equals 1 means that records are identical). This graph illustrates usage of CustomizedRecordTransform class too. This class extend abstract class DataRecordTransform and allows to create complex transformation in relatively easy way.
graphCheckForeignKey.grf This graph shows usage of CheckForeignKey component. Data are read from table customers and City and State fields are checked against the table of known cities. If city is not found between known cities null is substituted on requested fields. On output are printed customers, for which there wasn't found city and state in city table.
graphCloverData.grf This graph illustrates usage of CloverDataReader/Writer components. This components reads/writes data in clover internal format and can be used for moving data between different graphs. Reading/writing data in clover format is faster then in any other format. This graph shows also how works StructureWriter component (it can be used for writing records and some additional information eg. in xml format).
graphDBExecute.grf This graph illustrates how-to run SQL/DDL/DML commands against database. Commands in SQLCode are delimited by semicolon and are executed one by one. If needed, they may be grouped into single transaction. This graph creates two tables (EMPLOYEE and EMPLOYEE_TMP) and populates table EMPLOYEE with sample data.
graphDBFJoin.grf This graph illustrates complex transformation. It reads data about orders made and combines them (joins) with data about which items were purchased. It then adds information about which company made particular order/bought particular item. It illustrates usage of DBFDataReader,Sort,MergeJoin,HashJoin,Filter. It also shows how to embed data transformation routine directly into transformation graph layout file (both joins have the transformations embedded). Note: It is faster to reference record's fields by index as opposite to names – used in this example.
graphDBFJoinTL.grf This graph illustrates complex transformation like preceding example, but transformations are written in Transformation Language.
graphDBJoin.grf This graph shows usage of DBJoin component: data are read from flat file and joined with the records from database due to conforming key field (lname in records read from file, last_name in records read from database).
graphDBFLoad.grf This graph illustrates reading data from DBF file.
graphDBLoad.grf This graph illustrates how to propagate data to database.
graphDBLoad5.grf This graph illustrates usage of dbFields and cloverFields parameters of DB_OUTPUT_TABLE component. Parameter dbFields allows to specify that only certain fields of target DB table will be populated. Parameter cloverFields allows to specify which input/Clover fields are used for populating target fields. The final mapping is determined by the order in which Clover fields and DB fields appear in the parameters or can be set directly. Note: parameter batchMode allows for grouping of inserts into DB thus increasing throughput. It works only with DBs/JDBC drivers which support this feature (e.g. Oracle).
graphDBLoad6.grf This graph illustrates how to run hand-written SQL/DML against DB. Parameters (question-marks) are substituted by input fields (in order they are listed in cloverFields). Also target parameter types are devised from input field types - this may cause some problems when trying to map incompatible data types.
graphDBLookup.grf This graph illustrates how to join data using reformat component. graphLookupJoin.grf shows how to do similar transformation in easier way.
graphDBRead.grf This graph reads personal data from three sources and finds records with the same values on corresponding fields (lname – last_name, fname – first_name). Matching pairs are transformed into one output record and saved in intersection_customer_employee.txt file.
graphDBUnload.grf In this example, the DBInputTable component connects to database and unloads all records from employee table. It then stores the records in employees.list.out file.
graphDBUnloadParametrized.grf This graph illustrates how to read not all records from database. This example is similar to graphDBJoin.grf, but found records from database are not sent to transformation function, but directly to output port. For illustration in the next phase are read all data from the same table.
graphDBUnloadUniversal.grf This graph illustrates usage of dynamic metadata generation based on SQL query. It allows unloading of database data without prior assembly of DB table metadata. Dynamic metadata can be for example used for easy migration of data between two different databases when one universal graph with parameters specifying which table should be unload/migrated can be used. This graph requires db_table parameter to be defined. When running this graph, use command like this one: java -cp ”../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples” org.jetel.main.runGraph -plugins ../plugins-P:db_table=Employee graphDBUnloadUniversal.grf Note: where clause 1=0 in SQL query in dynamic metadata definition is there only for speeding up query processing - no real data is needed, only metadata describing the result - thus we specify condition which ensures that no data should really be returned - it is up to database whether it will optimize the execution plan based on this.
graphDenormalizeInline.grf This graph demonstrates functionality of DENORMALIZER component with denormalization specified by inline Java source.
graphDenormalizeTL.grf This graph demonstrates functionality of DENORMALIZER component with denormalization specified by inline Transformation Language source.
graphGenerateData.grf This graph illustrates usage of Data Generator, Reader, Writer components. It also shows how to use shift attribute on metadata.
graphExtFilter.grf This illustrates usage of Filter and HashJoin components.
graphExtFilter2.grf This graph shows some more functionality of EXT_FILTER component. Filters all employees younger than 41 years. The current age is calculated as today's date minus date of birth. Notice that instead of classical comparison operators ==,!=, …etc… one can use their textual abbreviations .eq. .ne. .lt. .gt. …etc…
graphFilter.grf This graph demonstrates functionality of Filter component. It can filter on text, date, integer, numeric fields with comparison [<, >, ==, <=, >=, !=]. Text fields can also be compared to a Java regexp using ~ (tilda) operator. A filter can be made of different parts separated by a ”;” (semicolon). If one of the parts is verified, the record passes the filter (it's an OR combination of the parts, and AND can be achieved by several filters cascaded). Date format used for comparison depends on input field's format – e.g. if input field HireDate has defined date format “yyyy-MM-dd”, then the same format must be used in filter when specifying date constant. This graph produces two output files: First with all employees who were hired before 1993-12-31. Second with all employees who have in their memo that they are “fluent” in some language Note: some of the comparison operators have to be XML encoded - e.g. ”>” as ”.gt.” etc.
graphFixLenByteMode.grf This graph demonstrates difference between byte mode and char mode for FixLenDataReader.
graphIntersectData.grf This graph demonstrates functionality of DATA_INTERSECT component - performs joining of two sorted (based on specified key) data flows (A and B) and outputs: 1. to port 0 records found only in flow A 2. to port 1 records found in A & B 3. to port 2 records found only in flow B . Records both in flow A and B should be unique
graphJms.grf This graph illustrates usage of JMS Reader/Writer Components. It requires ACtiveMQ JMS server to be running on localhost!
graphJoinData.grf This graph demonstrates functionality of SORT and MERGE_JOIN components. MERGE_JOIN component performs joining of master data with slave data based on specified key. It requires both master and slave data to be sorted according to used key.
graphJoinHash.grf This graph demonstrates functionality of HASH_JOIN components. HASH_JOIN component performs joining of master data with slave data based on specified key. First all slave data (from all slave ports) are read and hash-tables are constructed from them. Then for every master record, attempt is made to find corresponding slave-records. HASH_JOIN performs better than SORT/MERGE_JOIN when you have relatively small slave data set and big master data set. It doesn't require master and slave data to be sorted.
graphJoinHashInline.grf This graph demonstrates how to use HASH_JOIN component for 3way join. HASH_JOIN component performs joining of master data with slave data based on specified key. First all slave data are read and hash-tables are constructed from them. Then for every master record, attempt is made to find corresponding slave-records.
graphJoinHashUsingTransform.grf This graph demonstrates functionality of HASH_JOIN components. Transformation function is written in Transformation Language Lite.
graphJoinHashUsingTransformLanguage.grf This graph demonstrates functionality of HASH_JOIN components. Transformation function is written in Transformation Language,
graphJoinMergeInline.grf This graph illustrates usage of MERGE component. It merges data based on specified key. It can merge master flow with a few slave flows.
graphLdapReader_Uninett.grf Reads information from LDAP directory.
graphLdapReaderWriter.grf This graph illustrates usage of LDAP Reader/Writer components. It requires LDAP server properly installed and configured.
graphLookupJoin.grf This graph joins data from file with data from lookup table.
graphLookupReader.grf This graph illustrates usage of in-memory lookup table.
graphMergeData.grf This graph illustrates usage of MERGE component. It merges data based on specified key.
graphMysqlWriter.grf This graph illustrates usage of MySQL Writer Component.
graphNormalizeInline.grf This graph demonstrates functionality of NORMALIZER component with normalization specified by in-line Java source.
graphNormalizeTL.grf This graph demonstrates functionality of NORMALIZER component with normalization specified by in-line Transformation Language source.
graphOrdersReformat.grf This example illustrates usage of Reformat component. Source data contains info about orders customers made. We add artificial key - ProductID which is a sequence of integers. We also combine individual fields of ship-to-address (address,city,country) into one single field. Reformat component calls reformatOrders class which implements RecordTransform interface (required by Reformat component). The directory in which recormatOrders class exists must be part of the Java classpath
graphOrdersReformatExternTransform.grf This example illustrates usage of Reformat component with transformation code written in extern file. The source code is automatically compiled at runtime. For successful compilation, tools.jar library must be part of CLASSPATH.
graphOrdersReformatInline.grf This example illustrates usage of Reformat component with transformation code (class) inlined (saved with the graph itself). The source code is automatically compiled at runtime. For successful compilation, tools.jar library must be part of CLASSPATH.
graphOrdersReformatUsingTransform.grf This example illustrates usage of Reformat component with transformation written in Transformation Language Lite.
graphOrdersTLReformat.grf This example illustrates usage of Reformat component with transformation written in Transformation Language.
graphParametrizedLookup.grf This graph demonstrates usage of Node-level parameters.
graphPartition.grf This graph demonstrates functionality of Partition component. This component can be used to split input data flow into several (based on number of connected output ports) output data flows.
graphPhasesDemo.grf This graph illustrates usage of several components in different phases. First, all components from phase 0 are run, then the execution continues with phase 1. Data which crosses phase boundary is automatically buffered so the data producing node can finish its work and the data consuming node can later start reading the data.
graphRangeLookup.grf This graph ilustrates usage of range lookup table. During phase 0 lookup table is read from xls file to memory and in the next phase for data from flat file are looked up records from lookup table.
SCDType2.grf Graph with SCD transformation (see Slowly Changing Dimension (SCD) example).
graphRevenues.grf This is practical illustration of usage CloverETL. This graph transforms data for branch bank. On inputs are clients, month interests and charges and half-year bonus. On outputs we have aggregated revenues by clients, clients without revenues and clients from database, who are not in input file.
graphSequence.grf This example extends graphOrdersReformat.grf in a sense that the counter used for generating PRODUCTID is taken from Sequence object which is persistent between graph executions – i.e. you get continuous sequence of unique numbers. For successful compilation, tools.jar library must be part of CLASSPATH.
graphSequenceCheckData.grf This graph illustrates usage of Sequence Checker Component.
graphSimpleCopy.grf This graph demonstrates functionality of SimpleCopy component. Everything brought to SimpleCopy on port 0 is duplicated onto all connected output ports. It also shows functionality of Trash Component. It discards everything which is sent into it. Its purpose is debugging - shows how many records ended in it and can print incoming records, if desired (option debugPrint)
graphSimpleCopyEmbeddedMetadata.grf This graph demonstrates functionality of SimpleCopy component. Everything brought to SimpleCopy on port 0 is duplicated onto all connected output ports. It also shows functionality of Trash Component. It discards everything which is sent into it. Its purpose is debugging - shows how many records ended in it and can print incoming records, if desired (option debugPrint). Metadata in this graph is embedded directly to graph instead of external file.
graphSimpleCopyLocale.grf This graph demonstrates functionality of SimpleCopy component. Everything brought to SimpleCopy on port 0 is duplicated onto all connected output ports. It also shows functionality of Trash Component. It discards everything which is sent into it. Its purpose is debugging - shows how many records ended in it and can print incoming records, if desired (option debugPrint). Metadata in this graph is embedded directly to graph instead of external file. Output metadata is the same as input except that for BirthDate and HireDate, locale property is defined. It allows to specify language or language.Country locale which is used when formatting output. In this example, date info on input is expected to be in dd/MM/yyyy format, whereas on output it should follow US/UK standard → (usually defined as MM/dd/YYYY).
graphSimpleLookup.grf This graph illustrates how to join data using reformat component.
graphSortData.grf This graph illustrates usage of Sort component. It sorts data based on specified key.
graphSortUniversal.grf This graph illustrates usage of Sort component. It sorts data based on specified key. It also shows how to use parameters/global properties within graph. For executing see Sort Universal example.
graphSystemExecute.grf This graph ilustrates usage of System Execute Component
graphViewData.grf This graph is a simple utility which demonstrates usage of parameters. It allows viewing content of particular data file (in delimited format). It reads in data and stores them in debug format.
graphXLSReadWrite.grf In this example data are read from ORDERS.xls file, partitioned to 3 parts and each of thees parts are saved in orders.partitioned$.xls “multifile” in different sheets. (See Different ways of reading files example.)
graphXMLExtract.grf In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.



examples.txt · Last modified: 2009/09/16 16:01 by jausperger
Back to top
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0