To download example package go to download page.
This example illustrates usage of SimpleCopy and Trash components. It reads employees.dat file, parses the records, sends them to SimpleCopy component which in turn replicates them to several Trash components connected to SimpleCopy component. One of the Trash components is configured to dump all incoming records into text file Trash1_output.txt - mostly for debugging purpose. The Trash component normally discards all records it receives on input port.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSimpleCopy.grf
Or you can use proper script:
./run.sh graphSimpleCopy.grf run graphSimpleCopy.grf
This is the expected output produced by the framework:
*** CloverETL framework/transformation graph runner ver 2.1, (c) 2002-06 D.Pavlis, released under GNU Lesser General Public License ***
Running with framework version: 2.1 build#385 compiled 14/02/2007 18:12:57
DEBUG [main] - Plugin org.jetel.lookup loaded.
id - org.jetel.lookup
version - 2.1.0
provider-name - null
point-id - lookup - {className=org.jetel.lookup.SimpleLookupTable, type=simpleLookup}
point-id - lookup - {className=org.jetel.lookup.DBLookupTable, type=dbLookup}
DEBUG [main] - Plugin org.jetel.sequence loaded.
id - org.jetel.sequence
version - 2.1.0
provider-name - null
point-id - sequence - {className=org.jetel.sequence.SimpleSequence, type=SIMPLE_SEQUENCE}
point-id - sequence - {className=org.jetel.sequence.PrimitiveSequence, type=PRIMITIVE_SEQUENCE}
DEBUG [main] - Plugin org.jetel.thirdparty loaded.
id - org.jetel.thirdparty
version - 2.1.0
provider-name - null
point-id - component - {type=XML_EXTRACT, className=org.jetel.component.XMLExtract}
point-id - component - {type=FILTER, className=org.jetel.component.Filter}
point-id - component - {type=LDAP_READER, className=com.linagora.component.LdapReader}
point-id - component - {type=LDAP_WRITER, className=com.linagora.component.LdapWriter}
DEBUG [main] - Plugin org.jetel.connection loaded.
id - org.jetel.connection
version - 2.1.0
provider-name - null
point-id - connection - {className=org.jetel.connection.DBConnection, type=JDBC}
point-id - connection - {className=org.jetel.connection.JmsConnection, type=JMS}
DEBUG [main] - Plugin org.jetel.component loaded.
id - org.jetel.component
version - 2.1.0
provider-name - null
point-id - component - {type=DATA_READER, className=org.jetel.component.DataReader}
point-id - component - {type=DATA_WRITER, className=org.jetel.component.DataWriter}
point-id - component - {type=DELIMITED_DATA_READER, className=org.jetel.component.DelimitedDataReader}
point-id - component - {type=DELIMITED_DATA_WRITER, className=org.jetel.component.DelimitedDataWriter}
point-id - component - {type=SIMPLE_COPY, className=org.jetel.component.SimpleCopy}
point-id - component - {type=CONCATENATE, className=org.jetel.component.Concatenate}
point-id - component - {type=SIMPLE_GATHER, className=org.jetel.component.SimpleGather}
point-id - component - {type=REFORMAT, className=org.jetel.component.Reformat}
point-id - component - {type=DB_INPUT_TABLE, className=org.jetel.component.DBInputTable}
point-id - component - {type=SORT, className=org.jetel.component.Sort}
point-id - component - {type=DB_OUTPUT_TABLE, className=org.jetel.component.DBOutputTable}
point-id - component - {type=FIXLEN_DATA_WRITER, className=org.jetel.component.FixLenDataWriter}
point-id - component - {type=DEDUP, className=org.jetel.component.Dedup}
point-id - component - {type=FIXLEN_DATA_READER, className=org.jetel.component.FixLenDataReader}
point-id - component - {type=MERGE, className=org.jetel.component.Merge}
point-id - component - {type=MERGE_JOIN, className=org.jetel.component.MergeJoin}
point-id - component - {type=EXT_MERGE_JOIN, className=org.jetel.component.MergeJoin}
point-id - component - {type=SORTED_JOIN, className=org.jetel.component.MergeJoin}
point-id - component - {type=TRASH, className=org.jetel.component.Trash}
point-id - component - {type=DB_EXECUTE, className=org.jetel.component.DBExecute}
point-id - component - {type=HASH_JOIN, className=org.jetel.component.HashJoin}
point-id - component - {type=EXT_HASH_JOIN, className=org.jetel.component.HashJoin}
point-id - component - {type=CHECK_FOREIGN_KEY, className=org.jetel.component.CheckForeignKey}
point-id - component - {type=DBF_DATA_READER, className=org.jetel.component.DBFDataReader}
point-id - component - {type=EXT_FILTER, className=org.jetel.component.ExtFilter}
point-id - component - {type=EXT_SORT, className=org.jetel.component.ExtSort}
point-id - component - {type=PARTITION, className=org.jetel.component.Partition}
point-id - component - {type=DATA_INTERSECTION, className=org.jetel.component.DataIntersection}
point-id - component - {type=AGGREGATE, className=org.jetel.component.Aggregate}
point-id - component - {type=SYS_EXECUTE, className=org.jetel.component.SystemExecute}
point-id - component - {type=KEY_GEN, className=org.jetel.component.KeyGenerator}
point-id - component - {type=APROX_MERGE_JOIN, className=org.jetel.component.AproxMergeJoin}
point-id - component - {type=DBJOIN, className=org.jetel.component.DBJoin}
point-id - component - {type=XLS_READER, className=org.jetel.component.XLSReader}
point-id - component - {type=XLS_WRITER, className=org.jetel.component.XLSWriter}
point-id - component - {type=CLOVER_WRITER, className=org.jetel.component.CloverDataWriter}
point-id - component - {type=CLOVER_READER, className=org.jetel.component.CloverDataReader}
point-id - component - {className=org.jetel.component.MysqlDataReader, type=MYSQL_DATA_READER}
point-id - component - {className=org.jetel.component.MysqlDataWriter, type=MYSQL_DATA_WRITER}
point-id - component - {className=org.jetel.component.StructureWriter, type=STRUCTURE_WRITER}
point-id - component - {className=org.jetel.component.Normalizer, type=NORMALIZER}
point-id - component - {className=org.jetel.component.Denormalizer, type=DENORMALIZER}
point-id - component - {className=org.jetel.component.JmsReader, type=JMS_READER}
point-id - component - {className=org.jetel.component.JmsWriter, type=JMS_WRITER}
point-id - component - {className=org.jetel.component.LookupJoin, type=LOOKUP_JOIN}
point-id - component - {className=org.jetel.component.LookupTableReaderWriter, type=LOOKUP_TABLE_READER_WRITER}
point-id - component - {className=org.jetel.component.DataGenerator, type=DATA_GENERATOR}
point-id - component - {className=org.jetel.component.MultiLevelReader, type=MULTI_LEVEL_READER}
point-id - component - {className=org.jetel.component.SequenceChecker, type=SEQUENCE_CHECKER}
DEBUG [main] - Plugin cz.opensys.oracle loaded.
id - cz.opensys.oracle
version - 2.1.0
provider-name -
point-id - component - {className=cz.opensys.oracle.component.OracleDataReader, type=ORACLE_DATA_READER}
point-id - component - {className=cz.opensys.oracle.component.OracleDataWriter, type=ORACLE_DATA_WRITER}
Graph definition file: graphSimpleCopy.grf
INFO [main] - Checking graph configuration...
DEBUG [main] - Opening input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [main] - Reading input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
INFO [main] - Starting WatchDog thread ...
INFO [WatchDog] - Thread started.
INFO [WatchDog] - Running on 1 CPU(s) max available memory for JVM 65088 KB
INFO [WatchDog] - [Clover] Initializing phase: 0
DEBUG [WatchDog] - initializing edges:
DEBUG [WatchDog] - all edges initialized successfully...
DEBUG [WatchDog] - initializing nodes:
DEBUG [WatchDog] - BROADCAST ...OK
DEBUG [WatchDog] - BROADCAST2 ...OK
DEBUG [WatchDog] - Opening input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [WatchDog] - Reading input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [WatchDog] - INPUT1 ...OK
DEBUG [WatchDog] - TRASH1 ...OK
DEBUG [WatchDog] - TRASH2 ...OK
DEBUG [WatchDog] - TRASH3 ...OK
INFO [WatchDog] - [Clover] phase: 0 initialized successfully.
INFO [WatchDog] - Starting up all nodes in phase [0]
DEBUG [WatchDog] - BROADCAST ... started
DEBUG [WatchDog] - BROADCAST2 ... started
DEBUG [WatchDog] - INPUT1 ... started
DEBUG [WatchDog] - TRASH1 ... started
DEBUG [WatchDog] - TRASH2 ... started
DEBUG [WatchDog] - TRASH3 ... started
INFO [WatchDog] - Sucessfully started all nodes in phase!
INFO [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 1
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 15/03/07 15:22:29
INFO [WatchDog] - Node Status Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ----------------------------------------------------------------------------------
INFO [WatchDog] - BROADCAST FINISHED_O
INFO [WatchDog] - %CPU:-1.9 In:0 12 8 0 0
INFO [WatchDog] - Out:0 12 8 0 0
INFO [WatchDog] - Out:1 12 8 0 0
INFO [WatchDog] - Out:2 12 8 0 0
INFO [WatchDog] - BROADCAST2 FINISHED_O
INFO [WatchDog] - %CPU:-1.9 In:0 12 8 0 0
INFO [WatchDog] - Out:0 12 8 0 0
INFO [WatchDog] - INPUT1 FINISHED_O
INFO [WatchDog] - %CPU:-1.9 Out:0 12 8 0 0
INFO [WatchDog] - TRASH1 FINISHED_O
INFO [WatchDog] - %CPU:-1.9 In:0 12 8 0 0
INFO [WatchDog] - TRASH2 FINISHED_O
INFO [WatchDog] - %CPU:-1.9 In:0 12 8 0 0
INFO [WatchDog] - TRASH3 FINISHED_O
INFO [WatchDog] - %CPU:-1.9 In:0 12 8 0 0
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - Forcing garbage collection ...
INFO [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB)
INFO [WatchDog] - 0 0 1 1007
INFO [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO [main] - WatchDog thread finished - total execution time: 1 (sec)
INFO [main] - Graph execution finished successfully
Execution of graph successful !
Reformat component changes format and/or content of data records by applying “reformat” method on all data records flowing through it. For every record it receives through input port it calls transform() method of specified class (name defined through transformClass attribute of Reformat component), which has to implement org.jetel.component.RecordTransform interface. The transformation class has to be compiled separately prior to running Reformat component and it must reside in directory accessed by Java when searching for classes (i.e. either included in CLASSPATH or defined using ”-cp” parameter when starting java VM).
Executing:
Prepare reformat class:
javac -cp "../cloveretl.engine.jar" javaExamples/reformatOrders.java
Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphOrdersReformat.grf
This example is similar to previous one except transformation class (its source code) is directly embedded into graph definition file. The source code is automatically compiled when the graph is started and then transformation object is created to be executed for each record.
When the graph is ran subsequently, the source is recompiled only if it changes between two runs.
In order to run successfully, environment variable $JAVA_HOME have to be set correctly.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphOrdersReformatInline.grf
This example is similar to previous ones except transformation is given in special form. If you need simple transformation, when output fields are combinations of inputs fields and some parameters, this is the simplest way how to do it.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphOrdersReformatUsingTransform.grf
This example is similar to previous ones except transformation is given in Etl transform language. In this language you can write complicated transformations in relatively easy way.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphOrdersTLReformat.grf
Merge component combines several sorted input data flows into one based on specified key. The resulting data flow is sorted according to specified key.
This component can be successfully used when you have several sorted data files and you need to combine them into one big sorted file, but you don't want to waste resources on resorting the big file.
The testing graph reads data form orders.sorted file (actually three readers access the file at the same time) and stores the result into orders.merged file.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphMergeData.grf
Filter component allows filtering any input data records based on specified filter condition. It can split flow of data into two. One with records which fulfill the condition and the other with those which don't.
Filter condition is composed of one or more simple comparisons <,>,<=,>=,==,!=. Individual comparisons can be combined using logical operators “AND” and “OR”. Parenthesis can be used to group comparisons. There are some logical functions (see description of Extended Filter Component) you can use too.
This graph is a bit more complex and illustrates usage of Filter and HashJoin components. It produces four output files:
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphExtFilter.grf
Join component joins two or more sorted input data flows into one based on specified key. It can produce classical inner join or left outer join (based on parameters).
This component can be useful when you have main data set - for example with orders (driver) containing part number and you need to join it with other data set containing price for each part (slave).
The testing graph reads data form orders.dat file (driver) and employees.dat and produces records with combined information from both files.
Each pair of corresponding records is transformed by reformatJoinTest.class. The source code for this class is attached as well.
Executing:
Prepare reformat class:
javac -cp "../cloveretl.engine.jar" javaExamples/reformatJoinTest.java
Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphJoinData.grf
Hash Join component joins two or more input data flows into one based on specified key. It can produce classical inner join or left outer join (based on parameters).
Functionality of this component is similar to previous one, but it doesn't require input data to be sorted. It is most suited to join data sets when one is relatively large (master) and the others are small (slaves) - typically when using lookup table to convert key into corresponding value.
The testing graph reads data form orders.dat file (driver) and employees.dat and produces records with combined information from both files.
Each pair of corresponding records is transformed by reformatJoinTest.class. The source code for this class is attached as well.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphJoinHash.grf
As with Reformat component, transformation class which is executed for input records can be embedded directly into graph. It is again dynamically compiled at run-time.
In this example you can see joining more then two input sets. Master data from orders.dat are joined with two slave inputs: first employees.dat due to key field EmployeeID and with the second, customers_delimited.txt, due to key field CustomerID. So in output record there are information from all three input files.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphJoinHashInline.grf
As with Reformat component, transformation can be written in summary form.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphJoinHashUsingTransform.grf
As with Reformat component, transformation can be written in Etl transform language.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphJoinHashUsingTransformLanguage.grf
Sort component sorts input data flow based on specified key. You can use more than one field name for key.
Sorted data flow is required by several components (Merge, Join, etc..) so this is very useful component.
The testing graph reads data form orders.dat file, sorts them and stores the result into orders.sorted file.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSortData.grf
Similar to previous example, this one sorts input data. The difference is that the exact datafile to be sorted is not hardcoded within graph definition, it is determined based on supplied parameters.
CloverETL allows global graph properties to be defined and these properties can be reference from within the graph. Each reference is replaced by property value at run-time. In our example parameter's definitions are embedded in graph code (Executing: java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSortUniversal.grf) but it is possible to define parameters in-line. In such case you can pass on parameters in executing command:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins -P:sortKey=<..your.fieldName..> -P:metadata=<..metadata filename..> -P:data=<..datafile..> -P:outputData=<..datafile..> graphSortUniversal.grf
like this
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins -P:sortKey="CompanyName" -P:metadata="metadata/delimited/customers.fmt" -P:data="data/delimited/customers_delimited.txt" -P:outputData="output/output.txt" graphSortUniversal.grf
Components can be grouped into individual phases. Each phase has assigned a unique number. In each transformation graph, there must exists at least one phase (with any number assigned to it). Execution of graph is done by phases. It starts with phase with the lowest number and continues in ascending order. All components in particular phase are initialized and run. Then components from next phase are started and so on. If there are any connections (Edges) between components from different phases, these connections are buffered so the first component (from preceding phase) can finish its job and the second component (from following - not necessarily next) can read the data produced by the fist component.
Example presented here is divided into three phases. First,data is read and sorted, in second,data is copied into three similar flows. In third, data is again merged into one flow and saved to file.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphPhasesDemo.grf
Partition component offers various ways of splitting input data flow into several output flows. The exact number of output flows depends on how many output ports are connected.
There are currently three partition algorithms implemented:
It is possible to write your own partition function. It has to implement org.jetel.component.partition.PartitionFunction and can be write as java source or in Etl transform language.
Executing example:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphPartition.grf
It is possible to call system command from CloverETL framework. System execute component can be standalone component or can work with records flowing form other component. Output from executing system command can be sent to other component or can be logged to file. If there is not given interpreter, “command” attribute is treated as simple system command with parameters, else “command” attribute is saved in temporary batch file and there is called interpreter to execute this script
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphSystemExecute.grf
CloverETL can read and write data from/to xls files. In this example data are read from ORDERS.xls file, partitioned to 3 parts and each of thees parts are saved in different sheets in one of orders.partitioned0.xls, … , orders.partitioned7.xls files (see Writers description).
Important: 1.Because it is impossible to write to different sheets in the same time each writing component has to be in different phase. 2. jxl.jar (for older CloverETL poi-2.5.1.jar) - reading/writing to excel files library, has to be in execution path.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:../lib/jxl.jar" org.jetel.main.runGraph -plugins ../plugins graphXLSReadWrite.grf
CloverETL can read data from xml files. In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.
Important: tools.jar has to be in execution path.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphXMLExtract.grf
CloverETL can read data from xml files. In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.
Important: tools.jar has to be in execution path.
Executing:
java -cp "../cloveretl.engine.jar:../lib/saxon8.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphXPathReader.grf
There are few components for reading data from flat file: Delimited Data Reader – reads delimited data, Fixlen Data Reader – reads data of fixed length and Universal Data Reader, which can read delimited or fixed length data depending on metadata on output port. There are others reader components for reading data in different formats: DBFDataReader – reads dBase tables, XMLExtract – reads xml files, XLSDataReader – reads excel files, Clover Data Reader – reads data saved in clover internal format. Most of thees readers can read data from “MultiFile” - it means that in fileURL attribute can be wildcards as * or ?. (For more information see Readers.) This is illustrated by following example:
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphCloverData.grf
It is possible to omit some data from flat file. For such purpose there is “shift” attribute on metadata, which indicates how many characters should be omitted from last field. Usage of this feature you can see in following example:
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphGenerateData.grf
Fix lenght data reader and byte mode using. This graph demonstrates difference between byte mode and char mode for FixLenDataReader.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples"" org.jetel.main.runGraph -plugins ../plugins graphFixLenByteMode.grf
This file is an input file.
When experimenting with database, presented example graphs expect that there exists database table “EMPLOYEE” with following structure:
create table EMPLOYEE ( EMP_NO INTEGER, FIRST_NAME VARCHAR(80), LAST_NAME VARCHAR(80), PHONE_EXT VARCHAR(80), HIRE_DATE DATE, DEPT_NO VARCHAR(80), JOB_CODE VARCHAR(80), JOB_GRADE NUMERIC, JOB_COUNTRY VARCHAR(80), SALARY NUMERIC, FULL_NAME VARCHAR(80) )
If you wish to experiment with unloading data from this table, you can use populate_employee_table.sql file (supplied with these examples) to insert some records into the table. There is also Oracle specific file populate_employee_table.ora.
DBInputTable component can be used for extracting data from database. It allows a SQL query to be specified and run against database. The result set produced by DB is then sent by the component to output port - and gets propagated to other processing nodes. It uses JDBC to communicate with DB.
You may wonder why to use CloverETL for processing data which is already in DB. It should be faster to perform any processing within DB itself. Well, it is not always true.
DBUnload component (as all other DB related components) needs to know which JDBC driver to use, what is the DB connection string and under what user name it should connect to database. These parameters can be provided directly to component (one by one) or special config file can be created from which component is able to read required info.
Following is an example of config file which can be used to connect to Borland Interbase DB:
dbDriver=interbase.interclient.Driver dbURL=jdbc:interbase://localhost/home/projects/interbase/examples/database/employee.gdb user=SYSDBA password=masterkey
I hope that the parameters (their names) are self-explanatory. If not, read some JDBC related documentation.
In this example, the DBInputTable component connects to database and unloads all records from employee table. It then stores the records in employees.list.out file.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphDBUnload.grf
DBOutputTable component does the opposite to DBInputTable, it populates database table with data processed by CloverETL framework. Similarly to DBInputTable, it requires some JDBC related parameters to be supplied. In order to be able to run this example, you have to create a DB table with the name EMPLOYEE_TMP and structure similar to EMPLOYEE table presented above.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphDBLoad.grf
DBJoin component joins records from input port with records from database. Each pair of matching records is sent to transformation (can be written in four ways : external class, java source embedded in graph definition, summary transformation or in Transformation Language) to get output record. In this example for records from flat file are searched records from database where field lname on input has the same value as field last_name in database records.
Executing:
Prepare transform class:
javac -cp "../cloveretl.engine.jar" javaExamples/TransformForDBjoin.java
Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphDBJoin.grf
Graph graphDBLookup.grf has similar functionality as preceding graph, but you can see usage of Database Lookup Table.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples":$JAVA_HOME/lib/tools.jar"" org.jetel.main.runGraph -plugins ../plugins graphDBLookup.grf
This examples combines several components to perform useful transformation similar to one you could use in a real life.
Thees graphs combine information about orders made and information about individual items/products purchased within each order. It also adds information about which customer ordered the goods. To get all this information, it needs to join data from ORDERS.DBF (a dBase table) ODETAILS.DBF and Customers.txt. First two data sets are joined using MERGE_JOIN component, thus they have to be sorted first. Then information about customer is added using HASH_JOIN. As both joins require transformation code, two transformation classes are embedded directly into graph file and dynamically compiled at run-time. As there is a possibility that data set containing info about orders references customer which is not in Customers.txt file, second join is defined to be a left join. At the end, EXT_FILTER is used to split data to two sets - one with customer info complete and the other with missing.
Note:In order to run successfully,”tools.jar” library (present in $JAVA_HOME/lib/) has to be part of CLASSPATH.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphDBFJoin.grf
Graph graphDBFJoinTL.grf is very similar to preceding one, but with transformation written in Etl transformation language
.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples" org.jetel.main.runGraph -plugins ../plugins graphDBFJoinTL.grf
This graph reads personal data from two sources and finds records with the same values on corresponding fields (lname – last_name, fname – first_name). Matching pairs are transformed into one output record and saved in intersection_customer_employee.txt file.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphIntersectData.grf
Approximative join component is used for joining data which are similar on given fields. It requires data to be prepared: it joins data from two data flows with the same value of matching key and similar value of join key. In this example data are read from from file customers0.dat and from database table employee; then for both flows there is generated matching key consisting of 4 letters of last name (flat file: lname, database: last_name) and 3 letters of first name (flat file: fname, database: first_name) . Approximative join component joins data from thees flows (sorted due the matching key); it compares records with the same matching key only, but to conforming output sends only thees, which join key is similar enough (conformity attribute), so among conforming records you can found such record: 4 Damstra Robert Damstra Roberta 0.875 - with conformity 0.875 (conformity equals 1 means that records are identical).
This graph illustrates usage of CustomizedRecordTransform class too. This class extend abstract class DataRecordTransform and allows to create complex transformation in relatively easy way.
Executing:
Prepare transform class:
javac -cp "../cloveretl.engine.jar:../lib/commons-logging.jar" javaExamples/customizedTransformExample.java
Execute graph:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphAproximativeJoin.grf
This is practical illustration of CloverETL usage . This graph transforms client's data for bank's branch . On inputs are clients, monthly interests, service charges and half-year bonuses. On outputs we have aggregated revenues by clients, clients without revenues and clients from database, who are not in input file.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins graphRevenues.grf
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own synthetic primary key.
In our example, we originally have dimension of exchange rates:
exchange_right_id|Country|Currency|Amount|Code|Valid_to|Valid_from|2731|Australia|dollar|1|AUD||07-18-2007
After new exchange rates arrived we add new record to the dimension and update field „Valid_to“ for old record. Matching key is composed from these fields: Country, Currency, Amount, Code. For better unerstanding we have four files in our example:
Advantages:
Disadvantages:
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.
Executing:
java -cp "../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples:$JAVA_HOME/lib/tools.jar" org.jetel.main.runGraph -plugins ../plugins SCDType2.grf
| graphAggregateSorted.grf | This graph illustrates usage of Aggregate component. |
| graphAggregateUnsorted.grf | This graph illustrates usage of Aggregate component. |
| graphAproximativeJoin.grf | In this example data are read from from file customers0.dat and from database table employee; then for both flows there is generated matching key consisting of 4 letters of last name (flat file: lname, database: last_name) and 3 letters of first name (flat file: fname, database: first_name) . Approximative join component joins data from thees flows (sorted due the matching key); it compares records with the same matching key only, but to conforming output sends only thees, which join key is similar enough (conformity attribute), so among conforming records you can found such record: “4 Damstra Robert Damstra Roberta 0.875” - with conformity 0.875 (conformity equals 1 means that records are identical). This graph illustrates usage of CustomizedRecordTransform class too. This class extend abstract class DataRecordTransform and allows to create complex transformation in relatively easy way. |
| graphCheckForeignKey.grf | This graph shows usage of CheckForeignKey component. Data are read from table customers and City and State fields are checked against the table of known cities. If city is not found between known cities null is substituted on requested fields. On output are printed customers, for which there wasn't found city and state in city table. |
| graphCloverData.grf | This graph illustrates usage of CloverDataReader/Writer components. This components reads/writes data in clover internal format and can be used for moving data between different graphs. Reading/writing data in clover format is faster then in any other format. This graph shows also how works StructureWriter component (it can be used for writing records and some additional information eg. in xml format). |
| graphDBExecute.grf | This graph illustrates how-to run SQL/DDL/DML commands against database. Commands in SQLCode are delimited by semicolon and are executed one by one. If needed, they may be grouped into single transaction. This graph creates two tables (EMPLOYEE and EMPLOYEE_TMP) and populates table EMPLOYEE with sample data. |
| graphDBFJoin.grf | This graph illustrates complex transformation. It reads data about orders made and combines them (joins) with data about which items were purchased. It then adds information about which company made particular order/bought particular item. It illustrates usage of DBFDataReader,Sort,MergeJoin,HashJoin,Filter. It also shows how to embed data transformation routine directly into transformation graph layout file (both joins have the transformations embedded). Note: It is faster to reference record's fields by index as opposite to names – used in this example. |
| graphDBFJoinTL.grf | This graph illustrates complex transformation like preceding example, but transformations are written in Transformation Language. |
| graphDBJoin.grf | This graph shows usage of DBJoin component: data are read from flat file and joined with the records from database due to conforming key field (lname in records read from file, last_name in records read from database). |
| graphDBFLoad.grf | This graph illustrates reading data from DBF file. |
| graphDBLoad.grf | This graph illustrates how to propagate data to database. |
| graphDBLoad5.grf | This graph illustrates usage of dbFields and cloverFields parameters of DB_OUTPUT_TABLE component. Parameter dbFields allows to specify that only certain fields of target DB table will be populated. Parameter cloverFields allows to specify which input/Clover fields are used for populating target fields. The final mapping is determined by the order in which Clover fields and DB fields appear in the parameters or can be set directly. Note: parameter batchMode allows for grouping of inserts into DB thus increasing throughput. It works only with DBs/JDBC drivers which support this feature (e.g. Oracle). |
| graphDBLoad6.grf | This graph illustrates how to run hand-written SQL/DML against DB. Parameters (question-marks) are substituted by input fields (in order they are listed in cloverFields). Also target parameter types are devised from input field types - this may cause some problems when trying to map incompatible data types. |
| graphDBLookup.grf | This graph illustrates how to join data using reformat component. graphLookupJoin.grf shows how to do similar transformation in easier way. |
| graphDBRead.grf | This graph reads personal data from three sources and finds records with the same values on corresponding fields (lname – last_name, fname – first_name). Matching pairs are transformed into one output record and saved in intersection_customer_employee.txt file. |
| graphDBUnload.grf | In this example, the DBInputTable component connects to database and unloads all records from employee table. It then stores the records in employees.list.out file. |
| graphDBUnloadParametrized.grf | This graph illustrates how to read not all records from database. This example is similar to graphDBJoin.grf, but found records from database are not sent to transformation function, but directly to output port. For illustration in the next phase are read all data from the same table. |
| graphDBUnloadUniversal.grf | This graph illustrates usage of dynamic metadata generation based on SQL query. It allows unloading of database data without prior assembly of DB table metadata. Dynamic metadata can be for example used for easy migration of data between two different databases when one universal graph with parameters specifying which table should be unload/migrated can be used. This graph requires db_table parameter to be defined. When running this graph, use command like this one: java -cp ”../cloveretl.engine.jar:../lib/commons-logging.jar:../lib/log4j-1.2.12.zip:../lib/javolution.jar:javaExamples” org.jetel.main.runGraph -plugins ../plugins-P:db_table=Employee graphDBUnloadUniversal.grf Note: where clause 1=0 in SQL query in dynamic metadata definition is there only for speeding up query processing - no real data is needed, only metadata describing the result - thus we specify condition which ensures that no data should really be returned - it is up to database whether it will optimize the execution plan based on this. |
| graphDenormalizeInline.grf | This graph demonstrates functionality of DENORMALIZER component with denormalization specified by inline Java source. |
| graphDenormalizeTL.grf | This graph demonstrates functionality of DENORMALIZER component with denormalization specified by inline Transformation Language source. |
| graphGenerateData.grf | This graph illustrates usage of Data Generator, Reader, Writer components. It also shows how to use shift attribute on metadata. |
| graphExtFilter.grf | This illustrates usage of Filter and HashJoin components. |
| graphExtFilter2.grf | This graph shows some more functionality of EXT_FILTER component. Filters all employees younger than 41 years. The current age is calculated as today's date minus date of birth. Notice that instead of classical comparison operators ==,!=, …etc… one can use their textual abbreviations .eq. .ne. .lt. .gt. …etc… |
| graphFilter.grf | This graph demonstrates functionality of Filter component. It can filter on text, date, integer, numeric fields with comparison [<, >, ==, <=, >=, !=]. Text fields can also be compared to a Java regexp using ~ (tilda) operator. A filter can be made of different parts separated by a ”;” (semicolon). If one of the parts is verified, the record passes the filter (it's an OR combination of the parts, and AND can be achieved by several filters cascaded). Date format used for comparison depends on input field's format – e.g. if input field HireDate has defined date format “yyyy-MM-dd”, then the same format must be used in filter when specifying date constant. This graph produces two output files: First with all employees who were hired before 1993-12-31. Second with all employees who have in their memo that they are “fluent” in some language Note: some of the comparison operators have to be XML encoded - e.g. ”>” as ”.gt.” etc. |
| graphFixLenByteMode.grf | This graph demonstrates difference between byte mode and char mode for FixLenDataReader. |
| graphIntersectData.grf | This graph demonstrates functionality of DATA_INTERSECT component - performs joining of two sorted (based on specified key) data flows (A and B) and outputs: 1. to port 0 records found only in flow A 2. to port 1 records found in A & B 3. to port 2 records found only in flow B . Records both in flow A and B should be unique |
| graphJms.grf | This graph illustrates usage of JMS Reader/Writer Components. It requires ACtiveMQ JMS server to be running on localhost! |
| graphJoinData.grf | This graph demonstrates functionality of SORT and MERGE_JOIN components. MERGE_JOIN component performs joining of master data with slave data based on specified key. It requires both master and slave data to be sorted according to used key. |
| graphJoinHash.grf | This graph demonstrates functionality of HASH_JOIN components. HASH_JOIN component performs joining of master data with slave data based on specified key. First all slave data (from all slave ports) are read and hash-tables are constructed from them. Then for every master record, attempt is made to find corresponding slave-records. HASH_JOIN performs better than SORT/MERGE_JOIN when you have relatively small slave data set and big master data set. It doesn't require master and slave data to be sorted. |
| graphJoinHashInline.grf | This graph demonstrates how to use HASH_JOIN component for 3way join. HASH_JOIN component performs joining of master data with slave data based on specified key. First all slave data are read and hash-tables are constructed from them. Then for every master record, attempt is made to find corresponding slave-records. |
| graphJoinHashUsingTransform.grf | This graph demonstrates functionality of HASH_JOIN components. Transformation function is written in Transformation Language Lite. |
| graphJoinHashUsingTransformLanguage.grf | This graph demonstrates functionality of HASH_JOIN components. Transformation function is written in Transformation Language, |
| graphJoinMergeInline.grf | This graph illustrates usage of MERGE component. It merges data based on specified key. It can merge master flow with a few slave flows. |
| graphLdapReader_Uninett.grf | Reads information from LDAP directory. |
| graphLdapReaderWriter.grf | This graph illustrates usage of LDAP Reader/Writer components. It requires LDAP server properly installed and configured. |
| graphLookupJoin.grf | This graph joins data from file with data from lookup table. |
| graphLookupReader.grf | This graph illustrates usage of in-memory lookup table. |
| graphMergeData.grf | This graph illustrates usage of MERGE component. It merges data based on specified key. |
| graphMysqlWriter.grf | This graph illustrates usage of MySQL Writer Component. |
| graphNormalizeInline.grf | This graph demonstrates functionality of NORMALIZER component with normalization specified by in-line Java source. |
| graphNormalizeTL.grf | This graph demonstrates functionality of NORMALIZER component with normalization specified by in-line Transformation Language source. |
| graphOrdersReformat.grf | This example illustrates usage of Reformat component. Source data contains info about orders customers made. We add artificial key - ProductID which is a sequence of integers. We also combine individual fields of ship-to-address (address,city,country) into one single field. Reformat component calls reformatOrders class which implements RecordTransform interface (required by Reformat component). The directory in which recormatOrders class exists must be part of the Java classpath |
| graphOrdersReformatExternTransform.grf | This example illustrates usage of Reformat component with transformation code written in extern file. The source code is automatically compiled at runtime. For successful compilation, tools.jar library must be part of CLASSPATH. |
| graphOrdersReformatInline.grf | This example illustrates usage of Reformat component with transformation code (class) inlined (saved with the graph itself). The source code is automatically compiled at runtime. For successful compilation, tools.jar library must be part of CLASSPATH. |
| graphOrdersReformatUsingTransform.grf | This example illustrates usage of Reformat component with transformation written in Transformation Language Lite. |
| graphOrdersTLReformat.grf | This example illustrates usage of Reformat component with transformation written in Transformation Language. |
| graphParametrizedLookup.grf | This graph demonstrates usage of Node-level parameters. |
| graphPartition.grf | This graph demonstrates functionality of Partition component. This component can be used to split input data flow into several (based on number of connected output ports) output data flows. |
| graphPhasesDemo.grf | This graph illustrates usage of several components in different phases. First, all components from phase 0 are run, then the execution continues with phase 1. Data which crosses phase boundary is automatically buffered so the data producing node can finish its work and the data consuming node can later start reading the data. |
| graphRangeLookup.grf | This graph ilustrates usage of range lookup table. During phase 0 lookup table is read from xls file to memory and in the next phase for data from flat file are looked up records from lookup table. |
| SCDType2.grf | Graph with SCD transformation (see Slowly Changing Dimension (SCD) example). |
| graphRevenues.grf | This is practical illustration of usage CloverETL. This graph transforms data for branch bank. On inputs are clients, month interests and charges and half-year bonus. On outputs we have aggregated revenues by clients, clients without revenues and clients from database, who are not in input file. |
| graphSequence.grf | This example extends graphOrdersReformat.grf in a sense that the counter used for generating PRODUCTID is taken from Sequence object which is persistent between graph executions – i.e. you get continuous sequence of unique numbers. For successful compilation, tools.jar library must be part of CLASSPATH. |
| graphSequenceCheckData.grf | This graph illustrates usage of Sequence Checker Component. |
| graphSimpleCopy.grf | This graph demonstrates functionality of SimpleCopy component. Everything brought to SimpleCopy on port 0 is duplicated onto all connected output ports. It also shows functionality of Trash Component. It discards everything which is sent into it. Its purpose is debugging - shows how many records ended in it and can print incoming records, if desired (option debugPrint) |
| graphSimpleCopyEmbeddedMetadata.grf | This graph demonstrates functionality of SimpleCopy component. Everything brought to SimpleCopy on port 0 is duplicated onto all connected output ports. It also shows functionality of Trash Component. It discards everything which is sent into it. Its purpose is debugging - shows how many records ended in it and can print incoming records, if desired (option debugPrint). Metadata in this graph is embedded directly to graph instead of external file. |
| graphSimpleCopyLocale.grf | This graph demonstrates functionality of SimpleCopy component. Everything brought to SimpleCopy on port 0 is duplicated onto all connected output ports. It also shows functionality of Trash Component. It discards everything which is sent into it. Its purpose is debugging - shows how many records ended in it and can print incoming records, if desired (option debugPrint). Metadata in this graph is embedded directly to graph instead of external file. Output metadata is the same as input except that for BirthDate and HireDate, locale property is defined. It allows to specify language or language.Country locale which is used when formatting output. In this example, date info on input is expected to be in dd/MM/yyyy format, whereas on output it should follow US/UK standard → (usually defined as MM/dd/YYYY). |
| graphSimpleLookup.grf | This graph illustrates how to join data using reformat component. |
| graphSortData.grf | This graph illustrates usage of Sort component. It sorts data based on specified key. |
| graphSortUniversal.grf | This graph illustrates usage of Sort component. It sorts data based on specified key. It also shows how to use parameters/global properties within graph. For executing see Sort Universal example. |
| graphSystemExecute.grf | This graph ilustrates usage of System Execute Component |
| graphViewData.grf | This graph is a simple utility which demonstrates usage of parameters. It allows viewing content of particular data file (in delimited format). It reads in data and stores them in debug format. |
| graphXLSReadWrite.grf | In this example data are read from ORDERS.xls file, partitioned to 3 parts and each of thees parts are saved in orders.partitioned$.xls “multifile” in different sheets. (See Different ways of reading files example.) |
| graphXMLExtract.grf | In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields. |