CloverETL examples

To download example package go to download page.

We assume that examples are unzipped to the CloverETL home directory. When executing examples from command line you should be in the examples/SimpleExamples subdirectory (eg. /home/user/cloverETL/examples/SimpleExamples).

(Older examples are not divided into four subdirectories and you should run them within their examples directory. Thus, in the commands the groups of caracters (../../) should be replaced by ../ only.)

Remember that (on Windows) you should use semicolons instead of colons in commands shown below.

Simple Examples

SimpleCopy example

This example illustrates usage of SimpleCopy and Trash components. It reads employees.dat file, parses the records, sends them to SimpleCopy component which in turn replicates them to several Trash components connected to SimpleCopy component. One of the Trash components is configured to dump all incoming records into text file Trash1_output.txt - mostly for debugging purpose. The Trash component normally discards all records it receives on input port.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphSimpleCopy.grf

Or you can use proper script:


This is the expected output produced by the framework:

***  CloverETL framework/transformation graph runner ver 2.1, (c) 2002-06 D.Pavlis, released under GNU Lesser General Public License  ***
 Running with framework version: 2.1 build#385 compiled 14/02/2007 18:12:57

DEBUG [main] - Plugin org.jetel.lookup loaded.
        id - org.jetel.lookup
        version - 2.1.0
        provider-name - null
                point-id  - lookup - {className=org.jetel.lookup.SimpleLookupTable, type=simpleLookup}
                point-id  - lookup - {className=org.jetel.lookup.DBLookupTable, type=dbLookup}

DEBUG [main] - Plugin org.jetel.sequence loaded.
        id - org.jetel.sequence
        version - 2.1.0
        provider-name - null
                point-id  - sequence - {className=org.jetel.sequence.SimpleSequence, type=SIMPLE_SEQUENCE}
                point-id  - sequence - {className=org.jetel.sequence.PrimitiveSequence, type=PRIMITIVE_SEQUENCE}

DEBUG [main] - Plugin org.jetel.thirdparty loaded.
        id - org.jetel.thirdparty
        version - 2.1.0
        provider-name - null
                point-id  - component - {type=XML_EXTRACT, className=org.jetel.component.XMLExtract}
                point-id  - component - {type=FILTER, className=org.jetel.component.Filter}
                point-id  - component - {type=LDAP_READER, className=com.linagora.component.LdapReader}
                point-id  - component - {type=LDAP_WRITER, className=com.linagora.component.LdapWriter}

DEBUG [main] - Plugin org.jetel.connection loaded.
        id - org.jetel.connection
        version - 2.1.0
        provider-name - null
                point-id  - connection - {className=org.jetel.connection.DBConnection, type=JDBC}
                point-id  - connection - {className=org.jetel.connection.JmsConnection, type=JMS}

DEBUG [main] - Plugin org.jetel.component loaded.
        id - org.jetel.component
        version - 2.1.0
        provider-name - null
                point-id  - component - {type=DATA_READER, className=org.jetel.component.DataReader}
                point-id  - component - {type=DATA_WRITER, className=org.jetel.component.DataWriter}
                point-id  - component - {type=DELIMITED_DATA_READER, className=org.jetel.component.DelimitedDataReader}
                point-id  - component - {type=DELIMITED_DATA_WRITER, className=org.jetel.component.DelimitedDataWriter}
                point-id  - component - {type=SIMPLE_COPY, className=org.jetel.component.SimpleCopy}
                point-id  - component - {type=CONCATENATE, className=org.jetel.component.Concatenate}
                point-id  - component - {type=SIMPLE_GATHER, className=org.jetel.component.SimpleGather}
                point-id  - component - {type=REFORMAT, className=org.jetel.component.Reformat}
                point-id  - component - {type=DB_INPUT_TABLE, className=org.jetel.component.DBInputTable}
                point-id  - component - {type=SORT, className=org.jetel.component.Sort}
                point-id  - component - {type=DB_OUTPUT_TABLE, className=org.jetel.component.DBOutputTable}
                point-id  - component - {type=FIXLEN_DATA_WRITER, className=org.jetel.component.FixLenDataWriter}
                point-id  - component - {type=DEDUP, className=org.jetel.component.Dedup}
                point-id  - component - {type=FIXLEN_DATA_READER, className=org.jetel.component.FixLenDataReader}
                point-id  - component - {type=MERGE, className=org.jetel.component.Merge}
                point-id  - component - {type=MERGE_JOIN, className=org.jetel.component.MergeJoin}
                point-id  - component - {type=EXT_MERGE_JOIN, className=org.jetel.component.MergeJoin}
                point-id  - component - {type=SORTED_JOIN, className=org.jetel.component.MergeJoin}
                point-id  - component - {type=TRASH, className=org.jetel.component.Trash}
                point-id  - component - {type=DB_EXECUTE, className=org.jetel.component.DBExecute}
                point-id  - component - {type=HASH_JOIN, className=org.jetel.component.HashJoin}
                point-id  - component - {type=EXT_HASH_JOIN, className=org.jetel.component.HashJoin}
                point-id  - component - {type=CHECK_FOREIGN_KEY, className=org.jetel.component.CheckForeignKey}
                point-id  - component - {type=DBF_DATA_READER, className=org.jetel.component.DBFDataReader}
                point-id  - component - {type=EXT_FILTER, className=org.jetel.component.ExtFilter}
                point-id  - component - {type=EXT_SORT, className=org.jetel.component.ExtSort}
                point-id  - component - {type=PARTITION, className=org.jetel.component.Partition}
                point-id  - component - {type=DATA_INTERSECTION, className=org.jetel.component.DataIntersection}
                point-id  - component - {type=AGGREGATE, className=org.jetel.component.Aggregate}
                point-id  - component - {type=SYS_EXECUTE, className=org.jetel.component.SystemExecute}
                point-id  - component - {type=KEY_GEN, className=org.jetel.component.KeyGenerator}
                point-id  - component - {type=APROX_MERGE_JOIN, className=org.jetel.component.AproxMergeJoin}
                point-id  - component - {type=DBJOIN, className=org.jetel.component.DBJoin}
                point-id  - component - {type=XLS_READER, className=org.jetel.component.XLSReader}
                point-id  - component - {type=XLS_WRITER, className=org.jetel.component.XLSWriter}
                point-id  - component - {type=CLOVER_WRITER, className=org.jetel.component.CloverDataWriter}
                point-id  - component - {type=CLOVER_READER, className=org.jetel.component.CloverDataReader}
                point-id  - component - {className=org.jetel.component.MysqlDataReader, type=MYSQL_DATA_READER}
                point-id  - component - {className=org.jetel.component.MysqlDataWriter, type=MYSQL_DATA_WRITER}
                point-id  - component - {className=org.jetel.component.StructureWriter, type=STRUCTURE_WRITER}
                point-id  - component - {className=org.jetel.component.Normalizer, type=NORMALIZER}
                point-id  - component - {className=org.jetel.component.Denormalizer, type=DENORMALIZER}
                point-id  - component - {className=org.jetel.component.JmsReader, type=JMS_READER}
                point-id  - component - {className=org.jetel.component.JmsWriter, type=JMS_WRITER}
                point-id  - component - {className=org.jetel.component.LookupJoin, type=LOOKUP_JOIN}
                point-id  - component - {className=org.jetel.component.LookupTableReaderWriter, type=LOOKUP_TABLE_READER_WRITER}
                point-id  - component - {className=org.jetel.component.DataGenerator, type=DATA_GENERATOR}
                point-id  - component - {className=org.jetel.component.MultiLevelReader, type=MULTI_LEVEL_READER}
                point-id  - component - {className=org.jetel.component.SequenceChecker, type=SEQUENCE_CHECKER}

DEBUG [main] - Plugin cz.opensys.oracle loaded.
        id - cz.opensys.oracle
        version - 2.1.0
        provider-name -
                point-id  - component - {className=cz.opensys.oracle.component.OracleDataReader, type=ORACLE_DATA_READER}
                point-id  - component - {className=cz.opensys.oracle.component.OracleDataWriter, type=ORACLE_DATA_WRITER}

Graph definition file: graphSimpleCopy.grf
INFO  [main] - Checking graph configuration...
DEBUG [main] - Opening input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [main] - Reading input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
INFO  [main] - Starting WatchDog thread ...
INFO  [WatchDog] - Thread started.
INFO  [WatchDog] - Running on 1 CPU(s) max available memory for JVM 65088 KB
INFO  [WatchDog] - [Clover] Initializing phase: 0
DEBUG [WatchDog] -  initializing edges:
DEBUG [WatchDog] -  all edges initialized successfully...
DEBUG [WatchDog] -  initializing nodes:
DEBUG [WatchDog] -      BROADCAST ...OK
DEBUG [WatchDog] -      BROADCAST2 ...OK
DEBUG [WatchDog] - Opening input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [WatchDog] - Reading input file /home/avackova/home/eclipse/cloverETL/examples/data/delimited/employees.dat
DEBUG [WatchDog] -      INPUT1 ...OK
DEBUG [WatchDog] -      TRASH1 ...OK
DEBUG [WatchDog] -      TRASH2 ...OK
DEBUG [WatchDog] -      TRASH3 ...OK
INFO  [WatchDog] - [Clover] phase: 0 initialized successfully.
INFO  [WatchDog] - Starting up all nodes in phase [0]
DEBUG [WatchDog] - BROADCAST ... started
DEBUG [WatchDog] - BROADCAST2 ... started
DEBUG [WatchDog] - INPUT1 ... started
DEBUG [WatchDog] - TRASH1 ... started
DEBUG [WatchDog] - TRASH2 ... started
DEBUG [WatchDog] - TRASH3 ... started
INFO  [WatchDog] - Sucessfully started all nodes in phase!
INFO  [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 1
INFO  [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO  [WatchDog] - Time: 15/03/07 15:22:29
INFO  [WatchDog] - Node                   Status     Port      #Records         #KB  Rec/s   KB/s
INFO  [WatchDog] - ----------------------------------------------------------------------------------
INFO  [WatchDog] - BROADCAST              FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] -                                   Out:0           12           8      0       0
INFO  [WatchDog] -                                   Out:1           12           8      0       0
INFO  [WatchDog] -                                   Out:2           12           8      0       0
INFO  [WatchDog] - BROADCAST2             FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] -                                   Out:0           12           8      0       0
INFO  [WatchDog] - INPUT1                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                         Out:0           12           8      0       0
INFO  [WatchDog] - TRASH1                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] - TRASH2                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] - TRASH3                 FINISHED_O
INFO  [WatchDog] - %CPU:-1.9                          In:0           12           8      0       0
INFO  [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO  [WatchDog] - Forcing garbage collection ...
INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 0                                  1              1007
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO  [main] - WatchDog thread finished - total execution time: 1 (sec)
INFO  [main] - Graph execution finished successfully
Execution of graph successful !

Reformat example

Reformat component changes format and/or content of data records by applying “reformat” method on all data records flowing through it. For every record it receives through input port it calls transform() method of specified class (name defined through transformClass attribute of Reformat component), which has to implement org.jetel.component.RecordTransform interface. The transformation class has to be compiled separately prior to running Reformat component and it must reside in directory accessed by Java when searching for classes (i.e. either included in CLASSPATH or defined using ”-cp” parameter when starting java VM).

Executing:
Prepare reformat class:
javac -cp "../../lib/cloveretl.engine.jar" trans/reformatOrders.java
Execute graph:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphOrdersReformat.grf

Reformat example with transformation class source embedded

This example is similar to previous one except transformation class (its source code) is directly embedded into graph definition file. The source code is automatically compiled when the graph is started and then transformation object is created to be executed for each record.
When the graph is ran subsequently, the source is recompiled only if it changes between two runs.
In order to run successfully, environment variable $JAVA_HOME have to be set correctly.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphOrdersReformatInline.grf

Reformat example with Transformation Language Lite

This example is similar to previous ones except transformation is given in special form. If you need simple transformation, when output fields are combinations of inputs fields and some parameters, this is the simplest way how to do it.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphOrdersReformatUsingTransform.grf

Reformat example with Transformation Language

This example is similar to previous ones except transformation is given in Etl transform language. In this language you can write complicated transformations in relatively easy way.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphOrdersTLReformat.grf

Merge example

Merge component combines several sorted input data flows into one based on specified key. The resulting data flow is sorted according to specified key.
This component can be successfully used when you have several sorted data files and you need to combine them into one big sorted file, but you don't want to waste resources on resorting the big file.
The testing graph reads data form orders.sorted file (actually three readers access the file at the same time) and stores the result into orders.merged file.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphMergeData.grf

Filter example

Filter component allows filtering any input data records based on specified filter condition. It can split flow of data into two. One with records which fulfill the condition and the other with those which don't.
Filter condition is composed of one or more simple comparisons <,>,<=,>=,==,!=. Individual comparisons can be combined using logical operators “AND” and “OR”. Parenthesis can be used to group comparisons. There are some logical functions (see description of Extended Filter Component) you can use too.
This graph is a bit more complex and illustrates usage of Filter and HashJoin components. It produces four output files:

  • all employees who were hired before 1993-12-31
  • all employees who have in their memo that they are “fluent” in some language
  • all employees who have in their memo that they are “fluent” in some language and were hired before 1993-12-31
  • all employees who were hired after 1993-12-31

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphExtFilter.grf

JMS transfer of XMl file and XPath parser example

since 2.4

Unlike the other examples on this page, this example must be run within the examples/ExtExamples directory. This graph shows:

  • in phase 0 shows how to use JMS Reader/Writer for transfer one single field containing XML
  • in phase 1 shows how to parse this XML with XPathParser and create complex DataRecords

Nodes description

Input

Input reads XML from file data-in/other/customers.xml. Puts one record with one field to output port. Xml for this example contains data with one customer element and one related order element.

JmsWriter

JmsWriter sends XML as bodyField of single message to specified connection.

JmsReader

JmsReader reads message from specified connection. Creates one record with one field from message's bodyField. Puts created record to output port.

XPathTransform (Reformat)

XPathTransform contains inline java code which uses XPathParser to apply specified XPath mappings on input XML. XPath mappings are specified like this:

<Context xpath="/records/customer" outPort="0">
	<Context xpath="/records/customer/order" outPort="1" />
</Context>
This means that record from content of element customer will be created and put on port 0 and record from its child elemennt order will be created and put on port 1 Both ports should have connected edges with proper metadata.

OutCustomer

OutCustomer writes customer record into file data-out/customers.out

OutOrder

OutCustomer writes order record into file data-out/orders.out

Execution

It's necessary to have accessible JMS server and JMS connection parameters set properly. All libraries required by your JMS connector implementation have to be in classpath or in libraries attribute of JMS connection.

Join example

Join component joins two or more sorted input data flows into one based on specified key. It can produce classical inner join or left outer join (based on parameters).
This component can be useful when you have main data set - for example with orders (driver) containing part number and you need to join it with other data set containing price for each part (slave).
The testing graph reads data form orders.dat file (driver) and employees.dat and produces records with combined information from both files.
Each pair of corresponding records is transformed by reformatJoinTest.class. The source code for this class is attached as well.

Executing:
Prepare reformat class:
javac -cp "../../lib/cloveretl.engine.jar" trans/reformatJoinTest.java

Execute graph:
java -cp "../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphJoinData.grf

Hash Join example

Hash Join component joins two or more input data flows into one based on specified key. It can produce classical inner join or left outer join (based on parameters).
Functionality of this component is similar to previous one, but it doesn't require input data to be sorted. It is most suited to join data sets when one is relatively large (master) and the others are small (slaves) - typically when using lookup table to convert key into corresponding value.
The testing graph reads data form orders.dat file (driver) and employees.dat and produces records with combined information from both files.
Each pair of corresponding records is transformed by reformatJoinTest.class. The source code for this class is attached as well.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphJoinHash.grf

Hash Join with transformation class source embedded

As with Reformat component, transformation class which is executed for input records can be embedded directly into graph. It is again dynamically compiled at run-time.
In this example you can see joining more then two input sets. Master data from orders.dat are joined with two slave inputs: first employees.dat due to key field EmployeeID and with the second, customers_delimited.txt, due to key field CustomerID. So in output record there are information from all three input files.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphJoinHashInline.grf

Hash Join with Transformation Language Lite

As with Reformat component, transformation can be written in summary form.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphJoinHashUsingTransform.grf

Hash Join with Transformation Language

As with Reformat component, transformation can be written in Etl transform language.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphJoinHashUsingTransformLanguage.grf

Sort example

Sort component sorts input data flow based on specified key. You can use more than one field name for key.
Sorted data flow is required by several components (Merge, Join, etc..) so this is very useful component.
The testing graph reads data form orders.dat file, sorts them and stores the result into orders.sorted file.

Executing:
java -cp "../../cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphSortData.grf

Sort Universal example

Similar to previous example, this one sorts input data. The difference is that the exact datafile to be sorted is not hardcoded within graph definition, it is determined based on supplied parameters.
CloverETL allows global graph properties to be defined and these properties can be reference from within the graph. Each reference is replaced by property value at run-time. In our example parameter's definitions are embedded in graph code but it is possible to define parameters in-line.

Graph can be executed by the following command:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins graph/graphSortUniversal.grf)

Or if you want to specify the parameters in-line:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins -P:sortKey=<..your.fieldName..> -P:metadata=<..metadata filename..> -P:data=<..datafile..> -P:outputData=<..datafile..> graph/graphSortUniversal.grf

The resulting command with parameters specified in-line would look like this:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans" org.jetel.main.runGraph -plugins ../../plugins -P:sortKey="CompanyName" -P:metadata="meta/delimited/customers.fmt" -P:data="data-in/delimited/customers_delimited.txt" -P:outputData="data-out/output.txt" graph/graphSortUniversal.grf

Phases example

Components can be grouped into individual phases. Each phase has assigned a unique number. In each transformation graph, there must exists at least one phase (with any number assigned to it). Execution of graph is done by phases. It starts with phase with the lowest number and continues in ascending order. All components in particular phase are initialized and run. Then components from next phase are started and so on. If there are any connections (Edges) between components from different phases, these connections are buffered so the first component (from preceding phase) can finish its job and the second component (from following - not necessarily next) can read the data produced by the fist component.

Example presented here is divided into three phases. First,data is read and sorted, in second,data is copied into three similar flows. In third, data is again merged into one flow and saved to file.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphPhasesDemo.grf

Partition data example

Partition component offers various ways of splitting input data flow into several output flows. The exact number of output flows depends on how many output ports are connected.
There are currently three partition algorithms implemented:

  • Round-Robin - first record goes into first port, second into second…etc. When the last port was used, it starts again with the first one.
  • Hash - based on specified key, a hash value is calculated and based on it the output port is selected. It should be secured that for two different records with same keys, the exact same output port is selected.
  • Range - based on specified key and range array, records are sent to output port based on value present in key-fields.

It is possible to write your own partition function. It has to implement org.jetel.component.partition.PartitionFunction and can be write as java source or in Etl transform language.

Executing example:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphPartition.grf

System call example

It is possible to call system command from CloverETL framework. System execute component can be standalone component or can work with records flowing form other component. Output from executing system command can be sent to other component or can be logged to file. If there is not given interpreter, “command” attribute is treated as simple system command with parameters, else “command” attribute is saved in temporary batch file and there is called interpreter to execute this script

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphSystemExecute.grf

Working with Excel files example

CloverETL can read and write data from/to xls files. In this example data are read from ORDERS.xls file and writer partitions them due to values on ShipCountry field. For each different value there is created new xls sheet. Partition due to kety field value works since 2.4 CloverETL version.

Important: 1.If you want to write to different sheets from different writer component each writer has to run in different phase
2. jxl.jar (for older - till 2.2 CloverETL poi-2.5.1.jar) - reading/writing to excel files library, has to be in execution path.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:../../lib/jxl.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphXLSReadWrite.grf

XML extract example

CloverETL can read data from xml files. In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.

Important: tools.jar has to be in execution path - till 2.4 version, when Clover uses janino compiler.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphXMLExtract.grf

XML XPath reader example

CloverETL can read data from xml files. In this example data are read from employees.xml file. “Mapping” attribute describes dependencies between xml nodes and metadata fields.

Important: tools.jar has to be in execution path - till 2.4 version, when Clover uses janino compiler.

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphXPathReader.grf

Different ways of reading files

There are few components for reading data from flat file: Delimited Data Reader – reads delimited data, Fixlen Data Reader – reads data of fixed length and Universal Data Reader, which can read delimited or fixed length data depending on metadata on output port. There are others reader components for reading data in different formats: DBFDataReader – reads dBase tables, XMLExtract – reads xml files, XLSDataReader – reads excel files, Clover Data Reader – reads data saved in clover internal format. Most of thees readers can read data from “MultiFile” - it means that in fileURL attribute can be wildcards as * or ?. (For more information see Readers.) This is illustrated by following example:

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphCloverData.grf



Fix lenght data reader and byte mode using. This graph demonstrates difference between byte mode and char mode for FixLenDataReader.


Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans"" org.jetel.main.runGraph -plugins ../../plugins graph/graphFixLenByteMode.grf

This file is an input file.

BLOB reformat

This example covers BLOB related issues. It shows you how to convert Clover string field to Clover byte and save it into BLOB field of IBM Db2 database.

To run this graph, you need to have a database table with following structure:

CREATE TABLE EMP_BLOB (
  EMPID INTEGER,
  LASTNAME BLOB(100) 
); 

Executing:
java -cp "../../lib/cloveretl.engine.jar:../../lib/commons-logging.jar:../../lib/log4j-1.2.12.jar:../../lib/javolution.jar:trans:../../lib/janino.jar" org.jetel.main.runGraph -plugins ../../plugins graph/graphBlobReformat.grf

If you have longer fields then in this example, you need to increase following values in file cloveretl.engine.jar\org\jetel\data\defaultProperties (default values are optimized for speed):

Record.MAX_RECORD_SIZE = 8192\\
DataParser.FIELD_BUFFER_LENGTH = 512\\
DataFormatter.FIELD_BUFFER_LENGTH = 512

Data policy

For identifying errors in input data file there can be used dataPolicy attribute on Data Reader component. This attribute value is strict on default.
Let's examinate following graph with different values of dataPolicy attribute.

For dataPolicy=strict graph finishes with error:

.
.
INFO  [WatchDog] - Sucessfully started all nodes in phase!
FATAL [WatchDog] - !!! Fatal Error !!! - graph execution is aborting
ERROR [WatchDog] - Node DATA_READER0 finished with status: ERROR caused by: account_num cannot be set to 8747577056D : "8747577056D" in record # 3 in field # 1
DEBUG [WatchDog] - Node DATA_READER0 error details:
org.jetel.exception.BadDataFormatException: account_num cannot be set to 8747577056D : "8747577056D" in record # 3 in field # 1
	at org.jetel.data.LongDataField.fromString(LongDataField.java:389)
	at org.jetel.data.parser.DataParser.populateField(DataParser.java:517)
	at org.jetel.data.parser.DataParser.parseNext(DataParser.java:410)
	at org.jetel.data.parser.DataParser.getNext(DataParser.java:129)
	at org.jetel.util.MultiFileReader.getNext(MultiFileReader.java:235)
	at org.jetel.component.DataReader.execute(DataReader.java:181)
	at org.jetel.graph.Node.run(Node.java:366)
	at java.lang.Thread.run(Thread.java:595)
WARN  [WatchDog] - Interrupted node: DATA_READER0
WARN  [WatchDog] - Interrupted node: DATA_WRITER0
WARN  [WatchDog] - Interrupted node: TRASH0
INFO  [WatchDog] - ----------------------** Final tracking Log for phase [0] **---------------------
INFO  [WatchDog] - Time: 17/01/08 12:34:07
INFO  [WatchDog] - Node                   Status     Port      #Records         #KB  Rec/s    KB/s
INFO  [WatchDog] - ----------------------------------------------------------------------------------
INFO  [WatchDog] - ---------------------------------** End of Log **--------------------------------
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!

INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 ERROR                                                 
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
Execution of graph failed !

With dataPolicy=controlled graph execution finishes successfully. All parsed records are processed and info about unparseable records is logged or sent to log port:

.
.
INFO  [WatchDog] - Starting up all nodes in phase [0]
DEBUG [WatchDog] - DATA_READER0 ... started
DEBUG [WatchDog] - DATA_WRITER0 ... started
DEBUG [WatchDog] - TRASH0 ... started
INFO  [WatchDog] - Sucessfully started all nodes in phase!
+-------+--------+-----------------+--------------------------------------------------------------------------------------+
|Record |recNo   |offendindValue   |errmes                                                                                |
+-------+--------+-----------------+--------------------------------------------------------------------------------------+
|# 1    |3       |8747577056D      |account_num cannot be set to 8747577056D : "8747577056D" in record # 3 in field # 1   |
+-------+--------+-----------------+--------------------------------------------------------------------------------------+

When dataPolicy is set to lenient only parseable records are processed and there is no info about incorrect records:

.
.
INFO  [WatchDog] - Starting up all nodes in phase [0]
DEBUG [WatchDog] - DATA_READER0 ... started
DEBUG [WatchDog] - DATA_WRITER0 ... started
DEBUG [WatchDog] - TRASH0 ... started
INFO  [WatchDog] - Sucessfully started all nodes in phase!
INFO  [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 1
INFO  [WatchDog] - ----------------------** Final tracking Log for phase [0] **---------------------
INFO  [WatchDog] - Time: 17/01/08 12:43:43
INFO  [WatchDog] - Node                   Status     Port      #Records         #KB  Rec/s    KB/s
INFO  [WatchDog] - ----------------------------------------------------------------------------------
INFO  [WatchDog] - DATA_READER0           FINISHED_OK    
INFO  [WatchDog] -  %cpu:..                          Out:0           11           3      0       0
INFO  [WatchDog] -                                   Out:1            0           0      0       0
INFO  [WatchDog] - DATA_WRITER0           FINISHED_OK    
INFO  [WatchDog] -  %cpu:..                           In:0           11           3      0       0
INFO  [WatchDog] - TRASH0                 FINISHED_OK    
INFO  [WatchDog] -  %cpu:..                           In:0            0           0      0       0
INFO  [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO  [WatchDog] - Forcing garbage collection ...
INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 FINISHED_OK                        1              2631
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
Execution of graph successful !

simple_examples.txt · Last modified: 2009/09/16 16:00 by jausperger
Back to top
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0