CloverETL Transformation Language concept

Program structure

Program in Transform Language consist of two parts. The first part is declarations of variables, function declarations and statements. They don't have to be in this order, but variables and functions are visible after their declaration. You can interspersed these parts. In the second part, there is a mapping, where you can assign values to output data records. After mapping there are not allowed declarations nor statements.

The syntax of Transform Language is described by BNF grammar. The BNF diagram of this grammar you can see here.

So schema of example programme could look like that:

  ImportStatement;
  VariableDeclaration1;
  FunctionDeclaration1
  Statement1
  FunctionDeclaration2
  FunctionDeclaration3
  Statement2
  VariableDeclaration3;
  Mapping1;
  Mapping2;
  Mapping3;

Data Types

Type name Description Default value Declaration syntax
int simple type of width 32-bit with range from -2147483648 to 2147483647 (in relation to Java type: Integer.MIN_VALUE to Integer.MAX_VALUE). Integer.MIN_VALUE is interpreted as null 0 int identifier;
long simple type of width 64-bit with range from -9223372036854775807 to 9223372036854775807 ( in relation to Java type: Long.MIN_VALUE+1 to Long.MAX_VALUE). You can cast int number to long by adding l on the end of number e.g. long l; l=214748364l + 2147483647l 0 long identifier;
decimal type representing arbitrary-precision decimal numbers. Default values of DECIMAL_LENGTH and DECIMAL_SCALE are in file org.jetel.data.defaultProperties (cloveretl.engine) and are, by default, set to 8 and 2 respectively. If you declare variable as decimal type; it is not possible to assign to the variable any value with larger length or scale. You can cast float number/literal to decimal by adding d at the end of number e.g. decimal dec=2148.34d decimal with default length and scale, with undefined value. decimal [(length,scale)] identifier;
number or double type representing double numbers (64 bits to store a value). It has three special values: NaN, Infinity and -Infinity. 0.0 number identifier;
string type for storing sequences of characters. Sequence of characters has to be in single or double quotes. For characters that are impossible to enter directly, there are several escape sequences:
 Escape sequence  | Description                     
 \\ddd            | octal character (ddd)           
 \\uxxxx          | hexadecimal (UNICODE) character (xxxx) 
 \\'              | single quote          
 \\”              | double quote
 \\\\             | backslash
 \\r              | carriage return 
 \\n              | new line (line feed) 
 \\f              | form feed
 \\t              | tab        
 \\b              | backspace
empty string string identifier;
date type for storing date and time in format yyyy-[M]M-[d]d [[H]H:[m]m:[s]s] , e.g.: 2001-01-02 2002-02-03 12:30:02 2003-03-04 2:15:7 current date and time date identifier;
boolean simple type with two values true and false. false boolean identifier;
bytearray array of bytes with unlimited length - behaves similarly to list empty array bytearray identifier;
bytearray (size) identifier;
list container for storing list of elements of any type indexed by numbers. Example:
list l;int i; for(i=0;i<20;i++) { l[]=i; print_err(l[i]; }
l[4]=99;
Assignment : list1=list2 means that both lists reference the same elements, assignment: list2[]=list1 causes coping of all list1 elements to the end of list2
list2[]='abc' adds string 'abc' as the last element of list2
list2[]=null removes last element of list2, trunc(list1) will delete all members from list1
empty list list identifier;
map container for storing elements of any type indexed by string-key. Example:
string key='abc'; map m; m['f1']='hello'; m['f2']='xxx'; m[key]='123';
empty map map identifier;
record record (set of fields) with structure based on existing metadata - behaves similarly to map However, can also be indexed by numbers starting from 0. Examples:
record(MetdataId) my_record; - declares record with structure defined by MetadataId
record(@1) another_record; - declares record with the structure defined by 1st input port (0 based)
my_record[3] - refers to the 3rd field (0 based) of my_record
n/a record(metadataID) identifier;
record(@inputRecordName) identifier record(@inputRecordNumber) identifier

You can see numeric type in text below, this is label for int, long , decimal, number and double types.

Literals

Type Description Declaration syntax Example
integer digits representing integer number [0-9]+ 95623
long integer digits representing integer number with value greater than 232 [0-9]+ 956230781312312331287L
hexadecimal integer digits and letters representing integer number in hexadecimal form 0x[0-9A-F]+ 0xA7B0
octal integer digits representing integer number in octal form 0[0-7]* 0644
numeric floating point number represented by 64bits in double precision format [0-9]+.[0-9]+ 456.123
decimal decimal number with fixed precision [0-9]+.[0-9]+D 123.456D
double quoted string string value/literal enclosed in double quotes; escaped characters [\n,\r,\t, \\, \”] get translated into corresponding control chars ..anything except [].. hello\tworld\n\r
single quoted string string value/literal enclosed in single quotes; only one escaped character [\'] gets translated into corresponding char ['] '..anything except [']..' 'hello\tworld\n\r'
list of literals List of literals where individual literals can be also other lists/maps/records [ <any literal> (, <any literal>)* ] [10, 'hello', “world”, 0x1A, 2008-01-01 ]
[ [ 1 , 2 ] ] , [ 3 , 4 ] ]
date Date value mask: yyyy-MM-dd 2008-01-01
datetime Datetime value mask: yyyy-MM-dd HH:mm:ss 2008-01-01 18:55:00

Variables

Variables must be declared before they can be used. Declaration form:

  type identifier;
  type identifier = ..initializer..;

Each variable has to be declared separately. Initialization of variable can be done as separate expression or directly as a part of declaration statement. Variable is visible after its declaration. It can't be declared in a block.

Operators

Arithmetic operators

Arithmetic operations are not symmetrical. Result of an operation depends on the type of left argument and the precision can't be greater than precision of left argument.

Operator Result
+ Addition operator can be used between variables of the same type except boolean and date types. Usage of addition operator between different numeric types results theirs sum (adding two int variables or int and long can cause over-flow if you don't use type casting) . Addition for strings results its concatenation. You can use addition with string and other types (not vice versa) resulting original string with attached string representation of the variable. Function concat() works faster then addition, so if it is possible it is better to use it instead of addition operator. For date variable it is possible to use addition with numeric type. The result is date with added number of whole days from numeric variable, e.g.: 2004-01-30 + 1.5 = 2004-01-31
- Subtraction (also unary minus). Subtraction operator can be used between numeric variables or for date with numeric.
* Multiplication (can be used between numeric types only)
/ Division (can be used between numeric types only). Dividing by zero throws TransformLangExecutorRuntimeException: Interpreter runtime exception on line … column … : div - arithmetic exception - / by zero (all numeric types except number) or gives Infinity (number type).
% Modulus (can be applied to floating-point types as well as integer types)
++ Increment operator increases its operand by one. When used in prefix mode value of the variable is incremented and then used (i=1;j=++i; gives j=2 and i=2), when used in postfix mode increment is made as last operation (i=1;j=i++; gives j=1 and i=2). It can be applied to floating-point types as well as integer types.
-- Decrement operator decreases its operand by one. When used in prefix mode value of the variable is decremented and then used (i=1;j=–i; gives j=0 and i=0), when used in postfix mode increment is made as last operation (i=1;j=i–; gives j=1 and i=0). It can be applied to floating-point types as well as integer types.

Examples:

Declaration and initialization Operation Result
int i; i=10; number n1; n1=0.1; number n2; n2=i+n1
n2=n1+i
10
10.1
decimal d1;d1=0.1d; decimal(10,4) d2; d2=0.001d; decimal(10,4) result; result=d1+d2
result=d2+d1
0.1000
0.1010
int i;i=10; number n1;n1=0.1; number result; result=i/n1

result=n1/i
TransformLangExecutorRuntimeException: Interpreter runtime exception on line … column … : div - arithmetic exception - / by zero arg[0] CloverInteger “10” arg[1] CloverDouble “0.1”
0.01

Relational operators

Operator result
==
.eq.
Equal to. Can be used for comparison of compatible variables of any type.
!=
<>
.ne.
Not equal to. Can be used for comparison of compatible variables of any type.
>
.gt.
Greater then. Can be used for comparison of compatible variables of numeric, date or string types.
<
.lt.
Less then. Can be used for comparison of compatible variables of numeric, date or string types.
>=
=>
.ge.
Greater then or equal to. Can be used for comparison of compatible variables of numeric, date or string types.
<=
=<
.le.
Less then or equal to. Can be used for comparison of compatible variables of numeric, date or string types.
~=
.regex.
Checks if given string matches against the regular expression. Examples: * s~="[^abc].*" means that any character (.) except (^) a or b or c ([abc]) may occur zero or more times * s.regex.'[a-zA-Z]{3}' means that characters a through z or A through Z, inclusive (range) occur exactly 3 times For complete rules see documentation of java.util.regex.Pattern
.in. Is contained in list. Can be used for checking whether specified value is in list of other values (represented by list, map, list of literals).

Operators in form .operator. should be led and trailed by space.

Note: result of comparison can depend on order of elements if they have different types eg.:

decimal var1=8.89;
int var2=8;
bool result1=var1 .eq. var2;
bool result2=var2 .eq. var1;
result1 is false while result2 is true.

Example:

  list listvar=['A','B','C','D','E'];
  bool_var= int_var1 .lt. $fieldName
  bool_var= "A" .in. listvar;
  bool_var= value .in. [1, 2, 3 ,10, 50, 60];

Boolean logical operators

operator result
||
or
logical OR
&&
and
logical AND
!
not
logical NOT
==
.eq.
Equal to
!=
<>
.ne.
Not equal to

Operators in form .operator. should be led and trailed by space.

Example:

  bool_var= int_var1 .eq. $fieldName

Control Statements

Selection statements

IF statement

If statement is conditional branch statement. It can be used to route program execution through two different paths. If statement general form:

if (Expression) Statement else Statement1

Statement may be a single statement or a compound statement enclosed in curly braces. The Expression is any expression that returns a boolean value. The else clause is optional. It is possible to use ladder if statement:

  if (Expression1) Statement1
  else if (Expression2) Statement2
  else if (Expression3) Statement3
  else Statement-n

where it is executed from top down. As soon as one of the Expression is true, the Statement associated with its if is executed, and rest of the ladder is bypassed. If none of the condition is true, then the final else Statement is executed. This final else acts as a default condition. If there is no final else and all other conditions are false, then no action will take place.

Examples:

  if (d==0.1) n=0;
 
  if (d==0.1 || l<=1) 
     n=0;        
  else
     {n=-1;print_err('n equal -1')}
 
  if (date1<=date2) {
     print_err('before if (i<j)');
 
     if (i<j)
        print_err('date1<today and i<j') 
     else
        print_err('date1<date2 only')
 
     result=true;
  }

Switch statement

Switch statement is multi-way branch statement. It dispatches program execution to different parts of code based on the value of an expression. Switch statement form:

  switch (Expression) {
    case Expression1:Statement1
    case Expression2:Statement2
    default: Statement-n
  }

First Expression can be of any type and the type of the other Expressions has to be compatible with this type. It means that following example is correct and for n=2 will be executed Statement2:

  int n;
  number m; m=2.0;
 
  switch (n) {
    case 1:Statement1
    case m:Statement2
    case 2:Statement3
    default:Statement4
  }

In case that case expressions have the same value, all statements will be executed (in example above for n=2 Statement3 as well as Statement2 will be executed).

The switch statement works like this: the value of the Expression is compared with each of the expressions in the case statements. If a match is found there is executed the statement from corresponding case statement and comparing continues. If none of the expressions matches the value of the Expression, then the default statement is executed. Default statement is optional, so if no Expression1 till Expression-n-1 matches and no default is present , then no further action is taken.

Iteration statements

While loop

While loop repeats a statement while its controlling expression is true. While loop form:

while (Expression) Statement

The Expression can be any boolean expression. The Statement will be executed as long as the conditional expression is true. When Expression becomes false, control passes to the next line of code immediately following the loop. Since the while loop evaluates its conditional expression at the top of the loop, it can happen that Statement will not be executed at all.

Example:

  while (born<now) {
    born=dateadd(born,1,year);
    while (yer<5) yer=yer+1;
    yer=yer+1;
  }

Do-while loop

Do-while loop is similar to while loop, but the Statement is executed always at least once, even if the Expression is false from the beginning. Form of the do-while loop:

do Statement while (Expression)

Each iteration of the do-while loop first executes Statement and then evaluates the Expression. If this Expression is true, the loop will repeat, otherwise the loop terminates.

Example:

  do {
    born=dateadd(born,1,year);
    do yer=yer+1; while (yer<5);
    yer=yer+1;
  } while (born<now)

For loop

For loop is usually used , when there is known number of iteration. For loop form:

  for (initializationExpression;conditionExpression;iterationExpression) Statement

The for loop operates as follows: when the loop first starts initializationExpression is executed (it can be assignment or only name of variable controlling the loop). InitializationExpression is executed only once. Next conditionExpression is evaluated (it has to be boolean expression). It usually tests the loop control variable against the end's condition. If this expression is true Statement is executed else the loop terminates. Next the iterationExpression is executed. This is usually expression, which increments or decrements loop control variable. The loop then iterates, first evaluating conditionExpression, executing Statement, then iterationExpression and so on until the conditionExpression is false.

Examples:

  for (born;born<now;born=dateadd(born,1,year)) ++yer;
  for (born;!b;++yer) if (yer==100) b=true;

For each loop

For each loop is used for iteration over the list, map or record. All elements of the iterable must be of the same type. For each loop form:

  foreach (variable : iterableVariable) Statement

The Statement is executed once for each element from the iterableVariable, i.e. depending of the type of iterable variable:

  • for each element of the list
  • for each field of data record
  • for each map value

Examples:

list keyList = [1,4,6,14,16];
int i;
foreach (i : keyList) {
   tmp = lookup(LookupTable0,i).value;
   my_function(tmp);
}
string value;
string newValue = '';
record(@0) rec = @0;
foreach(value : rec){
  newValue = newValue + value;
}
foreach(value : my_map){
  newValue = newValue + value;
}

Jump statements

Break statement

Break statement is used for force termination of loop. It consist of one word: break. When a break statement is encountered inside a loop, the loop is terminated and program control resumes at the next statement following the loop. When it is used in nested loop only the innermost loop is terminated. For example:

  int year;year=0;
  int i;
 
  while (year<33) {
    year=year+1;
    for (i=0;i<20;i=i+1)
    if (i==10) break
  }

after the while loop we have year=34, i=10.

Continue statement

Continue statement breaks the current iteration of loop and control of program continues with next iteration or iterationExpression in for loop. Syntax: continue

Return statement

Return statement is used to explicitly return from a function. It means that execution of program is transferred back to the place the function was called and to the caller is assigned function's return value. The return statement is the last statement executed in function. If in a function there is no return statement, execution of the functions ends on last statement and the return value of function is null. Return statement used in block causes skipping all statements after it till the end of block. General form of return statement:

return Expression

Expression is non-obligatory. Return statement without Expression returns null value.

Example:

  function year_before(now) {
    return dateadd(now,-1,year)
  }

Error handling

The try-catch statement is used for error handling. Syntax of this statement is following:

try Statement1 catch (stringVariable) Statement2

The program tries to execute Statement1. If the execution is successful, the program continues with the next statement after the whole try-catch statement. If an exception occurs, when trying to execute the Statement1, program doesn't fail, but executes Statement2, which can correct the previously unsuccessfully executed Statement1. The reason of the failing is saved to stringVariable.

Example:

string errmes;
int i;
.
.
.
try {
  i = "some string"; //Statement 1 - unsuccessful as string cannot be assigned to integer
} catch (errmes) { 
  print_log(error, errmes); //Statement 2 - error message is written to logger
  i = 12345; //Statement 2 - unsuccessfully executed Statement1 is corrected with a new assignment 
             //of integer to the integer variable
}
After executing the above code, the following message is visible in logger:
ERROR [DATA_GENERATOR0] - class org.jetel.interpreter.TransformLangExecutorRuntimeException#Interpreter runtime exception on line 27 column 13 : invalid assignment of "some string" [STRING] to variable "i" [INTEGER] " - incompatible data types

Functions

In Transform Language it is possible to create own functions. Function declaration in general has following form:

  function functionName(arg1,arg2,...) {
    variableDeclarations;
    Statements
    Mapping
    [return [Expression]]
  }

Variable declarations and statements must be in this order. All variables must be declared before any statement. Mapping must follow the last statement and only one return statement can follow the mapping. Variables declared in function are local and are visible only inside function. Function can return a value; to do it use return statement in form return Expression. If there is no return statement used in function it returns null value and works as procedure.

Examples:

  function year_before(now) {
    return dateadd(now,-1,year)
  }
 
  function mapName(){
    string result;
    print_err('function');
    result='result';
    $Name:=result;
  }

Calling functions:

  date yearBefore;
  yearBefore=year_before(2000-01-01);
  mapName;

Function for Error Messages

In Transform Language it is also possible to create a function for your own error messages. Function declaration has the following form:

  function getMessage() {
    //some code
    return expressionOrVariableWithYourOwnErrorMessage;
  }

In this function you need to define your own error message and return its text. You can return any expression or string variable whose value is the error message of your own.

Import

It is possible to import code from other file to current program. Instruction

import "<fileName_URL>";
is replaced by the code from specified external file. The import statement has to be the first statement in the scrip !

Example:

   import '/home/usr/cloveretl.engine/data/tlExample.ctl';
   import 'http://myserver/coderepos/tlFunctions.ctl';

See appendix_1_-_attributes for <fileName_URL> options.

Eval

It is possible to evaluate (parse&execute) string which contains CTL expression (or piece of program/code).
There exist two variants of eval command:

  • eval() - evaluate expression, make it part of the AST - you may, for example, add whole new function declaration this way
  • eval_exp() - evaluate expression, return whatever was the result

Example:

string str='print_err("eval test OK")';
eval(str); 
print_err(eval_exp('"ahoj"'));

The main difference between eval() and eval_exp() is that the latter parses & executes the expression and then cleans up everything. This is good if you just need to evaluate mathematical expression or perform simple task. If you need to add new function definition to the existing script, then you can use eval() which adds the parsed expression to the script within which is the eval() executed.

Data flows (records)

Input records can be accessed with following syntax:

@inputRecordName or @inputRecordNumber

Then it is possible to manipulate with such record as with any other record, e.g.

record (metadataId) my_record = @0;
my_value = @inputRecord[1]

The above syntax can serve as a metadata provider also, i.e. when you declare a new record, you can define it as follows:

record(@0) rec;
rec[0] = "test";

You can refer to the individual fields in the data flows (records) entering and leaving a component. You can refer to the flows either by their sequential order numbers counting from 0 or by a metadata name. Fields in the flows are referenced by their name or by their sequential order number. Flow reference and a field reference shall be separated by a dot (”.”). Flow reference can be omitted (if the field is referenced by its name), in which case it defaults to zero.
If a field is referenced by its number, flow must also be referenced, either by its number or name.
References must be prefixed with $.

$[<flow number>|<flow name>.]<field name>
$<flow number>.<field number>|$<flow name>.<field number>

Examples:

example description
print_err($1.City) prints out on the error output value of “City” field from 2nd (0-based) input flow
print_err($1.3) prints out on the error output value of “City” field (4th field) from 2nd (0-based) input flow
datediff($Born, 2005-1-1,month) computes month's difference between value of “Born” field from 0th input flow and 2005-1-1
$results.Field5:=5 assigns constant value 5 to field called “Field5” of the output flow described by metadata named “results”

Parameters

You can also use parameters in your CTL definition. If you have defined any parameter (MyParameter), you can use it in any transformation by typing '${MyParameter}'. It is better if you use only single quotes in the parameters that are used in CTL. Otherwise, some problem with escape sequences may arise when using double quotes.
You can also use all environment variables.

Sequences

Transformation language allows using of sequences defined in graph. Sequences are accessed through their IDs. The syntax for sequence-related operations is as follows:

sequence(<id>).[next|current|reset]

  • next - casts next value of sequence and returns it
  • current - returns current value of sequence
  • reset - resets sequence to its starting value

Standard return value type is int, if needed, it may be specified that either long or string should be returned.

sequence(<id>,[int|long|string]).[next|current|reset]

Example:

int i; for(i=0;i<10;++i) print_err(sequence(test).next);
int value; value=sequence(myseq).current;

Lookup tables

There is possibility of using records from graph lookup tables. Function lookup(lookupTableId,key) returns record from lookupTableId lookup table stored under the key key. When lookup table contains duplicated records you can use lookup_found returning number of records stored after the last introduced key and lookup_next returning next record stored after the last used key.

  • lookup(<id>,<key_part1>[,key_part2, …]).<field_name> - performs lookup of record from lookup table id based on specified key, returns value of field_name if found or NULL
  • lookup_next(<id>).<field_name> - returns next value of field if such exists or NULL; lookup() must be called first
  • lookup_found(<id>) - returns #of records found by previous lookup() call
  • lookup_admin(<id>,[init|free]) - allows to initialize lookup table of free/destroy it

Example:

lookup_admin(LKP, init);
print_err(lookup(LKP,'HELLO').Age); 
print_err(lookup_found(LKP));
print_err(lookup_next(LKP).Age);
lookup_admin(KPP, init);
print_err(lookup(KPP,'a','b','c').Id);

Mapping

Mapping can be done in the end of the program or in a second part of function body. In general it has form:

  recordField:=Expression [: Expression1 [:Expression2 .....]..... ]

where recordField can be accessed in the way described above. Interpreter tries to assign Expression to recordField, if it does not finish successfully, it tries to assign Expression1 and so on.

Last mapping must be followed by semicolon.

Examples:

example description
$Name:=result; assigns value of “result” variable to “Name” field of 0th output record
$0.Age:=$Age; assigns “Age” field's value of 0th input record to “Age” field of 0th output record
$out.City:=concat("My City ",$City); assigns string, which is concatenation of “My City “ and “City” field's value of 0th input record, to “City” field of output record called “out”
$out1.Value:= $in.Value; assigns value of “Value” field of input record called “in” to “Value” field of output record called “out1”

Comments

Comments are used for entering a remarks into a program's source. The contents of comment are ignored by the compiler. It describes or explains fragments of code to readers. In Transform Language there are two kinds of comments:

  • multi-line comment: /* - starts the comment,*/ -ends the comment. Everything between these marks is ignored
  • one-line comment: // - starts the comment. Everything after this mark till the end of line is ignored

Dictionary values

CTL language supports interaction with dictionary object related to the current graph. Only string-type dictionary values are supported (i.e. complex Java values cannot be accessed by CTL). There are three internal functions to manipulate the dictionary contents. Arguments of these functions can be any expressions returning a string value.

  • read_dict(name): reads content of dictionary value stored under specified name
  • write_dict(name,value): overwrites dictionary value stored under specified name
  • delete_dict(name): removes value stored under specified name from the dictionary
transf_lang_syntax.txt · Last modified: 2010/03/23 11:14 by avackova
Back to top
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0