Program in Transform Language consist of two parts. The first part is declarations of variables, function declarations and statements. They don't have to be in this order, but variables and functions are visible after their declaration. You can interspersed these parts. In the second part, there is a mapping, where you can assign values to output data records. After mapping there are not allowed declarations nor statements.
The syntax of Transform Language is described by BNF grammar. The BNF diagram of this grammar you can see here.
So schema of example programme could look like that:
ImportStatement; VariableDeclaration1; FunctionDeclaration1 Statement1 FunctionDeclaration2 FunctionDeclaration3 Statement2 VariableDeclaration3; Mapping1; Mapping2; Mapping3;
| Type name | Description | Default value | Declaration syntax |
|---|---|---|---|
int | simple type of width 32-bit with range from -2147483648 to 2147483647 (in relation to Java type: Integer.MIN_VALUE to Integer.MAX_VALUE). Integer.MIN_VALUE is interpreted as null | 0 | int identifier; |
long | simple type of width 64-bit with range from -9223372036854775807 to 9223372036854775807 ( in relation to Java type: Long.MIN_VALUE+1 to Long.MAX_VALUE). You can cast int number to long by adding l on the end of number e.g. long l; l=214748364l + 2147483647l | 0 | long identifier; |
decimal | type representing arbitrary-precision decimal numbers. Default values of DECIMAL_LENGTH and DECIMAL_SCALE are in file org.jetel.data.defaultProperties (cloveretl.engine) and are, by default, set to 8 and 2 respectively. If you declare variable as decimal type; it is not possible to assign to the variable any value with larger length or scale. You can cast float number/literal to decimal by adding d at the end of number e.g. decimal dec=2148.34d | decimal with default length and scale, with undefined value. | decimal [(length,scale)] identifier; |
number or double | type representing double numbers (64 bits to store a value). It has three special values: NaN, Infinity and -Infinity. | 0.0 | number identifier; |
string | type for storing sequences of characters. Sequence of characters has to be in single or double quotes. For characters that are impossible to enter directly, there are several escape sequences:Escape sequence | Description \\ddd | octal character (ddd) \\uxxxx | hexadecimal (UNICODE) character (xxxx) \\' | single quote \\” | double quote \\\\ | backslash \\r | carriage return \\n | new line (line feed) \\f | form feed \\t | tab \\b | backspace | empty string | string identifier; |
date | type for storing date and time in format yyyy-[M]M-[d]d [[H]H:[m]m:[s]s] , e.g.: 2001-01-02 2002-02-03 12:30:02 2003-03-04 2:15:7 | current date and time | date identifier; |
boolean | simple type with two values true and false. | false | boolean identifier; |
bytearray | array of bytes with unlimited length - behaves similarly to list | empty array | bytearray identifier; bytearray (size) identifier; |
list | container for storing list of elements of any type indexed by numbers. Example:list l;int i; for(i=0;i<20;i++) { l[]=i; print_err(l[i]; }
l[4]=99;
Assignment : list1=list2 means that both lists reference the same elements, assignment: list2[]=list1 causes coping of all list1 elements to the end of list2list2[]='abc' adds string 'abc' as the last element of list2list2[]=null removes last element of list2, trunc(list1) will delete all members from list1 | empty list | list identifier; |
map | container for storing elements of any type indexed by string-key. Example:string key='abc'; map m; m['f1']='hello'; m['f2']='xxx'; m[key]='123'; | empty map | map identifier; |
record | record (set of fields) with structure based on existing metadata - behaves similarly to map However, can also be indexed by numbers starting from 0. Examples:record(MetdataId) my_record; - declares record with structure defined by MetadataId | n/a | record(metadataID) identifier; record(@inputRecordName) identifier record(@inputRecordNumber) identifier |
You can see numeric type in text below, this is label for int, long , decimal, number and double types.
| Type | Description | Declaration syntax | Example |
|---|---|---|---|
| integer | digits representing integer number | [0-9]+ | 95623 |
| long integer | digits representing integer number with value greater than 232 | [0-9]+ | 956230781312312331287L |
| hexadecimal integer | digits and letters representing integer number in hexadecimal form | 0x[0-9A-F]+ | 0xA7B0 |
| octal integer | digits representing integer number in octal form | 0[0-7]* | 0644 |
| numeric | floating point number represented by 64bits in double precision format | [0-9]+.[0-9]+ | 456.123 |
| decimal | decimal number with fixed precision | [0-9]+.[0-9]+D | 123.456D |
| double quoted string | string value/literal enclosed in double quotes; escaped characters [\n,\r,\t, \\, \”] get translated into corresponding control chars | ”..anything except [“]..“ | “hello\tworld\n\r“ |
| single quoted string | string value/literal enclosed in single quotes; only one escaped character [\'] gets translated into corresponding char ['] | '..anything except [']..' | 'hello\tworld\n\r' |
| list of literals | List of literals where individual literals can be also other lists/maps/records | [ <any literal> (, <any literal>)* ] | [10, 'hello', “world”, 0x1A, 2008-01-01 ] [ [ 1 , 2 ] ] , [ 3 , 4 ] ] |
| date | Date value | mask: yyyy-MM-dd | 2008-01-01 |
| datetime | Datetime value | mask: yyyy-MM-dd HH:mm:ss | 2008-01-01 18:55:00 |
Variables must be declared before they can be used. Declaration form:
type identifier; type identifier = ..initializer..;
Each variable has to be declared separately. Initialization of variable can be done as separate expression or directly as a part of declaration statement. Variable is visible after its declaration. It can't be declared in a block.
Arithmetic operations are not symmetrical. Result of an operation depends on the type of left argument and the precision can't be greater than precision of left argument.
| Operator | Result |
|---|---|
| + | Addition operator can be used between variables of the same type except boolean and date types. Usage of addition operator between different numeric types results theirs sum (adding two int variables or int and long can cause over-flow if you don't use type casting) . Addition for strings results its concatenation. You can use addition with string and other types (not vice versa) resulting original string with attached string representation of the variable. Function concat() works faster then addition, so if it is possible it is better to use it instead of addition operator. For date variable it is possible to use addition with numeric type. The result is date with added number of whole days from numeric variable, e.g.: 2004-01-30 + 1.5 = 2004-01-31 |
| - | Subtraction (also unary minus). Subtraction operator can be used between numeric variables or for date with numeric. |
| * | Multiplication (can be used between numeric types only) |
| / | Division (can be used between numeric types only). Dividing by zero throws TransformLangExecutorRuntimeException: Interpreter runtime exception on line … column … : div - arithmetic exception - / by zero (all numeric types except number) or gives Infinity (number type). |
| % | Modulus (can be applied to floating-point types as well as integer types) |
| ++ | Increment operator increases its operand by one. When used in prefix mode value of the variable is incremented and then used (i=1;j=++i; gives j=2 and i=2), when used in postfix mode increment is made as last operation (i=1;j=i++; gives j=1 and i=2). It can be applied to floating-point types as well as integer types. |
| -- | Decrement operator decreases its operand by one. When used in prefix mode value of the variable is decremented and then used (i=1;j=–i; gives j=0 and i=0), when used in postfix mode increment is made as last operation (i=1;j=i–; gives j=1 and i=0). It can be applied to floating-point types as well as integer types. |
Examples:
| Declaration and initialization | Operation | Result |
|---|---|---|
| int i; i=10; number n1; n1=0.1; number n2; | n2=i+n1 n2=n1+i | 10 10.1 |
| decimal d1;d1=0.1d; decimal(10,4) d2; d2=0.001d; decimal(10,4) result; | result=d1+d2 result=d2+d1 | 0.1000 0.1010 |
| int i;i=10; number n1;n1=0.1; number result; | result=i/n1 result=n1/i | TransformLangExecutorRuntimeException: Interpreter runtime exception on line … column … : div - arithmetic exception - / by zero arg[0] CloverInteger “10” arg[1] CloverDouble “0.1” 0.01 |
| Operator | result |
|---|---|
| == .eq. | Equal to. Can be used for comparison of compatible variables of any type. |
| != <> .ne. | Not equal to. Can be used for comparison of compatible variables of any type. |
| > .gt. | Greater then. Can be used for comparison of compatible variables of numeric, date or string types. |
| < .lt. | Less then. Can be used for comparison of compatible variables of numeric, date or string types. |
| >= => .ge. | Greater then or equal to. Can be used for comparison of compatible variables of numeric, date or string types. |
| <= =< .le. | Less then or equal to. Can be used for comparison of compatible variables of numeric, date or string types. |
| ~= .regex. | Checks if given string matches against the regular expression. Examples: * s~="[^abc].*" means that any character (.) except (^) a or b or c ([abc]) may occur zero or more times * s.regex.'[a-zA-Z]{3}' means that characters a through z or A through Z, inclusive (range) occur exactly 3 times For complete rules see documentation of java.util.regex.Pattern |
| .in. | Is contained in list. Can be used for checking whether specified value is in list of other values (represented by list, map, list of literals). |
Operators in form .operator. should be led and trailed by space.
Note: result of comparison can depend on order of elements if they have different types eg.:
decimal var1=8.89; int var2=8; bool result1=var1 .eq. var2; bool result2=var2 .eq. var1;
result1 is false while result2 is true.
list listvar=['A','B','C','D','E']; bool_var= int_var1 .lt. $fieldName bool_var= "A" .in. listvar; bool_var= value .in. [1, 2, 3 ,10, 50, 60];
| operator | result |
|---|---|
| || or | logical OR |
| && and | logical AND |
| ! not | logical NOT |
| == .eq. | Equal to |
| != <> .ne. | Not equal to |
Operators in form .operator. should be led and trailed by space.
Example:
bool_var= int_var1 .eq. $fieldName
If statement is conditional branch statement. It can be used to route program execution through two different paths. If statement general form:
if (Expression) Statement else Statement1
Statement may be a single statement or a compound statement enclosed in curly braces. The Expression is any expression that returns a boolean value. The else clause is optional. It is possible to use ladder if statement:
if (Expression1) Statement1 else if (Expression2) Statement2 else if (Expression3) Statement3 else Statement-n
where it is executed from top down. As soon as one of the Expression is true, the Statement associated with its if is executed, and rest of the ladder is bypassed. If none of the condition is true, then the final else Statement is executed. This final else acts as a default condition. If there is no final else and all other conditions are false, then no action will take place.
Examples:
if (d==0.1) n=0; if (d==0.1 || l<=1) n=0; else {n=-1;print_err('n equal -1')} if (date1<=date2) { print_err('before if (i<j)'); if (i<j) print_err('date1<today and i<j') else print_err('date1<date2 only') result=true; }
Switch statement is multi-way branch statement. It dispatches program execution to different parts of code based on the value of an expression. Switch statement form:
switch (Expression) { case Expression1:Statement1 case Expression2:Statement2 default: Statement-n }
First Expression can be of any type and the type of the other Expressions has to be compatible with this type. It means that following example is correct and for n=2 will be executed Statement2:
int n; number m; m=2.0; switch (n) { case 1:Statement1 case m:Statement2 case 2:Statement3 default:Statement4 }
In case that case expressions have the same value, all statements will be executed (in example above for n=2 Statement3 as well as Statement2 will be executed).
The switch statement works like this: the value of the Expression is compared with each of the expressions in the case statements. If a match is found there is executed the statement from corresponding case statement and comparing continues. If none of the expressions matches the value of the Expression, then the default statement is executed. Default statement is optional, so if no Expression1 till Expression-n-1 matches and no default is present , then no further action is taken.
While loop repeats a statement while its controlling expression is true. While loop form:
while (Expression) Statement
The Expression can be any boolean expression. The Statement will be executed as long as the conditional expression is true. When Expression becomes false, control passes to the next line of code immediately following the loop. Since the while loop evaluates its conditional expression at the top of the loop, it can happen that Statement will not be executed at all.
Example:
while (born<now) { born=dateadd(born,1,year); while (yer<5) yer=yer+1; yer=yer+1; }
Do-while loop is similar to while loop, but the Statement is executed always at least once, even if the Expression is false from the beginning. Form of the do-while loop:
do Statement while (Expression)
Each iteration of the do-while loop first executes Statement and then evaluates the Expression. If this Expression is true, the loop will repeat, otherwise the loop terminates.
Example:
do { born=dateadd(born,1,year); do yer=yer+1; while (yer<5); yer=yer+1; } while (born<now)
For loop is usually used , when there is known number of iteration. For loop form:
for (initializationExpression;conditionExpression;iterationExpression) Statement
The for loop operates as follows: when the loop first starts initializationExpression is executed (it can be assignment or only name of variable controlling the loop). InitializationExpression is executed only once. Next conditionExpression is evaluated (it has to be boolean expression). It usually tests the loop control variable against the end's condition. If this expression is true Statement is executed else the loop terminates. Next the iterationExpression is executed. This is usually expression, which increments or decrements loop control variable. The loop then iterates, first evaluating conditionExpression, executing Statement, then iterationExpression and so on until the conditionExpression is false.
Examples:
for (born;born<now;born=dateadd(born,1,year)) ++yer; for (born;!b;++yer) if (yer==100) b=true;
For each loop is used for iteration over the list, map or record. All elements of the iterable must be of the same type. For each loop form:
foreach (variable : iterableVariable) Statement
The Statement is executed once for each element from the iterableVariable, i.e. depending of the type of iterable variable:
Examples:
list keyList = [1,4,6,14,16]; int i; foreach (i : keyList) { tmp = lookup(LookupTable0,i).value; my_function(tmp); }
string value; string newValue = ''; record(@0) rec = @0; foreach(value : rec){ newValue = newValue + value; }
foreach(value : my_map){ newValue = newValue + value; }
Break statement is used for force termination of loop. It consist of one word: break. When a break statement is encountered inside a loop, the loop is terminated and program control resumes at the next statement following the loop. When it is used in nested loop only the innermost loop is terminated. For example:
int year;year=0; int i; while (year<33) { year=year+1; for (i=0;i<20;i=i+1) if (i==10) break }
after the while loop we have year=34, i=10.
Continue statement breaks the current iteration of loop and control of program continues with next iteration or iterationExpression in for loop. Syntax: continue
Return statement is used to explicitly return from a function. It means that execution of program is transferred back to the place the function was called and to the caller is assigned function's return value. The return statement is the last statement executed in function. If in a function there is no return statement, execution of the functions ends on last statement and the return value of function is null. Return statement used in block causes skipping all statements after it till the end of block. General form of return statement:
return Expression
Expression is non-obligatory. Return statement without Expression returns null value.
Example:
function year_before(now) { return dateadd(now,-1,year) }
The try-catch statement is used for error handling. Syntax of this statement is following:
try Statement1 catch (stringVariable) Statement2
The program tries to execute Statement1. If the execution is successful, the program continues with the next statement after the whole try-catch statement. If an exception occurs, when trying to execute the Statement1, program doesn't fail, but executes Statement2, which can correct the previously unsuccessfully executed Statement1. The reason of the failing is saved to stringVariable.
Example:
string errmes; int i; . . . try { i = "some string"; //Statement 1 - unsuccessful as string cannot be assigned to integer } catch (errmes) { print_log(error, errmes); //Statement 2 - error message is written to logger i = 12345; //Statement 2 - unsuccessfully executed Statement1 is corrected with a new assignment //of integer to the integer variable }After executing the above code, the following message is visible in logger:
ERROR [DATA_GENERATOR0] - class org.jetel.interpreter.TransformLangExecutorRuntimeException#Interpreter runtime exception on line 27 column 13 : invalid assignment of "some string" [STRING] to variable "i" [INTEGER] " - incompatible data types
In Transform Language it is possible to create own functions. Function declaration in general has following form:
function functionName(arg1,arg2,...) { variableDeclarations; Statements Mapping [return [Expression]] }
Variable declarations and statements must be in this order. All variables must be declared before any statement. Mapping must follow the last statement and only one return statement can follow the mapping. Variables declared in function are local and are visible only inside function. Function can return a value; to do it use return statement in form return Expression. If there is no return statement used in function it returns null value and works as procedure.
Examples:
function year_before(now) { return dateadd(now,-1,year) } function mapName(){ string result; print_err('function'); result='result'; $Name:=result; }
Calling functions:
date yearBefore; yearBefore=year_before(2000-01-01); mapName;
In Transform Language it is also possible to create a function for your own error messages. Function declaration has the following form:
function getMessage() { //some code return expressionOrVariableWithYourOwnErrorMessage; }
In this function you need to define your own error message and return its text. You can return any expression or string variable whose value is the error message of your own.
It is possible to import code from other file to current program. Instruction
import "<fileName_URL>";is replaced by the code from specified external file. The
import statement has to be the first statement in the scrip !
Example:
import '/home/usr/cloveretl.engine/data/tlExample.ctl'; import 'http://myserver/coderepos/tlFunctions.ctl';
See appendix_1_-_attributes for <fileName_URL> options.
It is possible to evaluate (parse&execute) string which contains CTL expression (or piece of program/code).
There exist two variants of eval command:
Example:
string str='print_err("eval test OK")'; eval(str); print_err(eval_exp('"ahoj"'));
The main difference between eval() and eval_exp() is that the latter parses & executes the expression and then cleans up everything. This is good if you just need to evaluate mathematical expression or perform simple task. If you need to add new function definition to the existing script, then you can use eval() which adds the parsed expression to the script within which is the eval() executed.
Input records can be accessed with following syntax:
@inputRecordName or @inputRecordNumber
Then it is possible to manipulate with such record as with any other record, e.g.
record (metadataId) my_record = @0; my_value = @inputRecord[1]
The above syntax can serve as a metadata provider also, i.e. when you declare a new record, you can define it as follows:
record(@0) rec; rec[0] = "test";
You can refer to the individual fields in the data flows (records) entering and leaving a component. You can refer to the flows either by their sequential order numbers counting from 0 or by a metadata name. Fields in the flows are referenced by their name or by their sequential order number. Flow reference and a field reference shall be separated by a dot (”.”). Flow reference can be omitted (if the field is referenced by its name), in which case it defaults to zero.
If a field is referenced by its number, flow must also be referenced, either by its number or name.
References must be prefixed with $.
$[<flow number>|<flow name>.]<field name>
$<flow number>.<field number>|$<flow name>.<field number>
Examples:
| example | description |
|---|---|
| print_err($1.City) | prints out on the error output value of “City” field from 2nd (0-based) input flow |
| print_err($1.3) | prints out on the error output value of “City” field (4th field) from 2nd (0-based) input flow |
| datediff($Born, 2005-1-1,month) | computes month's difference between value of “Born” field from 0th input flow and 2005-1-1 |
| $results.Field5:=5 | assigns constant value 5 to field called “Field5” of the output flow described by metadata named “results” |
You can also use parameters in your CTL definition. If you have defined any parameter (MyParameter), you can use it in any transformation by typing '${MyParameter}'. It is better if you use only single quotes in the parameters that are used in CTL. Otherwise, some problem with escape sequences may arise when using double quotes.
You can also use all environment variables.
Transformation language allows using of sequences defined in graph. Sequences are accessed through their IDs. The syntax for sequence-related operations is as follows:
sequence(<id>).[next|current|reset]
next - casts next value of sequence and returns itcurrent - returns current value of sequencereset - resets sequence to its starting value
Standard return value type is int, if needed, it may be specified that either long or string should be returned.
sequence(<id>,[int|long|string]).[next|current|reset]
Example:
int i; for(i=0;i<10;++i) print_err(sequence(test).next); int value; value=sequence(myseq).current;
There is possibility of using records from graph lookup tables. Function lookup(lookupTableId,key) returns record from lookupTableId lookup table stored under the key key. When lookup table contains duplicated records you can use lookup_found returning number of records stored after the last introduced key and lookup_next returning next record stored after the last used key.
lookup(<id>,<key_part1>[,key_part2, …]).<field_name> - performs lookup of record from lookup table id based on specified key, returns value of field_name if found or NULLlookup_next(<id>).<field_name> - returns next value of field if such exists or NULL; lookup() must be called firstlookup_found(<id>) - returns #of records found by previous lookup() calllookup_admin(<id>,[init|free]) - allows to initialize lookup table of free/destroy itExample:
lookup_admin(LKP, init); print_err(lookup(LKP,'HELLO').Age); print_err(lookup_found(LKP)); print_err(lookup_next(LKP).Age); lookup_admin(KPP, init); print_err(lookup(KPP,'a','b','c').Id);
Mapping can be done in the end of the program or in a second part of function body. In general it has form:
recordField:=Expression [: Expression1 [:Expression2 .....]..... ]
where recordField can be accessed in the way described above. Interpreter tries to assign Expression to recordField, if it does not finish successfully, it tries to assign Expression1 and so on.
Last mapping must be followed by semicolon.
Examples:
| example | description |
|---|---|
| $Name:=result; | assigns value of “result” variable to “Name” field of 0th output record |
| $0.Age:=$Age; | assigns “Age” field's value of 0th input record to “Age” field of 0th output record |
| $out.City:=concat("My City ",$City); | assigns string, which is concatenation of “My City “ and “City” field's value of 0th input record, to “City” field of output record called “out” |
| $out1.Value:= $in.Value; | assigns value of “Value” field of input record called “in” to “Value” field of output record called “out1” |
Comments are used for entering a remarks into a program's source. The contents of comment are ignored by the compiler. It describes or explains fragments of code to readers. In Transform Language there are two kinds of comments:
CTL language supports interaction with dictionary object related to the current graph. Only string-type dictionary values are supported (i.e. complex Java values cannot be accessed by CTL). There are three internal functions to manipulate the dictionary contents. Arguments of these functions can be any expressions returning a string value.
read_dict(name): reads content of dictionary value stored under specified namewrite_dict(name,value): overwrites dictionary value stored under specified namedelete_dict(name): removes value stored under specified name from the dictionary