This component is deprecated. Use DataReader instead.
Parses specified input data file and broadcasts the records to all connected out ports.
Input ports: * one optional input port defined/connected (port protocol see fileURL).
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | DELIMITED_DATA_READER | |
fileURL | yes | path to the input files | ||
charset | no | character encoding of the input file | ISO-8859-1 | |
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | 'Strict' | |
skipFirstLine | no | specifies whether first record/line should be skipped. If record delimiter is specified than skip one record else first line of flat file. | false | |
skipRows | no | specifies how many records/rows should be skipped from the source file. Good for handling files where first rows is a header not a real data. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. | ∞ | |
skipLeadingBlanks | no | specifies whether to skip leading blanks before setting string to data fields. When not set, there is used value of “trim” attribute of metadata. | 2.6 | |
skipTrailingBlanks | no | specifies whether to skip trailing blanks before setting string to data fields. When not set, there is used value of “trim” attribute of metadata. | 2.6 | |
trim | no | specifies whether to trim strings before setting them to data fields. When not set, strings are trimmed depending on “trim” attribute of metadata. Note: if this option is ON (true), then field composed of all blanks/spaces is transformed to null (zero length string). | ||
incrementalFile | incrementalKey | property file used for incremental reading | ||
incrementalKey | incrementalFile | property name stored in property file carries last reading position |
Note:
This reader can't read multiple character delimiters if an initial part of the delimiter is contained in data before this delimiter. Ie: Delimiter is 'delim'. Data are 'xxxd!xxxdelim'. Please use DataReader instead DelimitedDataReader while the problem will be resolved.
Example:
<Node id="InputFile" type="DELIMITED_DATA_READER" fileURL="/tmp/mydata.dat" /> <Node id="InputFile" type="DELIMITED_DATA_READER" fileURL="/tmp/mydata.dat" charset="ISO-8859-15" dataPolicy="Strict" skipFirstLine="true" skipRows="10" numRecords="100" trim="true" />
This component is deprecated. Use DataReader instead.
Parses specified fixed-length record, input data file and broadcasts the records to all connected out ports. The length of record is fixed either in bytes or in chars. Byte mode is somewhat stricter than char mode. It doesn't support record delimiters, incomplete records and skipping of leading/trailing blanks in fields.
If The Byte mode is true
Caution:
If you use FixLenDataReader for reading textfiles, consider the presence of the line terminators. Line terminator is one byte long in Unix-like systems and Macintosh and two bytes long under Windows. If, for example, each line contains one record, then the last field of the record must include also the line terminator, otherwise your data cannot be read correctly.
Input ports: * one optional input port defined/connected (port protocol see fileURL).
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | FIXLEN_DATA_READER | |
fileURL | yes | path to the input files | ||
charset | no | character encoding of the input file | ISO-8859-1 | |
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | 'Strict' | |
byteMode | no | specifies parsing mode. true for byte mode, false for char mode. Byte mode reads data into byte buffer and then decodes into clover field and char mode reads date into char buffer and then decodes into clover field as well. Using of byte mode is relevant for clover 'byte' and 'cbyte' (compressed array) of bytes field. The difference is showed in examples folder. | false | |
skipLeadingBlanks | no | values true/false. Sets on/off skipping of leading blanks in fields. It doesn't have any effect in byte mode. | true | |
skipTrailingBlanks | no | values true/false. Sets on/off skipping of trailing blanks in fields. It doesn't have any effect in byte mode. | true | |
trim | no | specifies whether to trim strings before setting them to data fields. When not set, strings are trimmed depending on “trim” attribute of metadata. | ||
skipEmpty | no | values true/false. Specifies whether empty records are to be ignored. It doesn't have any effect in byte mode. | true | |
skipFirstLine | no | specifies whether first record/line should be skipped. If record delimiter is specified than skip one record else first line of flat file. | false | |
skipRows | no | specifies how many records/rows should be skipped from the source file. Good for handling files where first rows is a header not a real data. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. | ∞ | |
enableIncomplete | no | Sets on/off support for incomplete records, this attribute is permanently true for char mode and false for byte mode. | ||
incrementalFile | incrementalKey | property file used for incremental reading | ||
incrementalKey | incrementalFile | property name stored in property file carries last reading position |
Example:
<Node id="InputFile" type="FIXED_DATA_READER" fileURL="/tmp/mydata.dat" /> <Node id="InputFile" type="FIXED_DATA_READER" fileURL="/tmp/mydata.dat" charset="UTF-8" dataPolicy="Controlled" byteMode="false" skipLeadingBlanks="true" skipTrailingBlanks="true" skipEmpty="false" enableIncomplete="false" />
This component is deprecated. Use DataWriter instead.
All records from input port [0] are formatted with delimiter and written to specified file. Delimiters are taken from metadata specified for port[0] data flow.
Input ports:
Output ports:
* one optional output port defined/connected (port protocol see fileURL).
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | DELIMITED_DATA_WRITER | |
fileURL | yes | path to the output file | ||
append | no | whether to append data at the end if output file exists or replace it (true/false) | false | |
charset | no | character encoding of the output file (if not specified, then ISO-8859-1 is used) | ISO-8859-1 | |
outputFieldNames | no | print names of individual fields into output file - as a first row (true/false) | false | |
bytesPerFile | no | max size of output files. To avoid splitting a record to two files, max size could be slightly overreached. | ∞ | |
recordsPerFile | no | max number of records in one output file | ∞ | |
recordSkip | no | specifies how many records/rows should be skipped before of writting the first record to the output file. | 0 | |
recordCount | no | specifies how many records/rows should be written to the output file. | ∞ | |
partitionKey | no | sequence of field names from incoming data flow (separated by semicolon). Data records with different partitionKey values are saved to different output files. The names of these different output files differ from each other by additional numbers or distinguishing names depending on the value of partitionFileTag attribute. Example of partitionKey: FirstName;SecondName. | ||
partition | if and only if partitionOutFields is set | name of the lookup table that will serve to group output data records and provide distinguishing names for output file names. | ||
partitionFileTag | no | possible values: numberFileTag or keyNameFileTag. If set to numberFileTag, output file names accept additional numbers appended to the base file name specified in the fileURL attribute (“out”, in the example mentioned above). If set to keyNameFileTag, output file names accept distinguishing names appended to the base file name specified in the fileURL attribute (the mentioned “out”). In case of the partitionKey mentioned above, all “John Smiths” are saved to one file, all “Peter Browns” to another. The names can be either “path/out27” and “path/out34”, for example, (partitionFileTag is set to numberFileTag), respectively, or they can be “path/outJohnSmith” and “path/outPeterBrown” (partitionFileTag is set to keyNameFiletag), respectively. | numberFileTag | |
partitionOutFields | if and only if partition is set | sequence of field names from the lookup table specified in the partition attribute (separated by semicolon). If user wants to group data records and/or give the output files different names than those specified in the partitionKey attribute, she needs to use a lookup table whose key values can be equal to the mentioned partitionKey. All records whose partitionKey values are not equal to the lookup table key values are saved together to a single file named “unassignedout” (the “unassigned” word is added to the beginning of the base file name, which is “out”, in our example). The other records are distributed to different output files according to different values of partitionKey (which are equal to the values of the lookup table key) and the values of the specified partitionOutFields serve as distinguishing names. Thus, the names of the other output files (those assigned) are as follows: “path/outpartitionOutFieldsvalues”. The values of individual fields of the partitionOutFields attribute are converted to strings and concatenated automatically to create the distinguishing names. | ||
partitionUnassignedFileName | no | user defined name of new file. The file will be created if a lookup table is specified and there is at least one record for which doesn't exist a key in the lookup table. These type of records will be written to the file. If the partitionUnassignedFileName is empty, the records will be discarded. | 2.8 |
Example:
<Node id="Writer" type="DELIMITED_DATA_WRITER" fileURL="/tmp/transfor.out"/> <Node id="Writer" type="DELIMITED_DATA_WRITER" fileURL="/tmp/transfor.out" append="true" charset="UTF-8" outputFieldNames="true" bytesPerFile="1000000" recordsPerFile="10000" recordSkip="10" recordCount="100" />
This component is deprecated. Use DataWriter instead.
All records from input port [0] are formatted with sizes specified in metadata and written to specified file. Sizes are taken from metadata specified for port[0] data flow.
Input ports:
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | FIXLEN_DATA_WRITER | |
fileURL | yes | path to the output file | ||
append | no | whether to append data at the end if output file exists or replace it (true/false) | false | |
charset | no | character encoding of the output file (if not specified, then ISO-8859-1 is used) | ISO-8859-1 | |
outputFieldNames | no | print names of individual fields into output file - as a first row (true/false) | false | |
fieldFiller | no | allows specifying what character will be used for padding output fields. | ” ” (space) | |
recordFiller | no | allows specifying what character will be used for padding gaps between fields in output records. | = | |
leftAlign | no | specify if values should be align left or right. It is used when filler is defined. | true | |
bytesPerFile | no | max size of output files. To avoid splitting a record to two files, max size could be slightly overreached. | ∞ | |
recordsPerFile | no | max number of records in one output file | ∞ | |
recordSkip | no | specifies how many records/rows should be skipped before of writting the first record to the output file. | 0 | |
recordCount | no | specifies how many records/rows should be written to the output file. | ∞ | |
partitionKey | no | sequence of field names from incoming data flow (separated by semicolon). Data records with different partitionKey values are saved to different output files. The names of these different output files differ from each other by additional numbers or distinguishing names depending on the value of partitionFileTag attribute. Example of partitionKey: FirstName;SecondName. | ||
partition | if and only if partitionOutFields is set | name of the lookup table that will serve to group output data records and provide distinguishing names for output file names. | ||
partitionFileTag | no | possible values: numberFileTag or keyNameFileTag. If set to numberFileTag, output file names accept additional numbers appended to the base file name specified in the fileURL attribute (“out”, in the example mentioned above). If set to keyNameFileTag, output file names accept distinguishing names appended to the base file name specified in the fileURL attribute (the mentioned “out”). In case of the partitionKey mentioned above, all “John Smiths” are saved to one file, all “Peter Browns” to another. The names can be either “path/out27” and “path/out34”, for example, (partitionFileTag is set to numberFileTag), respectively, or they can be “path/outJohnSmith” and “path/outPeterBrown” (partitionFileTag is set to keyNameFiletag), respectively. | numberFileTag | |
partitionOutFields | if and only if partition is set | sequence of field names from the lookup table specified in the partition attribute (separated by semicolon). If user wants to group data records and/or give the output files different names than those specified in the partitionKey attribute, she needs to use a lookup table whose key values can be equal to the mentioned partitionKey. All records whose partitionKey values are not equal to the lookup table key values are saved together to a single file named “unassignedout” (the “unassigned” word is added to the beginning of the base file name, which is “out”, in our example). The other records are distributed to different output files according to different values of partitionKey (which are equal to the values of the lookup table key) and the values of the specified partitionOutFields serve as distinguishing names. Thus, the names of the other output files (those assigned) are as follows: “path/outpartitionOutFieldsvalues”. The values of individual fields of the partitionOutFields attribute are converted to strings and concatenated automatically to create the distinguishing names. | ||
partitionUnassignedFileName | no | user defined name of new file. The file will be created if a lookup table is specified and there is at least one record for which doesn't exist a key in the lookup table. These type of records will be written to the file. If the partitionUnassignedFileName is empty, the records will be discarded. | 2.8 |
Example:
<Node id="Writer" type="FIXLEN_DATA_WRITER" fileURL="/tmp/transfor.out"/> <Node id="Writer" type="FIXLEN_DATA_WRITER" fileURL="/tmp/transfor.out" append="true" charset="UTF-8" outputFieldNames="false" fieldFiller=" " recordFiller="." />
This component is deprecated. Use Filter (Ext) instead.
All records for which the filterExpression evaluates TRUE are copied from input port:0 onto output port:0. Rejected records are copied onto output port:1 (if it is connected).
It can filter on text, date, integer, numeric fields with comparison [>, <, ==, =<, >=, !=] Text fields can also be compared to a Java regexp. using ~ (tilda) character. A filter can be made of different parts separated by a ”;”. If one of the parts is verified, the record pases the filter (it's an OR combination of the parts, and AND can be achieved by several filters). Date format used for comparison is the same as defined for the field which is used for filtration.
Input ports: * 0 - input records
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | FILTER | |
filterExpression | yes | expression used for filtering records. See above. |
Example:
<Node id="FILTEREMPL1" type="FILTER" filterExpression="HireDate&lt;1993-12-31"/> <Node id="FILTEREMPL1" type="FILTER" filterExpression="Name~^A.*;Age>25"/>
This component is deprecated. Use Reformat instead.
This component creates key which is constructed as combination of chars from given data fields.It is possible to make some operation on original strings as removing blank space, removing diacritic, getting only alpha signs, etc.
Input ports: * one input port defined/connected.
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | KEY_GEN | |
keyExpression | yes | field names with the way of generating the key from them, separated by :;| {colon, semicolon, pipe}. |
keyExpression - String, which describes creating of key for each field has form: [from][-]how many[d|n|a|s|u|l]. When there is no “from” it means key is created from chars from the beginning of string or if there is ”-from” the end of string
Example:
<Node id="KEY_GEN0" type="KEY_GEN"> <attr name="keyExpression">lname 2d;fname -2ad</attr> </Node>
This component is deprecated. Use Sort (Ext) instead.
Sorts the incoming records based on specified key. The key is name (or combination of names) of field(s) from input record.
Input ports: * one input port defined/connected.
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | SORT | |
sortKey | yes | sequence of field names separated by :;| {colon, semicolon, pipe}. | ||
sortOrder | no | One of “Ascending|Descending” {the fist letter is sufficient}. | Ascending | |
useI18N | no | true/false perform sorting according to national rules - e.g. Czech or German handling of characters like “i”,”í”. Use it only if you are sorting data according to key which can contain accented characters or you want sorter to follow certain locale specific rules. | false | |
locale | no | locale to be used when sorting using I18N rules. | system default |
Example:
<Node id="SORT_CUSTOMER" type="SORT" sortKey="Name:Address"/> <Node id="SORT_CUSTOMER" type="SORT" sortKey="Name:Address" sortOrder="A" useI18N="true" locale="fr" />
This component is deprecated. Use HashJoin (Ext) instead.
Joins two records from two different input flows based on specified key. The flow on port 0 is the driver, the flow on port 1 is the slave. First, all records from slave flow are read and stored in hash table. Then for every record from driver flow, corresponding record from slave flow is looked up (if it exists)
Description:
Joins records on input ports. It expects driver stream at port [0] and slave streams at other input ports. Slave streams are expected to be small enough to store all the slave records in hashtable.
Each driver record is joined with corresponding slave records according to join keys specification.
There are three join modes available: inner, left outer, full outer.
In case you use outer mode, till 2.4 version be sure your transformation code is able to handle null input records. From 2.5 version it doesn't happen any more, because for lacking record there is used special null record - all getValue methods returns null instead of throwing exception. If you want to realise some action for such record you can compare it with NullRecord.NULL_RECORD (see example below). Hash join does not require input data to be sorted. But it spends some time at the beginning initializing hashtable of slave records. It is generally good idea to specify how many records are expected to be stored in each hashtable (there is one hashtable per each slave input), especially when you expect the number to be really great. It is better to specify slightly greater number to ensure that rehashing won't occure. For small record sets - up to 512 records, there is no need to specify the size.
Input ports: * 0 - driver record input
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | HASH_JOIN | |
joinKey | yes | join key specification in format driver_field1;driver_field2,… | ||
transformClass | if no transform | name of the class to be used for transforming joined data. If no class name is specified then it is expected that the transformation Java source code is embedded in XML | ||
transform | if no transformClass | contains definition of transformation | ||
leftOuterJoin | no | true/false - allow to create left outer join. | false | |
hashTableSize | no | how many records are expected (roughly) to be in hashtable. | 512 | |
slaveDuplicates | no | true/false - allow records on slave port with duplicate keys. False - multiple duplicate records are discarded - only the first one is used for join. | false | |
slaveOverrideKey | no | slave key specification in format driver_field1;driver_field2,… There is used if you want to assign some slave field name with different name from master field name. | joinKey values |
Example:
<Node id="JOIN" type="HASH_JOIN" joinKey="CustomerID" transformClass="org.jetel.test.reformatOrders"/> <Node id="JOIN" type="HASH_JOIN" joinKey="EmployeeID"> <attr name="transform">//#TL function transform(){ $OrderID := $OrderID; $CustomerID := $CustomerID; $EmployeeID := $EmployeeID; $EmployeeID2 := num2str($1.EmployeeID); $LastName := concat($1.LastName,":",$1.FirstName); } </attr> </Node> In-line transformation for 2.4 and lower ver.: <Node id="JOIN" type="HASH_JOIN" joinKey="EmployeeID;EmployeeType" leftOuterJoin="true"> <attr name="transform"> import org.jetel.component.DataRecordTransform; import org.jetel.data.*; public class reformatJoinTest extends DataRecordTransform{ int recNo = 0; public boolean transform(DataRecord[] source, DataRecord[] target){ target[0].getField(0).setValue(source[0].getField(0).getValue()); target[0].getField(1).setValue(source[0].getField(1).getValue()); target[0].getField(2).setValue(source[0].getField(2).getValue()); if (source[1]!=null){ target[0].getField(3).setValue(source[1].getField(0).getValue()); target[0].getField(4).setValue(source[1].getField(1).getValue()); }else{ System.out.println("Slave record not found for " + recNo + "th record."); } recNo++; return true; } } </attr> </Node> In-line transformation since 2.5 ver.: <Node id="JOIN" type="HASH_JOIN" joinKey="EmployeeID;EmployeeType" leftOuterJoin="true"> <attr name="transform"> import org.jetel.component.DataRecordTransform; import org.jetel.data.*; public class reformatJoinTest extends DataRecordTransform{ int recNo = 0; public boolean transform(DataRecord[] source, DataRecord[] target){ if (source[1] == NullRecord.NULL_RECORD) { System.out.println("Slave record not found for " + recNo + "th record."); } target[0].getField(0).setValue(source[0].getField(0).getValue()); target[0].getField(1).setValue(source[0].getField(1).getValue()); target[0].getField(2).setValue(source[0].getField(2).getValue()); target[0].getField(3).setValue(source[1].getField(0).getValue()); target[0].getField(4).setValue(source[1].getField(1).getValue()); recNo++; return true; } } </attr> </Node>
This component is deprecated. Use MergeJoin (Ext) instead.
Joins sorted records on input ports. It expects driver stream at port [0] and slave streams at other input ports. Each driver record is joined with corresponding slave records according to join keys specification. The method transform is every composed of driver and corresponding slaves. There are three join modes available: inner, left outer, full outer.
Inner mode processess only driver records for which all associated slaves are available.
Left outer mode furthermore processes driver records with missing slaves.
Full outer mode additionally calls transformation method for slaves without driver.
In case you use outer mode, till 2.4 version be sure your transformation code is able to handle null input records. From 2.5 version it doesn't happen any more, because for lacking record there is used special null record - all getValue methods returns null instead of throwing exception. If you want to realise some action for such record you can compare it with NullRecord.NULL_RECORD (see example below).
Input ports: * 0 - sorted driver record input.
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | MERGE_JOIN | |
joinKey | yes | join key specification in format driver_field1;driver_field2,… | ||
transformClass | if no transform | name of the class to be used for transforming joined data. If no class name is specified then it is expected that the transformation Java source code is embedded in XML | ||
transform | if no transformClass | contains definition of transformation | ||
fullOuterJoin | no | true/false - uses full outer join. | false | |
leftOuterJoin | no | true/false - uses left outer join. If fullOuterJoin and leftOuterJoin are false, inner join is used. | false | |
slaveOverrideKey | no | slave key specification in format driver_field1;driver_field2,… It is used if you want to assign some slave field name with different name from master field name. | joinKey values |
Example:
<Node id="JOIN" type="MERGE_JOIN" joinKey="CustomerID" transformClass="org.jetel.test.reformatOrders"/> <Node id="MERGE_JOIN0" joinKey="seqKey" slaveOverrideKey="seqKey" type="MERGE_JOIN"> <attr name="transform"> ${out.0.car} = ${in.0.car}; ${out.0.seqKey} = ${in.0.seqKey}; ${out.0.mobilephone} = ${in.0.mobilephone}; ${out.0.financial_bonus} = ((${in.1.monthly_bonus})*12)+(${in.1.yearly_bonus}); </attr> </Node> In-line transformation for 2.4 and lower ver.: <Node id="JOIN" type="MERGE_JOIN" joinKey="EmployeeID;EmployeeType" leftOuterJoin="true"> <attr name="transform"> import org.jetel.component.DataRecordTransform; import org.jetel.data.*; public class reformatJoinTest extends DataRecordTransform{ int recNo = 0; public boolean transform(DataRecord[] source, DataRecord[] target){ target[0].getField(0).setValue(source[0].getField(0).getValue()); target[0].getField(1).setValue(source[0].getField(1).getValue()); target[0].getField(2).setValue(source[0].getField(2).getValue()); if (source[1]!=null){ target[0].getField(3).setValue(source[1].getField(0).getValue()); target[0].getField(4).setValue(source[1].getField(1).getValue()); }else{ System.out.println("Slave record not found for " + recNo + "th record."); } recNo++; return true; } } </attr> </Node> In-line transformation since 2.5 ver.: <Node id="JOIN" type="MERGE_JOIN" joinKey="EmployeeID;EmployeeType" leftOuterJoin="true"> <attr name="transform"> import org.jetel.component.DataRecordTransform; import org.jetel.data.*; public class reformatJoinTest extends DataRecordTransform{ int recNo = 0; public boolean transform(DataRecord[] source, DataRecord[] target){ if (source[1] == NullRecord.NULL_RECORD) { System.out.println("Slave record not found for " + recNo + "th record."); } target[0].getField(0).setValue(source[0].getField(0).getValue()); target[0].getField(1).setValue(source[0].getField(1).getValue()); target[0].getField(2).setValue(source[0].getField(2).getValue()); target[0].getField(3).setValue(source[1].getField(0).getValue()); target[0].getField(4).setValue(source[1].getField(1).getValue()); recNo++; return true; } } </attr> </Node>