Readers are initial components of graph that reads data from input source. The source can be for example a file placed on local disk, ftp, ldap, jms or database tables, etc. Graph must contain at least one of these components or more.
Common attributes
| Attribute | Description | Exp. |
|---|---|---|
id | component identification | string |
type | component type. This attribute is automatically generated from gui. | string |
File readers
| Attribute | Description | Exp. |
|---|---|---|
charset | character encoding of the input file | see locales encoding |
dataPolicy | specifies how to handle misformatted or incorrect data. | Strict | Controlled | Lenient |
fileURL | path to the data input file. | ( [zip: | gzip: | tar:] [path/] filename ) | ( http[s]://[user:password@]server [/path] [/filename] ) | ( [s]ftp://user:password@server [/path] /filename ) | - | .. |
numRecords | specifies how many records/rows should be read from the source file. | number |
skipFirstLine | specifies whether first record/line should be skipped. If record delimiter is specified than skip one record else first line of flat file. | false | true |
skipRows | specifies how many records/rows should be skipped from the source file; good for handling files where first rows is a header not a real data. | number |
trim | specifies whether to trim strings before setting them to data fields. When not set, strings are trimmed depending on “trim” attribute of metadata. | false | true |
Database readers
| Attribute | Description | Exp. |
|---|---|---|
| sqlQuery | query to be sent to database | see wikipedia sql |
| fetchSize | how many records should be fetched from db at once. | number |
Reads data saved in Clover internal format and send the records to out ports.
Input ports: none
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | CLOVER_READER | |
fileURL | yes | path to the data file. It archive storing data, data indexes and metadata description or binary file with data saved in Clover internal format. | ||
indexFileURL | no | if index file is not in the same directory as data filr or has not expected name (fileURL.idx) | ||
skipRows | no | specifies how many records/rows should be skipped from the source file; good for handling files where first rows is a header not a real data. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. | ∞ | |
startRecord | no | index of first parsed record | 0 | |
finalRecord | no | index of final parsed record | ∞ |
Both startRecord and finalRecord attributes are deprecated and should not be used.
Example:
<Node id="CLOVER_READER0" type="CLOVER_READER" fileURL="zip:customers.clv.zip"/> <Node id="CLOVER_READER0" type="CLOVER_READER" fileURL="customers.clv" finalRecord="2" startRecord="1" />
Generates data according to pattern. Record fields can be filled by constants, random or sequence values, lookup tables, CTL functions.. User can use either enhanced generate/generateClass/generateURL generator or simple pattern-randomFields-sequenceFields generator.
Input ports: none
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | DATA_GENERATOR | |
generate | if no generateClass or generateURL | contains definition of code of the generator; the attribute does not resolve escape characters, this resolving will be in the newest 2.8.x version | 2.7 | |
generateClass | if no generate or generateURL | name of the class to be used for data generating | 2.7 | |
generateURL | if no generateClass or generate | contains path to the file with code of the generator. | 2.7 | |
charset | no | character encoding of the external generator file (generateURL). | ISO-8859-1 | 2.9 |
pattern | no | pattern for filling new record. It is string containing values for all fields, which will be not set by random or sequence values. Field's values in this string have to have format coherent with metadata (appropriate length or delimited by appropriate delimiter) | ||
randomFields | no | names of fields to be set by random values (optionaly with ranges) separated by semicolon. When there are not given random ranges (or one of them) there are used minimum possible values for given data field (eg. for LongDataField minimum is Long.MIN_VALUE and maximum Long.MAX_VALUE). Random strings are generated from chars 'a' till 'z'. For numeric fields random ranges are: min value (inclusive) and max value (exclusive), and for byte or string fields random ranges mean minimum and maximum length of field (if it is not fixed), eg. field1=random(0,51) - for numeric field random value from range (0,50], for string field - random string of length 0 till 51 chars, field2=random(10) - allowed only for string or byte field, means length of field. It is prescribed to use standard mapping syntax since 2.5 version: fields are preceded by $, mappings are separated by :;| {colon, semicolon, pipe} and assignment sign is :=, eg.: $field1:=random(0,51);$field2:=random(10) | ||
randomSeed | no | Sets the seed of this random number generator using a single long seed. | ||
sequenceFields | no | names of fields to be set by values from sequence (optionaly with sequence name: fieldName=sequenceName) separated by semicolon. It is prescribed to use standard mapping syntax since 2.5 version: fields are preceded by $, mappings are separated by :;| {colon, semicolon, pipe} and assignment sign is :=, eg.: $field1:=sequenceName | ||
recordsNumber | yes | number of records to generate |
Example:
<Node id="DATA_GENERATOR0" type="DATA_GENERATOR" recordsNumber="10" generate="//#TL int i; function generate() { i = 2; // a key $0.RandomName := random_string(0,5)+random_string(5,5); $0.RandomDate := random_date("2009.01.01","2009.12.31","yyyy.MM.dd"); $0.Random := random();$0.RandomInt := random()*100; $0.Composite := random_string(3,5)+" - " + round(random()*100); $0.Sequence := sequence(Sequence0).next; $0.LookupTableV1 := lookup(LookupTable0,i).field2; $0.LookupTableV2 := lookup(LookupTable0,i).field1; } function init() { lookup_admin(LookupTable0, init); } function finished() { } " > <Node id="DATA_GENERATOR0" type="DATA_GENERATOR"> <attr name="randomFields">$ShipAddress :=random(1,777);$EmployeeID:=random( 1,${EMPLOYEE_NUMBER});$Freight:=random(1,51);$ShippedDate:=random(20.10.2005,30.10.2005)</attr> <attr name="recordsNumber">10000</attr> <attr name="sequenceFields">OrderID</attr> <attr name="pattern">agata|20.10.2005|30.10.2005|1|test|Prague|EU|000000|CZ </attr> </Node>
Parses specified input data file and send the records to the first output port. Embeded parser covers both fixlen and delimited data format.
Logging port has to define following metadata structure:
| Field | Type | Description |
|---|---|---|
| 0 | integer | record number |
| 1 | integer | field number (number 1 means the first field == whose index is 0) |
| 2 | string | wrong record in a raw form |
| 3 | string | error message - detail information about this error |
The metadata for logging port:
<Metadata id="Metadata0"> <Record name="errorPort" type="delimited"> <Field delimiter=";" name="RecNumber" nullable="true" type="integer"/> <Field delimiter=";" name="FieldNumber" nullable="true" type="integer"/> <Field delimiter=";" name="RawRecord" nullable="true" type="string"/> <Field delimiter="\r\n" name="ErrorMessage" nullable="true" type="string"/> </Record> </Metadata>
Note: logging port is used only if controlled data policy is defined
Input ports:
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | DATA_READER | |
fileURL | yes | path to the data input file. | ||
charset | no | character encoding of the input file | ISO-8859-1 | |
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | Strict | |
skipLeadingBlanks | no | specifies whether to skip leading blanks before setting string to data fields. When not set, there is used value of “trim” attribute of metadata. | ||
skipTrailingBlanks | no | specifies whether to skip trailing blanks before setting string to data fields. When not set, there is used value of “trim” attribute of metadata. | 2.6 | |
trim | no | specifies whether to trim strings before setting them to data fields. When not set, strings are trimmed depending on “trim” attribute of metadata. Note: if this option is ON (true), then field composed of all blanks/spaces is transformed to null (zero length string). | ||
skipFirstLine | no | deprecated - replaces skipSourceRows. Specifies whether first record/line should be skipped. If record delimiter is specified than skip one record else first line of flat file. | false | |
skipRows | no | specifies how many records/rows should be skipped from the source file; good for handling files where first rows is a header not a real data. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. | ∞ | |
skipSourceRows | no | specifies how many records/rows should be skipped from every source file; good for handling files where first rows is a header not a real data. | 0 | 2.7 |
numSourceRecords | no | specifies how many records/rows should be read from every source. | ∞ | 2.7 |
maxErrorCount | no | count of tolerated error records in input file (applicable only for controlled data policy) | 0 | |
quotedStrings | no | field can be quoted by ' ' or ” ” | false | |
treatMultipleDelimitersAsOne | no | if this option is true, then multiple delimiters are recognise as one delimiter | false | |
incrementalFile | incrementalKey | property file used for incremental reading | ||
incrementalKey | incrementalFile | property name stored in property file carries last reading position | ||
verbose | no | provides more comprehensive error notification in exchange for worse performance (few percents) | true | 2.8 |
Example:
<Node id="InputFile" type="DATA_READER" fileURL="data.txt"/> <Node id="InputFile" type="DATA_READER" fileURL="zip:http://www.store.com/data.zip#data.txt" charset="ISO-8859-15" dataPolicy="Controlled" skipLeadingBlanks="false" trim="false" skipFirstLine="true" skipRows="1" numRecords="100" maxErrorCount="0" quotedStrings="false" treatMultipleDelimitersAsOne="false" />
This component reads data from DB. It first executes specified query on DB and then extracts all the rows returned.
SqlQuery and url are mutually exclusive. Url is the primary and if found the sqlQuery will not be used.
When connecting to MS SQL Server, it is convenient to use jTDS http://jtds.sourceforge.net driver. It is an open source 100% pure Java JDBC driver for Microsoft SQL Server and Sybase. Its speed is higher than that of Microsoft driver.
Input ports: none
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | DB_INPUT_TABLE | |
dbConnection | yes | id of the Database Connection object to be used to access the database | ||
sqlQuery | if no url | query to be sent to database. From ver. 2.4 the query can contain mapping between clover and database fields eg. query: select $field1:=dbField1, $field2:=dbField2 from mytableis interpreted as: select dbField1, dbField2 from mytableand output field field1 will be filled by value from dbField1 and field2 will be filled by value from dbField2. The query can be written without mapping also; then output fields will be fulfilled from the first in order data flows from database. For incremental reading clause where defining new records must be present (see incrementalKey, incrementalFile attributes), eg. query for incremental reading should look like: select $f1:=db1, $f2:=db2, … from myTable where dbX > #myKey1 and dbY ⇐#myKey2, where myKey1 and myKey2 must be defined in incrementalKey attribute. sqlQuery or url must be defined | ||
url | if no sqlQuery | url location of the query. The query will be loaded from file referenced by url. Syntax of the query must be as described above. | ||
fetchSize | no | how many records should be fetched from db at once. See JDBC's java.sql.Statement.setFetchSize() MIN_INT constant is implemented - is resolved to Integer. MIN_INT value (good for MySQL JDBC driver) | 20 | |
SQLCode | no | XML tag. This tag allows for embedding large SQL statement directly into graph. | ||
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | 'Strict' | |
incrementalFile | incrementalKey | url to file where key values are stored. Values have to be set by user for 1st reading, then are set to requested value (see sqlQuery, incrementalKey attributes) automatically, eg. myKey1=0 Dates, times and timestamps have be written in format defined in Defaults.DEFAULT_DATE_FORMAT, Defaults.DEFAULT_TIME_FORMAT, Defaults.DEFAULT_DATETIME_FORMAT | ||
incrementalKey | incrementalFile | defines on which db fields incremental values are defined and which record from result set will be stored (last, first, min or max). Key parts have to be separated by :;| {colon, semicolon, pipe}, eg.:myKey1=first(dbX);myKey2=min(dbY) (see sqlQuery attribute) |
Examples:
<Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="NorthwindDB" sqlQuery="select * from employee_z"/> <Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="NorthwindDB" url="c:/temp/test.sql"/> <Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="NorthwindDB" DataPolicy="Strict" fetchSize="1000"> <attr name="SQLCode"> select * from employee_z </attr> </Node> <Property id="GraphParameter0" name="param1" value="A%"/> <Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="NorthwindDB" DataPolicy="Strict" fetchSize="1000"> <attr name="SQLCode"> select * from employee_z where last_name = '${param1}' </attr> </Node> <Node dbConnection="DBConnection0" id="INPUT" sqlQuery="select $last_name:=last_name,$full_name:=full_name from employee" type="DB_INPUT_TABLE"/> Example for incremental reading: <Node dbConnection="DBConnection0" id="INPUT" incrementalFile="dbInc.txt" incrementalKey="key1=last(id);key2=max(last_update)" sqlQuery="select * from employee where id > #key1 or last_update>#key2" type="DB_INPUT_TABLE"/> Starting content of dbInc.txt: key1=0 key2=1999-12-31
Reads records from specified dBase data file and broadcasts the records to all connected out ports. This component needs metadata specified as fix-length - type=“fixed”. Also, first field in metadata must be String field with length 1 which is used as indicator of deleted records in DBF. Such metadata can be automatically generated by Clover's utility DBFAnalyzer. Its main class can be executed as 'java -cp “clover.engine.jar” org.jetel.database.dbf.DBFAnalyzer'
Note: DBFAnalyzer generates additional information from DBF file (dataOffset and recordSize), but these are not neccessary.
Input ports: * one optional input port defined/connected (port protocol see fileURL).
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | DBF_DATA_READER | |
fileURL | yes | path to the input files | ||
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | 'Strict' | |
charset | no | Which character set to use for decoding field's data. Default value is deduced from DBF table header. If it is specified as part of metadata at record level, then it is take from there. | ||
skipRows | no | specifies how many records/rows should be skipped from the source file. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. | ∞ | |
skipSourceRows | no | specifies how many records/rows should be skipped from every source file. | 0 | 2.7 |
numSourceRecords | no | specifies how many records/rows should be read from every source. | ∞ | 2.7 |
incrementalFile | incrementalKey | property file used for incremental reading | ||
incrementalKey | incrementalFile | property name stored in property file carries last reading position |
Example:
<Node id="InputFile" type="DBF_DATA_READER" fileURL="/tmp/customers.dbf"/> <Node id="InputFile" type="DBF_DATA_READER" fileURL="/tmp/customers.dbf" dataPolicy="Strict" charset="UTF-8" />
Receives JMS messages and transforms them to data records using user-specified transformation class (so-called processor). The processor implements a JmsMsg2DataRecord interface or inherits from a JmsMsg2DataRecordBase class. The processor may be specified either by class name or by inline Java code.
Default implementation of the processor org.jetel.component.jms.JmsMsg2DataRecordProperties is sufficient in most cases.
Body of the incoming message is stored in field which is specified by bodyField component attribute. Properties of the message are stored in fields with the same names (if they exist in output record metadata). It can process javax.jms.TextMessage as well as javax.jms.BytesMessage (since 2.8).
Input ports: none
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | JMS_READER | |
connection | yes | JMS connection ID | ||
processorClass | no | Name of processor class. Default value is applied only if attributes processorCode and processorURL are't specified | org.jetel.component.jms.JmsMsg2DataRecordProperties | |
processorCode | no | Inline Java code defining processor class. It's applied only if processorClass isn't specified | ||
processorURL | no | URL to file which contains java source of processor class. It's applied only if processorClass and processorCode aren't specified | ||
charset | no | Charset of processor code, if it's specified by processorURL attribute. | Default is taken from CloverETL engine defaults | |
selector | no | JMS selector specifying messages to be processed | ||
maxMsgCount | no | maximal number of messages to be processed. 0 means there's no constraint on count of messages. | 0 | |
timeout | no | maximal time (in milliseconds) to await a next message. 0 means forever. | 0 | |
bodyField | no | Name of field in output record metadata, which should be filled by body of incoming JMS message. This attribute is used by default processor implementation (JmsMsg2DataRecordProperties). If value of “bodyField” attribute is specified, there must be such field in metadata. If value isn't specified, processor tries to set field named “bodyField”, but it's silently ignored if such field doesn't exist in output record metadata. | bodyField (since 2.8 - older versions don't have any default) | |
msgCharset | no | Charset of messages content. It's used only for javax.jsm.BytesMessage. This attribute is used by default processor implementation (JmsMsg2DataRecordProperties). | Default is taken from CloverETL engine defaults | 2.8 |
Constraints of reading messages:
| Attribute maxMsgCount | Attribute timeout | Description |
|---|---|---|
| 0 | 0 | Node keeps waiting for new messages. Also Phase, which this node is embedded in, never stops. |
| greater then 0 | 0 | Node reads new messages until its count reaches maxMsgCount. It doesn't matter how long it lasts. |
| 0 | greater then 0 | Node reads new messages for specified amount of milliseconds. It doesn't matter how many messages it reads. |
| greater then 0 | greater then 0 | JmsReader stops when count of read messages reaches maxMsgCount or timeout occured. |
Example:
<Node id="JmsReader" type="JMS_READER" connection="dest" /> <Node id="JmsReader" type="JMS_READER" connection="dest" timeout="4000" maxMsgCount="0" </Node>
This class is intended to provide a mean to read information from an LDAP directory. Provides the logic to extract search result of an LDAP directory and transform them into Jetel Data Records. The metadata provided throuh output port/edge must precisely describe the structure of read object.
Results of the search must have the same objectClass.
Input ports: none
Output ports:
NOTE: only string and byte clover data fields are supported; string is compatible with most of ldap usually types; byte is necessary for example to userPassword ldap type reading
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | LDAP_READER | |
ldapUrl | yes | Ldap url of the directory, on the form “ldap://host:port/” | ||
base | yes | Base DN used for the LDAP search | ||
filter | yes | Filter used for the LDAP connection. | ||
scope | no | Scope of the search request, must be one of object, onelevel or subtree. | 'OBJECT' | |
user | no | The user DN to used when connecting to directory. | ||
password | no | The password to used when connecting to directory. | ||
multiValueSeparator | no | LDAP is possible to handle keys with multiple values. These values are delimited by this string/character. __none__ is special escape value to turn off this functionality, only first value is read. | | |
Example:
<Node id="INPUT1" type="LDAP_READER" ldapUrl="ldap://ldap.uninett.no:389/" base="ou=people,dc=uninett,dc=no" filter="uid=*" scope="SUBTREE"> </Node> <Node id="INPUT1" type="LDAP_READER" ldapUrl="ldap://foobar.com:389/" base="ou=people,dc=foo,dc=bar" filter="uid=*" scope="subtree" user="uid=Manager,dc=foo,dc=bar" password="manager_pass"> </Node>
since 2.7.0
This is an universal reader frame used to read flat files with heterogenous structure. Such files can contain a mix of both fixed-length and delimited data records along with other non-record data.
Input ports:
Output ports:
The logic itself which parses the file into records is out of scope of this reader and is delegated to user implementation of a MultiLevelSelector (“selector” in further reading) interface which is the key part of a working multi level reader. There is no default mode of operation since the underlying files can have virtually any structure. Selectors are plugged into the reader via selectorClass or selectorCode properties. There will be a set of built-in implementations of common file formats, like COBOL Copybook, etc.
MultiLevelReader uses the selector to identify data of potentionally various types (different metadata), then parses the particular record and sends it to one of the connected output ports - exactly the one with corresponding metadata attached. It works in a character-based loop - at first it allows the selector to “take a look ahead” at data at (or after) current position and find and decide, which type (metadata) the next record will be. Then it uses a standard DataParser to parse the record from the file using the metadata proposed by the selector. Finally it sends the record to the corresponding output port. Then loop runs until end of file is reached or no further records can be identified.
Each particular implementation of MultiLevelSelector must be implemented with all possible formats and caveats of the files it is supposed to operate on in mind. Without properly working selector the whole component is likely to fail its job. Selectors often work as state machines driven by input characters.
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | MULTI_LEVEL_READER | |
fileURL | yes | path to the input files | ||
charset | no | character encoding of the input file | ISO-8859-1 | |
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing and 'Lenient' attempts to skip incorrect data and continue. | 'Strict' | |
skipRows | no | specifies how many records from the beginning will be skipped | ||
numRecords | no | specifies how many records/rows should be read from the source. | ||
skipSourceRows | no | specifies how many records from the beginning of each source will be skipped | ||
numSourceRecords | no | specifies how many records/rows should be read from each source | ||
selectorCode | no | Inline Java code for class implementing the MultiLevelSelector interface. | ||
selectorURL | no | URL of Java class implementing the MultiLevelSelector interface. | ||
selectorClass | no | Full name of java class implementing the MultiLevelSelector interface. Must be loaded in classpath of running jvm | PrefixMultiLevelSelector | |
selectorProperties | no | Properties (key-value pairs) for the particular selector, if any applicable |
Note:
Default selector is PrefixMultiLevelSelector specified as default value for selectorClass property. If you need to specify your own custom selector, you can use one of the three attributes: selectorCode, selectorURL or selectorClass. You must specify only one at a time.
MultiLevelSelector - under the hood
Here is a breif overview of the MultiLevelSelector interface and how it should behave. It is a basic interface, yet very powerful.
Methods of MultiLevelSelector
| Method | Description |
|---|---|
void init(DataRecordMetadata[], Properties) | An init method with the pool of available metadata on output ports. We will be selecting metadata from this set so each selector must store them. |
int choose(CharBuffer) | Main method which looks into CharBuffer and reads until it can decide, where the next record begins and what type is it. It returns index to “metadata pool” (see init(DataRecordMetadata[]) method above |
int nextRecordOffset() | This is always called in relation to previous call to choose(). It must report the number of characters to skip before the start of the next identified record. |
int lookAheadCharacters() | This method can report how many characters the selector will need to identify next record. This has rather a statistical meaning and doesn't need to return anything (0 or negative number) |
void reset() | Resets the internal state of the selector (if there is any) |
An implementation often works on the principle of a parser state machine - it reads one character after another and advances its state until it comes to conclusion. Each time a new record is to be identified, the reset() method is called. Multiple calls to choose() without reset() are possible in case of buffer underruns.
PrefixMultiLevelSelector
Default implementation of MultiLevelSelector determines records by their prefixes (in character-wise sense). Any number of prefixes can be specified and each prefix defines the output port to send the record to. The prefix-to-portnumber table can be specified in selectorProperties attribute.
Example of a simple mixed flat file:
1,a,b,c,10,20,30 2,a,apple,30,20 2,a,orange,34,56 2,b,carrot,129 3,1,2,3,4,5,6,7
In the previous example the records and their types are determined by the first two fields. PrefixMultiLevelSelector can be used to parse this file with the following selectorProperties table:
| Key | Value |
|---|---|
| 1 | 0 |
| 2,a | 1 |
| 2,b | 2 |
| 3 | 3 |
Where left side are strings (keys) and values are numbers of output ports.
More advanced example of a mixed flat file:
This file cannot be parsed using the default PrefixMultiLevelSelector. A custom selector is needed but quite easy to implement.
# This is an example file for MultiLevelReader.
# This is a flat file that contains mix of delimited and fixed-length
# data records along with comments and blank spaces.
#
# An example implementation of MultiLevelSelector interface is responsible for all the logic here
# e.g. all comments and blank spaces are ignored by this implementation
# In this example, data types are determined by first character on a line
# next line is fixed-length data
H1953JOHN
# these lines are mixed types delimited data
1,a,b,c,10,20,30
2,a,b,30,20
/* another form of comment */
# next two lines are again fixed-length data
CMARY SMITH 1992F
CJANE SMITH 1990F # note that previous newline is technically an error in fixed-length data
CJACOB SMITH 1993MCPETER SMITH 1996M # but newlines, as these comments, are skipped by the selector
# yet more delimited data
2,x,z,34,56
2,z,y,129,345
3,1,2,3,4,5,6,7
/*
* This is a multiline
* comment
/* Even nested comments can be allowed */
But must be nested properly.
*/
# rest of data follow right after this indented comment
2,john,smith,3,1954
F2008SMITH
Output from example above
The advanced example above has total of 5 output ports with following data on each of them:
Port 0
1,a,b,c,10,20,30
Port 1
2,a,b,30,20 2,x,z,34,56 2,z,y,129,345 2,john,smith,3,1954
Port 2
3,1,2,3,4,5,6,7
Port 3
H1953JOHN F2008SMITH
Port 4
CMARY SMITH 1992FCJANE SMITH 1990FCJACOB SMITH 1993MCPETER SMITH 1996M
since 2.8.1
Parses specified input data file and send the records to the output port. Embeded parser now covers just delimited data format (fixlen data will be supported in a future release).
The goal of this component is very similar to Universal Data Reader - read the CSV files.
The reason why this component was developed was to maximalize the reading performence.
The improvement was reached on few levels. First of all the reading of file is parallelized by
set of reading threads. Input file is divided into set of chunks and each reading thread parses
just records from this part of an input file. This algorithm simply exploits very fast hard drives,
which are now commonly available. Number of readers is dedicated by component parameter
levelOfParallelism. Next performance improvement was reached by using of simplistic data parser.
This parser is as simple as posible - limited validation, error handling, functionality - but very fast.
List of limitation:
Input ports:
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | PARALLEL_READER | |
fileURL | yes | path to the data input file, the given URL is limited to real files on a local harddrive - no other protocols are supported | ||
charset | no | character encoding of the input file | ISO-8859-1 | |
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | Strict | |
levelOfParallelism | no | number of parallel running workers | 2 | |
quotedStrings | no | field can be quoted by ' ' or ” ” | false | |
segmentReading | no | in case your graph is running in clover server environment, the parallel reader can process only appropriate part of file; whole data file is devided to segments by clover server and each cluster worker processes only one proper part of file | false |
Example:
<Node id="PARALLEL_READER0" type="PARALLEL_READER" fileURL="${DATAIN_DIR}/data.txt" levelOfParallelism="3" quotedStrings="true"/>
Xml Extract Component is a component, which parse XML datafile to different output(s). This component have only one input and 1..n output(s).
Description:
As intput is necessitated .xml file or some text file with xml structure. The elemets and their children elements will be parsed by following actions:
Ouputs are depending on mapping definition. Only one nested level of elements is possible to be inserted in one output port. If element includes some nested elements, then it`s necessary to create new output port for this element and his children. If his child includes another nested children elements, it`s same.
For example, if you have a file with this structure:
<?xml version="1.0" encoding="ISO-8859-1"?> <BOOK> <ID>11</ID> <NAME>Western</NAME> <AUTHOR>John Wayne</AUTHOR> <CHAPTER> <CHNAME>In desert</CHNAME> <SECTION> <PARA>paragraph1</PARA> </SECTION> <SECTION> <PARA>paragraph2</PARA> </SECTION> </CHAPTER> <CHAPTER> <CHNAME>Back in the pub</CHNAME> <SECTION> <PARA>paragraph3</PARA> </SECTION> </CHAPTER> </BOOK>
For this file, you must have 3 outputs ports (and data writers too). First for element BOOK and his elements without children(ID,NAME,AUTHOR), second for element CHAPTER (with CHNAME) and last for element SECTION (with element PARA). The level of nesting in this document (root element BOOK) is three.
Mapping used in the Xml Extract Component is :
<Mapping element="BOOK" outPort="0"> <Mapping element="CHAPTER" outPort="1" parentKey="ID" generatedKey="ID"> <Mapping element="SECTION" outPort="2" parentKey="CHNAME" generatedKey="CHNAME"/> </Mapping> </Mapping>
Output 1 : 11;Western;Jahn Wayne
Output 2: 11;In desert -- "11" is a parent element indentificator
11;Back in the pub
Output 3: In desert;paragraph1 --"In desert" is a parent element indentificator
In desert;paragraph2
Back in the pub;paragraph3 --"Back in the pub" is a parent element indentificator
<!ELEMENT Mappings (Mapping*)> <!ELEMENT Mapping (Mapping*)> <!ATTLIST Mapping element NMTOKEN #REQUIRED //name of binded XML element outPort NMTOKEN #IMPLIED //name of output port for this mapped XML element parentKey NMTOKEN #IMPLIED //field name of parent record, which is copied into field of the current record //passed in generatedKey atrribute generatedKey NMTOKEN #IMPLIED //see parentKey comment sequenceField NMTOKEN #IMPLIED //field name, which will be filled by value from sequence //(can be used to generate new key field for relative records) sequenceId NMTOKEN #IMPLIED //we can supply sequence id used to fill a field defined in a sequenceField attribute //(if this attribute is omited, non-persistent PrimitiveSequence will be used) xmlFields NMTOKEN #IMPLIED //comma separeted xml element names, which will be mapped on appropriate record fields //defined in cloverFields attribute cloverFields NMTOKEN #IMPLIED //see xmlFields comment skipRows NMTOKEN #IMPLIED //skips elements for a mapping numRecords NMTOKEN #IMPLIED //count of element that are processed for a mapping >
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | XML_EXTRACT | |
mappingURL | !mapping | file containing a mapping between xml elements or attributes and clover fields | 2.8 | |
mapping | !mappingURL | mapping between xml elements or attributes and clover fields | ||
sourceUri | yes | location of source XML data to process | ||
useNestedNodes | no | if nested unmapped XML elements will be used as data source; false if will be ignored | true | |
xmlFeatures | no | defines how to handle input xml file. See features | 2.7 | |
skipRows | no | specifies how many records/rows should be skipped from the source file. Good for handling files where first rows is a header not a real data. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. | ∞ |
Example:
<myXML> <phrase> <text>hello</text> <localization> <chinese>how allo yee dew ying</chinese> <german>wie gehts</german> </localization> </phrase> <locations> <location> <name>Stormwind</name> <description>Beautiful European architecture with a scenic canal system.</description> </location> <location> <name>Ironforge</name> <description>Economic capital of the region with a high population density.</description> </location> </locations> <someUselessElement>...</someUselessElement> <someOtherUselessElement/> <phrase> <text>bye</text> <localization> <chinese>she yee lai ta</chinese> <german>aufweidersehen</german> </localization> </phrase> </myXML> Suppose we want to pull out "phrase" as one datarecord, "localization" as another datarecord, and "location" as the final datarecord and ignore the useless elements. First we define the metadata for the records. Then create the following mapping in the graph: <node id="myId" type="com.lrn.etl.job.component.XMLExtract"> <attr name="mapping"><![CDATA[ <Mapping element="phrase" outPort="0" sequenceField="id"> <Mapping element="localization" outPort="1" parentKey="id" generatedKey="parent_id"/> </Mapping> <Mapping element="location" outPort="2"/> ]]> </attr> </node> Port 0 will get the DataRecords: 1) id=1, text=hello 2) id=2, text=bye Port 1 will get: 1) parent_id=1, chinese=how allo yee dew ying, german=wie gehts 2) parent_id=2, chinese=she yee lai ta, german=aufwiedersehen Port 2 will get: 1) name=Stormwind, description=Beautiful European architecture with a scenic canal system. 2) name=Ironforge, description=Economic capital of the region with a high population density. i.e.2. <x> <y>z</y> xValue </x> There will be no column x with value xValue. Issue: Namespaces are not considered. <ns1:x>xValue</ns1:x> <ns2:x>xValue2</ns2:x> Will be considered the same x.
Technical information:
The component is based on SAX technology and uses common jre SAX parser that loads nodes from xml during processing.
Performance test for the component:
| xml size | memory allocation | working time |
|---|---|---|
| 100kB | 0.2MB | 1s |
| 1MB | 0.2MB | 1.6s |
| 10MB | 0.2MB | 3.7s |
HW: AMD Athlon™ 64 Processor 3200+, SW: Suse 10.2. x86_64 Architecture, test graph: Xml extract example
Parses xml input data file base on xpaths queries and broadcasts the records to specific connected output ports.
Description:
Each context element mentioned in context hierarchy in mapping attribute of this component iterates over all matched xml nodes (results of XPath query). A nested context element query is evaluated on each result of the parent context. A translation xml nodes to clover data records is provided by mapping elements of appropriate context. All mapping xpaths or nodeName, that are defined in mapping elements, bind results to clover fields. XML elements and clover fields with same names are mapped by this component automatically on each other. XPath attribute can mapped arbitrary node value by contrast to nodeName that can mapped only element from the query result. Mapping definition via nodeName is quicker, so it is better to use nodeName than xpath if it is possible.
Record from nested Context element could be connected via key fields with parent record produced by parent Mapping element (see parentKey and generatedKey attribute notes). In case that retrieved values are not suitable to compose unique key, extractor could fill one or more fields with values coming from sequence (see sequenceField and sequenceId attribute).
If read XML document contains definition of XML namespaces you have to specify attribute namespacePaths in mapping. See description of namespacePaths in DTD definition of mapping below. The child elements of clover mapping inherit the definition of namespacePaths attribute from parent element.
Mapping attribute contains mapping hierarchy in XML form. DTD of mapping:
<!ELEMENT Context (Context* | Mapping*)> <!ELEMENT Mapping> <!ELEMENT Context (Context* | Mapping*)> <!ATTLIST Context xpath NMTOKEN #REQUIRED //xpath query to the xml node outPort NMTOKEN #IMPLIED //name of output port for this mapped XML element parentKey NMTOKEN #IMPLIED //field name of parent record, which is copied into field of //the current record passed in generatedKey atrribute generatedKey NMTOKEN #IMPLIED //see parentKey comment sequenceField NMTOKEN #IMPLIED //field name, which will be filled by value from sequence //(can be used to generate new key field for relative records) sequenceId NMTOKEN #IMPLIED //we can supply sequence id used to fill a field //defined in a sequenceField attribute (if this attribute is omited, //non-persistent PrimitiveSequence will be used) namespacePaths NMTOKEN #IMPLIED //list of namespaces delimited by ';' used for a xpath attribute //example: namespacePaths='n1="http://www.w3.org/TR/html4/";n2="http://ops.com/"' //example for default namespace: namespacePaths='"http://www.w3.org/TR/html4/";n2="http://ops.com/"' <!ELEMENT Mapping> <!ATTLIST Mapping cloverField NMTOKEN #REQUIRED //name of metadata field xpath NMTOKEN #REQUIRED if no nodeName //xpath query to the xml value nodeName NMTOKEN #REQUIRED if no xpath //direct xml node from where is taken a text, it is guicker than xpath trim NMTOKEN #IMPLIED //trims leading and trailing space (it is true by default) namespacePaths NMTOKEN #IMPLIED //list of namespaces delimited by ';' used for a xpath attribute //example: namespacePaths='n1="http://www.w3.org/TR/html4/";n2="http://ops.com/"' //example for default namespace: namespacePaths='"http://www.w3.org/TR/html4/"' >
Input ports:
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | XML_XPATH_READER | |
fileURL | yes | location of source XML data to process | ||
mappingURL | !mapping | file containing a mapping between xml elements or attributes and clover fields | 2.8 | |
mapping | yes | mapping between xml elements or attributes and clover fields | ||
dataPolicy | no | specifies how to handle misformatted or incorrect data. lStrict' value aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | Strict | |
xmlFeatures | no | defines how to handle input xml file. See features | 2.7 | |
skipRows | no | specifies how many records/rows should be skipped from the source file. | 0 | |
numRecords | no | max number of parsed records | ∞ |
Example:
<myXML xmlns:n2="http://ops.com/" xmlns:n1="http://www.w3.org/TR/html4/"> <phrase> <text>hello</text> <localization aid="100"> <chinese>how allo yee dew ying</chinese> <german>wie gehts</german> </localization> </phrase> <locations> <n1:location> <name>Stormwind</name> <description>Beautiful European architecture with a scenic canal system.</description> </n1:location> <n2:location> <name>Ironforge</name> <description>Economic capital of the region with a high population density.</description> </n2:location> </locations> <someUselessElement>...</someUselessElement> <someOtherUselessElement/> <phrase> <text>bye</text> <localization aid="101"> <chinese>she yee lai ta</chinese> <german>aufweidersehen</german> </localization> </phrase> </myXML> Suppose we want to pull out "phrase" as one datarecord, "localization" as another datarecord, and "location" as the final datarecord and ignore the useless elements. First we define the metadata for the records. Then create the following mapping in the graph: <node id="myId" type="XML_XPATH_READER"> <attr name="mapping"><![CDATA[ <Context xpath="/myXML" > <Context xpath="phrase" outPort="0" sequenceField="id"> <Context xpath="localization" outPort="1" parentKey="id" generatedKey="parent_id"> <Mapping xpath="./@aid" cloverField="aid"/> </Context> </Context> <Context xpath="locations/n2:location" outPort="2" namespacePaths="n1='http://www.w3.org/TR/html4/';n2='http://ops.com/'"> </Context> ]]> </attr> </node> // alternative - mapping elements that can be omitted <node id="myId" type="XML_XPATH_READER"> <attr name="mapping"><![CDATA[ <Context xpath="/myXML" > <Context xpath="phrase" outPort="0" sequenceField="id"> <Mapping nodeName="text" cloverField="text"/> <Context xpath="localization" outPort="1" parentKey="id" generatedKey="parent_id"> <Mapping nodeName="chinese" cloverField="chinese"/> <Mapping nodeName="german" cloverField="german"/> <Mapping xpath="./@aid" cloverField="aid"/> </Context> </Context> <Context xpath="locations/n2:location" outPort="2" namespacePaths="n1='http://www.w3.org/TR/html4/';n2='http://ops.com/'"> <Mapping xpath="name/text()" cloverField="name"/> <Mapping xpath="description" cloverField="description"/> </Context> </Context> ]]> </attr> </node> Port 0 will get the DataRecords: 1) id=1, text=hello 2) id=2, text=bye Port 1 will get: 1) parent_id=1, chinese=how allo yee dew ying, german=wie gehts, aid=100 2) parent_id=2, chinese=she yee lai ta, german=aufwiedersehen, aid=101 Port 2 will get: 1) name=Ironforge, description=Economic capital of the region with a high population density.
| xml size | memory allocation | working time |
|---|---|---|
| 100kB | 0.45MB | 1.5s |
| 1MB | 2.8MB | 2.9s |
| 10MB | 24.6MB | 6.5s |
HW: AMD Athlon™ 64 Processor 3200+, SW: Suse 10.2. x86_64 Architecture, test graph: Xml xpath reader example
The same data and output like XMLExtract performance test.
Parses data from xls file and send the records to output ports. JExcel can handle with files up to ~8.1MB in xls file - ~4.9MB in flat file - for more data set more memory for jvm.
Input ports: * one optional input port defined/connected (port protocol see fileURL).
Output ports:
Xml attributes:
| Attribute | Mandatory | Description | Default | ETL Version since |
|---|---|---|---|---|
id | yes | component identification | ||
type | yes | component type | XLS_READER | |
parser | no | The type of a XLS(X) parser. Possible values: 'auto' for automatic selection of a parser based on a file extension, 'XLS' for a classic XLS parser, 'XLSX' for a XLSX parser. | auto | |
fileURL | yes | path to the input file | ||
dataPolicy | no | specifies how to handle misformatted or incorrect data. 'Strict' aborts processing, 'Controlled' logs the entire record while processing continues, and 'Lenient' attempts to set incorrect data to default values while processing continues. | 'Strict' | |
maxErrorCount | no | count of tolerated error records in input file (applicable only for controlled data policy) | 0 | |
sheetName | no | name of sheet for reading data. Can be used with wild cards as '?' and '*' | ||
sheetNumber | no | number of sheet for reading data (starting from 0). Can be set as mask: {number; minNumber-maxNumber; *-maxNumber; minNumber-*; or as their combination separated by comma, eg. 1,3,5-7,9-*}. This attribute has higher priority then sheetName. One of theese atributes has to be set. | ||
metadataRow | no | number of row where are names of columns | 0 | |
fieldMap | no | Pairs of clover fields and xls columns (cloverField=xlsColumn) separated by :;| {colon, semicolon, pipe}. Can be used for mapping clover fields and xls fields or for defining order of reading columns from xls sheet. Xls columns can be written as names given in row specified by metadataRow attribute or as column's codes preceded * by $. Xls fields may be missing, then columns are read in order they are in xls sheet and are given to proper metadata fields. It is prescribed to use standard mapping syntax since 2.5 version: clover fields are preceded by $, xls cell codes by # mappings are separated by :;| {colon, semicolon, pipe} and assignment sign is :=, eg.: $Freight:=FREIGHT or $Freight:=#C | when metadataRow>0 default mapping is by column name (if metadataRow>0 and fieldMap is not set, clover fields whose names differ from names of xls columns will be empty or will be filled by 0); when metadataRow=0 default mapping is by column index (if metadataRow=0 and fieldMap is not set and clover fields on the same index are of different data type than xls columns on the same index, graph will fail). | |
charset | no | character encoding of the input file. Don't set it, if XSLReader uses POI library (it recognizes encoding automatically). When XLSReader uses JExcelAPI. | ISO-8859-1 | |
skipRows | no | specifies how many records/rows should be skipped from the source file; good for handling files where first rows is a header not a real data. It also depends on the metadataRow number. | 0 | |
numRecords | no | specifies how many records/rows should be read from the source. It also depends on the metadataRow number. | ∞ | |
skipSourceRows | no | specifies how many records/rows should be skipped from every source file; good for handling files where first rows is a header not a real data. | 0 | 2.8 |
numSourceRecords | no | specifies how many records/rows should be read from every source. | ∞ | 2.8 |
startRow | no | index of first parsed record | 0 | |
finalRow | no | index of final parsed record | ∞ | |
incrementalFile | incrementalKey | property file used for incremental reading | ||
incrementalKey | incrementalFile | property name stored in property file carries last reading position |
Both startRow and finalRow are deprecated and should not be used.
Example:
<Node id="XLS_READER1" type="XLS_READER" fileURL="ORDERS.xls"/> <Node id="XLS_READER1" type="XLS_READER" fieldMap="ORDER=ORDERID,N,20,5;CUSTOMERID=CUSTOMERID,C,5; EMPLOYEEID=EMPLOYEEID,N,20,5;ORDERDATE=ORDERDATE,D;REQUIREDDA=REQUIREDDA, D;SHIPCOUNTR=SHIPCOUNTR,C,15" fileURL="ORDERS.xls" metadataRow="1" startRow="2" </Node> <Node id="XLS_READER1" type="XLS_READER" fieldMap="ORDER=$a;CUSTOMERID=$b;EMPLOYEEID=$c;ORDERDATE=$d; REQUIREDDA=$d;SHIPPEDDAT=$f;SHIPVIA=$g;FREIGHT=$h;SHIPNAME=$i;SHIPADDRES=$j; SHIPCITY=$k;SHIPREGION=$l;SHIPPOSTAL=$n;SHIPCOUNTR=$m" fileURL="ORDERS.xls" metadataRow="1" </Node> <Node id="XLS_READER1" type="XLS_READER" fieldMap="ORDER;CUSTOMERID;EMPLOYEEID;ORDERDATE;SHIPCOUNTR" fileURL="*.xls" sheetNumber="*" </Node> <Node id="XLS_READER0" type="XLS_READER" dataPolicy="strict" fileURL="example.xls" metadataRow="1" startRow="2" sheetName="Sheet?" </Node> <Node id="XLS_READER1" type="XLS_READER" fileURL="${DATAIN_DIR}/other/O*.xls" metadataRow="1" sheetNumber="*" fieldMap="$OrderDate:=ORDERDATE,D;$EmployeeID:=EMPLOYEEID,N,20,5;$Freight:=FREIGHT,N,20,5;$ShipCountry:=SHIPCOUNTR,C,15;" </Node> <Node id="XLS_READER1" type="XLS_READER" fileURL="${DATAIN_DIR}/other/O*.xls" metadataRow="1" sheetNumber="*" fieldMap="$OrderDate:=#D;$ShipAddress:=#J;$ShipPostalCode:=#M;$ShipName:=#I;$CustomerID:=#B;$ShipCity:=#K;" </Node>
- specifies what should happen if a BadDataFormatException is thrown. This can happen if i.e.:
There are three different data policies defined:
| Value | Description |
|---|---|
| Strict | any BadDataFormatException aborts processing of graph. This is default value for specific readers. |
| Controlled | every BadDataFormatException is only logged for entire record while processing continues for next record. |
| Lenient | every BadDataFormatException is skipped while processing continues for next record. |
| Value | Description |
|---|---|
| /path/filename.txt | path to the data local input file. |
| /path/filename1.txt;/path/filename2.txt | path to two data local input files. |
| /path/* | path to the data local input files. Component reads all files in directory. |
| /path/file00?.txt | path to the data local input files. Component reads all wildcard matched files . |
| /path/file.txt;/path/file2.txt | path to the data local input files. Component reads all delimited files . |
| zip:/path/filename.zip | path to the data zip input file. Component reads first file in zip file. |
| zip:/path/filename.zip#name.txt | path to the data zip input file. Component reads one file marked after '#'. |
| gzip:/path/filename.gz | path to the data gzip input file. |
| ftp://user:password@server/path/name.txt | ftp address to the data input file. |
| ftp://user:password@server/path/name*.txt | ftp address to the data input file with a wild card. |
| sftp://user:password@server/path/name.txt | sftp address to the data input file. |
| sftp://user:password@server/path/name*.txt | sftp address to the data input file with a wild card. |
| http://server/path/name.txt | http address to the data input file. |
| https://server/path/name.txt | https address to the data input file. |
| zip:(http://server/path/name.zip)#filename.txt | path to the data zip input file via http. |
| zip:(ftp://user:password@server/path/name.zip)#filename.txt | path to the data zip input file via ftp. |
| zip:(zip:(http://server/path/name.zip)#filename.zip)#name.txt | path to the data zip inner input file via http. |
| zip:(/path/filename?.zip)#name.* | path to the data zip inner input file with wild cards. since 2.7 |
| zip:/path/filename?.zip#name.* | path to the data zip inner input file with wild cards. since 2.7 |
| gzip:(http://server/path/name.gz) | path to the data gzip input file via http. |
| gzip:(ftp://user:password@server/path/name.gz) | path to the data gzip input file via ftp. |
| tar:(path/name.tar)#path/filename.txt | path to the data tar input file. |
| tar:(gzip:path/name.tar.gz)#path/filename.txt | path to the data tar input file that is gziped. |
| tar:(ftp://user:password@server/path/name.tar)#filename.txt | path to the data tar input file via ftp. |
| tar:((gzip:/path/name?.gz)#filename?.tar)#name.??? | path to the data tar/gzip inner input file with wild cards. since 2.7 |
| port:$0.fieldName:source | each data record field from input port represents an URL to be load in & parsed. *1) |
| port:$0.fieldName:discrete | each data record field from input port represents one particular data source. *1) |
| port:$0.fieldName:stream | all data fields from input port are concatenated (version 2.10 - until a field containing null value & represent one particular data source). *1) |
| dict:keyName:discrete | reads data from dictionary *2). |
| dict:keyName:source | reads data from dictionary such as discrete type but expects an input URL/file. The data from this input passes to the reader. *2). |
| - | stdin(console) is the data input file. |
Proxy specification for a URL in the fileURL attribute. The URL can have three proxy protocols:
| Value | Description |
|---|---|
| http:(direct:)//seznam.cz/ | no proxy used |
| http:(proxy://user:password@212.93.193.82:443)//seznam.cz/ | proxy for http protocol |
| ftp:(proxy://user:password@proxyserver:1234)//seznam.cz/ | proxy for ftp protocol |
| sftp:(proxy://66.11.122.193:443)//user:password@server/path/file.dat | proxy for sftp protocol |
| bytes/records per file | skip/num records | charset | zip | gzip | tar *1) | ftp | sftp | http | https | stdin | autofilling | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CloverDataReader | - | yes | - | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| DataGenerator | - | - | - | - | - | - | - | - | - | - | - | yes |
| DataReader | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| DBFDataReader | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| DBInputTable | - | - | - | - | - | - | - | - | - | - | - | yes |
| DelimitedDataReader | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| FixLenDataReader | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| JmsReader | - | - | - | - | - | - | - | - | - | - | - | yes |
| LdapReader | - | - | - | - | - | - | - | - | - | - | - | yes |
| LookupTableReaderWriter | - | - | - | - | - | - | - | - | - | - | - | no |
| MultiLevelReader | - | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| XLSReader | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| XMLExtract | yes | yes | - | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| XMLXPathReader | yes | yes | - | yes | yes | yes | yes | yes | yes | yes | yes | yes |