To simplify things, CloverETL uses JDBC to work (talk to) with databases. If your database of heart has JDBC driver, CloverETL can be used to unload data stored within database table or can populate database table with internal data.
Before any attempt to connect to database can be made, the way of connecting to the database has to be described. For this purpose, DBConnection must be specified first. Withing graph definition, it can be done following way:
<DBConnection id="InterbaseDB" dbConfig="Interbase.cfg"/>
| Parameter name | Description of parameter | Example of parameter's value |
|---|---|---|
dbDriver | Specifies name of class containing JDBC driver for your database. This class must be visible to Java (i.e. be part of CLASSPATH) | org.postgresql.Driver |
dbURL | URL for connecting to database - the name of JDBC driver to use, IP address where the server listens, name of database instance, port, etc. | jdbc:postgresql://192.168.1.100/mydb |
user | Username under which to connect to database | admin |
password | Password to be used | free |
driverLibrary | Optional parameter. Where to look for JDBC driver class. | c:\Oracle\product\10.1.0\Client_1\jdbc\lib\ojdbc14.jar |
| ..other specific parameter.. | Optional parameters specific for your JDBC driver | Oracle example: defaultRowPrefetch=10 |
Sample listing of Postgres.cfg file with definition of connection to PostgreSQL database:
dbDriver=org.postgresql.Driver dbURL=jdbc:postgresql://192.168.1.100/mydb user=david password=unknown
All parameters can be also directly specified when defining connection:
<DBConnection id="InterbaseDB" dbDriver="org.postgresql.Driver" dbURL="jdbc:postgresql://192.168.1.100/mydb" user="david" password="unknown" />
If you use the dbConfig parameter, it has the precedence and all the connection parameters will be sought in specified properties file !
When working with database through JDBC drivers, CloverETL needs to map its internal data types onto JDBC data types. The variety of DB (JDBC) field types is huge but most of them (with exception of BLOBs) can be mapped onto Clover internal types without loosing any information.
Following table lists JDBC data types and corresponding CloverETL data types. The conversion is done automatically by CloverETL when analyzing DB tables using org.jetel.database.AnalyzeDB utility. This conversion can also be made manually using presented table.
| JDBC (DB) data type | CloverETL data type |
|---|---|
| INTEGER, SMALLINT, TINYINT | INTEGER |
| BIGINT | LONG |
| DECIMAL, DOUBLE, FLOAT, NUMERIC, REAL | NUMERIC |
| CHAR, LONGVARCHAR, VARCHAR, OTHER | STRING |
| DATE, TIME, TIMESTAMP | DATE |
| BOOLEAN, BIT | STRING (true value coded as “T” false value coded as “F”) |
Following example illustrates the conversion. First, the DDL (Oracle DB) definition of database table is presented and then Clover's version of the same using its internal datatypes.
CREATE TABLE MYEMPLOYEE ( EMP_NO NUMBER NOT NULL, FIRST_NAME VARCHAR2(15) NOT NULL, LAST_NAME VARCHAR2(20) NOT NULL, PHONE_EXT VARCHAR2(4), HIRE_DATE DATE NOT NULL, DEPT_NO CHAR(3) NOT NULL, JOB_CODE VARCHAR2(5) NOT NULL, JOB_GRADE NUMBER(4,2) NOT NULL, JOB_COUNTRY VARCHAR2(15) NOT NULL, SALARY NUMBER(15,2) NOT NULL, FULL_NAME VARCHAR2(35) );
<?xml version="1.0" encoding="UTF-8"?> <Record name="EMPLOYEE" type="delimited"> <Field name="EMP_NO" type="numeric" delimiter="," format="#"/> <Field name="FIRST_NAME" type="string" delimiter="," /> <Field name="LAST_NAME" type="string" delimiter="," /> <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," /> <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" /> <Field name="DEPT_NO" type="string" delimiter="," /> <Field name="JOB_CODE" type="string" delimiter="," /> <Field name="JOB_GRADE" type="numeric" delimiter="," /> <Field name="JOB_COUNTRY" type="string" delimiter="," /> <Field name="SALARY" type="numeric" delimiter="," /> <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" /> </Record>
The reverse conversion from CloverETL to JDBC data type (usually done when populating target DB table) is again driven by JDBC datatypes. There are some exceptions caused by non existence of certain field types on CloverETL's side. These exceptions are handled automatically by CloverETL. Internally it is done by calling different than standard JDBC methods for populating DB fields with values. See following table for explanation. See source code (org.jetel.database.CopySQLData) to get complete insight.
| JDCB type | CloverETL type | Conversion performed |
|---|---|---|
Timestamp | Date | Date is converted to Timestamp and the target is set using setTimestamp() method |
Boolean, Bit | String | If string contains “T” or “t” the target is set to be True, otherwise False using setBoolean() |
Decimal, Double,Numeric, Real | Integer | Conversion from Integer to Decimal is made, the target is set using setDouble() method |
Other (includes | String | The target is set using setString() method |
CloverETL package contains simple utility which can analyse source or target database table and produce Clover's metadata description file. This metadata can be later on used by any DB related component.
Running AnalyzeDB utility is simple, use command like this:
“java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB”
AnalyzeDB needs several parameters to be specified. At least it must know how to connect to database and which DB table to analyze. For specifying database connection, the same DBConnection parameter file can be used (see text above).
For specifying which table to analyze, SQL query must be specified which is executed against DB and the returned result set is examined for field types. This way, only portion of table can be extracted/analyzed.
See following table for complete list of options/parameters:
| Parameter | Meaning |
|---|---|
-dbDriver | JDBC driver to use |
-dbURL | Database name (URL) |
-config | Config/Property file containing parameters |
-user | User name |
-password | User's password |
-d | Delimiter to use (standard is [,]) |
-o | Output file to use (standard is stdout) |
-f | Read SQL query from filename |
-q | SQL query on command line |
-info | Displays list of driver's properties |
Example of using AnalyzeDB to get field types of employee DB table:
java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select * from employees where 1=0"
Using such a command, all the data fields will be examined. When only some of the fields should be extracted, specify them in the SQL query:
java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select emp_no,full_name from employees where 1=0"
For unloading data from database table, use DBInputTable component. It requires DBConnection to be specified (dbConnection parameter) and SQL command (sqlQuery parameter), which will be executed against database specified by DBConnection.
Individual fields fetched from database are mappend onto Clover data record/fields (see JDBC to CloverETL table) - the structure of Clover record is determined by specified Clover metadata (metadata is assigned to Edge which connects DBInputTable with other components connected to DBInputTable).
Example of transformation graph which uses DBInputTable component:
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingDB"> <Global> <Metadata id="InMetadata" fileURL="metadata/employee.fmt"/> <DBConnection id="PosgressDB" dbConfig="Posgress.cfg"/> </Global> <Phase number="0"> <Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="PosgressDB" sqlQuery="select * from employee" /> <Node id="OUTPUT" type="DELIMITED_DATA_WRITER_NIO" append="false" fileURL="employees2.list.out"/> <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
SQL command (sqlQuery) can be more complicated than the example above suggests. You can use any valid SQL construct but make sure the metadata corresponds to number and types of returned data fields. See CloverETL examples for more variations of DBInputTable usages.
When there is a need to populate DB table with data comming from CloverETL transformation graph, the DBOutputTable component can be used to fulfill it. It is complement to DBInputTable. It maps CloverETL data records/individual fields onto target DB table fields. It can perform simple data conversions to successfully map CloverETL basic data types on to target DB variants - see CloverETL to JDBC table above.
Following example illustrates usage of DBOutputTable:
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingDB2"> <Global> <Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/> <DBConnection id="PosgressDB" dbConfig="posgress.cfg"/> </Global> <Phase number="0"> <Node id="INPUT" type="DELIMITED_DATA_READER_NIO" fileURL="employees.list.dat" /> <Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB" dbTable="myemployee" /> <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
Should you need to populate only certain fields of target DB table (when for instance one fields is automatically populated from DB sequence), dbFields parameter of DBOutputTable can be used:
<Node id="OUTPUT2" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB" dbTable="myemployee" dbFields="FIRST_NAME;LAST_NAME" />
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingDB3"> <Global> <Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/> <DBConnection id="PosgressDB" dbConfig="posgress.cfg"/> </Global> <Phase number="1"> <Node id="INPUT" type="DELIMITED_DATA_READER_NIO" fileURL="employees2.list.tmp" /> <Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="InterbaseDB" dbTable="myemployee" dbFields="FIRST_NAME;LAST_NAME" cloverFields="LAST_NAME;FIRST_NAME" /> <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
| Source field (CloverETL) | Target field (DB table) |
|---|---|
| LAST_NAME | FIRST_NAME |
| FIRST_NAME | LAST_NAME |
Sometimes you need to execute single (or multiple) commands against DB which does not require any input. For example create new table, add data partition, drop index or something totally different. For this, CloverETL offers DBExecute component which takes specified commands and executes them one by one against DB. You may define whether all commands form one transaction or whether the should be DB commit after each command.
Following is a simple example of DBExecute:
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingExecute"> <Global> <DBConnection id="InterbaseDB" dbConfig="interbase.cfg"/> </Global> <Phase number="0"> <Node id="DBEXEC" type="DB_EXECUTE" dbConnection="InterbaseDB" inTransaction="N"> <SQLCode> create table EMPLOYEE ( EMP_NO NUMBER not null, FIRST_NAME VARCHAR2(15) not null, LAST_NAME VARCHAR2(20) not null, PHONE_EXT VARCHAR2(4), HIRE_DATE DATE not null, DEPT_NO CHAR(3) not null, JOB_CODE VARCHAR2(5) not null, JOB_GRADE NUMBER(4,2) not null, JOB_COUNTRY VARCHAR2(15) not null, SALARY NUMBER(15,2) not null, FULL_NAME VARCHAR2(35) ); insert into employee values(2,'Robert','Nelson','250',28/12/1988,'600','VP',2.0,'USA',105900.0,'Nelson, Robert'); insert into employee values(4,'Bruce','Young','233',28/12/1988,'621','Eng',2.0,'USA',97500.0,'Young, Bruce'); insert into employee values(5,'Kim','Lambert','22',06/02/1989,'130','Eng',2.0,'USA',102750.0,'Lambert, Kim'); insert into employee values(8,'Leslie','Johnson','410',05/04/1989,'180','Mktg',3.0,'USA',64635.0,'Johnson, Leslie'); insert into employee values(9,'Phil','Forest','229',17/04/1989,'622','Mngr',3.0,'USA',75060.0,'Forest, Phil'); </SQLCode> </Node> </Phase> </Graph>