CloverETL how-to Load/Unload data from database

How CloverETL works with databases

To simplify things, CloverETL uses JDBC to work (talk to) with databases. If your database of heart has JDBC driver, CloverETL can be used to unload data stored within database table or can populate database table with internal data.

DBConnection

Before any attempt to connect to database can be made, the way of connecting to the database has to be described. For this purpose, DBConnection must be specified first. Withing graph definition, it can be done following way:

   <DBConnection id="InterbaseDB" dbConfig="Interbase.cfg"/>

It specifies that CloverETL should set up database connection called InterbaseDB. All required parameters (JDBC driver name, DB connect string, user name &amp; password) can be found in config file called Interbase.cfg

The content of dbConfig file is standard Java preferences file. It contains names of parameters with values for parameters. The possible parameters lists following table:

Parameter name Description of parameter Example of parameter's value
dbDriver Specifies name of class containing JDBC driver for your database. This class must be visible to Java (i.e. be part of CLASSPATH)
org.postgresql.Driver
dbURL URL for connecting to database - the name of JDBC driver to use, IP address where the server listens, name of database instance, port, etc. jdbc:postgresql://192.168.1.100/mydb
user Username under which to connect to database
admin
password Password to be used
free
driverLibrary Optional parameter. Where to look for JDBC driver class. c:\Oracle\product\10.1.0\Client_1\jdbc\lib\ojdbc14.jar
..other specific parameter.. Optional parameters specific for your JDBC driver Oracle example: defaultRowPrefetch=10


Sample listing of Postgres.cfg file with definition of connection to PostgreSQL database:

   dbDriver=org.postgresql.Driver
   dbURL=jdbc:postgresql://192.168.1.100/mydb
   user=david
   password=unknown


All parameters can be also directly specified when defining connection:

   <DBConnection id="InterbaseDB" 
                 dbDriver="org.postgresql.Driver"
                 dbURL="jdbc:postgresql://192.168.1.100/mydb"
                 user="david"
                 password="unknown"
   />


If you use the dbConfig parameter, it has the precedence and all the connection parameters will be sought in specified properties file !

Mapping JDBC data types onto Clover types

When working with database through JDBC drivers, CloverETL needs to map its internal data types onto JDBC data types. The variety of DB (JDBC) field types is huge but most of them (with exception of BLOBs) can be mapped onto Clover internal types without loosing any information.

JDBC to CloverETL

Following table lists JDBC data types and corresponding CloverETL data types. The conversion is done automatically by CloverETL when analyzing DB tables using org.jetel.database.AnalyzeDB utility. This conversion can also be made manually using presented table.

JDBC (DB) data type CloverETL data type
INTEGER, SMALLINT, TINYINT INTEGER
BIGINT LONG
DECIMAL, DOUBLE, FLOAT, NUMERIC, REAL NUMERIC
CHAR, LONGVARCHAR, VARCHAR, OTHER STRING
DATE, TIME, TIMESTAMP DATE
BOOLEAN, BIT STRING
(true value coded as “T”
false value coded as “F”)


Following example illustrates the conversion. First, the DDL (Oracle DB) definition of database table is presented and then Clover's version of the same using its internal datatypes.

   CREATE TABLE MYEMPLOYEE
      (
         EMP_NO      NUMBER NOT NULL,
         FIRST_NAME  VARCHAR2(15) NOT NULL,
         LAST_NAME   VARCHAR2(20) NOT NULL,
         PHONE_EXT   VARCHAR2(4),
         HIRE_DATE   DATE NOT NULL,
         DEPT_NO     CHAR(3) NOT NULL,
         JOB_CODE    VARCHAR2(5) NOT NULL,
         JOB_GRADE   NUMBER(4,2) NOT NULL,
         JOB_COUNTRY VARCHAR2(15) NOT NULL,
         SALARY      NUMBER(15,2) NOT NULL,
         FULL_NAME   VARCHAR2(35)
      );

   <?xml version="1.0" encoding="UTF-8"?>
   <Record name="EMPLOYEE" type="delimited">
      <Field name="EMP_NO" type="numeric" delimiter="," format="#"/>
      <Field name="FIRST_NAME" type="string" delimiter="," />
      <Field name="LAST_NAME" type="string" delimiter="," />
      <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," />
      <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" />
      <Field name="DEPT_NO" type="string" delimiter="," />
      <Field name="JOB_CODE" type="string" delimiter="," />
      <Field name="JOB_GRADE" type="numeric" delimiter="," />
      <Field name="JOB_COUNTRY" type="string" delimiter="," />
      <Field name="SALARY" type="numeric" delimiter="," />
      <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" />
  </Record>

CloverETL to JDBC

The reverse conversion from CloverETL to JDBC data type (usually done when populating target DB table) is again driven by JDBC datatypes. There are some exceptions caused by non existence of certain field types on CloverETL's side. These exceptions are handled automatically by CloverETL. Internally it is done by calling different than standard JDBC methods for populating DB fields with values. See following table for explanation. See source code (org.jetel.database.CopySQLData) to get complete insight.

JDCB type CloverETL type Conversion performed
Timestamp Date Date is converted to Timestamp and the target is set using setTimestamp() method
Boolean, Bit String If string contains “T” or “t” the target is set to be True, otherwise False using setBoolean()
Decimal, Double,
Numeric, Real
Integer Conversion from Integer to Decimal is made, the target is set using setDouble() method
Other (includes
NVARCHAR & NCHAR)
String The target is set using setString() method


Using AnalyzeDB utility

CloverETL package contains simple utility which can analyse source or target database table and produce Clover's metadata description file. This metadata can be later on used by any DB related component.

Running AnalyzeDB utility is simple, use command like this:
“java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB”

AnalyzeDB needs several parameters to be specified. At least it must know how to connect to database and which DB table to analyze. For specifying database connection, the same DBConnection parameter file can be used (see text above).
For specifying which table to analyze, SQL query must be specified which is executed against DB and the returned result set is examined for field types. This way, only portion of table can be extracted/analyzed.

See following table for complete list of options/parameters:

Parameter Meaning
-dbDriver JDBC driver to use
-dbURL Database name (URL)
-config Config/Property file containing parameters
-user User name
-password User's password
-d Delimiter to use (standard is [,])
-o Output file to use (standard is stdout)
-f Read SQL query from filename
-q SQL query on command line
-info Displays list of driver's properties


Example of using AnalyzeDB to get field types of employee DB table:

java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select * from employees where 1=0"

Using such a command, all the data fields will be examined. When only some of the fields should be extracted, specify them in the SQL query:

java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select emp_no,full_name from employees where 1=0"

Reading data from DB

DBInputTable component

For unloading data from database table, use DBInputTable component. It requires DBConnection to be specified (dbConnection parameter) and SQL command (sqlQuery parameter), which will be executed against database specified by DBConnection.
Individual fields fetched from database are mappend onto Clover data record/fields (see JDBC to CloverETL table) - the structure of Clover record is determined by specified Clover metadata (metadata is assigned to Edge which connects DBInputTable with other components connected to DBInputTable).

Example of transformation graph which uses DBInputTable component:

  <?xml version="1.0" encoding="UTF-8"?>
  <Graph name="TestingDB">
    <Global>
      <Metadata id="InMetadata" fileURL="metadata/employee.fmt"/>
      <DBConnection id="PosgressDB" dbConfig="Posgress.cfg"/>
    </Global>
    <Phase number="0">
      <Node id="INPUT" 
            type="DB_INPUT_TABLE"
            dbConnection="PosgressDB"
            sqlQuery="select * from employee"
      />
      <Node id="OUTPUT" type="DELIMITED_DATA_WRITER_NIO" append="false" fileURL="employees2.list.out"/>
      <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/>
    </Phase>
  </Graph>


SQL command (sqlQuery) can be more complicated than the example above suggests. You can use any valid SQL construct but make sure the metadata corresponds to number and types of returned data fields. See CloverETL examples for more variations of DBInputTable usages.

Pupulating DB with CloverETL data records

DBOutputTable component

When there is a need to populate DB table with data comming from CloverETL transformation graph, the DBOutputTable component can be used to fulfill it. It is complement to DBInputTable. It maps CloverETL data records/individual fields onto target DB table fields. It can perform simple data conversions to successfully map CloverETL basic data types on to target DB variants - see CloverETL to JDBC table above.

Following example illustrates usage of DBOutputTable:

  <?xml version="1.0" encoding="UTF-8"?>
  <Graph name="TestingDB2">
    <Global>
      <Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/>
      <DBConnection id="PosgressDB" dbConfig="posgress.cfg"/>
    </Global>
    <Phase number="0">
      <Node id="INPUT" type="DELIMITED_DATA_READER_NIO" fileURL="employees.list.dat" />
      <Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB" dbTable="myemployee" />
      <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/>
    </Phase>
  </Graph>


Should you need to populate only certain fields of target DB table (when for instance one fields is automatically populated from DB sequence), dbFields parameter of DBOutputTable can be used:

  <Node id="OUTPUT2"
        type="DB_OUTPUT_TABLE"
        dbConnection="PosgressDB"
        dbTable="myemployee"
        dbFields="FIRST_NAME;LAST_NAME"
  />

One more parameter of DBOutputTable can be used to precisely specify mapping from CloverETL data record to DB table record. It allows for specifying which source (Clover) field is mappend onto which target DB table field. The parameter name is cloverFields and contains list of source fileds (from source) record which should be considered for populating target DB table.
Coupled with dbFields, it specifies 1:1 mapping. Individual fields are mapped according to the order in which they apper in dbFields &amp; cloverFields respectively. The driving side which determines how many fields will be populated is always dbFields parameter. When there is no dbFields parameter present, CloverETL assumes that all target fields should be populated in the order in which they appear in the target DB table.
Following examples illustrates how to pick certain fields from source data record (CloverETL record) regardless their order and map them onto target DB table fields (again, regardless their order):

<?xml version="1.0" encoding="UTF-8"?>
  <Graph name="TestingDB3">
    <Global>
      <Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/>
      <DBConnection id="PosgressDB" dbConfig="posgress.cfg"/>
    </Global>
    <Phase number="1">
      <Node id="INPUT" type="DELIMITED_DATA_READER_NIO" fileURL="employees2.list.tmp" />
      <Node id="OUTPUT"
            type="DB_OUTPUT_TABLE"
            dbConnection="InterbaseDB"
            dbTable="myemployee"
            dbFields="FIRST_NAME;LAST_NAME"
            cloverFields="LAST_NAME;FIRST_NAME" 
      />
      <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/>
    </Phase>
  </Graph>

The resulting mapping between fields specified in example above is:

Source field (CloverETL) Target field (DB table)
LAST_NAME FIRST_NAME
FIRST_NAME LAST_NAME


Executing SQL/DML/DDL statements against DB

DBExecute component

Sometimes you need to execute single (or multiple) commands against DB which does not require any input. For example create new table, add data partition, drop index or something totally different. For this, CloverETL offers DBExecute component which takes specified commands and executes them one by one against DB. You may define whether all commands form one transaction or whether the should be DB commit after each command.
Following is a simple example of DBExecute:

  <?xml version="1.0" encoding="UTF-8"?>
  <Graph name="TestingExecute">
    <Global>
      <DBConnection id="InterbaseDB" dbConfig="interbase.cfg"/>
    </Global>
    <Phase number="0">
      <Node id="DBEXEC" type="DB_EXECUTE" dbConnection="InterbaseDB" inTransaction="N">
        <SQLCode>
          create table EMPLOYEE
          (
            EMP_NO      NUMBER not null,
            FIRST_NAME  VARCHAR2(15) not null,
            LAST_NAME   VARCHAR2(20) not null,
            PHONE_EXT   VARCHAR2(4),
            HIRE_DATE   DATE not null,
            DEPT_NO     CHAR(3) not null,
            JOB_CODE    VARCHAR2(5) not null,
            JOB_GRADE   NUMBER(4,2) not null,
            JOB_COUNTRY VARCHAR2(15) not null,
            SALARY      NUMBER(15,2) not null,
            FULL_NAME   VARCHAR2(35)
          );
          insert into employee 
            values(2,'Robert','Nelson','250',28/12/1988,'600','VP',2.0,'USA',105900.0,'Nelson, Robert');
          insert into employee 
            values(4,'Bruce','Young','233',28/12/1988,'621','Eng',2.0,'USA',97500.0,'Young, Bruce');
          insert into employee 
            values(5,'Kim','Lambert','22',06/02/1989,'130','Eng',2.0,'USA',102750.0,'Lambert, Kim');
          insert into employee 
            values(8,'Leslie','Johnson','410',05/04/1989,'180','Mktg',3.0,'USA',64635.0,'Johnson, Leslie');
          insert into employee 
            values(9,'Phil','Forest','229',17/04/1989,'622','Mngr',3.0,'USA',75060.0,'Forest, Phil');
        </SQLCode>
      </Node>
    </Phase>
  </Graph>

database_using_jdbc_how-to.txt · Last modified: 2009/09/16 12:27 (external edit)
Back to top
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0