Updated at: August 19, 2017
kettle is a open source ETL tool which coded by Java, (support Windows & Unix/Linux)
Currently kettle was acquired by Pentaho and named as PDI(pentaho data intergration), there are both community(free) and enterprise version.
To install kettle, http://wiki.pentaho.com/display/EAI/01.+Installing+Kettle
data-integration
directory hierarchy looks like:
I have highlighted 4 main components of kettle in the screenshot above, there are two executable files (.bat / .sh, run on Windows / Unix-like) for each one.
After you run ./spoon.sh
,
the first thing you should do, create a repository, kettle currently support 3 kinds of repository:
My suggestion is database repo, you can find details for any kind of repository here.
After you create a repo, connect in Spoon by admin/admin (default password: admin).
To open a transformation/job from repo, ‘Tools’ -> ‘Repository’ -> ‘Explore…’ ( Ctrl + E )
# This example runs a transformation from file on a windows platform:
pan.bat /file:"D:\Transformations\Customer Dimension.ktr" /level:Basic
# This example runs a transformation from file on a Linux box:
pan.sh -file="/PRD/Customer Dimension.ktr" -level=Minimal
# This example runs a transformation from the repository on a windows platform: (Enter on a single line without returns...)
pan.bat /rep:"Production Repository"
/trans:"update Customer Dimension"
/dir:/Dimensions/
/user:matt
/pass:somepassword123
/level:Basic
# This example runs a job from file on a windows platform:
kitchen.bat /file:D:\Jobs\updateWarehouse.kjb /level:Basic
# This example runs a job from file on a Linux box:
kitchen.sh -file=/PRD/updateWarehouse.kjb -level=Minimal
# This example runs a job from the repository on a windows platform:
kitchen.bat
/rep:"Production Repository"
/job:"Update dimensions"
/dir:/Dimensions
/user:matt
/pass:somepassword123
/level:Basic
enable carte on your remote host:
carte.sh 127.0.0.1 8081
add new slave server in Spoon:
run options: select the slave sever that you create
view execution result @
Go in deeper, Execute Scheduled Jobs on a Remote Carte Server
Use MySQL as an example, if you try to add a MySQL database connection without a mysql jdbc driver, you will meet this kind of error message:
Error connecting to database [Local MySQL DB] : org.pentaho.di.core.exception.KettleDatabaseException: Error occured while trying to connect to the database
Exception while loading class org.gjt.mm.mysql.Driver
Solution:
kettle will create several tables in your specified database for a database repository,
just copy $HOME/.kettle/repositories.xml
to different host’s same location
Notice: you may need to change the ip in
<server></server>
.
PLS-00222: no function with name ‘PROC_XXX’ exists in this scope
The mistake i had done was i had left the Result name populated thus the component presumed the call was for a function. Once i had removed the value from this field the procedure worked.
e.g. insert into an Oracle table you need to configure the oracle connection like this:
currently, I am planing to create a ETL automation tool with kettle, feel free to star this repo on github.