Open Source Decision Time for Pentaho BI

Posted by Seth Grimes
Monday, October 12, 2009
5:21 PM

Companies adapt their business models to changing business conditions and emerging opportunities. For BI software publisher Pentaho, the demise of as-a-service BI provider LucidEra created an opportunity that was too good to pass up. LucidEra’s Clearview interface, acquired and rebranded Pentaho Analyzer, fills a product-line gap by providing pivot analysis for non-technical business users. That this centerpiece Enterprise Edition component was not and is not open source invites a question. Is Pentaho, founded as a “commercial open source” BI vendor, still defined by open source? Pentaho itself seems unsure.

Pentaho’s brand of commercial open source involves devoting professional resources to shepherd community developed of core BI components: the Mondrian OLAP server, the JPivot JSP (Java Server Page) tag library, JFreeReports, Kettle ETL, etc. The company has built a number of open source elements such as a workflow engine and infrastructure services that bind these components into a suite, available as an open source Community Edition, and has supplemented these tools with additional, closed source components in a commercially licensed Enterprise Edition.

Analyst Merv Adrian characterizes Pentaho as “open core,” which seems like a very apt description. The Pentaho BI Suite’s basic components are open source, and non-open source elements are based on open standards. Neither reliance on an open core nor past exclusive use of open source components, however, is not sufficient for Pentaho to continue to call itself, at this juncture, a “commercial open source” company.

Pentaho, the company, seems itself of two minds about its status.

The main page of Pentaho’s Web site contains only two uses of the words “open source.” The first, “open source business intelligence,” is in the company logo. The other is in a bullet point, “Find out why commercial open source BI from Pentaho is changing the IT landscape.” This limited usage is justified. Pentaho does, after all, commercially develop and support open-source BI, which is indeed changing the IT landscape by lowering the cost of BI, even if the company is apparently increasingly making money from closed source technology to the point where its product pages are careful in how they use the term “open source.”

Further, Julian Hyde’s Pentaho Analyzer blog article — Julian founded the Mondrian project and works part-time for Pentaho — seems to acknowledge the evolution of Pentaho’s self conception: “Releasing ClearView as part of Enterprise Edition is perfectly in sync with Pentaho’s business model.” The Pentaho BI Suite used to be all-open source; today, software that “puts Pentaho at the top of the heap, in competition with best-of-breed” is not.

By contrast, I found Pentaho’s Clearview-acquisition announcement to be confused even though deception certainly wasn’t the company’s intent. The press release is titled “Open Source BI Leader Expands SaaS-Ready Product Portfolio, Adds Best-in-Class Business User Analytics.” It couples “open source BI” with the new tool, which is inappropriate in this instance. Similarly, the release’s first sentence describes the company as “the commercial open source alternative for business intelligence (BI),” which is not false but is again contextually inappropriate.

You have to understand code words: “enterprise edition” means a closed-source (and also supported) extension of core open-source tools. I do, but does Joann Reader of any of the many articles spun out of the press release? It’s not until the fifth PR paragraph — the second from last sentence of the PR — that we learn that Analyzer will be available for “existing and new subscription customers,” and the one example I have looked at shows that not every trade-press reporter notes details that appear that far down in the text.

Doesn’t this instance of the “commercial open source” formulation confuse rather than inform?

The body of the Pentaho press release uses “open source” four times. It even includes an analyst quotation from the estimable Claudia Imhoff, “Attractive, intuitive business user interfaces are as mandatory in open source BI as they are in other forms of BI.” Yet this quotation isn’t apt, is it, given that this particular “intuitive business user interface” isn’t open source BI?

The boilerplate About Pentaho Corporation PR footer, in a release about a “strategic,” closed source component, uses “open source” three more times.

I believe that this confusion doesn’t help the market understand Pentaho and also that it is not necessary. Last March, I wrote that “OS-BI capabilities, reliability, and support have matured. Commercial OS-BI vendors now compete with BI market leaders.” This statement is just as true for Pentaho as an “open core” vendor that offers customers both lower baseline costs due to reliance on free, open-source components and also high-value, strategic, enterprise software tools such as Analyzer.

EnterpriseDB made the jump early last year and now calls itself “the leading provider of enterprise-class products and services.” The company clearly believes that product and service quality matter more to customers than the open-source status of the PostgreSQL DBMS that the company builds on. EnterpriseDB was on the same confusing path that Pentaho is now on but has clarified its messaging and appears to be thriving.

Pentaho thrived as a commercial open source BI vendor and will do well as a core-open BI vendor, even more when the company unifies its messaging to reinforce, rather than the company’s product origins, today’s enterprise-ready capabilities.


I plan to post more on this topic in a subsequent blog article.

Modification, October 13, 8:14 AM: I modified “still an open source company” to “still defined by open source?,” which is closer to what I intended to ask. The word “defined” isn’t perfect here. The word I really want is a combination of (best) “defined,” “constrained,” “distinguished,” “described,” maybe like the French “cerner.”

OSBI » Blog Archive » How to manually install & configure Pentaho

There has been a mayor change in the way Pentaho looks and some changes on the configuration. There is no documentation on the pentaho forum or wiki to get a working manual installation. Al documentation is based on the PCI version. I couldn’t figure out why the Pentaho guys didn’t put a simple readme or something in the archive…. This howto is for the manual installation of Pentaho and meant for a installation on a already working tomcat server. After this howto you will have a working Pentaho BI server based on Tomcat 5.5 or 6, MySQL 5.0 or 5.1, Pentaho BI server 3.5 and Ubuntu 9.04 Jaunty (or another debian based distro).

Make sure you have a Java virtual machine (openjdk or sun) and Apache ant (both auto installed with Tomcat in Ubuntu)

Download the pentaho manual installation archive in a temp folder via the console and extract (with unzip) the archive somewhere at the Desktop or /tmp

wget http://sourceforge.net/projects/pentaho/files/Business%20Intelligence%20Server/3.5.0-RC2/biserver-manual-ce-3.5.0.RC2.zip/download

We want Tomcat to communicate with MySQL. Pentaho is delivered with a HSQLDB demo database so want that communication also available. Copy the needed connectors to the Tomcat installation folder:

sudo cp /tmp/biserver-manual-ce/pentaho-third-party/mysql-connector-java-5.0.7.jar /usr/share/tomcat5.5/common/lib/
sudo cp /tmp/biserver-manual-ce/pentaho-third-party/hsqldb-1.8.0.jar /usr/share/tomcat5.5/common/lib/

…and make sure the user tomcat (or whatever user runs tomcat) have the necessary rights to this jars.

Build and deploy the Pentaho WAR

open a console and go to the folder where you unpacked the archive and build the war files:

cd /tmp/biserver-manual-ce
ant war-pentaho-tomcat
ant zip-pentaho-style-war
ant zip-pentaho-steel-wheels-style-war
ant zip-pentaho-portal-layout-war

or

ant build-all

These commands should return BUILD SUCCESSFUL. The builded wars could be find in the /build/pentaho-wars folder. Copy the wars to tomcat webapps folder:
cd /build/pentaho-wars
sudo cp *.war /usr/share/tomcat5.5/webapps/
sudo cp tomcat/*.war /usr/share/tomcat5.5/webapps/

When copied this files Tomcat will deploy these wars automatically and create the pentaho directories:

ls -l /usr/share/tomcat5.5/webapps/ |grep ^d
/webapps/pentaho/
/webapps/sw-style/
/webapps/pentaho-portal-layout/
/webapps/pentaho-style/

Configure the Pentaho Solution folder

Next thing to do is to copy the Pentaho solutions directory. This directory is used for all reporting solutions Pentaho offers and is the place where all reporting files are stored fysically. We can use the /opt directory and we need to give the tomcat user read access to this directory:

sudo mkdir /opt/pentaho/
sudo cp -r /tmp/biserver-ce-3.5.0-STABLE/biserver-ce/pentaho-solutions /opt/pentaho/

and give ownership to the Tomcat user:

sudo chown -R tomcat5.5:tomcat5.5 /opt/pentaho/pentaho-solutions
sudo chmod -R 775 /opt/pentaho/pentaho-solutions

Pentaho configuration

First we have to tell pentaho where it can find the solution directory by editing the web.xml file and stop Tomcat, if you already didn’t:

sudo /etc/init.d/tomcat5.5 stop
sudo nano /usr/share/tomcat5.5/pentaho/WEB-INF/web.xml

Search for the solution-path entry and make sure that it looks like this:

<context-param>
<param-name>solution-path</param-name>
<param-value>/opt/pentaho/pentaho-solutions</param-value>
</context-param>

Also make sure that this entry points to the port Tomcat is listening at:

<context-param>
<param-name>base-url</param-name>
<param-value>http://localhost:8080/pentaho/</param-value>
</context-param>

If you want to make Pentaho available in your network or from the internet, you must replace localhost by the computer hostname or local IP, for the internet use the IP address provided by your ISP. Or you can use apache as reverse proxy…

Next to do is disable Tomcat security. For security reasons Tomcat forbids the execution of some jar files needed by Pentaho. It can be disabled by editing the Tomcat startup script and set security off:

sudo nano /etc/init.d/tomcat5.5

# Use the Java security manager? (yes/no)
TOMCAT5.5_SECURITY=no

Here you can read more about the Tomcat security manager…

Configure the Pentaho database connections and data

The Pentaho PCI and the manual installation comes with HSQLDB databases, we have to load that databases into MySQL:
– Quartz (Used for scheduling)
– Sampledata (the data used by the sample solutions)
– Hibernate (used for security and other settings)
You can find the datafiles in /tmp/biserver-manual-ce/pentaho-data/hsqldb

The archive delivers also datafiles in MySQL, Oracle10g and PostgreSQL format. But not all the data is completed, we miss all the sample data and some of the hibernate tables are incorrect for use by MySQL. So first we adjust the hibernate table which misses the user authentication tables:

cd /tmp/biserver-manual-ce/pentaho-data/mysql5
sudo nano create_repository_mysql.sql

and add at the end:

CREATE TABLE IF NOT EXISTS `GRANTED_AUTHORITIES` (
`USERNAME` varchar(50) collate latin1_general_ci NOT NULL,
`AUTHORITY` varchar(50) collate latin1_general_ci NOT NULL,
KEY `FK_GRANTED_AUTHORITIES_USERS` (`USERNAME`),
KEY `FK_GRANTED_AUTHORITIES_AUTHORITIES` (`AUTHORITY`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

CREATE TABLE IF NOT EXISTS `AUTHORITIES` (
`AUTHORITY` varchar(50) collate latin1_general_ci NOT NULL,
`DESCRIPTION` varchar(100) collate latin1_general_ci default NULL,
PRIMARY KEY (`AUTHORITY`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

CREATE TABLE IF NOT EXISTS `USERS` (
`USERNAME` varchar(50) collate latin1_general_ci NOT NULL,
`PASSWORD` varchar(50) collate latin1_general_ci NOT NULL,
`ENABLED` tinyint(1) NOT NULL,
`DESCRIPTION` varchar(100) collate latin1_general_ci default NULL,
PRIMARY KEY (`USERNAME`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

import in MySQL (in this order):

mysql -u root -p
source create_repository_mysql.sql
source create_quartz_mysql.sql
source create_sample_datasource_mysql.sql
source load_sample_users_mysql.sql

Check if all queries succeeded without errors…
show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hibernate          |
| mysql              |
| quartz             |
+--------------------+
6 rows in set (0.00 sec)
exit

Load the sampledata

The Pentaho sampledata for MySQL is missing for this version. You can find an older version somewhere on sourceforge but when I used that Pentaho gives some errors when trying the report samples. So I created a mysql script, download it here, and load this file with the source statement. You can check if the data loaded succesfully with this script:

+---------------------+------+
| table_name          | rows |
+---------------------+------+
| CUSTOMERS           |  122 |
| CUSTOMER_W_TER      |  122 |
| DEPARTMENT_MANAGERS |    4 |
| DIM_TIME            |  265 |
| EMPLOYEES           |   23 |
| OFFICES             |    7 |
| ORDERDETAILS        | 2996 |
| ORDERFACT           | 2996 |
| ORDERS              |  326 |
| PAYMENTS            |  272 |
| PRODUCTS            |  110 |
| QUADRANT_ACTUALS    |  148 |
| TRIAL_BALANCE       |   22 |
+---------------------+------+

Next thing todo is configure the datasources:

sudo nano /usr/share/tomcat5.5/webapps/pentaho/META-INF/context.xml

and adjust to settings to your MySQL settings:

<Resource name="jdbc/Hibernate" auth="Container" type="javax.sql.DataSource"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
maxWait="10000" username="hibuser" password="password"
driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/hibernate"
validationQuery="select 1" />

<Resource name="jdbc/Quartz" auth="Container" type="javax.sql.DataSource"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
maxWait="10000" username="pentaho_user" password="password"
driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/quartz"
validationQuery="select 1"/>

Make sure that the validation query above is right.

In older pentaho versions we had to define each datasource systemwide, but that’s reduced to one:
edit server.xml and add this lines before </host>:

sudo nano  /etc/tomcat5.5/server.xml


<Context path="/pentaho" docbase="webapps/pentaho/">
<Manager className="org.apache.catalina.session.PersistentManager" saveOnRestart="false"/>
<Resource name="jdbc/Quartz" auth="Container" type="javax.sql.DataSource" maxActive="20"
maxIdle="5" maxWait="10000" username="pentaho_user" password="password" validationQuery="select 1"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/quartz" testOnBorrow="true" testWhileIdle="true" timeBetweenEvictionRunsMillis="10000"
maxEvictableIdleTimeMillis="60000"/>
</Context>

Security, changing to JDBC Security DAO

Out of the box Pentaho does security with Hibernate-based security data access object (DAO). In earlier steps we made the hibernate tables and we want to use them.
The information on the Pentaho wiki is outdated for 3.5. Follow this steps:

1. Edit the Spring XML files to use the JDBC DAOs instead of the Hibernate ones. Change every bean resource from hibernate to jdbc

cd /opt/pentaho/pentaho-solutions/system/
sudo nano pentaho-spring-beans.xml


<beans>
<import resource="pentahoSystemConfig.xml" />
<import resource="adminPlugins.xml" />
<import resource="systemListeners.xml" />
<import resource="sessionStartupActions.xml" />
<import resource="applicationContext-spring-security.xml" />
<import resource="applicationContext-common-authorization.xml" />
<import resource="applicationContext-spring-security-jdbc.xml" />
<import resource="applicationContext-pentaho-security-jdbc.xml" />

<import resource="pentahoObjects.spring.xml" />
</beans>

2. Edit applicationContext-spring-security-jdbc.xml:

sudo nano applicationContext-spring-security-jdbc.xml

At the end of the file you must change to this:

<bean id="dataSource">
<property name="driverClassName" value="com.mysql.jdbc.Driver" />
<property name="url" value="jdbc:mysql://localhost/hibernate" />
<property name="username" value="hibuser" />
<property name="password" value="password" />
</bean>

3. Next thing is to edit the hibernate setting file:

sudo nano hibernate/hibernate-settings.xml

and change config-file tho:
<config-file>system/hibernate/mysql5.hibernate.cfg.xml</config-file>

sudo nano hibernate/mysql5.hibernate.cfg.xml

look for the lines beneath and change it to your settings:

<!--  MySQL Configuration -->
<property name="connection.driver_class">com.mysql.jdbc.Driver</property>
<property name="connection.url">jdbc:mysql://localhost:3306/hibernate</property>
<property name="dialect">org.hibernate.dialect.MySQL5InnoDBDialect</property>
<property name="connection.username">hibuser</property>
<property name="connection.password">password</property>
....

4. Correct the password in the user table

We must realize that the sample users passwords are encrypted and needed to be known by you. We can store and read the password in plain text or use an encryption method. This can be changed by adjusting the spring configuration file.

sudo nano /opt/pentaho/pentaho-solutions/system/applicationContext-spring-security-jdbc.xml

If you want to use MD5 hashing look for<bean id=”passwordEncoder”  and change class to:

<bean id="passwordEncoder" />

Make sure that your user password in your database is also encoded by MD5.

Don’t ever use plain text passwords for production…

5. Change the datasource for the sample solutions based on the sampledata.

The datasource for the sampledata database is by default set to the HSQL database. We must update it by edit the table DATASOURCE in the database HIBERNATE:

DRIVERCLASS: com.mysql.jdbc.Driver
URL: jdbc:mysql://localhost:3306/sampledata
QUERY: SELECT 1

or by running this sql (you can use phpmyadmin or SQuirreL):

UPDATE DATASOURCE SET DRIVERCLASS = 'com.mysql.jdbc.Driver', URL = 'jdbc:mysql://localhost:3306/sampledata' , QUERY = 'SELECT 1' WHERE NAME = 'SampleData'

Finish the installation

Just in case we forget setting some rights:

sudo chown -R tomcat5.5:tomcat5.5 /opt/pentaho/pentaho-solutions
sudo chown -R tomcat5.5:tomcat5.5 /usr/share/tomcat5.5/webapps/pentaho$

Realize that the security level used here isn’t enough for a production environment. There are several weak spots.
First weakness is the users table in the Hibernate database. Passwords are stored there in a way that an user with read permissions can see if some passwords are the same by looking at the hash text. Instead use a salt or another more secure messure…
Second is all the default passwords are stored in the XML files we edited before. Change this passwords and maybe change even the usernames in the files and in the dabases.
Third is the file ownership and permissions. File permissions must be set to read and write only for the tomcat user and never for some other user.

Now restart tomcat:

sudo /etc/init.d/tomcat5.5 restart

And browse to your fresh installed Pentaho BI server

Original Post : http://osbi.nl/2009/09/how-to-install-configure-pentaho-bi-server-3-5-on-ubuntu/