[34 total ]
DataCleaner 1.5.3 released

After much waiting, we are finally ready to release DataCleaner 1.5.3. Here's the wrap-up on what's been going on:

The MetaModel dependency has been upgraded to version 1.1.8, which means:
Improved Excel spreadsheet support
Improved ... [More] SQL Server support
Improved performance for CSV files
Fixed a bug that caused certain database connection errors to be ignored in terms of user feedback.
Fixed a bug that caused re-opening of database dictionaries to throw a NullPointerException.
Fixed a bug related to dictionary lookups of null values.
Added support for Teradata databases.
Added connection templates for SQL Server connections.
Added support for selection of custom encodings when reading CSV files.
Fixed a minor bug relating to reading files on the classpath when running in Java WebStart mode (which manifested in an exception thrown when clicking on "About DataCleaner").

So as you can see, it's been a mix of minor bugfixes and a couple of improvements to compatibility and performance regarding certain datastores. We hope you enjoy this new release of DataCleaner. As always, you can ...

Get your copy at the /downloads page.
Run it directly from the internet using Java WebStart.

Let us know what you think! [Less]

MetaModel 1.1.8 adds better SQL Server support

I'm happy to announce the release of MetaModel 1.1.8.

This release is a minor release with updates only relating to MS SQL Server. The changes are, however, profound in this regard. Microsoft SQL Server JDBC drivers are known to be quirky ... [More] when it comes to metadata exploration and we are happy to say that MetaModel now addresses these issues. So if you're a MS SQL Server you should be sure to get the latest version of MetaModel!

MetaModel is as always available at the following locations:

Downloadables at google code.
Javadocs available online.
Maven-support out of the box:
<dependency>
<groupId>dk.eobjects.metamodel</groupId>
<artifactId>MetaModel-full</artifactId>
<version>1.1.8</version>
</dependency>

We hope you're all satisified with the improvements of this release and don't hesitate to give us any feedback. [Less]

New book on Open Source Business Intelligence tells the DataCleaner-story

About half a year ago we received an exciting inquiry from Jos van Dongen on behalf of him and his co-author Roland Bouman, telling us that they where writing a new book about Open Source Business Intelligence and in particular Pentaho-based ... [More] solutions. And for this they where looking into DataCleaner for the data profiling section of the book!

The book is now out! It's called "Pentaho Solutions" and it's published by Wiley Publishing. You can read about it and buy it on their website as well.

The book contains a walkthrough for building a data warehouse using Open Souce tools and in doing so applying DataCleaner for the important job of profiling and validation.

We congratulate Roland Bouman and Jos van Dongen for their great work to promote Open Source Business Intelligence and thank them for mentioning DataCleaner while they're at it! [Less]

Explore and query all your datastores with MetaModel 1.1.7

We're pleased to announce the release of MetaModel 1.1.7. The major changes from our latest release is the introduction of two important improvements:

Microsoft SQL Server is finally supported and integration tests have been added to our ... [More] portfolio of tests of supported databases. Thank you to Asbjørn Leeth for the major contributions of this feature.
We've added an option to configure the character encoding for opening CSV files.

With the addition of these two improvements we think that we've added some significant "drops in the ocean" on our way of becoming the most comprehensive and advanced framework for object-oriented querying and datastore-independent schema exploration.

If you use Maven, update your dependencies to the following:

<dependency>
<groupId>dk.eobjects.metamodel</groupId>
<artifactId>MetaModel-full</artifactId>
<version>1.1.7</version>
</dependency>

... or if you don't, head on over to our download site at Google Code and download a copy of the release. [Less]

eobjects.org announces Open Source data quality with DataCleaner 1.5.2

Dear DataCleaner users,

We are happy to announce the release of DataCleaner 1.5.2. Users of DataCleaner 1.5.0 or 1.5.1 won't be able to see a lot of changes in the user interface, but this release actually holds quite a lot of improvements ... [More] “beneath the surface”:

The most notable improvement is in the Value Distribution Profile. Previously this profile consumed quite a lot of memory which could lead to out-of-memory errors in extreme cases. This has been fixed by using on-disk caching with the berkeley db when nescesary.
Another notable feature is that we can now distribute DataCleaner as a single JAR file. This means that we will be serving the application as a Java WebStart application (ie. run it as if it's an online application) and we are also considering other distribution options.
When starting the application, it automatically downloads regular expressions from the RegexSwap.
A bug in regards to matching number-based columns in dictionaries was reported and fixed.
A bug in regards to invalid characters in XML-export formats was reported and fixed.
When opening files, we are now ignoring suffix case so that .CSV files can be opened as well as .csv.
The number of columns shown in the preview window are automatically restricted if there are too many to show on a single screen.

You can download DataCleaner from the downloads page or you can use our new feature: Get it via Java WebStart!

This release underlines the ongoing evolution of DataCleaner to be a more and more professionally capable data profiler and data quality tool. Seeing that DataCleaner is being used in large corporations world wide I wish to address some thoughts that I have been having and that I know users are pondering with: How do you best combine the low adoption cost of Open Source applications like DataCleaner with the high flexibility that most commercial business-software provide? To service this need we've opened up a new division of the company that I work with, Lund&Bendsen. Whether you need to deploy DataCleaner to high-scale installations, integrate the applications with your existing systems or develop customized profiles, validation rules or satisfy other enterprise needs, we offer you first class services and in-depth expertise you wont find anywhere else.

To cut to the chase: DataCleaner 1.5.2 is here and we wish to extends the community development with a professional effort. So don't hesitate to let us know if you see an opportunity to invest. Adding value by targeting your use of the product is in the interest of both customer, developer and community and this is the reason our business is there.

To all you non-business users out there: Sorry for the obvious commercial rant and we hope you all enjoy the newest DataCleaner release.

Best regards,
Kasper Sørensen
Founder of eobjects.org and the DataCleaner project [Less]

MetaModel 1.1.6 released: Small changes, a bug fixed

We've released yet another version of MetaModel, namely version 1.1.6.

This release contains very few changes to the 1.1.5 release:

A convenience method was added to the Query class: select(FunctionType, Column).
Upgrading the ... [More] Apache POI version in MetaModel introduced a few bugs that we did not discover in the 1.1.5 milestone. In 1.1.6 we fixed these bugs and unittesting was significantly improved for this part of the code to prevent any new bugs from emerging.

We hope you enjoy this release and excuse for the hectic release schedule - the before mentioned bug fixes where critical and we hope that you appreciate the quick response from the community. [Less]

eobjects.org announces the release of MetaModel 1.1.5

We have just released the newest version of MetaModel, 1.1.5. This release is a minor release which means no API changes, but a few upgrades in terms of performance, flexibility and ease of distribution (full list):

The most important ... [More] upgrade have been to CSV performance. We encountered a bug when querying this type of datastore that meant that the whole DataSet was stored in memory while using it. This has undergone quite some refactoring so that it will now stream through memory as expected, thus keeping the door open for very large CSV files.
A minor change in the column naming scheme have been implemented for the Excel-based DataContext's. This means that if the first row of a spreadsheet contains only blank fields, we will automatically assign the names "[column 1]", "[column 2]" etc. accordingly.
The downloadable zip or tar.gz file will now contain a "MetaModel-1.1.5-all.jar" file, which is an assembled jar file containing the classes of all MetaModel modules (core, csv, jdbc, excel etc.), which should substantially ease deployment of the framework.

We hope you enjoy the new release of MetaModel and keep up the good work of providing the valuable feedback that drives development of it. [Less]

DataCleaner 1.5.1 released

We're happy to announce the release of DataCleaner version 1.5.1. This release is a minor release, nevertheless containing a few nice features - especially for the users who are enjoying the exporting features that was introduced in 1.5:

An ... [More] additional HTML export format have been added to the built-in export formats (usable when exporting Profiler results in the desktop app and when executing the runjob command-line tool).
The export format is now choosable directly in the desktop app.
Four new measures where added to the String Analysis profile: avg. chars and max/min/avg white spaces.

The new version of DataCleaner is (as always) downloadable for free on the downloads page and feedback from users is also greatly appreciated, ie:

Fill out our online user survey, or
Post your comments and questions at our discussion forum.

We hope that you all enjoy DataCleaner 1.5.1. [Less]

DataCleaner 1.5 released!

"Finally!" one might say. And this is definately what is going through my head right as I write this news-item. Finally, DataCleaner 1.5 has been released! Once again the effort to bring about the best open source data quality solution is bearing ... [More] fruit.

The new release is definately one of the most significant ones in the history of DataCleaner. The overall goal of the release has been to step up from the shadows of the "small tools" pool and mark DataCleaner as an enterprise-ready application for profiling and validating datastores of all kinds - both in scheduled mode, on servers and in an intuitive desktop environment.

For those of you with an interest in every little detail about this release, please feel free to review the complete list of changes - for everyone else, here's the recap:

Change of license to LGPL.
Multi-threaded execution of Profiler and Validator.
Command line (batch) execution of DataCleaner tasks.
More elaborate status information during profiler and validator execution.
New profile: Date mask matcher.
New profile: Regex matcher.
Load regex from the online RegexSwap repository.
Automatic download and install of popular database drivers.
More file types supported (.dat, .txt)
XML file support improved (.xml)
Memory improvements in Time analysis profile.
Improved logging when running profiling and validation.
Information schema provided for file-based datastores.
Lazy-loading of columns in datastore-tree.

We hope you enjoy the new DataCleaner 1.5! Now go over and download it right away. [Less]

Data quality pro launches DataCleaner articles

Things are starting to shape up for the big release of DataCleaner 1.5. We are starting off with a bit of excitement around in the data quality community.

Probably the most dedicated online magazine about data quality, data quality pro, have ... [More] launched a series of articles about profiling, validating and comparing data with DataCleaner. So far an introductory tutorial (including a complete and realistic example data-set) and a background article/interview have been published:

Learn how to profile and validate data (for free) using DataCleaner
Interview with Kasper Sørensen, creator of DataCleaner

We hope that you will enjoy the articles and we thank data quality pro for their great interest in our community. [Less]