High Activity

News

  Analyzed 3 days ago based on code collected 3 days ago.
 
Posted 7 months ago by kasper
Who will post the best content for use in DataCleaner?

Human Inference is announcing a competition for the DataCleaner community. The goal is to provide the best contribution for our favourite open source data quality tool.

What ... [More] kind of contributions?
Submitted content can be of many forms:

Educational content like tutorials, videos etc.
Regular Expressions for the RegexSwap.
DataCleaner extensions for the ExtensionSwap?.
Reference data for inclusion in the tool.
Use case descriptions – tell the community about your experiences.
Third party tool integration.

Prize
We do cherish everything in the community being free. But we will also be giving a nice prize to the winner with the best submission. The exact prize is to be announced shortly. All submissions will be reviewed and mentioned on the DataCleaner website.

Participating
Content must be submitted before Christmas (December 24) 2012. Post a comment on ​this discussion topic to tell the community where and how to retrieve your submitted content. We also encourage people to join our ​Google+ community hangouts where authors will be invited to present their contributions.

Submitted contributions (so far)
Here's a list of the submitted contributions in the contest so far:

​Pentaho Data Integration auto-profiling generator, by ​Alex Meadows. [Less]
Posted 8 months ago by kasper
Dear DataCleaner users and developers,

We have a new release for you today, version 3.0.3 of DataCleaner. Grab it before your neighbor at the ​download page.

The focus of this release has been stability, performance and ... [More] convenience for monitoring repository maintenance. Thus, the new and improved list follows:

We've added a service for renaming jobs in the monitoring repository. You can access this as a RESTful web service or interactively in the UI:

A web service was added for changing the historic date of an analysis result in the monitoring repository. This is convenient if you have historic dumps of data that you wish to include in a timeline.
The ​documentation has been updated with more elaborate descriptions of the web services available for repository navigation, job invocation and more.
The login dialog in the desktop application had a low-level version conflict, which caused it to be unusable. This has been fixed.
The web application has been made compatible with legacy JSF containers, making the range of applicable Java Webservers wider.
Caching of configuration in the web application was greatly improving, leading to faster page load and job initialization times.

We hope you enjoy this release. It should be 100% backwards compatible with other 3.x releases, so we encourage everyone to upgrade. [Less]
Posted 8 months ago by kasper
We are happy to invite everyone to a new initiative: The DataCleaner community hangout. The community hangout is a chance for users and developers of DataCleaner to meet face-to-face online every once in a while.

The last couple of weeks ... [More] we've been trying out the new concept with a limited amount of people, and we are now ready to make the invite to everyone with an interest!

The date of the next hangout is Tuesday the 6th of November at 10:00 CET. Please be aware of any timezone differences.

The hangouts are happening on Google+ on a semi-weekly basis. The frequency will be adjusted according to the interest in the community. To kick it off we will from the Human Inference side provide some presentations and discussion topics for the first couple of sessions. But the idea is also to engage users and friends to join the hangouts with their own input.

For the next hangout, project founder Kasper Sørensen will be demoing the new monitoring web application, and how it relates to the traditional desktop application.

For more information, go to our ​Google+ page and sign up to the next hangout. [Less]
Posted 8 months ago by kasper
We've released version 3.2.1 of ​MetaModel. This release is a minor feature enhancement and bugfix release. Here's the list of changes:

We've drastically improved the performance of "DELETE FROM" statements on CSV files.
We've added ... [More] mapping of unavailable-to-available data types when issuing "CREATE TABLE" statements containing unavailable data types on eg. DB2 or PostgreSQL. In these cases a proper data type will be automatically applied, eg. to use SMALLINT instead of BOOLEAN on DB2, or use BYTEA instead of BLOB on PostgreSQL.
A bug pertaining to multithreaded execution of compiled JDBC queries was fixed. We've created a pool of prepared statements to ensure parallel execution of compiled queries.
A bug pertaining to proper quoting of HAVING clause operands was fixed. When the data type of an aggregate function is different from data type of the functions argument, the quoting would not be correct under given circumstances.

Refer to the roadmap milestone for more details.

MetaModel 3.2.1 is ​available for direct download or as a Maven dependency. [Less]
Posted 8 months ago by kasper
It's friday afternoon and we have a little weekend gift to share with everyone. The last couple of weeks we've been working on a number of small but nice feature improvements and minor bugfixes in DataCleaner. These are now all available in ... [More] DataCleaner version 3.0.2 - ​go grab it at the downloads page.

Here's a wrap-up of the work that we've done:

When triggering a job in the monitoring web application, the panel auto-refreshes every second to get the latest state of the execution.
File-based datastores (such as CSV or Excel spreadsheets) with absolute paths are now correctly resolved in the monitoring web application.
The "Select from key/value map" transformer now supports nested select expressions like "Address.Street" or "orderlines[0].product.name".
The table lookup mechanism have been optimized for performance, using prepared statements when running against JDBC databases.
Administrators can now download file-based datastores directly from the "Datastores" page.
Exception handling in the monitoring web application has been improved a bit, making the error messages more precise and intuitive.

We hope you enjoy the new version. It should be a drop-in replacement of previous DataCleaner 3 releases, so no need to wait, upgrade now.

If you're using DataCleaner and think it would be fun to meet up with team members from Human Inference who work on the product, as well as consultants and other users of it - join our new ​Google+ page from where we will start doing community hangouts and thereby invite you to share ideas, questions and good vibes. [Less]
Posted 9 months ago by kasper
Thank you to all for the positive attention about our recent ​DataCleaner 3 release. With this information we've been able to quickly and effectively identify a few minor improvements and have introduced these in a new release: Version ... [More] 3.0.1.

The primary bugfix in this release was about restoring the mapping of columns and specific enumerable categorizations. For instance in the new Completeness analyzer, we found that after reloading a saved job, the mapping was not always correct.

Furthermore a few internal improvements have been made, making it easier to deploy the DataCleaner monitor web application in environments using the Spring Framework.

Last but not least, the visualization settings in the desktop application have been improved by automatically taking a look at the job being visualized and toggling displayed artifacts based on the screen size and amount of details needed to show it nicely.

DataCleaner 3.0.1 is available for download on our ​downloads page. We wish you good luck cleaning your data, and enjoy the software. [Less]
Posted 9 months ago by kasper
Dear friends, users, customers, developers, analysts, partners and more!

After an intense period of development and a long wait, it is our pleasure to finally announce that DataCleaner 3 is available. We at Human Inference invite you all to ... [More] our celebration! Impatient to try it out? Go ​download it right now!

So what is all the fuzz about? Well, in all modesty, we think that with DataCleaner 3 we are redefining 'the premier open source data quality solution'. With DataCleaner 3 we've embraced a whole new functional area of data quality, namely data monitoring.

Traditionally, DataCleaner has its roots in data profiling. In the former years, we've added several related additional functions:- transformations, data cleansing, duplicate detection and more. With data monitoring we basically deliver all of the above, but in a continuous environment for analyzing, improving and reporting on your data. Furthermore, we will deliver these functions in a centralized web-based system.

So how will the users benefit from this new data monitoring environment? We've tried to answer this question using a series of images:

Monitor the evolution of your data:

Share your data quality analysis with everyone:

Continuously monitor and improve your data's quality:

Connect DataCleaner to your infrastructure using web services:

The monitoring web application is a fully fledged environment for data quality, covering several functional and non-functional areas:

Display of timeline and trends of data quality metrics
Centralized repository for managing and containing jobs, results, timelines etc.
Scheduling and auditing of DataCleaner jobs
Providing web services for invoking DataCleaner transformations
Security and multi-tenancy
Alerts and notifications when data quality metrics are out of their expected comfort zones.

Naturally, the traditional desktop application of DataCleaner continues to be the tool of choice for expert users and one-time data quality efforts. We've even enhanced the desktop experience quite substantially:

There is a new Completeness analyzer which is very useful for simply identifying records that have incomplete fields.
You can now export DataCleaner results to nice-looking HTML reports that you can give to your manager, or send to your XML parser!
The new monitoring environment is also closely integrated with the desktop application. Thus, the desktop application now has the ability to publish jobs and results to the monitor repository, and to be used as an interactive editor for content already in the repository.
New date-oriented transformations are now available: Date range filter, which allows you to subset datasets based on date ranges, and format date, which allows to format a date using a date mask.
The Regex Parser (which was previously only available through ​the ExtensionSwap) has now been included in DataCleaner. This makes it very convenient to parse and standardize rich text fields using regular expressions.
There's a new Text case transformer available. With this transformation you can easily convert between upper/lower case and proper capitalization of sentences and words.
Two new search/replace transformations have been added: Plain search/replace and Regex search/replace.
The user experience of the desktop application has been improved. We've added several in-application help messages, made the colors look brighter and clearer and improved the font handling.

More than 50 features and enhancements were implemented in this release, in addition to incorporating several hundreds of upstream improvements from dependent projects.

We hope you will enjoy everything that is new about DataCleaner 3. And do watch out for follow-up material in the coming weeks and months. We will be posting more and more online material and examples to demonstrate the wonderful new features that we are very proud of. [Less]
Posted 11 months ago by kasper
Today we've released version 3.0.1 of  MetaModel. This is a minor point release which contains the following bugfixes and improvements:

Fixed a bug pertaining to "first row" semantics in the JDBC module. This issue was effective when both ... [More] "first row" and "max rows" was specified - one more row than desired would be produced.
The toSql() method of Table Creation builders now includes NOT NULL and PRIMARY KEY tokens in the ANSI SQL statement.
The documentation for POJO datastores has been updated since it contained a minor compilation issue due to an ambigiuous constructor.
A bug in the IBM DB2 support, related to handling of BLOBs was fixed.

This should be a drop-in replacement for version 3.0, so we encourage everyone to upgrade. [Less]
Posted 11 months ago by kasper
We've finally come to the day where we get to push the big red RELEASE button on the MetaModel 3.0 project!

This release is very significant since it marks the point where MetaModel is for the first time able to call itself a full CRUD ... [More] capable API for practically any data format.

Go to the  MetaModel website to read all about what a nice release this is:

 What's new in MetaModel 3.0?
 Using the new POJO based datastore
 Check the full CRUD example on the frontpage.

Congratulations to everyone involved in this release. We hope you will all appreciate this major arcievement and help us spread the word about MetaModel even more. [Less]
Posted about 1 year ago by manuel
We are celebrating the plans to build a version 3.0 of DataCleaner, where we hope to be pushing the limits of what you can expect from your open source data quality applications. A few big themes for version 3.0 has already been decided:

A ... [More] data quality monitoring web application.
A multi-tenant repository for data quality artifacts (jobs, profiling results, configurations, datastore definitions etc.)
Being able to edit data (in the desktop application).
Wizards to guide users through their first-time user experience with DataCleaner.

Go read Kasper Sørensen's  blog post about the data quality monitoring application, which underlines the general direction and scope of the release! [Less]
 

 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.