News

  Analyzed over 2 years ago based on code collected over 2 years ago.
 
Posted about 1 month ago by CUBRID
Today I am excited to announce an updated 9.x branch - CUBRID 9.3.0 has been released. CUBRID 9.3 is the version which stabilized and improved CUBRID 9.2.

CUBRID 9.3's DB volume is compatible with 9.2's. CUBRID 9.1 and previous version ... [More] should migrate database.

There are a number of new features and performance improvements in 9.3.0 that I will describe below.

You can download the latest version of CUBRID 9.3.0 for Windows and Linux from http://www.cubrid.org/?mid=downloads&item=cubrid&os=any&cubrid=9.3.0

New Features
The new features of CUBRID 9.3.0 are involved with several database aspects : SQL, HA, Administrative Convenience, Utility and Driver.

SQL New Features
Support SELECT FOR UPDATE Statement
The FOR UPDATE clause can be used in SELECT statements for locking rows returned by the statement for a later UPDATE/DELETE.

SELECT * FROM t1 INNER JOIN t2 ON t1.i=t2.i ORDER BY 1 FOR UPDATE OF t1, t2;
Support Schema Lock
The following two locks for schema are added.

SCH-S: Schema stability lock. This lock is obtained while query compile is being executed and guarantees that the schema included in the query is not modified by other transactions.
SCH-M: Schema modification lock. This lock is obtained while DDL (ALTER/CREATE/DROP) is being executed and prevents the modified schema from being accessed by other transactions.

As the schema lock is applied, the deadlock problem which has occurred when multiple DELETEs on the table without indexes were executed is solved.

Support various SHOW syntaxes for checking the internal database information

Add the SHOW ACCESS STATUS statement

The SHOW ACCESS STATUS statement is added to display the login information of the database account. This statement can only be used by a "DBA" account

Add the SHOW VOLUME HEADER statement

The "SHOW VOLUME HEADER" is added to display the specific volume header's information.

Add the SHOW LOG HEADER statement and the SHOW ARCHIVE LOG HEADER statement

Add the SHOW HEAP HEADER/CAPICITY statement

The "SHOW HEAP HEADER" statement displays the header page of the specified heap. The "SHOW HEAP CAPACITY" statement displays the capacity of the specified heap.

Add the SHOW INDEX HEADER/CAPACITY statement

The "SHOW INDEX HEADER" statement displays the header page of the specified index. The "SHOW INDEX CAPACITY" statement displays the capacity of the specified index.

Add the SHOW SLOTTED PAGE HEADER/SLOTS statement

The "SHOW SLOTTED PAGE HEADER OF VOLUME=vol_num AND PAGE=page_num" statement displays the header information of the specified slot page. The "SHOW SLOTTED PAGE SLOTS OF VOLUME=vol_num AND PAGE=page_num" statement displays all slot information of the specified slot page.

Support SHA-1, SHA-2, TO_BASE64, and FROM_BASE64 functions.

Add SHA-1 and SHA-2 functions
Add the TO_BASE64 function and the FROM_BASE64 function.
Support the "IF EXISTS" clause for the DROP SERIAL statement.
Add a hint, SKIP_UPDATE_NULL to allow a constraint of a certain column from NULL to NOT NULL without data verification.

HA New Features

CUBRID 9.3.0 contains following improvements in HA environment. 

CUBRID allows a broker to connect only to replica

Add a parameter that sets the broker to access only to the replica.

CSQL can do write operations to slave or replica if it has a special option

Add an option to write on the slave DB or the replica DB to CSQL.
The statistics information can be updated on the slave node or on the replica node with CSQL.

A utility for shrinking the replicating time is added

Add a utility to shorten the replication reflection time.

Other HA features 

Add a functionality to try to reconnect to another node when replication between the CAS and the DB server is delayed.
Add a replication latency time to the result of "cubrid statdump".
Add the logging functionality for the SQL received by the applylogdb process.
Add the functionality which detects a log flush delay caused by a copylogdb delay and writes it on the database server log file. 

Improvement of Administrative Convenience
Internal status information of SHARD proxy is strengthened
The SHARD proxy status information items are added and the number of shard proxy errors can be identified.

A number of options are added or changed in the "cubrid broker status" command.

A command to test whether the query specified in the DB is successfully executed or not, is added
A command to test query execution to the DB and to display whether the execution is successful or has failed has been added. After a query is executed, the transaction is rolled back. If the SHARD is set, query can be executed to all SHARD DBs. With this command, connection status to the DB, user's query permission, and hashing result of the SHARD DB can be checked.

Configuration

Add the standalone section to cubrid.conf.
Add the parameter to define the number of replication-delayed hosts among the hosts defined as db-host in the databases.txt that will try to connect.

Utility

Add the cubrid_replay utility to replay the SQL log in order to reconfirm the query execution time and query plan.
Add the --check-prev-link option and the --repair-prev-link option to the "cubrid checkdb". 
Add functionalities of displaying the last connection time by the client with each IP registered in the ACL list of the broker and displaying the number of ACL REJECTs.
Add a broker parameter which enables turning off trigger operation and CSQL option.
Add the query plan cache information when displaying the "cubrid statdump" command.
Add the logging functionality about the result of cubrid utility execution.

Drivers

[JDBC] Add the functionality to wait without broker connection while a connection object is created.
[CCI] Add a function that changes the login timeout value and a function that changes the datasource property.
[CCI] Add cci_bind_param_ex(), which can bind even when '\0' exists in the middle of a string.

Improvement of Performance
SQL Performance Improvment

Sampling method is adopted to update the statistics information for shorter UPDATE time.
Improve the performance of INSERT operations.
Improve the performance of SELECT and SCAN operations by improving page replacement policy of buffer manager.
Support loose index scan.
Improve the performance of a query which includes aggregate or analytic functions.
Improve the performance of a LIKE query.
Improve the problem that the INSERT operations take a long time after many DELETE operations.
Improve the performance for the insertion of TIMESTAMP type data.
Reduce the spent time for generating multiple connections at once.
Improve the performance of replicating UPDATE statements.
Reduce the spent time for generating indexes in a standalone mode. (--SA-mode)
Reduce the spent time for executing many FULL SCAN operations on the same table at the same time.
Improve the phenomenon for an application with queries to be relatively slower in Windows than in Linux.

Sharding Performance Improvment

Improve performance when the statement is reused in the SHARD statement pool.

Other Improvments

Reduce the time required to create several connections in the application at once. 
When a query is executed in an application in the Windows environment, the speed is relatively slower than in the Linux environment.

Behavior Changes
Some of the functionality in CUBRID 9.3.0 have been changed.

SQL

Change the operation when the column in the expression of the SELECT list and the alias of the expression are identical with the name of the column of GROUP BY clause.
Modify not to create sort merge join execution plan.

HA

Modify to "reload" only the node information when "cubrid heartbeat reload" command is executed.
Add the "cubrid heartbeat replication start/stop " command.

Driver

[JDBC][CCI] Changed the range of application and the default values of CCI loginTimeout and JDBC connectTimeout.
[JDBC][CCI] Modify the system so that queries, that are executed without specifying commit or rollback when the server process is restarted during a transaction or when a failover occurs in the HA environment, are treated as errors.
[JDBC] Modify that the JDBC connection object follows the URL's user or password when the user name and the password are entered as "" in the JDBC connection object.

Configuration

Modify not to apply corresponding broker parameters if the values of broker parameters (LONG_QUERY_TIME and LONG_TRANSACTION_TIME) are 0.
Modify the print information and related parameters of the broker access log (ACCESS_LOG) file.
The minimum value of log_buffer_size of cubrid.conf is changed from 48KB (3*1page, 16KB=1page) to 2MB (128*1page, 16KB=1page).

Utility

Displyed values are truncated when "-b" or "-f" option is used in "cubrid broker status" command.

Others

Replace error messages and utilities messages in Vietnamese and Khmer with that in English.

Besides these new features, performance improvements and behavioral changes, there are many bug fixes and minor improvements in resource management. For the full list refer to CUBRID 9.3.0 Release Notes.

If you would like to migrate from previous version to the latest CUBRID 9.3.0, please refer to Upgrading to CUBRID 9.3.0.

If you have any questions, feel free to leave your comment below. [Less]
Posted 3 months ago by newpcraft
We are releasing the heart bleed vulnerability patch of CUBRID Manager Server.

You have already seen the news regarding Heart Bleed bug, this vulnerability affecting a lot of web site which are using OpenSSL to provide https protocol. CUBRID ... [More] Manager Server uses OpenSSL in its https api and if you are using on the outside of network zone, it might be taken the vulnerability problem. Because of using OpenSSL 1.0.1c on CUBRID Manager Server to provide https CM API, we highly recommend to patch using this heart bleed vulnerability patch version. 

For additional help to patch, visit http://www.cubrid.org/wiki_tools/entry/2014-04-16-cubrid-manager-server-openssl-patch. [Less]
Posted 9 months ago by Esen Sagynov
Last month we announced CUBRID 8.4.4, the latest version of 8.4.x branch, with new SQL Profiling feature and numerous stability improvements. Today I am excited to announce an update to the 9.x branch - all new CUBRID 9.2.0.

Most new ... [More] features and stability improvements of the latest 8.4.4 version have been applied to CUBRID 9.2.0 as well. However, there are several more new features and performance improvements in 9.2.0 that I will describe below.

You can download the latest version of CUBRID 9.2.0 for Windows, Linux and AIX from http://www.cubrid.org/?mid=downloads&item=cubrid&os=detect&cubrid=9.2.0.

New Features
SQL profiling
Prior to 8.4.4 and 9.2.0 users could obtain SQL profiling information to analyze the query performance only in desktop GUI applications like CUBRID Manager or CUBRID Query Browser. Now CUBRID allows to retrieve SQL profiling information directly from drivers. To enable the SQL profiling one can execute SET TRACE ON before executing the query, then retrieve the query plan and statistics by executing SHOW TRACE query. See the following example:

csql> SET TRACE ON;
csql> SELECT /*+ RECOMPILE */ o.host_year, o.host_nation, o.host_city, n.name, SUM(p.gold), SUM(p.silver), SUM(p.bronze)
        FROM OLYMPIC o, PARTICIPANT p, NATION n
        WHERE o.host_year = p.host_year AND p.nation_code = n.code AND p.gold > 10
        GROUP BY o.host_nation;
csql> SHOW TRACE;

  trace
======================
  '
Query Plan:
  SORT (group by)
    NESTED LOOPS (inner join)
      NESTED LOOPS (inner join)
        TABLE SCAN (o)
        INDEX SCAN (p.fk_participant_host_year) (key range: (o.host_year=p.host_year))
      INDEX SCAN (n.pk_nation_code) (key range: p.nation_code=n.code)

  rewritten query: select o.host_year, o.host_nation, o.host_city, n.[name], sum(p.gold), sum(p.silver), sum(p.bronze) from OLYMPIC o, PARTICIPANT p, NATION n where (o.host_year=p.host_year and p.nation_code=n.code and (p.gold> ?:0 )) group by o.host_nation

Trace Statistics:
  SELECT (time: 1, fetch: 1059, ioread: 2)
    SCAN (table: olympic), (heap time: 0, fetch: 26, ioread: 0, readrows: 25, rows: 25)
      SCAN (index: participant.fk_participant_host_year), (btree time: 1, fetch: 945, ioread: 2, readkeys: 5, filteredkeys: 5, rows: 916) (lookup time: 0, rows: 38)
        SCAN (index: nation.pk_nation_code), (btree time: 0, fetch: 76, ioread: 0, readkeys: 38, filteredkeys: 38, rows: 38) (lookup time: 0, rows: 38)
    GROUPBY (time: 0, sort: true, page: 0, ioread: 0, rows: 5)
New SQL syntax
In CUBRID 9.2.0 we have added new SQL functions and statements.

Analytic functions

FIRST_VALUE, LAST_VALUE and NTH_VALUE functions

Aggregate-analytic functions

CUME_DIST, PERCENT_RANK and MEDIAN functions.

Support NULLS FIRST, NULLS LAST syntax to the ORDER BY clause.
Support DROP VIEW IF EXISTS statement.

AIX OS is now supported
CUBRID 9.2.0 now supports AIX Unix operating system. Refer to Recommendations for installing CUBRID .sh package in AIX.

Improvements
In CUBRID 9.2.0 there are over 140 stability and usability improvements which include the following:

Administrative Convenience

One of the major improvement is the SQL profiling which I've already explained above.
Sort the output of the tranlist command by the specified column.
Added an additional event log file to record query status information such as SLOW_QUERY, MANY_IOREADS, LOCK_TIMEOUT, DEADLONG, and TEMP_VOLUME_EXPAND which affect the query performance.
cubrid_master log file now includes each node information in the output of HA status.

HA

Added two new parameters to configure the replication delay interval to teh replica node and specify time for stopping replication.
An option to change the failover method in HA environment.

Sharding

cci_set_db_parameter function can be used in SHARD environment.
The password of shard DB can be also specified with an environment variable.

Configuration

Added a parameter to adjust the maximum number of replication archive logs.
Added a system parameter to specify transaction rollback when lock escalation occurs.
Added CONNECT_ORDER broker parameter which allows to specify the order brokers access the DB host when multiple HA/REPLICA DBs are configured.
Add a generic_vol_prealloc_size parameter that allows to maintain a certain amount of free space in GENERIC volume to prevent the low performance caused by the sudden increase of a GENERIC volume.

Globalization

Support Romanian ro_RO locale.
Hash partitioning is now available for all collations.

Performance

The in-memory sort optimization is added to process the query by saving the records that match the ORDER BY ... LIMIT condition to the sort buffer.
Query performance improved by applying SORT-LIMIT optimization while executing the ORDER BY ... LIMIT clause in the join query.
Query plan is rewritten when the proper data volume is exceeded during the re-executuion of prepare for the same query statement.
Statistical information of only the added index is updated.
Fix to use an index when a subquery is given in a START WITH clause as a condition in a hierarchical query

As a result of these improvements, the performance of SELECT operations in CUBRID 9.2.0 has improved by 23%.

Figure 1: Result of SELECT Operation of YCSB Benchmark (Master Server).

For more performance and QA results, refer to CUBRID 9.2.0 QA Completion Report.

Behavioral changes
Some of the functionality in CUBRID 9.2.0 have been changed.

SQL

When a column without a default value is added by using the ALTER ... ADD COLUMN statement, the PRIMARY KEY or NOT NULL constraint was violated as all values of the added columns became NULL. This problem has been fixed.

Globalization

CUBRID default data/time format is available when TO_CHAR, TO_DATE, TO_DATETIME, or TO_TIMESTAMP SQL functions are called without specifying the date/time format.
The collation and charset of an ENUM type attribute is no longer propagated from its components.

HA

Master with low priority is changed to slave when split-brain occurs.

Sharding

The "cubrid shard" command that controls the SHARD is removed; it is now controlled by the "cubrid broker" command.
SHARD-related parameters are configured in cubrid_broker.conf instead of shard.conf.
Some SHARD-related parameter names are changed. Refer to Changing SHARD-related parameter names.

Driver

In JDBC code changed the zero date of TIMESTAMP from '0001-01-01 00:00:00' to '1970-01-01 00:00:00'(GST) when the value of zeroDateTimeBehavior in the connection URL is "round".
In JDBC and CCI drivers query timeout is now applied to the batch processing function not per query when execution multiple queries at once.

Utility

Locale should be specified when creating a database.
Now the decimal values output by some utilities in Linux and Windows are same.

Configurations

Changed the time and capacity related parameter names to allow to set time unit or capacity unit next to the time or capacity parameter value.
PHRO is removed from the ACCESS_MODE of the broker.
Among the system parameters that a user without DBA permission can change dynamically, "client" or the "client/server" parameter can now be changed.

Other

Asynchronous query mode is no longer supported.

Besides these improvements and behavioral changes, there are many bug fixes and minor improvements in resource management. For the full list refer to CUBRID 9.2.0 Release Notes.

If you would like to migrate from previous version to the latest CUBRID 9.2.0, please refer to Upgrading to CUBRID 9.2.0.

If you have any questions, feel free to leave your comment below. [Less]
Posted 10 months ago by Esen Sagynov
Today we are releasing a new version of CUBRID driver for Nodej.s. node-cubrid 2.1.0 has a few API improvements which aim to provide more convenient APIs with full backward compatibility. However, the main objective of this release is to ... [More] refactor the driver to achieve significant performance improvement by leveraging the characteristics of V8 and Node.js platforms. Continue reading to learn more about the changes.

Improvements
I will explain the changes chronologically.

We have added Tavis Continuous Integration support to node-cubrid project. Now all 268K assertions are automatically performed on Travis servers upon every commit. Check out the latest node-cubrid bulid status at https://travis-ci.org/CUBRID/node-cubrid/builds.
Through Travis we can now guarantee that node-cubrid works on all major Node.js releases as well as CUBRID Server releases. This node-cubrid 2.1.0 is even compatible with the latest CUBRID 8.4.4 version we released last week.
To install CUBRID on Travis servers we use CUBRID Chef Cookbook. 
Besides the Travis CI integration, we have added Coveralls Code Coverage support. Now whenever someone pushes the commit, Coveralls will provide a percentage change in code coverage, whether the code change has made a contribution to code coverage or instead decreased it. Very convenient and encouraging plarform to write more tests. To see the current code coverage status of node-cubrid project, visit https://coveralls.io/r/CUBRID/node-cubrid. At this moment we provide 89% code coverage. Now we exactly know how many tests we need to add more and which lines of the code we need to test.
For those users who come from MySQL, we have added alias functions like end() in node-mysql for node-cubrid's close() function. Same goes for createConnection() alias for createCUBRIDConnection(). Now if you migrate to node-cubrid, there is less code you need to change in your application. 
Now createConnection() accepts an object of connection parameters. After all JavaScript is all about objects.
Connection timeout parameter can now be supplied to createConnection().
query() function now accepts params object which you can use to pass an array of values to bind to ? placeholders in the provided SQL query.
_sqlFormat() function in Helpers object, which is used to format an SQL query by replacing ? placeholder with respective bind values, is now smarter.
Now numbers are passed to the query as they are without being wrapped in single quotes as if they were strings, though users can wrap them if necessary.
If you pass a Date object, it will correctly be converted into CUBRID compatible DATETIME string format which can be stored in DATE, DATETIME, and TIMESTAMP columns.

Performance Improvement through Major Code Refactoring
As I mentioned at the beginning of this post, the core objective of this release is to refactor the driver to improve the performance of the driver. There are several refactoring works we have performed in this release.

Major refactoring of buffer parsers which handle server responses.
Major refactoring of protocol packet wrters to optimize the work with the Buffer.
Complete replacement of the Queries Queueing logic.

Buffer parsers refactoring
Prior to this 2.1.0 release, node-cubrid had many duplicate codes. All functions which initiate network communication with the CUBRID Server used to implement almost same functionality to read the data from the socket and prepare it for parsing. Though each function does need a separate logic to parse the buffer data, some functionality can be abstracted like reading bytes from the socket, and performing the basic preparation of the data before it is passed to the parser.

There is one more thing we have improved in buffer parsers: it is the work with the instances of Node.js Buffer class. The idea is that memory allocation through Buffer is not as fast as through a local heap of V8 (the JavaScript engine Node.js runs on top of). Moreover, resizing an existing buffer is quite expensive. There is a great inspirational video by Trevor Norris from Mozilla Working with Node.js Buffers and Streams [Youtube] which cover this topic. I highly recommend watching it. Prior to 2.1.0, whenever node-cubrid received a chunk of data from the server, it concatenated this chunk to the main buffer object of this particular request. Since we do not know the potential size of the incoming stream of data, we cannot preallocate enough buffer memory to avoid buffer resizing. This resulted in constant creation, resizing and copying of buffers upon every arrival of the data from the server.

In node-cubrid 2.1.0 we have resolved both of these issues: refactored buffer parsers completely to remove any code duplication and improved the work with the Buffer. Now, all functions which initiate network communication use the same API to read data from the socket and prepare it for parsing.

To resolve the second issues, we started leveraging the features of CUBRID Communication Protocol. In CUBRID Protocol when a server sends a data packet to a client, the first four bytes (the length of an integer type) represents the length of the incoming data packet. Thus, after receiving the first piece of the data packet, we can learn how many bytes will be necessary to keep the entire packet in a single Buffer instance. Knowing this value, we can create a Buffer instance with enough memory and start reading the remaining data from the pipe into this Buffer instance. This way we avoid buffer resizing completely.

Refactoring the protocol packet wrters
Just like with Buffer Readers (or Packet Readers) node-cubrid has Buffer Writers which we call Packet Writers. Since the Packet Writers also write into a buffer and send it over the wire, the same rule applies: we needed to optimize how writers work with the Buffer. Unlike with reading data sent by the server, when writing data into the buffer to send it to the server, we know exactly the length of this data. So, why not create the buffer with enough memory in advance? That's what we did. In 2.1.0, we create a Buffer instance only once for a request and write the entire payload into it, thus avoid buffer resizing.

Refactoring Queries Queueing logic
The third major refactoring affected how queryies queueing work in node-cubrid. We have introduced queries queueing in version 2.0.0. At the time it was implemented via the setTimer() function. Every X period the internal logic would check if there is a query the user wants to run. Obviously this creates three problems for us:

one that forces us to manage the queue and check if there is anything in the queue;
second there is a potential loss of time in between the queries. Imagine the case when the query A has started in time X and is completed in X + 60, while the timer will check the queue only at X + 100. Why lose these previous 40 units of the time?
the third issue is that there is a potential problem when a user can confuse and call the original query()/execute() functions instead of dedicated addQuery()/addNonQuery() queue related functions which would result in an error that tells that another query is already running.

To address these issues, we have completely replaced the old queueing logic with a new, more efficient and lightweight one which does not incur any delay. To leverage the new queueing mechanism we had to process all user request, which initiate a network communication with the server, though a queue. Prior to 2.1.0 only those queries which were added by addQuery()/addNonQuery() functions were processed by the queue. Now, everything is processed by the queue even requests to close the query, commit the transaction or fetch more data.

By processing all requests by one common queueing mechanism allows us to avoid "another query is already running" errors all together. Since everything goes through the queue, there is no way two requests are run at the same time within one connection. The new queue processes the pending request the moment the previous one is completed. Thus there is no any delay in executing the query. Moreover, this helps us remove the headache of managing and checking the timer. Now the query queueing logic is very efficient. If you check the source of the Queue.js module, you will notice that it's only 48 lines of code including comments and empty lines.

Other minor refacroring
There are a few other minor refactoring work we have also done. For example, JavaScript provides two very convenient functions like call() and apply() which you can use to alter the reference to the current context of the function. However, if you use it too often, it affects the performance. Previously, in node-cubrid there were quite many usages of these functions. Now we do not call them at all for the sake of better performance.

We have also refactored most functions by removing unnecessary and repetitive logic. For the project we have also refactored the test suites by gathering all of them in a single test directory.

Soon we will publish the performance test results which will compare the previous 2.0.2 release with this new 2.1.0 release. To see the full list of changes, refer to CHANGELOG.md file.

If you have any questions or feature requests, feel free to leave your comments below or create an issue in Github.

[Less]
Posted 11 months ago by Esen Sagynov
On behalf of CUBRID Team I am happy to announce the new 8.4.4 version of CUBRID open source relational database. This release is mostly dedicated for stability improvement of the 8.4 branch, and has some minor behavioral changes which I will explain ... [More] below. Thus, CUBRID 8.4.4 is fully compatible with the previous 8.4.3 version.

If you have been following up with our latest releases, you may have noticed that we are maintaining two branches for CUBRID 8.4.x and 9.x. As 8.4.0 release introduced major changes and extended support for MySQL compatible SQL, many users have started using or migrated to it. In fact, there are more 8.4.x users than 9.x at this moment. For this reason, we bring common bug fixes applied to the successor 9.x branch down to 8.4.x branch. This way we make sure users of the 8.4.x branch have the latest improvements.

New Features
Since this is a stability improvement release, there are no other new features except the SQL profiling which is brought down from the upcoming 9.2 release.

SQL profiling
Prior to 8.4.4 users could obtain SQL profiling information to analyze the query performance only in desktop GUI applications like CUBRID Manager or CUBRID Query Browser. Now CUBRID allows to retrieve SQL profiling information directly from drivers. To enable the SQL profiling one can execute SET TRACE ON before executing the query, then retrieve the query plan and statistics by executing SHOW TRACE query. See the following example:

csql> SET TRACE ON;
csql> SELECT /*+ RECOMPILE */ o.host_year, o.host_nation, o.host_city, n.name, SUM(p.gold), SUM(p.silver), SUM(p.bronze)
        FROM OLYMPIC o, PARTICIPANT p, NATION n
        WHERE o.host_year = p.host_year AND p.nation_code = n.code AND p.gold > 10
        GROUP BY o.host_nation;
csql> SHOW TRACE;

  trace
======================
  '
Query Plan:
  SORT (group by)
    NESTED LOOPS (inner join)
      NESTED LOOPS (inner join)
        TABLE SCAN (o)
        INDEX SCAN (p.fk_participant_host_year) (key range: (o.host_year=p.host_year))
      INDEX SCAN (n.pk_nation_code) (key range: p.nation_code=n.code)

  rewritten query: select o.host_year, o.host_nation, o.host_city, n.[name], sum(p.gold), sum(p.silver), sum(p.bronze) from OLYMPIC o, PARTICIPANT p, NATION n where (o.host_year=p.host_year and p.nation_code=n.code and (p.gold> ?:0 )) group by o.host_nation

Trace Statistics:
  SELECT (time: 1, fetch: 1059, ioread: 2)
    SCAN (table: olympic), (heap time: 0, fetch: 26, ioread: 0, readrows: 25, rows: 25)
      SCAN (index: participant.fk_participant_host_year), (btree time: 1, fetch: 945, ioread: 2, readkeys: 5, filteredkeys: 5, rows: 916) (lookup time: 0, rows: 38)
        SCAN (index: nation.pk_nation_code), (btree time: 0, fetch: 76, ioread: 0, readkeys: 38, filteredkeys: 38, rows: 38) (lookup time: 0, rows: 38)
    GROUPBY (time: 0, sort: true, page: 0, ioread: 0, rows: 5)
Improvements
In CUBRID 8.4.4 there are tons of stability and usability improvements which include the following list of improvements of administrative functions.

Administrative Convenience

One of the major improvement is the SQL profiling which I've already explained above.
Added a command line utility to output the parameter information of running brokers or shards.
Added tranlist utility to display a list of active transaction throughout all CAS processes.
Sort the output of the tranlist command by the specified column.
Now killtran utility can also remove transactions by SQL ID.
To remove multiple transactions at once -i option of killtran utility can be used.
Added an additional event log file to record query status information such as SLOW_QUERY, MANY_IOREADS, LOCK_TIMEOUT, DEADLONG, and TEMP_VOLUME_EXPAND which affect the query performance.
cubrid_master log file now includes each node information in the output of HA status.

HA

An option to change the failover method in HA environment.

Sharding

Now it's possible to print the shard ID for the shard key in the command line by running cubrid shard getid command.
cci_set_db_parameter function can be used in SHARD environment.
The password of shard DB can be also specified with an environment variable.

Configuration

Shard-related processes are started or terminated by using the "cubrid service" command.
Added a parameter to adjust the maximum number of replication archive logs.
Added a system parameter to specify transaction rollback when lock escalation occurs.

Performance

Query plan is rewritten when the proper data volume is exceeded during the re-executuion of prepare for the same query statement.
Statistical information of only the added index is updated.
Reduced time to fork new CAS processes.

Behavioral changes
Some of the functionality in CUBRID 8.4.4 has been changed.

SQL

When a column without a default value is added by using the ALTER ... ADD COLUMN statement, the PRIMARY KEY or NOT NULL constraint was violated as all values of the added columns became NULL. This problem has been fixed.
Failure to repeat INSERT in the array/batch executive function after executing PREPARE without committing a newly-created table

Driver

In JDBC code changed the zero date of TIMESTAMP from '0001-01-01 00:00:00' to '1970-01-01 00:00:00'(GST) when the value of zeroDateTimeBehavior in the connection URL is "round".
In JDBC and CCI drivers query timeout is now applied to the batch processing function not per query when execution multiple queries at once.

Utility

Made changes on "cubrid broker status" command.

Configurations

Changed the time and capacity related parameter names to allow to set time unit or capacity unit next to the time or capacity parameter value.
Changed the upper limit of the sort_buffer_size system parameter to 2G.
Removed OFF option from KEEP_CONNECTION broker parameter.
Changed APPL_SERVER_MAX_SIZE_HARD_LIMIT broker parameter.
Changed the default value of SQL_LOG_MAX_SIZE broker parameter from 100MB to 10MB.

Other

Asynchronous query mode is no longer supported.

Besides these improvements and behavioral changes, there are many bug fixes and minor improvements. For the full list refer to CUBRID 8.4.4 Release Notes.

What's next
In a week or so we will release a new version of 9.x branch which will carry the 9.2 label. We will explain about the new features of 9.2 in a separate post.

If you have any questions, feel free to leave your comment below. [Less]
Posted 12 months ago by Esen Sagynov
We are happy to announce that we have updated our CUBRID PHP and PDO drivers to roll out loads of improvements and bug fixes. The latest releases can be installed from  CUBRID and PDO_CUBRID PECL packages. ... [More] Below is a list of improvements which landed in cubrid-9.1.0.0004 and pdo_cubrid-9.1.0.0003 versions.

Connection parameters support.
Improved CUBRID database sharding support.
Collection data types support.
Removed dynamic CUBRID CCI API dependency.
Improved LOB data support.
Improved documentation.
Other minor improvements.

Connection parameters support
[APIS-580] Now both PHP and PDO drivers support all connection parameters supported by the underlying CCI API. When connecting to a CUBRID Broker, you can pass a list of various options like a list of alternative hosts which can be used either in HA environment to provide seamless failover functionality or in non-HA environment to provide driver level load balancing functionality. These are espectially useful in CUBRID SHARD environment.

Here is an example in PHP:

$conn = cubrid_connect($host, $port, $db, $user, $password, $createNewLink, "altHosts=192.168.0.2:33000,192.168.0.3:33000&rcTime=600");
Alternatively, connect with a URL:

$conn = cubrid_connect_with_url("CUBRID:localhost:33000:demodb:user:password:?login_timeout=100&altHosts=192.168.0.2:33000,192.168.0.3:33000&rcTime=600");
Or in PDO:

$db = new PDO("cubrid:dbname=demodb;host=localhost;port=33000", $user, $password, array("altHosts"=>"192.168.0.2:33000","rcTime"=>"600"));
Improved CUBRID database sharding support
We fixed a bug in PHP driver [APIS-562] which prevented DELETE queries from being executed by cubrid_execute() function in CUBRID SHARD environment.

Collection data types support
[APIS-555] Now PHP and PDO drivers provide native support for CUBRID collection data types like SET, MULTISET, LIST and SEQUENCE. There are multiple ways to insert an array of values into a collection data type column. 

Inserting an array of integers
$sql_stmt_insert = "INSERT INTO set_tbl_int VALUES (?);";
$data = array(123,456,789);

$stmt = $dbh->prepare($sql_stmt_insert);

// Provide the data type of elements as a third parameter.
$ret = $stmt->bindParam(1, $data, PDO::PARAM_INT);

$ret = $stmt->execute();
Inserting an array of strings
To insert an array of strings, set the PDO data type to NULL. This will tell PDO not to perform any conversion.

$sql_stmt_insert = "INSERT INTO set_tbl_str VALUES (?);";
$data = array("abc","def","ghi");

$stmt = $cubrid_pdo->prepare($sql_stmt_insert);

// When inserting an array of string, set the data type to NULL.
// This will tell PDO not to perform any conversion.
$ret = $stmt->bindParam(1, $data, PDO::PARAM_NULL);

$ret = $stmt->execute();
Alternatively, setting integer as a PDO data type will work as well. PDO will still keep the values as strings.

$sql_stmt_insert = "INSERT INTO set_tbl_str VALUES (?);";
$data = array("abc","def","ghi");

$stmt = $cubrid_pdo->prepare($sql_stmt_insert);

$ret = $stmt->bindParam(1, $data, PDO::PARAM_INT);

$ret = $stmt->execute();
The third way to insert an array of string is to specify the CUBRID data type in the fifth parameter as shown below.

$sql_stmt_insert = "INSERT INTO set_tbl_str VALUES (?);";
$data = array("abc","def","ghi");

$stmt = $cubrid_pdo->prepare($sql_stmt_insert);

$ret = $stmt->bindParam(1, $data, 0, 0, "varchar");

$ret = $stmt->execute();

Notice that PDO::PARAM_ARRAY cannot be used because PDO does not support it.
PDO::PARAM_STR also should not be used because PDO will throw an error that an array to string conversion is not allowed.

Removed dynamic CUBRID CCI API dependency
[APIS-559] Since this release CUBRID PHP and PDO drivers do not have dynamic dependency on CUBRID CCI driver. Instead we now compile PHP and PDO driver together with CCI source code. This allows to install PHP and PDO drivers on environments where CUBRID Server is not installed. This is one more step closer to provide a platform independent PHP and PDO drivers.

Improved LOB data support

[APIS-556] - We fixed a LOB related bug in cubrid_fetch() function which tried to allocate 4 billion bytes.
[APIS-364] - Also fixed a bug which affected Windows users when a user tried to export LOB data from a database which was successfully imported from a fie prior to export.

Documentation improvements

[APIS-553] - Manual page for cubrid_close_request() should be edited.
[APIS-554] - cubrid_connect_with_url() example does not have updated connection url.
[APIS-557] - cubrid_lob2_seek64 example does not work properly.
[APIS-558] - incorrect documentation.
[APIS-584] - Some 'Example' codes about the driver in php.net are somewhat confused.
[APIS-596] - The 'test_lob' table which is not created initially may lead some misunderstandings.

Other minor improvements

[APIS-120] - Return value is not correct when passing a date parameter to cubrid_bind method.
[APIS-121] - Return value is not false when passing a invalid time to cubrid_bind method.
[APIS-358] - Create table failed when using default timestamp '2038-01-19 12:14:07'.
[APIS-513] - CAS error code is not expected in Windows.
[APIS-560] - Cannot load PHP API 9.1.0.0003 in CentOS 5.6: undefined symbol: pthread_create in Unknown on line 0.
[APIS-583] - The cubrid_get_class_name() function throws a warning if select data from a system table.
[APIS-586] - cubrid_bind() doesn't work properly when insert bit data.
[APIS-587] - cubrid_bind() doesn't support select bit data from a prepared statement.
[APIS-549] - Warnings during PHP/PDO API build via PECL.

What's next
In the next version we will focus on improving Mac OS X support. The current release was a preparation for this. After removing the dynamic CCI dependency we can easily compile our PHP ad PDO drivers on Mac OS X as well.

Stay tuned for further updates! [Less]
Posted 12 months ago by Esen Sagynov
On behalf of CUBRID open source database project I wanted to take a chance and make a public announcement that we are looking for a Fedora Package Sponsor who would mentor us on submitting CUBRID package to Fedora repository.

It has been ... [More] three years since we have created a Review Request at https://bugzilla.redhat.com/show_bug.cgi?id=658754. Recently we have received quite many reviews and now we are at a point when we need a sponsor to move on to the next stage.

So, if you know someone or you know someone who knows someone who is a sponsor at Fedora and would be willing to sponsor us, we would appreciate very much if you let us know by commenting on this post.

For a reference, Packager sponsor responsibilities states:

Packager Sponsors are maintainers that have a good record of maintaining packages, doing reviews and assisting others with the processes and procedures of Fedora. Sponsors act as mentors for new contributors to help point them to areas they would like to contribute, assist them with processes and procedures and assist them when they need general guidance.  [Less]
Posted 12 months ago by Choo Yun-cheol
Today I would like to talk about Spanner, a NewSQL distributed relational database by Google. It can distribute and store data in data centers across the world, provide consistency that is as excellent as in RDBMS while enabling to ... [More] store an amount of data that exceeds the capacity of a single data center.

In this article I will briefly explain when the NewSQL trend has begun, then will introduce Spanner, its features and architecture, how it performs data distribution and rebalancing, how it actually stores the data, and finally how it provides data consistency. You will also learn about Google's TrueTime API which is at the core of Spanner distributed relational database.

NoSQL and NewSQL
NewSQL is on the rise. A wholly different database architecture, differentiated from NoSQL, is beginning to emerge. There are many reasons why NoSQL products have been popular. As there are a variety of NoSQL products, and they have been developed to serve different purposes, it is not easy to list their common features. However, as you can see in Hadoop or Cassandra, one of the main advantages of NoSQL is its horizontal scalability. As these NoSQL products don't provide Strong Consistency, they cannot be used where high-level data consistency is required. 

NewSQL has as excellent scalability as NoSQL, and at the same time it guarantees ACID like RDBMS which is performed in a single node. The term NewSQL was first used in 2011 by Matthew Aslett at the 451 Group, a corporate business IT research group. Figure 1 below shows the classification of NewSQL that was made by the 451 Group (it does not include information on Spanner, as it was drawn up in 2011).

Figure 1: Classification of RDBMS, NoSQL and NewSQL Made by the 451 Group.
(Source http://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/)

Of course, HBase also provides transactions in a limited form (transaction for a single row). However, not all business requirements can be met with such a limited transaction rule.

nBase, which is developed by Storage System Dev. Team here at NHN, is also a NewSQL. Currently, NAVER Mail, probably the most popular email service provider in Korea, uses nBase. nBase supports Strong Consistency from version 1.3.

What is Spanner?
Debuted [video] at the OSDI Conference in 2012, Spanner is a NewSQL created by Google. It is a distributed relational database that can distribute and store data in Google's BigTable storage system in multiple data centers. Spanner meets ACID (of course, it supports transaction) and supports SQL. Currently, F1, Google's advertisement platform, uses Spanner. Gmail and Google Search will also use it soon. F1 has 5 data replicas in the U.S. to prevent the service from stopping even in the event of a natural disasters such as an earthquake, or when one or two data centers have a failure.

Spanner provides the scalability that enables you to store a few trillion database rows in millions of nodes distributed to hundreds of data centers. When you read data, Spanner connects you to the data center that is geographically closest to you, and when you write data, it distributes and stores it to multiple data centers. If the data center you try to access has a failure, of course, you can read the data from another data center that has a replica of the data. 

The Spanner client automatically performs a failover between replicas. When the number of servers storing data is changed or a failure occurs in equipment, Spanner automatically re-distributes data through data transfer among the equipments.

The following information on Spanner was obtained from Google's Spanner Paper. In addition, the figures below are also excerpted from the paper.

The Data Model of Spanner as a Semi-relational Data Model 
Spanner's data model is semi-relational. It is called semi-relational because it has some features that differentiate it from normal relational databases.

In Spanner, all tables should have at least one primary key.
Spanner is a multi-version database that uses a version when storing data in a column. It has evolved from the key-value store that maintains a version like BigTable (and like HBase, too, which was also influenced by BigTable).
Data is stored in a semi-relational table that has a schema. This means that data has a version, which is given in the form of a time stamp when committing. In applications that use Spanner, you can also read data of past versions.
Spanner supports transactions for general use and supports SQL as well.
Spanner provides Strong Consistency. It can read and write data consistently and provides globally consistent read operations for a specific time stamp. This functionality enables you to carry out 'consistent backup', 'MapReduce operation' and 'atomic schema change', even with ongoing transactions. This is possible because Spanner issues a serialized commit time stamp to distributed transactions by using TrueTime API.

Spanner Server Configuration
The universal set of Spanner is called universe. A universe consists of multiple zones. A zone means a unit that can be run with physical independence. A data center may have one or more zones. If you want to store data separately in different server groups, you should make two or more zones in a single data center. You can also create or remove a zone in an operating system.

Figure 2: Spanner Server Configuration.

Figure 2 above shows the configuration of servers in a universe. One zone consists of one zonemaster and hundreds of, or thousands of, spanservers. The zonemaster allocates data to spanservers, while spanservers actually store and process data. Location proxy is called by the client, and shows in which spanserver the target data is stored.

The universe master provides the status of all zones or debugging information, and the placement driver automatically transfers data between zones, and inspects to determine whether there is any data to be moved due to a change in replication settings or load balancing by communicating with spanservers periodically.

Spanserver Software Stack

Figure 3: Spanserver Software Stack.

Each spanserver manages 10 to 1000 data structures called tablet. A tablet has a concept similar to a tablet of BigTable. It can store multiple mappings in the form of (key:string, timestamp:int64) string.

The difference between the tablet of Spanner and the tablet of BigTable is that Spanner stores a time stamp together with data. This means Spanner is not a simple key-value store but has the characteristics of a multi-version database.

The status of a tablet is stored in Colossus Distributed File System (the successor of Google File System) in the form of B-tree file and write-ahead log (WAL).

Spanner uses Paxos state machine to support data replication among spanservers.

Paxos is a protocol set created for reliable operations in distributed environments. In an distributed environment, you can have a failure at any time, and you are not also guaranteed to receive events from distributed nodes in the order of occurrence. Paxos is used to resolve this type of reliability issue. One of the main issues handled by Paxos is which node is a leader. This is for consistency in the process of data replication.

A spanserver has the transaction manager to support distributed transactions. The transaction manager is not involved in a transaction performed in a single Paxos group, but when a transaction is performed across multiple Paxos groups, one of the participant leaders is selected as coordinator leader, and performs coordination to enable phase-2 commit.

Directory
A directory is a set of continuous keys that use the same prefix (you can think of it as a bucket). The Spanner paper says that "bucket" is a more appropriate term, and that the term "directory" is used as they intended to keep using the old term.

A directory is a unit of data allocation. All the data in a directory have identical replication settings, and the transfer of data between Paxos groups is also conducted, as shown in Figure 4 below, in the unit of a directory.

Figure 4: Transfer of a Directory between Paxos Groups.

Spanner moves a directory to reduce the load of a certain Paxos group, groups directories which are frequently called together, or moves a directory into a group that is geographically close to the client that tries to access it. A directory can be transferred even while an operation of the client is in progress.

Transfer of directories between Paxos groups is conducted in the background. Through this transfer work, you can add or delete replicas in a Paxos group. To prevent the blocking of read/write tasks in progress during the transfer of a large amount of data, directory transfer is not performed as a single transaction. Instead, only the start of data transfer is registered to the background when directory transfer is conducted, and after the task is complete, a transaction is used only when the rest of the data is moved.

A directory is the minimum unit of geographical replica allocation. The administrator can specify the number of types of replica and its geographical allocation for each application. For example, you can configure settings to store 3 replicas of the data of User A in Europe, and store 5 replicas of the data of User B in North America.

If the size of a directory is too big, you can split a single directory into multiple fragments. In this case, the unit of directory transfer or allocation among groups also becomes a fragment.

Data Model
Spanner's data model features a semi-relational table with a schema, a query language that is expanded from SQL, and transactions for general purpose.

An application can create one or more databases in a universe, and one database can have many tables without a limit. A table has rows and columns like an RDBMS table. But unlike an RDBMS table, each data has version information.

Figure 5: An Example of Spanner Schema.

Figure 5 above shows an example of the schema of Spanner. In Spanner's schema definition language, you can express the hierarchical relationship among tables using the INTERLEAVE IN declaration. The top-level table in the hierarchical relationship is a directory table. According to the key defined in a directory table, its sub-table names are arranged in dictionary order and make up a directory. The ON DELETE CASCADE statement is used, when a row of a directory table is deleted, to delete the data of related sub-tables together.

In the example, the Users table is specified as a directory table, and according to the value of the uid column, which is the primary key, data is divided and stored into different directories. As the client specifies the hierarchical relationship among multiple tables, a database can have better performance when data is divided and distributed.

TrueTime API
TrueTime is an API that provides time-related information, which consists of the following methods:

TT.now()
TT.now() method is used to get the current time. This method returns the current time in the form of TTinterval:[earliest, latest] which takes the inaccuracy of time into account. TTinterval returned by TT.now() method guarantees that the absolute time when TT.now() was called is within TTinterval.

TT.after(t)
The TT.after(t) method returns true if the time is surely after the specified time t, and returns false if not.

TT.before(t)
In contrast, the TT.before(t) method returns true if the time is surely before the specified time t, and returns false if not.

TrueTime gets time information from GPS and the atomic clock. It gets time information from two different sources because one of them could have a failure and be unable to provide time information. When you get time information through GPS, you may fail to receive it due to problems such as an antenna or reception problem and radio wave interference. In addition, the time you get from an atomic clock may have an error due to frequency errors.

TrueTime consists of multiple time master devices in each data center. Most masters have a GPS receiver equipped with an antenna. Other masters without GPS are called Armageddon master. These masters are equipped with an atomic clock. All masters check the status of time information by comparing their information with each other periodically. In this process, each master synchronizes the time by checking for any error in their clock. To reduce risks from errors by masters, the timeslave daemon that runs for each time master gets information from multiple time masters and identifies any master that provides incorrect information and adjusts its own clock.

For this reason, the inaccuracy of the clock tends to increase from when the time is synchronized until the next synchronization comes. The time inaccuracy of Spanner draws a toothed wheel line going up and down between 1 ms and 7 ms per synchronization period. As the synchronization period is 30 seconds, and the time error increases by 200 μsec per second, the interval between teeth becomes 0-6 ms, and the other 1ms results from communication latency with the time master.

Concurrency Control
Spanner provides three types of operations: read/write transaction, read transaction and snapshot read operation. A single write operation is performed through a read/write transaction, while a single read operation, not a snapshot read, is performed through a read transaction.

The Process of Read/Write Transaction
A write operation executed in a read/write transaction is buffered on the client until the commit. A read operation in a transaction, therefore, is not affected by the result of a write operation.

Read in a read/write transaction uses a wound-wait method to avoid a deadlock. The client gets a read lock from the leader replica of an appropriate group and reads the latest data. To prevent a timeout while a transaction is open, the client sends a keepalive message to the participant leaders. If the client completes all read tasks and the buffering of the write task is finished, the phase-2 commit is started. The client selects a coordinator group and sends a commit message to all participant leaders. The commit message contains the information on the coordinator and the write task that was buffered.

Leaders other than the coordinator hold a write lock, prepare a time stamp value bigger than all the time stamps allocated to the transaction, log records through Paxos, and then send the time stamp value to the coordinator.

The coordinator leader holds a write lock, and then skips the process of preparing a time stamp. The coordinator receives time stamps from all the participant leaders, and then selects a time stamp for the transaction. The commit time stamp should be equal to, or bigger than, the value of all the time stamps received from the participant leaders, bigger than the TT.now().latest of the time when the coordinator received the commit message, and bigger than the value of any time stamps leaders allocated to the previous transaction. After that, the coordinator leader logs the commit record through Paxos.

Before applying the commit record to the coordinator replica, the coordinator leader waits until the value of TT.after (commit time stamp) becomes true to ensure that the time specified by the time stamp has passed. After that, the coordinator sends the commit time stamp to the client and all the participant leaders. The participant leaders that received the commit time stamp log the result of the transaction through Paxos. All participants apply the same time stamp and then release the lock.

Figure 5: The Process of the Phase-2 Commit of a Read/Write Transaction.

The Process of a Read Transaction
As a read transaction is executed after a read time stamp is determined without locking, a write task coming in the course of a read task is not blocked. A read transaction is executed in two phases. First, a read time stamp is determined and then a read is executed from the snapshot of the read time stamp.

To execute a read transaction, you need a scope expression that summarizes the scope of keys to be read by the transaction. If the scope in the scope expression can be executed within a single Paxos group, the client will request the group leader for a read transaction. Paxos leader determines a time stamp for the read transaction, and executes a read operation. If it is a read in a single group, the value of LastTS(), which is the time stamp value of the last committed write in the Paxos group, is used for a read time stamp instead of the value of TT.now().latest.

If the scope in the scope expression should be executed across multiple Paxos groups, the client will use the value for TT.now().latest as a read time stamp. The client waits until TT.after() becomes true, and then confirms that the time stamp value has passed, and then sends the read task in the transaction to all replicas.

Schema Change Transaction
Spanner also supports atomic schema change by using TrueTime. As there may be millions of groups requiring schema change, it is almost impossible to change schemas by using normal transactions. BigTable supports atomic schema change in a data center, but during the change work, all operations are blocked.

However, you can execute schema change without a block by using a special transaction. First, you should select a future point of time explicitly and create and register a time stamp. Through this, schema change can be performed without affecting any ongoing tasks. 

A read/write task related to the schema change work is synchronized with the registered schema change time stamp. If the time stamp of the task is earlier than the schema change time stamp, the task will be performed earlier. If it is after the schema change time stamp, it will be blocked.

Conclusion
Spanner has blended and developed the ideas of two different research communities. First, Spanner accepted familiar, easy-to-use semi-relational interface, transactions and SQL-based query language from the database communities. Second, Spanner also accepted the concepts of scalability, auto segmentation, failure resistance, data replication consistency and wide distribution from the communities. Thanks to 5 years of development efforts, Spanner has gained the critical functionalities of a database, which had been impossible in BigTable under globally distributed environments.

Another key functionality of Spanner is TrueTime. TrueTime provides a functionality based on accurate time synchronization in a distributed system by expressing the inaccuracy of time more specifically in the time API.

By Choo Yun-cheol, Senior Software Engineer at Storage System Dev. Team, NHN Corporation. [Less]
Posted about 1 year ago by Kwon Donghun
In this article I would like to introduce you to Cloudera Impala, an open source system which provides real-time SQL querying functionality on top of Hadoop. I will quickly go over when and how Impala was created, then will explain in more ... [More] details about Impala's architecture, its advantages and drawbacks, compare it to Pig and Hive. You will also learn how to install, configure and run Impala. At the end of this article I will show the performance test results I have obtained when comparing Impala with Hive, Infobright, infiniDB, and Vertica.

Analyzing results in real time through Hadoop
With the advent of Hadoop in 2006, big data analytics was no longer a task that could be performed by only a few companies or groups. This is because Hadoop was an open-source framework, and thus many companies and groups that needed big data analytics could use Hadoop easily at low cost. In other words, big data analytics became a universal technology.

The core of Hadoop is Hadoop Distributed File System (HDFS) and MapReduce. Hadoop stores data in HDFS, a file system that can expand the capacity in the form of a distributed file system, conducts MapReduce operations based on the stored data, and consequently gets the required data.

There is no limit to one's needs. The Hadoop user group tried to overcome Hadoop's limits in terms of functionality and performance, and develop it more. Complaints were focused on the use of MapReduce. MapReduce has two main disadvantages.

It is very inconvenient to use.
Its processing is slow.

To resolve the inconveniences of using MapReduce, platforms such as Pig and Hive appeared in 2008. Pig and Hive are sub-projects of Hadoop (Hadoop is also an ecosystem of multiple platforms; a variety of products based on Hadoop have been created). Both Pig and Hive have a form of high-level language, but Pig has a procedural form and Hive has a declarative language form similar to SQL. With the advent of Pig and Hive, Hadoop users could conduct big data analytics easier.

However, as Hive and Pig are related to the data retrieval interface, they cannot contribute to accelerating big data analytics work. Internally, both Hive and Pig use MapReduce as well.

This is why HBase, a column-based NoSQL appeared. HBase, which enables faster input/output of key/value data, finally provided Hadoop-based systems with an environment in which data could be processed in real time.

This progress of Hadoop (Hadoop eco-System) was greatly influenced by Google. HDFS itself was implemented on the basis of papers on GFS published by Google, and Hbase appears to have been based on Google's papers on BigTable. Table 1 below shows this influential relationship.

Table 1: Google Gives Us A Map (source: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting)

Google Publication

Hadoop

Characteristics

GFS & MapReduce (2004)

HDFS & MapReduce (2006)

Batch Programs

Sawzall (2005)

Pig & Hive (2008)

Batch Queries

BigTable (2006)

HBase (2008)

Online key/value

Dremel (2010)

Impala (2012)

Online Queries

Spanner (2012)

????

Transactions, Etc.

Cloudera's Impala, introduced in this article, was also established under the influence of Google. It was created based on Google's Dremel paper, which was published back in 2010. Impala is an open-source system under an Apache license, which is an interactive/real-time SQL query system that runs on HDFS.

SQL is very familiar to many developers and is able to express data manipulation/retrieval briefly.

As Impala supports SQL and provides real-time big data processing functionality, it has the potential to be utilized as a business intelligence (BI) system. For this reason, some BI vendors are said to have already launched BI system development projects using Impala. The ability to get real-time analytics results by using SQL makes the prospect of big data brighter and also extends the application scope of Hadoop.

Cloudera Impala
Cloudera, which created Impala, said they had been technically inspired by Google's Dremel paper, which made them think that it would be possible to perform real-time, ad-hoc queries in Apache Hadoop.

In October 2012, when announcing Impala, Cloudera introduced it as follows:

“Real-Time Queries in Apache Hadoop, For Real”

Impala adopted Hive-SQL as an interface. As mentioned above, Hive-SQL is similar in terms of syntax to SQL, a popularly used query language. For this reason, users can access data stored in HDFS through a very familiar method.

As Hive-SQL uses Hive, you can access the same data through the same method. However, not all Hive-SQLs are supported by Impala. For this reason, you had better understand that Hive-SQLs that are used in Impala can also be used in Hive.

The difference between Impala and Hive is whether it is real-time or not. While Hive uses MapReduce for data access, Impala uses its unique distributed query engine to minimize response time. This distributed query engine is installed on all data nodes in the cluster.

This is why Impala and Hive show distinctively different performance in the response time to the same data. Cloudera mentions the following three reasons for Impala's good performance:

Impala has reduced CPU load compared to Hive, and thus it can increase IO bandwidth to the extent that CPU load is reduced. This is why Impala shows 3-4 times better performance than Hive on purely IO bound queries.
If a query becomes complex, Hive should conduct multi-stage MapReduce work or reduce side joins. For queries that cannot be efficiently processed through a MapReduce framework (a query that contains at least one join operation), Impala shows 7 to 45 times better performance than Hive.
If the data block to analyze has been file-cached, Impala will show much faster performance, and in this case, it performs 20 to 90 times faster than Hive.

Real-time in Data Analytics
The term "real time" is emphasized with the introduction of Impala. But one may naturally ask, "How much time is real time?" On this question, Doug Cutting (the person who created Hadoop) and Cloudera's senior architect gave their opinions on what real time is.

"When you sit and wait for it to finish, that is real time. When you go for a cup of coffee or even let it run overnight, that's not real time."
"'real-time' in data analytics is better framed as 'waiting less.'

Although it would be difficult to clearly define the criteria for real time numerically, if you can wait for a result, watching your monitor, that may be called real time.

Impala Architecture
Impala is composed largely of impalad and impala state store.

Impalad is a process that functions as a distributed query engine. It designs a plan for queries and processes queries on data nodes in the Hadoop cluster. The impala state store process maintains metadata for the impalads executed on each data node. When the impalad process is added or deleted in the cluster, metadata is updated through the impala state store process.

Figure 1:  Impala High-level Architectural View.

Data Locality Tracking and Direct Reads
In Impala, the impalad process processes queries on all data nodes in the cluster instead of MapReduce, which is Hadoop's traditional analytic framework. Some advantages of Impala with regard to this configuration include data locality and direct read. In other words, impalad processes only the data block within the data node to which it belongs, and reads the data directly from the local directory. Through this, Impala minimizes network load. Moreover, it can also benefit from the effect of file cache.

Scale Out
Impala provides a horizontal expansion like Hadoop cluster. In general, you can expand Impala when a cluster is horizontally expanded. All you have to do to expand the Impala that is running the impalad process on the server when a data node is added (metadata for the addition of impalad will be updated through impala state store).

This is very similar to databases based on massively parallel processing (MPP).

Failover
Impala analyzes data stored in Hive and HBase. And HDFS used by Hive and HBase provides a certain level of failover through replication. For this reason, Impala can perform queries if the replica of a data block and at least one impalad process exist.

Single Point of Failure (SPOF)
One of the huge concerns about all systems which use HDFS as a storage medium is that their name node is SPOF (we have discussed this in details when comparing HDFS with other distributed file systems in our previous article Overview and Recommendations for Distributed File Systems). Some solutions to prevent this have recently been released, but resolving the problem fundamentally is still a distant goal.

In Impala, the namenode is SPOF as well. This is because you can't perform queries unless you know the location of a data block.

Query Execution Procedure
The following is a brief account of the query execution procedure in Impala:

The user selects a certain impalad in the cluster, and registers a query by using impala shell and ODBC.
The impalad that received a query from the user carries out the following pre-task:
It brings Table Schema from the Hive metastore and judges the appropriateness of the query statement.
It collects data blocks and location information required to execute the query from the HDFS namenode.
Based on the latest update of Impala metadata, it sends the information required to perform the query to all impalads in the cluster.

All the impalads that received the query and metadata read the data block they should process from the local directory and execute the query.
If all the impalads complete the task, the impalad that received the query from the user collects the result and delivers it to the user.

Hive-SQL Supported by Impala
Not all Hive-SQLs are supported by Impala. As Impala supports only some Hive SQLs, you need to know which statements are supported.

SELECT QUERY
Impala supports most of the SELECT-related statements of Hive SQL.

Data Definition Language
In Impala, you cannot create or modify a table. As shown below, you can only retrieve databases and table schemas. You can only create and modify a table through Hive.

SHOW TABLES
SHOW DATABASES
SHOW SCHEMAS
DESCRIBE TABLE
USE DATABASE

Data Manipulation
Impala provides only the functionality of adding data to an already created table and partition.

INSERT INTO
INSERT OVERWRITE

Unsupported Language Elements
There are still many Hive SQLs not supported by Impala. Therefore, you are advised to see the Impala Language Reference before using Impala.

Data Definition Language (DDL) such as CREATE, ALTER, DROP
Non-scalar data types: maps, arrays, structs
LOAD DATA to load raw files
Extensibility mechanisms such as TRANSFORM, custom User Defined Functions (UDFs), custom file formats, custom SerDes
XML and JSON functions
User Defined Aggregate Functions (UDAFs)
User Defined Table Generating Functions (UDTFs)
Lateral Views
etc.

Data Model
Since Impala 1.0 has dropped its beta label, Impala supports a variety of file formats: Hadoop native (Apache Avro, SequenceFile, RCFile with Snappy, GZIP, BZIP, or uncompressed); text (uncompressed or LZO-compressed); and Parquet (Snappy or uncompressed), the new state-of-the-art columnar storage format.

The highest interest, however, lies in whether Impala will support Trevni, a project currently led by Doug Cutting. Trevni is a file format that stores a table record comprising rows and columns in the column-major format instead of the existing row-major format. Why is support for Trevni a matter of keen interest? It is because Impala could provide better performance with Trevni.

As Trevni is still being developed, you will only see a brief account of the column file format mentioned in the Dremel paper.

Dremel highlights the column file format as one of the factors affecting performance. What benefits most from the column file format is Disk IO. As the column file format uses a single record by dividing it into columns, it is very effective when you retrieve only some of the columns from the record. In the existing row-unit storage method, the same disk IO occurs whether you see a single column or all columns, but in the column file format, you can use Disk IO more efficiently, as Disk IO only occurs when you access the specific required column.

Figure 2: Comparison of Row-unit Storage and Column-unit base (source: Dremel: Interactive Analysis of Web-Scale Datasets).

Looking at the result of the two tests conducted with regard to the column file format in Dremel, you can estimate the degree of contribution of the column file format to performance.

Figure 3: Comparison of the Performance of the Column-unit Storage and the Row-unit Storage (source: Dremel: Interactive Analysis of Web-Scale Datasets).

Figure 4:  Comparison of the Performance of MapReduce and Dremel in the Column-unit Storage and the Row-unit Storage (300 nodes, 85 billion Records) (source: Dremel: Interactive Analysis of Web-Scale Datasets).

In Figure 3, (a), (b) and (c) show the execution time according to the number of columns randomly selected in the column file format, while (d) and (e) show the existing record reading time. According to the result of the test, the execution of the column file format is faster. In particular, the gap becomes bigger when it accesses a smaller number of columns. Reading records using the previous method, it took a very long time even when accessing a single column, as if it was reading all columns. However, the smaller the number of selected columns, the better performance the column file format (a, b and c) shows.

Figure 4 compares the execution time when MapReduce work is processed in the column file format data and when it is not (while this is a comparison test of MapReduce and Dremel, we will just check the result of application/non-application of the column file format to MR). In this way, the column file format improves performance significantly by reducing Disk IO.

Of course, as the above test result is about the column file format implemented in Dremel, the result of Trevni, which will apply to Impala, may be different from the result above. Nevertheless, Trevni is also being developed with the same goal as that of the column file format of Dremel. For this reason, it is expected to have a similar result to the test result above.

Installing, Configuring and Running Impala
Installing
To install Impala, you must install the following software (for detailed installation information, visit the Impala website):

Red Hat Enterprise Linux (RHEL) / CentOS 6.2 (64 bit) or above.
Hadoop 2.0
Hive
MySQL

MySQL is indispensable for Hive metastore. Hive supports a variety of databases used as metastore. However, currently, you should use MySQL to make Impala and Hive interwork.

If interworking with HBase is required, you should install HBase as well. This article will not discuss the installation procedure for each software solution in detail.

If you have installed all required software, you should now install the Impala package to all the data nodes in the cluster. Then, you should install the impala shell in the client host.

Configuring
For impalad to access HDFS file blocks directly from the local directory and for locality tracking, you should configure some settings for Hadoop core-site.xml and hdfs-site.xml. As this configuration is critical to the performance of Impala, you must add the settings. After changing the settings, HDFS should reboot.

Refer to Configuring Impala for Performance without Cloudera Manager for more information on configuration.

Running
Before running Impala, you should carry out some pre-tasks, such as table creation and data loading through Hive Client, as Impala does not currently support these.

If you need to analyze an existing table in HBase or a new table, you can redefine it as an extern table through Hive. Refer to https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration.

To enter a query in Impala, you can enter it through Impala shell, Cloudera Beeswaw (an application that helps users to use Hive easily), or ODBC. Here you will see a query method by using the Impala shell.

$ sudo –u impala impala-shell
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Build version: Impala v0.1 (1fafe67) built on Mon Oct 22 13:06:45 PDT 2012)

[Not connected] > connect impalad-host:21000
[impalad-host:21000] > show tables
custom_url
[impalad-host:21000] > select sum(pv) from custom_url group by url limit 50
20390
34001
3049203
[impalad-host:21000] >
Before entering a query, you should select an impalad to be the main server from the impalads in the cluster. Once the connection is complete, enter a query and see the result.

Impala Functionality/Performance Test
In this article I want to share with you the test results I have obtained when I first tried Impala beta version. Though many things have been changes since the beta release, I want you to see the trend in Impala performance. I conducted a simple test to see how well it performs within an available scope.

Table 2 below shows the equipment and the version of the software used. The Hadoop cluster was composed of one Namenode/JobTracker, 3 to 5 Data/Task nodes, and one commander. And the HDFS replication was set to "2". The Map/Reduce capacities of node were also "2". Data for the test is approximately 1.3 billion records with 14 column data, totaling approximately 65 GB.

Table 2: Hardware and Software Versions.

Classification

Description

Equipment

CPU

Intel Xeon 2.00GHz

Memory

16GB

Software

OS

CentOS release 6.3(Final)

Hadoop

2.0.0-cdh4.1.2

HBase

0.92.1-cdh4.1.2

Hive

0.9.0-cdh4.1.2

Impala

0.1.0-beta

The table schema and query created for the test are as follows:

Code 1: Hive Table Schema.

hive> describe custom_url;
OK
sd_uid bigint
type int
freq int
locale int
os int
sr int
browser int
gender int
age int
url int
visitor int
visit int
pv int
dt int
Code 2: Test Query.

select sr, url, sum(pv) as sumvalue from custom_url
WHERE sr != -1 and url != -1 and (sd_uid=690291 or sd_uid=692758)
group by sr, url order by sumvalue desc limit 50
The query was simply composed of frequently used simple statements, including select, from, group by and order by.

Test Result
With cluster sizes of 3 and 5 nodes, the same query in the same table was performed through Impala and Hive, and the average response time was measured.

Table 3: Comparison of the Time to Execute the Same Query in Impala and Hive, by Cluster Size.

Impala

Hive

Hive/Impala

3 nodes

265s

3,688s

13.96

5 nodes

187s

2,377s

13.71

As mentioned above, Impala shows much better performance than the analytic work by using MapReduce. In addition, the bigger the cluster size is, the better response time Impala shows linearly. (After expanding the cluster, the test was conducted after evenly distributing data blocks through rebalancing.)

Figure 5: Impala's Response Time According to Cluster Size.

In addition, to compare the performance of Impala with that of other column-based commercial databases, such as Infobright, infiniDB and Vertica, which have MPP as a common denominator with Impala, a test was also conducted with the same data and query (the open source version of Infobright and infiniDB and the Community Edition version of Vertica were used for the test). The test of the three databases above was conducted on a single server.

Table 4: Execution Time of Commercial Column-based Databases.

Database

Execution time

Infobright

200s

infiniDB

32s

Vertica

15s

As the test result in Table 4 shows, Infobright, infiniDB and Vertica, which are commercial enterprise products, show much better results than Impala. This is because Impala is still in the initial development stage, and thus may have structural and technical shortcomings. For Vertica, which showed the best performance of the four, in a three-server environment, it showed 30% higher performance than the result specified in Table 4.

However, as Vertica is a commercial product, if you want to configure a cluster of two or more servers, there is a higher cost.

Conclusion
Less than a year has passed since Impala beta was released and two months since the production 1.0 version is released. Since Impala 1.0 a variety of file formats are now supported, a subset of ANSI-92 SQL is supported including CREATE, ALTER, SELECT, INSERT,JOIN, and subqueries. In addition, it supports partitioned joins, fully distributed aggregations, and fully distributed top-n queries.

Apart from this, MapR, which does business by using Hadoop like Cloudera, suggested Drill, a system based on the Dremel paper, like Impala, which is being prepared mainly by its HQ developers, to the Apache incubator. It is still just a proposal, but Drill is expected to make fast progress once it is launched, as it is being promoted mainly by MapR developers.

By Kwon Donghun, Software Engineer at Data Infra Lab, NHN Corporation. [Less]
Posted about 1 year ago by Esen Sagynov
Along with the latest CUBRID Migration Toolkit release we are now announcing new 2013.05 versions (build 0120) of CUBRID Manager and CUBRID Query Browser tools which allow users to administer their CUBRID Database Servers and execute queries in ... [More] a friendly desktop GUI application.

You can download CM and CQB by following these links:

CM: http://www.cubrid.org/?mid=downloads&item=cubrid_manager&os=detect&cubrid=any
CQB: http://www.cubrid.org/?mid=downloads&item=cubrid_query_browser&os=detect&cubrid=any

As CQB is a subset of CM tool, the key features of this release is the same for both. They are:

All versions of CUBRID Server since 8.2.2 are supported.
Improved Query Tuning Mode feature: allows users to see the results of query tuning and compare the before/after results of query statistics and execution plan.
Improved results output when concurrently executing queries: allows to compare the output results (data/columns/records) by applying different colors.
Added search functionality to query results: now users can search within the results set in the Results Tab.
Improved Import Wizard: added support to export the failed data and display the error message for each failed query.

A number of bugs have also been fixed in this release. For full list of changes, refer to CUBRID Manager Release Notes.

How to upgrade
Autoupgrade
Users of CM/CQB version 1.2 and above will be automatically notified about the new updates via the built-in update feature. Users of older versions need to manually uninstall the existing version of CM or CQB, then download and install the latest version from the links provided above.

CUBRID Server compatibility
CUBRID Server version 8.2.2+ are supported. When establishing a connection with CUBRID Server, users can indicate the version of a server. CM and CQB usually will automatically detect the Server version.

Requirements
CM and CQB tools require JRE 1.6 or later.

Feature Walkthrough with Screenshots
Query Tuning Mode
The Tuning Mode is designed to allow users to compare the statistics and execution plan of two queries side by side. Sometimes you want to tweak the query and see how it compares to its original version. You want to see the differences between the number of fetched results, the number of dirty pages created, whether or not I/O reads and writes occur, and finally the total cost of executing the queries. These are the features you can obtain via the Tuning Mode.

To enable the Tuning Mode, toggle on the Tuning Mode button as shown in the Figure 1 below. When the Tuning Mode is enabled, CM/CQB will notify you that the tuning mode has been started. However, nothing else special happens at this moment.

Figure 1: Enable Tuning Mode.

Once you execute a query in tuning mode, a new window will popup (Figure 2 below) which will display the statistics and execution plan cache for this query on the left panel.

Figure 2: Tuning Mode window (click to enlarge).

To run the second query and compare its results on the right panel, you need to check "Display on the right panel (R)" checkbox as shown below in Figure 3.

Figure 3: Change the panel to run the second query.

Once the right side is checked, run the modified version of the original query. The right side of the Tuning Mode window will auto udpate and display the statistics and execution plan (Figure 4).

Figure 4: Compare two queries in Tuning Mode.

This is one of the great features we are very excited about in this new release.

Multple SQL execution on multiple databases in parallel
CM and CQB allow to execute multiple queries at once not only on a single connected database, but also on multiple databases. Image you want to query a master and one or more of the slave nodes to check if the data on the selected nodes are identical, or you want to query the production server and compare it against the stage server, or during the migration you want to query the destination database and compare the results with those of the source database. Concurrent SQL execution on multiple databases is a really useful feature to accomplish all these tasks.

In this release we have improved how results are displayed when executing multiple queries on multiple databases. You can now compare the results of executed queries side by side in a convenient popup window (Figure 5).

Figure 5: Comparison of the results of multiple queries.

Search within the Results Set
Since this new release CM and CQB provide a search functionality in the Results Tab (Figure 6). This allows users to filter the fetched result set in real time while the matched column values get highlighted.

Figure 6: Search within Results Set.

Improved Import Wizard
In CM and CQB users can import data into a database from various sources (SQL/TXT/CSV/XLS). All import operations are logged and kept in the import history so that users can rerun them again if necessary (Figure 7).

Figure 7: Import Wizard.

In this new release we have added a new feature which will export a list of erroneous queries into an external SQL file (Figure 8). If an error occurs while importing the data, you can click on "Browse error log..." button which will navigate you to a file where all problematic queries are logged.

Figure 8: Browse error logs.

Alternatively, you can double click on the import row (highlighted in red upon failure), and a popup window will appear where you can see a list of failed queries along with an error message for each query (Figure 9).

Figure 9: View failed queries.

This is a list of prominent features we wanted to share with you today. There are many more improvements (36 improvements, 66 bug fixes) we have made in this release. Refer to CUBRID Manager Release Notes to see the full list.

If you have any comments or questions, there is a comment form below. We always look forward to receiving a feedback. [Less]