|
Posted
7 months
ago
by
Sean Hull
Read the original article at Thoughts on Upcoming MySQL 5.6 DefaultsDuring Oracle Open World 2012 and the parallel MySQL Connect conference, the new 5.6 version was announced. It’s only release candidate right now, but that means the GA
... [More]
release is just around the corner.With that James Day has posted changes to various of the new parameter defaults. Many of them you may not run into on day-to-day production systems. However there are a few which I do see a lot.Welcome changes to defaultsinnodb_file_per_table = 1 (formerly 0 or off)This is a parameter which really needs to be set on most systems. It tells InnoDB not to use just one large tablespace for all tables, but rather to create an individual tablespace for each InnoDB table. If you come from the Oracle world you might wonder why this is necessary, but due to the evolution of InnoDB itself, this having lots of individual tablespaces helps a lot with concurrency. And that’s something you want for scalability of those web applications.What made the default really troublesome before is that if you forgot to set it, all your existing tables would be created in the single large tablespace. That would leave you only one option. Set the parameter, and then REBUILD all those previously created tables. The syntax to do rebuild wasn’t difficult, but the load it would generate on your systems surely was.innodb_log_file_size = 48m (formerly 5m)Here’s another parameter that didn’t have a great default previously. The more transactional activity you have in your database, the quicker these files fill up. As one fills up the database must switch to the next. You want to keep this switching to a minimum just like switching redologs in Oracle. By resizing these files larger, you make your database faster.With the previous default of 5m, one had to jump through a number of hoops to remedy the default. First you would stop the database, then delete or move the files, then change the parameter in the my.cnf file and start MySQL again. If you started just by changing the value in my.cnf, you’d get various errors and the database wouldn’t start. So all and all a larger default avoids all these problems. Bravo!Possibly unwelcome changes to defaultssync_master_info=10000 (formerly 0) sync_relay_log=10000 (formerly 0) sync_relay_log_info=10000 (formerly 0)I first discovered these three parameters reading the most recent printing of High Performance MySQL by Baron Schwartz, Peter Zaitsev & Vadim Tkachenko. I highly recommend the book if you haven’t picked up a copy. It is an in-depth technical tomb of MySQL information, a real tour-de-force that every database administrator should own.I mention these parameters elsewhere in my article on bulletproofing MySQL replication with integrity checking.The master.info and relay-log.info files on MySQL slaves are not by themselves crash safe. So if your slave crashes, it can fail to update these files with the servers current position. So upon restart the slave can fail.I decided to put those parameters into use on a production slave server. I set each of them to a value of 1. This tells the server to sync after ever 1 event or 1 transaction respectively. It didn’t occur to me to do some performance testing with the parameters on. Mistake, big mistake.Although the change made the slave crash safe, it also made the slave much slower. It began to lag behind the master significantly until it was showing old data. Since our Drupal setup used served most of the site off of the read-only slaves, we began to see old pages.Although these new defaults set the value to 10,000 I would not recommend setting these parameters on by default. If you do use them on production boxes I would recommend doing performance tests to make sure they don’t cause unnecessary lag to your replication setup. Perhaps the new multi-threaded replication will help somewhat with this in 5.6.Read this far? Grab our newsletter scalable startups!Related posts:5 Ways to Boost MySQL Scalability5 Ways to fortify MySQL replication3 Biggest MySQL Migration SurprisesEasy MySQL replication with hotbackups5 Things You Overlooked with MySQL DumpsFor more articles like these go to iHeavy, Inc +1-212-533-6828 [Less]
|
||||||
|
Posted
7 months
ago
by
Anders Karlsson
MySQL tries to be smart and to cram as much performance out of available hardware, and one thing that most MySQLers knows is that opening a table (ie. opening the frm file, and depending on the storage engine, any additional files related to the
... [More]
table) is expensive from a performance point of view. When we see the opened_tables status variable increasing, we tend to increase the table_open_cache setting and keep as many tables as possible open and avoid reopening them.When it comes to MyISAM though, this has a nasty side-effect. When the server crashes and there are MyISAM tables open, these might well need to be REPAIRed. And if you have big MyISAM tables, this is an issue. Let's say that your application use several MyISAM tables, with the same content, and that you create new tables after a certain time, to keep the size of each individual table down? There are other reasons why you have this effect also, say a large MyISAM table that this rarely updated or read from, but suddenly it is, and then it is kept alone again? When a crash occurs, you might be stuck with several MyISAM tables that are open, but have not been accessed in a long time so might not need to be open, but still there is a risk that these will require a long and boring REPAIR.There is a Feature request for this; 67142, but I was thinking that instead of having low level server code do this, this would be a good thing to implement using EVENTs. So let's give it a shot.To being with, I need to figure out when a table was last touch. MySQL doesn't record when a table was last read from, but if we are OK with flushing tables that haven't been written to in a specific time, then the update_time column in the information_schema table does the trick. To figure out what tables to flush, I need to select table and database names from this table for all tables that haven't been modified within a certain time. Also, I must filter so I only get MyISAM tables, and make sure that I don't hit the MyISAM tables in the mysql database. An appropriate SELECT may look like this:SELECT table_schema, table_name FROM information_schema.tables WHERE engine = 'MyISAM' AND table_schema != 'mysql' AND UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(update_time) > <seconds>That's not enough of course, all this gets me are the table and database names, we need to do something more than this. But let's start from the beginning. An EVENT will run an SQL statement, and in my world, there is mostly one statement it should run, which is a CALL to a stored procedure. And in this case, this procedure should use the SQL above to figure out what tables to flush, and then do the actual flushing. As the FLUSH command, when using inside a MySQL Routine, will not take any parameters, we have to run this as a PREPARED statement. All in all, when we end up with is something like this:DELIMITER //DROP PROCEDURE IF EXISTS myisamflush;CREATE PROCEDURE myisamflush(p_sec INTEGER)BEGIN DECLARE v_no_data BOOL DEFAULT FALSE; DECLARE v_database VARCHAR(64); DECLARE v_table VARCHAR(64); DECLARE v_dummy INTEGER; DECLARE cur1 CURSOR FOR SELECT table_schema, table_name FROM information_schema.tables WHERE engine = 'MyISAM' AND table_schema != 'mysql' AND UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(update_time) > p_sec; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_no_data = TRUE; OPEN cur1;-- Loop for all MyISAM-tables that haven't been updated-- for p_sec seconds. FETCH cur1 INTO v_database, v_table; WHILE NOT v_no_data DO-- Create the FLUSH statement. SET @tmpstmt = CONCAT('FLUSH TABLES ', v_database, '.', v_table);-- Prepare and execute the statement. PREPARE stmt1 FROM @tmpstmt; EXECUTE stmt1; DEALLOCATE PREPARE stmt1;-- Get the next table / database. FETCH cur1 INTO v_database, v_table; END WHILE; CLOSE cur1;-- This is to overcome a bug that causes a warning,-- even though the warning was caught. SELECT COUNT(*) INTO v_dummy FROM mysql.user;END;//DELIMITER ;Note that noop SELECT at the end? In some versions of MySQL, when a handler is run, like in this case the CONTINUE HANDLER FOR NOT FOUND warnings will still be around here, which means the execution of the procedure will finish with a warning. To get rid of that, I issue an SQL statement that always runs.I'm not sure why this is happeniing, and for it might be that I have a mistake somewhere, but I do not think so.So far so good, now all we need is an event, in this case I will run every 15 minutes and flush tables that hasn't been used in the last 15 minutes, but you can set this to anything:DROP EVENT IF EXISTS myisamflush;CREATE EVENT myisamflush ON SCHEDULE EVERY 15 MINUTE DO CALL myisamflush(900);Was this enough? Maybe, but to get this working check if the event scheduler is running:mysql> SHOW VARIABLES LIKE 'event%';+-----------------+-------+| Variable_name | Value |+-----------------+-------+| event_scheduler | OFF |+-----------------+-------+In this case, it wasn't running so we should start it (and we should also modify the config file so that the event scheduler is running when the server is restarted, but that is a different story):mysql> set global event_scheduler=1;Before we finish up, note that everything above assumes that you are running with root privileges or similar.Cheers/Karlsson [Less]
|
||||||
|
Posted
7 months
ago
by
Tokuview Blog
Recently, our CTO, Martín Farach-Colton had a chance to talk about scaling MySQL and MariaDB with Roberto Zicari of ODBMS.
In the article, Martin states “While I believe that one size fits most, claims that RDBMS can no longer keep up with ... [More] modern workloads come in from all directions. When people talk about performance of databases on large systems, the root cause of their concerns is often the performance of the underlying B-tree index.” He also notes how “Fractal Tree Indexes put you on a higher-performing tradeoff curve. Query-optimal write-optimized indexing is all about making general-purpose databases faster. For some of our customers’ workloads, it’s as much as two orders of magnitude faster.” To read the full article, and to see how Tokutek is helping companies scale MySQL, see here. [Less] |
||||||
|
Posted
7 months
ago
by
Oracle MySQL Group
Headquartered in Finland, EmblaCom Oy
provides turnkey and cloud-hosted voice solutions to mobile operators around the globe. Since launching the original mobile private branch exchange (PBX) in 1998, the company has focused on helping ... [More] its partners provide efficient voice communications to their key business customers. The company’s voice solutions are used by millions of subscribers, worldwide. EmblaCom Oy needed to replace several database engines with a standardized, scalable, development-friendly database solution to maximize availability and cut costs. The company chose MySQL Cluster Carrier Grade Edition, which has maximized accessibility to EmblaCom’s services for its clients and their hundreds of thousands of subscribers. The initiative has also reduced, by half, the cost of the database solution installation for customers, as well as lowered maintenance and customer service costs. Read the entire case study here. [Less] |
||||||
|
Posted
7 months
ago
by
Shlomi Noach
If you use RANGE (or RANGE COLUMNS) partitioning, and in particular when partitioning by date/time, then your are subject to the following questions: how and when do you create the "next" partition? How and when do you drop your older
... [More]
partitions?
Many people use in-house scripting to do that, and Giuseppe Maxia wrote Partition Helper. But I would like to take you one step forward, and provide with a query (based on views) which automagically understands which new partition you want to create, and provides you with the statement to do so. It looks somewhat like this (a demo follows later on): mysql> SELECT * FROM sql_range_partitions \G *************************** 1. row *************************** table_schema: test table_name: city sql_drop_first_partition: alter table `test`.`city` drop partition `p3` sql_add_next_partition: alter table `test`.`city` add partition (partition `p_20160101000000` values less than (736329) /* 2016-01-01 00:00:00 */ ) *************************** 2. row *************************** table_schema: test table_name: quarterly_report_status sql_drop_first_partition: alter table `test`.`quarterly_report_status` drop partition `p3` sql_add_next_partition: alter table `test`.`quarterly_report_status` reorganize partition `p_maxvalue` into (partition `p_20110401000000` values less than (1301608800) /* 2011-04-01 00:00:00 */ , partition p_maxvalue values less than MAXVALUE) A closer look at why this is magic This query just gave you the DROP PARTITION and ADD PARTITION for all tables in your databases that use a RANGE partitioning scheme. But, consider: The query automatically deduces the LESS THAN value of the new partition. It looks for a constant interval, time-based or integer-based, and keeps this interval onward. It understands that 5.1 does not allow you to partition by DATETIME, only via integers. It understands your integer may sometimes stand for TO_DAYS(), and sometimes for UNIX_TIMESTAMP() of your datetime. It auto-detects that. The query recognizes a MAXVALUE partition, and if such partition exists, it provides with a REORGANIZE PARTITION statement rather than ADD PARTITION statement. It suggests names for your partitions which give you a clue on what the partition contains. p_20160101000000 (can you splot the date/time?) tells me a lot more than some arbitrary p17. It recognizes the common case of using a LESS THAN (0) as first partition, to take care of NULLs. It skips this partition: the query does not offer to drop it, not does it consider it while examining the interval. So I never have to tell the query "I want a 3 month interval between partitions, and these are implemented using TO_DAYS()". I let it understand it on its own. It is just a view Which means you can SELECT sql_drop_first_partition, or you can SELECT sql_add_next_partition, or you can only SELECT ... WHERE table_schema='your_schema'. Or you can select them all. You can eval() it This view will be released with common_schema's next version. common_schema has a lot of these views which generate SQL statements. And it provides with the means to evaluate them: the eval() routine. So you don't need to export the text INTO OUTFILE and execute it via SOURCE. You can simply: call common_schema.eval("SELECT sql_add_next_partition FROM sql_range_partitions WHERE table_name='quarterly_report_status'"); And it is done. Get it The sql_range_partitions view will be included in common_schema 1.2, schedules to be released soon. Meanwhile, you can import this file: sql_range_partitions_addon.sql onto your existing common_schema 1.1 install (what? You don't already have common_schema installed? You should know it's packed with lots of stuff like this one!) If, by the time you read this, common_schema 1.2 is already out, you don't need to add anything. In action Consider the following table on a MySQL 5.1 server: CREATE TABLE test.quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL ) PARTITION BY RANGE (UNIX_TIMESTAMP(report_updated)) ( PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP('2008-01-01 00:00:00')), PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2008-04-01 00:00:00')), PARTITION p2 VALUES LESS THAN (UNIX_TIMESTAMP('2008-07-01 00:00:00')), PARTITION p3 VALUES LESS THAN (UNIX_TIMESTAMP('2008-10-01 00:00:00')), PARTITION p4 VALUES LESS THAN (UNIX_TIMESTAMP('2009-01-01 00:00:00')), PARTITION p5 VALUES LESS THAN (UNIX_TIMESTAMP('2009-04-01 00:00:00')), PARTITION p6 VALUES LESS THAN (MAXVALUE) ); Unfortunately MySQL does not remember the definition expressions, so: mysql> SHOW CREATE TABLE test.quarterly_report_status \G Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(report_updated)) (PARTITION p0 VALUES LESS THAN (1199138400) ENGINE = MyISAM, PARTITION p1 VALUES LESS THAN (1206997200) ENGINE = MyISAM, PARTITION p2 VALUES LESS THAN (1214859600) ENGINE = MyISAM, PARTITION p3 VALUES LESS THAN (1222808400) ENGINE = MyISAM, PARTITION p4 VALUES LESS THAN (1230760800) ENGINE = MyISAM, PARTITION p5 VALUES LESS THAN (1238533200) ENGINE = MyISAM, PARTITION p6 VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */ Yikes! What does 1238533200 stand for? No worries, let's eval(): mysql> call common_schema.eval("SELECT sql_add_next_partition FROM sql_range_partitions WHERE table_name='quarterly_report_status'"); mysql> SHOW CREATE TABLE test.quarterly_report_status \G CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(report_updated)) (PARTITION p0 VALUES LESS THAN (1199138400) ENGINE = MyISAM, PARTITION p1 VALUES LESS THAN (1206997200) ENGINE = MyISAM, PARTITION p2 VALUES LESS THAN (1214859600) ENGINE = MyISAM, PARTITION p3 VALUES LESS THAN (1222808400) ENGINE = MyISAM, PARTITION p4 VALUES LESS THAN (1230760800) ENGINE = MyISAM, PARTITION p5 VALUES LESS THAN (1238533200) ENGINE = MyISAM, PARTITION p_20090701000000 VALUES LESS THAN (1246395600) ENGINE = MyISAM, PARTITION p_maxvalue VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */ Our query auto-detected the meaning of those numbers like 1238533200, and has found the next partition to be created: p_20090701000000. That's '2009-07-01 00:00:00', and now we know what the partition stands for. A new MAXVALUE partition called p_maxvalue is created. Just for the fun of it, let's issue the same a few more times and see what comes out: mysql> call common_schema.eval("SELECT sql_add_next_partition FROM sql_range_partitions WHERE table_name='quarterly_report_status'"); mysql> call common_schema.eval("SELECT sql_add_next_partition FROM sql_range_partitions WHERE table_name='quarterly_report_status'"); mysql> call common_schema.eval("SELECT sql_add_next_partition FROM sql_range_partitions WHERE table_name='quarterly_report_status'"); mysql> call common_schema.eval("SELECT sql_add_next_partition FROM sql_range_partitions WHERE table_name='quarterly_report_status'"); mysql> SHOW CREATE TABLE test.quarterly_report_status \G Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(report_updated)) (PARTITION p0 VALUES LESS THAN (1199138400) ENGINE = MyISAM, PARTITION p1 VALUES LESS THAN (1206997200) ENGINE = MyISAM, PARTITION p2 VALUES LESS THAN (1214859600) ENGINE = MyISAM, PARTITION p3 VALUES LESS THAN (1222808400) ENGINE = MyISAM, PARTITION p4 VALUES LESS THAN (1230760800) ENGINE = MyISAM, PARTITION p5 VALUES LESS THAN (1238533200) ENGINE = MyISAM, PARTITION p_20090701000000 VALUES LESS THAN (1246395600) ENGINE = MyISAM, PARTITION p_20091001000000 VALUES LESS THAN (1254348000) ENGINE = MyISAM, PARTITION p_20100101000000 VALUES LESS THAN (1262296800) ENGINE = MyISAM, PARTITION p_20100401000000 VALUES LESS THAN (1270069200) ENGINE = MyISAM, PARTITION p_20100701000000 VALUES LESS THAN (1277931600) ENGINE = MyISAM, PARTITION p_maxvalue VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */ Notice the number interval is not constant: we have different number of days in different quarters. We have Feb 29th every 4 years. Yet we get the right LESS THAN value. Now isn't this cool? [Less] |
||||||
|
Posted
7 months
ago
by
MySQL Performance Blog
MySQL 5.6.7 RC is there, so I decided to test how it performs in tpcc-mysql workload from both performance and stability standpoints.
I can’t say that my experience was totally flawless, I bumped into two bugs: MySQL 5.6.7 locks ... [More] itself on CREATE INDEX MySQL 5.6.7-rc crashed under tpcc-mysql workload But at the end, is not this why RC for? And Oracle asked for a feedback, so I do my part. Benchmark date: Oct-2012 Benchmark goal: Test how MySQL 5.6.7 performs Hardware specification Server: Dell PowerEdge R710 CPU: 2x Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz Memory: 192GB Storage: Very Fast PCIe Flash Card Filesystem: ext4 Software OS: CentOS 6.3 MySQL Version: 5.6.7-RC Benchmark specification Benchmark name: tpcc-mysql Scale factor: 2500W (~250GB of data) Benchmark length: 4000 sec but the result is taken only for last 2000 sec to remove warm-up phase. Measurements are taken every second. Parameters to vary: we vary innodb_buffer_pool_size:13, 25, 50, 75, 100, 125GB to have different memory/data ration. We vary innodb_buffer_pool_instances: 1 and 8, and innodb_log_file_size: 2x4GB and 2x8GB. Results The first result is 2x4GB innodb logfiles. We can see that innodb_buffer_pool_instances=8 makes a big difference on small buffer_pool sizes, while on bigger buffer_pool, innodb_buffer_pool_instances=1 is more preferable. The the results on big buffer_pool is quite unstable, and the reason is that InnoDB falls into asynchronous flushing mode, the problem which was supposed to be fixed in new InnoDB flushing mechanism. However Dimitry told me that we may need a bigger innodb logfiles to get more stable results. So there it is with 2x4GB vs 2x8GB innodb logfiles. Obviously the result is quite better with bigger logs, so size does matter. Conclusion innodb_buffer_pool_instances parameter may change the result significantly, especially in intensive IO workloads. In MySQL 5.6 it is finally possible to achieve stable throughput without dips, but an adaptive flushing still requires big log files. MySQL configuration: [mysqld] gdb innodb_file_per_table = true innodb_data_file_path = ibdata1:100M:autoextend innodb_flush_method = O_DIRECT innodb_log_buffer_size = 256M innodb_flush_log_at_trx_commit = 1 innodb_buffer_pool_size = 125G innodb_buffer_pool_instances=8 innodb_log_file_size = 4G innodb_log_files_in_group = 2 #####plugin options innodb_read_io_threads = 16 innodb_write_io_threads = 16 innodb_io_capacity = 20000 innodb_io_capacity_max = 40000 #not innodb options (fixed) port = 3306 back_log = 50 max_connections = 2000 max_prepared_stmt_count=500000 max_connect_errors = 10 table_open_cache = 2048 max_allowed_packet = 16M binlog_cache_size = 16M max_heap_table_size = 64M sort_buffer_size = 4M join_buffer_size = 4M thread_cache_size = 1000 query_cache_size = 0 query_cache_type = 0 ft_min_word_len = 4 thread_stack = 192K tmp_table_size = 64M server-id = 10 #*** MyISAM Specific options key_buffer_size = 8M read_buffer_size = 1M read_rnd_buffer_size = 4M bulk_insert_buffer_size = 8M myisam_sort_buffer_size = 8M myisam_max_sort_file_size = 10G myisam_repair_threads = 1 myisam_recover user=root skip-grant-tables [Less] |
||||||
|
Posted
7 months
ago
by
MySQL Performance Blog
James Day just posted the great summary of defaults changes in MySQL 5.6 compared to MySQL 5.5
In general there are a lot of good changes and many defaults are now computed instead of hardcoded. Though some of changes are rather puzzling for me. ... [More] Lets go over them: back_log = 50 + ( max_connections / 5 ) capped at 900 – The large backlog is needed when there is a high rate (spikes) of connections which is rather unrelated to number of connections. You can get 1000s of connections a second even from as little as 100 connections, hence max_connections would be small. I think this is the case where somewhat higher fixed default (even 900) would be better than trying to be overly smart. host_cache_size = 128 + 1 for each of the first 500 max_connections + 1 for every 20 max_connections over 500, capped at 2000 Same thing. What number of max connections has to do with number of hosts which can potentially connect to the server ? We also speak about just pennies in terms of memory usage so why do not just default at 2000 ? innodb_file_per_table =1 Great! We found it is better choice in 95%+ of installations innodb_log_file_size = 48M This is still too small for any real load but a lot better than 5MB. Good to see innodb log re-sizing is made automatic now. innodb_old_blocks_time = 1000 Great. This is exactly the value I’ve been recommending for a while. innodb_stats_on_metadata = off Yet another change I actively recommended max_connect_errors = 100 10 or 100 does not make much difference if you get some sporadic network errors both can give problems. I would recommend adding support for value 0 which would mean limit disabled. Very few users actually need DOS prevention this variable strives to provide as MySQL is typically protected by Firewall to begin with. open_files_limit = 5000 strangely enough this is where the smart selection of variable default seems to be removed, while it could be needed as it is easy enough to set max_connections to over 5000 or use more than 5000 tables. There is also little “savings” in keeping this value low – 64K or even more would be good default. query_cache_size = 1M Query cache remains off by default but default size is raised to 1M which is probably well too small to be useful but leaves impression it does not need to be configured. sort_buffer_size = 256K Good choice for very small sorts but more complicated queries will suffer causing more frequent file based sorts. This means it makes it even more important to set this variable to higher values for sessions executing complicated queries. I think this piece of code needs general re-factoring to be able to deal with simple and complicated queries efficiently automatically. join_buffer_size = 256k The default is now double of what it was previously. I wonder why if there are any specific benchmarks which point to this as an optimal value or is it just looking to reduce number of different buffer sizes used. sync_master_info = 10000 Quite useless change in my opinion. Syncing master info every 10000 events is no better than not syncing it at all as replication will be badly out of sync if slave is to crash. The recommendation to use master_info_repository=table is a good one though default could just be left alone. Same applies to sync_relay_log and sync_relay_log_info settings. James also promises updated sample my.cnf and getting rid of old my-huge.cnf which are great. Perhaps my jokes about my-huge my-large etc referring to amounts of memory we have in the phones these days, I made in my Optimizing MySQL Configuration Webinar were heard [Less] |
||||||
|
Posted
7 months
ago
by
Cedric PEINTRE
For the first time since the beginning of my long long career, I decided today to use an advice tool to check my MySQL configuration.
I decided to launch pt-variable-advisor and mysqltuner V2 on one of my MySQL server. Here are the result ... [More] outputs : Then I ask myself what to do with these results, just raw results without much explanations, these tools are supposed to be done for beginners… Just one example from mysqltuner : Too many temporary tables are being written to disk. Increase max_heap_table_size and tmp_table_size If I set the size of my temporary tables at 16Mb, what that’s means, I have to increase to 32Mb, 64Mb or more ? What is the impact of this change to the memory of my server ? How can I be sure that the new value that I set is not really too big ? And another one from pt-variable-advisor : innodb_flush_log_at_trx_commit-1: InnoDB is not configured in strictly ACID mode Really ? Find me a beginner who knows the ACID properties… If I know ACID properties, do I really need these advice ? Two other elements have excited my mind because there is a contradiction between the two tools : sort_buffer_size parameter : pt-variable-advisor : The sort_buffer_size variable should generally be left at its default unless an expert determines it is necessary to change it mysqltuner : consider increasing sort_buffer_size (sorting) query_cache_size parameter : pt-variable-advisor : Nothing about query cache mysqltuner : Increase query_cache_size — there are too many low memory prunes I don’t know what you think but if you read some news for the last few years, you know that there are several algorithms to sorting in MySQL and you know that the best query cache tuner is here (even if I’m not to agree with the latter) So what ? I am not trying to denigrate these tools, these advice are certainly justified in one way or another. This is obviously very very complex to achieve this kind of tool. I just wonder what a beginner can do with such informations ? And I also wonder if these tools are really useful (for beginners) ? What’s your thoughts about that ? Related Posts :Why a new memory engine may change everything ?How to quickly identify queries with pt-query-digest and pt-query-advisor from rules ?Vote for MySQL[plus] awards 2011 !Monitoring your monitoring tools (MONyog inside) !Fatal timeout ! [Less] |
||||||
|
Posted
7 months
ago
by
Sergey Petrunia
MySQL 5.6 is now RC, I suppose this means that all features that were intended to be in release are pushed, so it’s time to take a look and see what’s going to be in MySQL 5.6 GA.
I decided to look at subquery optimizations and ... [More] compare them to what we’ve got in MariaDB 5.3/5.5. In case you don’t know, subquery optimizations in MySQL 5.6 and MariaDB come from a common ancestor - MySQL 6.0 alpha, which was released in spring 2009 and then abandoned because there were too many unstable features pushed into it. Then, both MariaDB team and Oracle salvaged subquery code out of the 6.0 mess, and worked to get it in shape for release. MariaDB released its results in GA quality in April 2012 as MariaDB 5.3, which was quickly followed by MariaDB 5.5. Inside MariaDB, we’ve considered 6.0-alpha’s feature set to be incomplete, and released only after we’ve added these three features: Cost-based choice between Materialization and IN->EXISTS strategies Semi-join Materialization support for subqueries with grouping Subquery predicate cache We’ve even had a subquery optimizations map to make sure we cover all kinds of subqueries. Now, I’m looking through MySQL 5.6.7 and 5.6.x changelogs, and the only improvement over original set of MySQL 6.0’s optimizations seems to be this part in 5.6.4 changelog: The optimizer detects and optimizes away these useless query parts within IN/ALL/SOME/EXISTS subqueries: DISTINCT GROUP BY, if there is no HAVING clause and no aggregate functions ORDER BY, which has no effect because LIMIT is not supported in these subqueries This is a nice-to-have feature. The problem is that this feature does not cover the gaps in the set of MySQL 5.6’s subquery optimizations. To see what I’m talking about, let’s take the “Cost-based choice between Materialization and IN->EXISTS strategies” feature, and see why we absolutely had to have it in MariaDB 5.3. Let’s consider a query with an uncorrelated subquery: select col1, col1 in (select col2 from one_k_rows_tbl) from ten_rows_tbl In MySQL 5.1, the only way to execute it was to use IN->EXISTS conversion. EXPLAIN in MySQL 5.1 looks like this: +------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | ten_rows_tbl | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | DEPENDENT SUBQUERY | one_k_rows_tbl | ALL | NULL | NULL | NULL | NULL | 1344 | Using where | +------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+ “DEPENDENT SUBQUERY” means that the subselect is re-executed for each record of the parent select. The original subquery is not correlated, but IN->EXISTS transformation converts the subquery from col1 in (select col2 from one_k_rows_table) into exists (select 1 from one_k_rows_table where col2=ten_rows_table.col1 LIMIT 1) The conversion makes the subquery correlated, but in return the subquery gets the “col2=col1″ IN-equality in the WHERE, which can make a big difference. MySQL 6.0 has added Materialization strategy, and both MySQL 5.6 and MariaDB have got it from there. In MySQL 5.6, the EXPLAIN looks like this: MySQL [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from ten_rows_tbl; +----+-------------+----------------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+---------------+------+---------+------+------+-------+ | 1 | PRIMARY | ten_rows_tbl | ALL | NULL | NULL | NULL | NULL | 10 | NULL | | 2 | SUBQUERY | one_k_rows_tbl | ALL | NULL | NULL | NULL | NULL | 1000 | NULL | +----+-------------+----------------+------+---------------+------+---------+------+------+-------+ It looks like uncorrelated subquery, but actually it is Materialization. MariaDB’s EXPLAIN output is very similar: MariaDB [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from ten_rows_tbl; +------+--------------+----------------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+----------------+------+---------------+------+---------+------+------+-------+ | 1 | PRIMARY | ten_rows_tbl | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | MATERIALIZED | one_k_rows_tbl | ALL | NULL | NULL | NULL | NULL | 1344 | | +------+--------------+----------------+------+---------------+------+---------+------+------+-------+ except that we have decided to print MATERIALIZED in select_type column to avoid possible confusion with regular uncorrelated subqueries. Now, let’s see the gap. MySQL 6.0 had support for subquery Materialization, but it lacked ability to make a cost-based choice whether to use Materialization. It used it whenever possible. Let’s change our query so that the outer select expects to read only one row: MariaDB [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from one_row_tbl; +------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | one_row_tbl | ALL | NULL | NULL | NULL | NULL | 1 | | | 2 | DEPENDENT SUBQUERY | one_k_rows_tbl | ALL | NULL | NULL | NULL | NULL | 1344 | Using where | +------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+ As you can see, MariaDB has decided to use the good oldDEPENDENT SUBQUERY, that is, IN->EXISTS strategy. Why? It figured that the upper select expects to read only one row, which means the subquery is expected to be evaluated only once. With Materialization strategy, one needs to pay a big upfront cost (materialization) in order to have cheap subquery re-execution. This pays off when the subquery is executed many times. This doesn’t pay off when you need to execute the subquery once, or just a few times. MariaDB was able to recognize this and made a cost-based decision. Now, let’s run this query on MySQL 5.6: MySQL [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from one_row_tbl; +----+-------------+----------------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+---------------+------+---------+------+------+-------+ | 1 | PRIMARY | one_row_tbl | ALL | NULL | NULL | NULL | NULL | 1 | NULL | | 2 | SUBQUERY | one_k_rows_tbl | ALL | NULL | NULL | NULL | NULL | 1000 | NULL | +----+-------------+----------------+------+---------------+------+---------+------+------+-------+ It still uses Materialization. Neither changelogs, nor the source code have any mention of cost-based choice, so I assume they haven’t made any improvements over MySQL 6.0 code. As I’ve mentioned, Materialization is a startup-heavy strategy, so one can expect MySQL 5.6 to show regressions for queries where the subquery is executed only a few times. [Less] |
||||||
|
Posted
7 months
ago
by
Keith Larson
I wanted to post a big thanks to the Chinese MySQL Bloggers. Almost a month ago we published the Chinese version of Planet ( my previous blog post ). In the last month, we have collected 68 different blog posts about MySQL across 8 different MySQL Bloggers. Thanks for the support, this is an encouraging start.
|
||||||
Copyright
©
2013
Black Duck Software, Inc.
and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a
Creative Commons Attribution 3.0 Unported License
. Ohloh
®
and the Ohloh logo are trademarks of
Black Duck Software, Inc.
in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.