News

  Analyzed 9 days ago based on code collected 9 days ago.
 
Posted about 13 hours ago
I recently mentioned the possibility of tables being drawn from a single Redis object, so you would get one row per key/value pair in the named hash, or one row per element in the named scalar, set, list or zset. This has now been committed, for ... [More] use with Release 9.2 and later. There are examples in the regression test files. This is going to be particularly important when we get to writable tables, which is the next order of business.

For those of you who were in my talk yesterday, the breakage I mentioned has now been fixed. [Less]
Posted about 15 hours ago
One tiny little feature many users of PostgreSQL have often forgotten is the ability to create similar tables. It happens quite frequently that you want to create a table, which is just like some other one. To achieve that most people will do … CREATE TABLE x AS SELECT … LIMIT 0; This works nicely, [...]
Posted 1 day ago
A while back I posted some SQL which helps track of changes to the PostgreSQL settings file. I've found it useful when benchmarking tests with different settings, but unfortunately the pg_settings_log() function needs to be run manually after each ... [More] setting change. However that sounds like something which a custom background worker (new in 9.3) could handle - basically all the putative background worker would need to do is execute the pg_settings_log() function whenever the server starts (or restarts) or receives SIGHUP .

This turned out to be surprisingly easy to implement. Based off the example contrib module and Michael Paquier's excellent posts , this is the code . Basically all it does is check for the presence of the required database objects (a function and a table) on startup, executes  pg_settings_log() on startup, and adds a signal handler for SIGHUP which also calls pg_settings_log() .

more... [Less]
Posted 1 day ago
Here are the slides from my talk on PostgreSQL and Redis.
Posted 1 day ago
This post was originally posted on Medium, a new blogging platform made up mostly of people who aren’t necessarily subscribed to Planet. So, please forgive the obvious statements, as the target audience are people who don’t know very much ... [More] about Postgres.

Wednesday May 23, with no fanfare, Tom Lane’s move to Salesforce.com was made public on the Postgres developer wiki.

For 15 years, Tom has contributed code to Postgres, an advanced open source relational database that started development around the same time as MySQL but has lagged behind it in adoption amongst web developers. Tom’s move is part of a significant pattern of investment by large corporations in the future of Postgres.

For the past few years, Postgres development has accelerated. Built with developer addons in mind, things like PLV8 and an extensible replication system have held the interest of companies like NTT and captured the imagination of Heroku.

Tom has acted as a tireless sentry for this community. His role for many years, in addition to hacking on the most important core bits, was to defend quality and a “policy of least surprise” when implementing new features.

Development for this community is done primarily on a mailing list. Tom responds to so many contributor discussions that he’s been the top overall poster on those mailing lists since 2000, with over 85k messages.

Really, he’s a cultural touchstone for a community of developers that loves beautiful, correct code.

Someone asked: “What does [Tom’s move] mean for Postgres?”

You probably don’t remember this:

Salesforce.com bases its entire cloud on Oracle database,” Ellison said, “but its database platform offering is PostgreSQL. I find that interesting.

When I read that last October, I was filled with glee, quickly followed by terror. I love my small database community, my friends and my job. What if Oracle shifted it’s attention to our community and attacked it, directly? So far, that hasn’t happened.

Instead, Salesforce advertised they were hiring “5 new engineers…and 40 to 50 more people next year” for a “huge PostgreSQL project.”

Tom’s move probably won’t change much for the day-to-day operation of Postgres itself. Hopefully, things are about to get real at Salesforce.

I’m a major contributor to Postgres. I started in 2006, learning about relational databases through work at a small bike parts manufacturer and ERP. My contributions include code, starting conferences, encouraging user group leaders and introducing Postgres to communities that otherwise would never hear from us. I’m a data architect at Mozilla. [Less]
Posted 2 days ago
Over the past few years, I've been making notes on pgsql-performance postings, specifically those postings which relate to query performance issues.  Today, I gave a talk at PGCon on the data I've been able to gather.

If you attended ... [More] the talk, please leave feedback through the PGCon web site or feel free to leave a comment below with your thoughts.  If not, you can find the slides on my Presentations web page.  A few people asked me to post the raw data on which the talk was based, including links to the original threads.  I have created a Query Performance section on my Google Site and posted the information there.

The version posted on the web site incorporates a few minor corrections as compared to what I presented in the talk; and I have left out (for the sake of politeness) the cases I attributed to user error.  There were actually only 2 such cases, not 3 as I said in the talk, but either way it seems more polite not to post specific links.  Please contact me if you find other mistakes in what I have posted and I will correct them.

Many thanks to all those who said nice things about my talk! [Less]
Posted 2 days ago
I delivered my presentation "Nulls Make Things Easier?" today at
PGCon, so I have placed my slides online.
The presentation is based on a series of eleven blog posts about
NULLs I did a few months ago.
Posted 2 days ago
Running accurate database benchmark tests is hard.  I’ve managed to publish a good number of them without being embarrassed by errors in procedure or results, but today I have a retraction to make.  Last year I did a conference talk called ... [More] “Seeking PostgreSQL” that focused on worst case situations for storage.  And that, it turns out, had a giant error.  The results for the Intel 320 Series SSD were much lower in some cases than they should have been, because the drive’s NCQ feature wasn’t working properly.  When presenting this talk I had a few people push back that the results looked weird, and I was suspicious too.  I have a correction to publish now, and I think the way this slipped by me is itself interesting.  The full updated SeekingPostgres talk is also available, with all of the original graphs followed by an “Oops!” section showing the next data.

Native Command Queueing is an important optimization for seek heavy workloads.  When trying to optimize work for a mechanical disk drive, it’s very important to know where the drive is currently at when deciding where to go next.  If you have a read for that same area of the drive in the queue, you want to read that one now, get the I/O out of the way while you’re nearby, and then move to another physical area of the disk.

However, on a SSD, you might think that re-ordering commands isn’t that important.  If reads are always inexpensive, taking a constant and small period of time on a flash device, their order doesn’t matter, right?  Well, that’s wrong on a few counts.  The idea that reads always take the same amount of time on SSD is a popular misconception.  There’s a bit of uncertainty around what else is happening in the drive.  Flash cells are made of blocks larger than a single database read.  What happens if you are reading 8K of a cell that is being rewritten right now, because someone is updating another 8K section?  Coordinating that is likely to pause your read for a moment.  It doesn’t take much lag at SSD speeds to result in a noticable slowdown.  Partially due to contention concerns, and partially due to nature of I/O, keeping the command queue full is still very important to keeping the drive usefully busy all of the time.

On the 120GB Intel 320 Series drive I used for testing, the drive tops out at around 28MB/s of transfers if you’re not pipelining requests via NCQ.  It goes a whole lot faster than that once the queue is full:

You might think such a huge difference would be immediately obvious in all test results, right?  It’s not though, and that’s how the error slipped by me.  Normally all of my tests are done by two similar machines, and then I validate they match.  I did that for some of the Seeking Postgres results, such as the write heavy tests.  For comparison, here are results from database’s pgbench tool executing its standard, TPC-B-like write test:

The write rate test is barely impacted by whether NCQ is turned on or off, so it wasn’t obvious that one drive had the feature enabled while the other didn’t.  I was using this to validate my test server was operating similar to a second system with one of these drives.  But I picked the one test here where NCQ doesn’t really matter.

The general conclusion of the original presentation is that the Intel SSDs are much faster than regular disk, but still a good bit slower than the more expensive FusionIO flash.  That I knew to be true from real-world workloads, so I’d have been surprised if things didn’t turn out that way.  But it turns out that is true whether or not NCQ is working.  The Intel 320 line in these results is better with NCQ than without, but the relative ranking isn’t any different now.  It’s just the case that the Intel SSD is more competative in some tests than I gave it credit for.

The seeking read results show a much large gap with NCQ enabled:

You might notice a small drop in TPS on that brown line at low scales.  That’s a test error I can’t correct for at this point.  The original server I used for these tests was gone before I figured out what was wrong.  The replacement has the same type of CPU chip, but it’s clocked a bit slower.  (Was an Intel i7 870, now is an Intel i7 860)  That’s why the CPU limited results at low scales dropped.  On any of the I/O limited tests, that original CPU and the slower new one are almost identical, so I still think I’m being fair here.

Finally, I turned the random seek throughput into a business oriented question by asking how long it would take to refill all of RAM after something like a server reboot.  My original test placed the Intel drive as taking 5 minutes to read 16GB of random data with 32 clients reading.  This is exactly what NCQ helps with, and the correctly working drive only takes 1 minute to refill cache:

Thankfully, I don’t have to say I was completely wrong before.  The relative ranking of the various storage options is still the same.  The FusionIO drive I tested was and still is at the top of heap, especially if you need high write throughput.  But the worst case for reading on the Intel 320 series drives (and the very similar 710 series) is much closer to specifications than my tests showed.

With this old territory sorted out, next up I’m testing Intel’s latest enterprise drive, the DC S3700, which replaces the 710 drives in their lineup.  Initial test results look great so far; detailed ones are coming soon. [Less]
Posted 2 days ago
My Blackhole FDW talk seemed to go well. The line about no unintentional data loss got the best laugh. Here are the slides.

Besides being a bit of fun, this did have a serious purpose - creating a skeleton for building an FDW, including the ... [More] writable API. The code has the contents of the docs on writing an FDW as comments in the appropriate functions, to help a new FDW writer.

The code is on bitbucket. [Less]
Posted 2 days ago
Thanks to Shaun M. Thomas, I have been offered a numeric copy of the “Instant PostgreSQL Backup” book from Packt publishing, and was provided with the “Instant PostgreSQL Starter” book to review. Considering my current ... [More] work-situation, doing a lot of PostgreSQL advertising and basic teaching, I was interested in reviewing this one…

Like the Instant collection ditto says, it’s short and fast. I kind of disagree with the “focused” for this one, but it’s perfectly fine considering the aim of that book.

Years ago, when I was a kid, I discovered databases with a tiny MySQL-oriented book. It teaches you the basis : how to install, basic SQL queries, some rudimentary PHP integration. This book looks a bit like its PostgreSQL-based counterpart. It’s a quick travel through installation, basic manipulation, and the (controversy) “Top 9 features you need to know about”. And that’s exactly the kind of book we need.

So, what’s inside ? I’d say what you need to kick-start with PostgreSQL.

The installation part is straight forward : download, click, done. Now you can launch pgadmin, create an user, a database, and you’re done. Next time someone tells you PostgreSQL ain’t easy to install, show him that book.

The second part is a fast SQL discovery, covering a few PostgreSQL niceties. It’s damn simple : Create, Read, Update, Delete. You won’t learn about indexes, functions, advanced queries here. For someone discovering SQL, it’s what needs to be known to just start…

The last part, “Top 9 features you need to know about”, is a bit more hard to describe. PostgreSQL is a RDBMS with included batteries, choosing 9 features must have been a really hard time for the author, and I think nobody can be blamed for not choosing that or that feature you like : too much choice… The author spends some time on pg_crypto, the RETURNING clause with serial, hstore, XML, even recursive queries… This is, from my point of view, the troublesome part of the book : mentioning all these features means introducing complicated SQL queries. I would never teach someone how to do recursive queries before teaching him joins, it’s like going from elementary school to university in fourty pages. But the positive part is that an open-minded and curious reader will have a great teaser and nice tracks to follow to increase his knowledge of PostgreSQL. Mentioning hstore is really cool, that’s one of the PostgreSQL feature one have to know…

 

To sum up my point of view about this book : it’s a nice book for beginners, especially considering the current NoSQL movement and people forgetting about SQL and databases. It’s a bit sad we don’t have more books like this one about PostgreSQL. I really hope Packt publishing will try to have a complete collection, from introduction (this book) to really advanced needs (PostgreSQL High Performance comes to mind) through advanced SQL queries, administration tips and so on… They have a book about PostgreSQL Server Programming planned next month, I’m really looking forward to this one. [Less]
 

 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.