Ulf Wendel

PHP mysqlnd memory optimizations: from 49MB to 2MB

2014/04/10 - by admin - 0 comments

Inspired by Antony, Andrey has implemented a memory optimization for the PHP mysqlnd library. Depending on your usage pattern and the actual query, memory used for result sets is less and free’d earlier to be reused by the PHP engine. In other cases, the optimization will consume about the same or even more memory. The additional choice is currently available with mysqli only.

From the network line into your script

Many wheels start spinning when mysqli_query() is called. All the PHP MySQL APIs/extensions (mysqli, PDO_MySQL, mysql) use a client library that handles the networking details and provides a C API to the C extensions. Any recent PHP will default to use the mysqlnd library. The library speaks the MySQL Client Server protocol and handles the communication with the MySQL server. The actions behind a users mysqli_query() are sketched below.

The memory story begins in the C world when mysqlnd fetches query results from MySQL. It ends with passing those results to the PHP.

PHP script mysqli extension/API mysqlnd library MySQL
*.php *.c
  return result set  

The new memory optimization is for buffered result sets as you get them from mysqli_query() or a sequence of mysqli_real_query(), mysqli_store_result(). With a buffered result set, a client fetches all query results into a local buffer as soon as they become available from MySQL. In most cases, this is the desired behaviour. The network line with MySQL becomes ready for a new command quickly. And, the hard to scale servers is offloaded from the duty to keep all results in memory until a potentially slow client has fetched and released them.

The result buffering happens first at the C level inside the mysqlnd library. The buffer holds zvals. A zval is internal presentation structure for a plain PHP variable. Hence, think of the mysqlnd result buffer as a list of anonymous PHP variables.

PHP script mysqli extension/API mysqlnd library MySQL
*.php *.c
  Buffer with zvals (MYSQLND_PACKET_ROW),
think: PHP variables

The default: reference and copy-on-write

When results are to be fetched from the mysqlnd internal buffers to a PHP script, the default behaviour of mysqlnd is to reference the internal buffer from the PHP script. When code like $rows = mysqli_fetch_all($res) is executed, first $rows gets created. Then, mysqlnd makes $rows reference the mysqlnd internal result buffers. MySQL results are not copied initially. Result set data is kept only once in memory.

PHP script mysqli extension/API mysqlnd library MySQL
*.php *.c
  Buffer with zvals (MYSQLND_PACKET_ROW),
think: PHP variables
$rows = mysqli_fetch_all($res)  
  rows[n] &= result_buffer_zval[n]  

The reference approach works fine if you have code that follows the general pattern: query(), fetch() followed by implicit or explicit unset(), free().

A call to query() fills the internal buffer. Then, all rows are fetched into PHP variables. You may fetch all of them at once or read them row by row into the same array using a pattern such as while ($row = $res->fetch_assoc()) { ... }. As long as you do not modify $row and free $row explicitly (may happen implicitly by overwriting them in a loop), no data will be copied. Then, as the last step you call free() to dispose the buffered result set.

$res = mysqli_query(...);               /* create internal buffer */
$rows = $res->fetch_all(MYSQLI_ASSOC);  /* rows &= internal buffer */
unset($rows);                           /* remove references */
$res->free();                          /* free internal buffer */   

The memory saving by using references is gone, if mysqlnd is forced to perform a copy-on-write. If you free the result set prior to freeing $rows, then mysqlnd is forced to copy the data into $rows before it can free the internal buffer. Otherwise, $rows points to nowhere.

$res = mysqli_query(...)     /* buffered results */
$rows = $res->fetch_all(MYSQLI_ASSOC)  /* rows &= internal buffer */
$res->free();                /* free internal buffer: copy to rows */
unset($rows)                 /* free copied results */

The copy_on_write_saved and copy_on_write_performed statistics tell you what your code does.

When I say that memory is free’d I mean that it is given back to the PHP engine. The PHP engine may or may not release it to the system immediately. If you are just after monitoring how mysqlnd behaves, check the statistics, or go for huge result sets to ensure the garbage collection kicks in to make effects visible.

The price of copy-on-write: management overhead

Copy-on-write does not come for free: mysqlnd must track variables that reference the internal buffer. This adds some memory overhead (zval**). And, the internal tracking list must not point to nowhere . As long as the tracking list exists, the referenced user variable – here: the elements of $rows – must not be released. There is kind of a circular reference. The resulting overhead is two-folded: there is an additional tracking list and the references from elements of $rows might occupy memory until free() is called. But, you save holding query results twice in memory. So the question is what takes more memory, and when?

PHP script mysqli extension/API mysqlnd library MySQL
*.php *.c
  Buffer with zvals (MYSQLND_PACKET_ROW)  
$rows = mysqli_fetch_all($res)  
  assign rows[n] &= result_buffer_zval[n]  
  remember rows[n] belongs to result_buffer_zval[n]  
  increase rows[n] reference counter  

The answer depends on your code and the size of the result sets. If you fetch some few thousand rows at most on your one PHP server, it does not matter much how mysqlnd manages the result sets. The implementation detail impact is not worth optimizing: the impact is small and it takes time consuming fine tuning to find the optimum. But for Antony it mattered: likely, he handles PHP at scale…

New: classical copy as an alternative

At times it is more efficient to bite the bullet and to copy data from the internal buffer into user variables. It may sound counter intuitive but this may help to ‘save’ memory. ‘Save’ must not only be measured in absolute terms at a given point in time. This is what the figures below will show as it is easy to grasp. When done with reading ask yourself what it means to release memory early during the execution of one or many scripts, think of memory usage over time.

PHP script mysqli extension/API mysqlnd library MySQL
*.php *.c
  Buffer with zvals (MYSQLND_PACKET_ROW)  
$rows = mysqli_fetch_all($res)  
  assign/copy rows[n] = result_buffer_zval[n]  

Every optimization is a trade: the COPY don’t do

The new always copy policy can be enabled using mysqli_store_result(MYSQLI_STORE_RESULT_COPY_DATA). Let’s consider a case that seems to prove one cannot ‘save’ memory by duplicating data. The code runs a single SELECT that returns a result set of 200.000 rows. Anything big enough to make the basic principles discussed visible through memory_get_usage(true) works. Then, all rows are fetched into using fetch_all() and the result set is released.

$link = new mysqli("", "root", "", "test", 3307);
$link->real_query("SELECT * FROM test LIMIT 200000");
$res = $link->store_result(MYSQLI_STORE_RESULT_COPY_DATA);
$rows = $res->fetch_all(MYSQLI_ASSOC);

The diagram shows the memory usage reported by memory_get_usage(true) immediately after the function calls. The default behaviour is to make $rows reference (red, NO COPY) the internal buffer. The blue line shows the effect when $rows gets populated with copies. The initial memory usage of the reference approach is lower until free() is called. Now copy-on-write must be done and either approach uses the same amount of memory.

BTW, don’t get nervous about the huge figures. This is what happens if you create large PHP arrays…

fetch_all() + STORE_COPY: memory

Function runtimes measured with microtime(true) are as expected. The reference approach is a tad faster when $rows is to be populated but has some hidden costs at free() when the internal buffer reference lists are to be checked.

For this very script, with this very result set, version of MySQL and PHP and notebook there was no significant difference in total runtime. Only a minor tendency of MYSQLI_STORE_RESULT_COPY_DATA being ever so slighty slower became visible. This meets expectations due to the additional memory copies and allocations.

fetch_all() + STORE_COPY: runtime

Counter example done, time to show the potential win

Copying result sets is no significant win when you have many rows and use fetch_all(), when you have few rows at all (point SELECT, paging with LIMIT, few hundret rows overall). Or, if you have few large rows with, for example, BLOB columns. In those cases, there is no need to bother about the flavour of the mysqlnd result set handling.

The code to try the copy optimization with should:

  • fetch many, rather small rows
  • and, should not use fetch_all()
  • or… is sloppy with free() (bad style, use mysqlnd statistics to identify such scripts)

Replace the fetch_all() with while ($row = $res->fetch_assoc()) in the 200k rows result set example and you immediately see the potential of Antony’s proposal:

$link->real_query("SELECT * FROM test LIMIT 200000");
$res = $link->store_result(MYSQLI_STORE_RESULT_COPY_DATA);
while ($row = $res->fetch_all(MYSQLI_ASSOC)) {

Peak memory reported for this code is 1808 KB (~2MB) versus 50368 KB (~49 MB) for the referencing logic! But the runtime is about 10% slower. As with all those benchmarks: run your own! I am showing trivial microbenchmark observations only to highlight the principles behind. I did not even bother to align the negative values reported by memory_get_usage()

fetch_all() + STORE_COPY: proper usage, memory

The lower peak memory usage is because the memory used for $row can now be released as the result set is being iterated.

Pseudo code to illustrate copy logic during fetch loop
PHP call PHP internals memory objects
store_result() Create internal_row_buffer[]
  • internal_row_buffer: n rows
$row = fetch_assoc() create $row, copy internal_row_buffer[0] into $row
  • internal_row_buffer: n rows
  • + copy of internal_row_buffer[0]
  • + zval $row
$row = fetch_assoc() free $row, copy internal_row_buffer[1] into $row
  • internal_row_buffer: n rows
  • - copy of internal_row_buffer[0] (previous contents of $row)
  • + copy of internal_row_buffer[1]

Obviously, with the copy technique, there is no reference from the mysqlnd library to $row because mysqlnd does not have to bother about copy-on-write and the like. $row contains a copy of the actual data from MySQL and no longer points to the mysqlnd internal result buffer. The table below tries to illustrate the situation when data copies are avoided and references are used:

Pseudo code to illustrate reference logic during fetch loop
PHP call PHP internals memory objects
store_result() Create internal_row_buffer[], internal_reference_list[]
  • internal_row_buffer: n rows
  • internal_reference_list: 0 entries
$row = fetch_assoc() create $row, $row = &internal_row_buffer[0]; internal_reference_list[0] = $row;
  • internal_row_buffer: n rows
  • internal_reference_list: + 1 entry (first $row)
  • + zval $row
$row = fetch_assoc() $row = &internal_row_buffer[1]; internal_reference_list[1] = $row;
  • internal_row_buffer: n rows
  • internal_reference_list: 1 + 1 entries (first, second $row)
  • zval $row (first) + zval $row (second)
n-th $row = fetch_assoc()
  • internal_row_buffer: n rows
  • internal_reference_list: n entries (all $row)
  • n * zval $row (all)
unset($row); free_result($res) free(internal_row_buffer); free(internal_reference_list) empty

Don’t get too excited. The copy approach is not an ultimate solution. There are many factors to consider: actual code, actual statements and the size of their result sets, your servers demands. And, don’t even try to count bytes: some hundret or even thousand bytes here and there may not matter much. Write proper code (explicitly calling free()) and check your servers reaction.


The key takeaway is: try MYSQLI_STORE_RESULT_COPY_DATA if you have result sets with many rows. The copy approach will release memory faster to PHP for reuse and thus peak memory usage may be significantly lower. Again, be warned that I have choosen my examples carefully to show a hefty difference when doing point in time memory measurements. There may be a broad grey area for which you need to consider memory usage over time and possibly over many scripts to decide whether copy or reference fits your bill.

Ready for use?

Yes, the code is ready for use! The patch was running in production for a while and I failed to break it with tests too.

A new PHP configuration setting mysqlnd.fetch_data_copy has been introduced. Setting mysqlnd.fetch_data_copy=1 makes mysqlnd use and enforce the copy approach for all buffered result sets.

Not for PDO yet

The copy approach is not compatible with PDO_MySQL, though. PDO_MySQL relies on the zval reference logic and must be updated first.

Credits to A + A = A+ ?!

The manual has got a few words on it too but not substantially more than is above. The changes should bubble through build, staging and mirror servers within a few days.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

The performance penalty of the early MySQL Fabric support for PHP

2014/03/13 - by admin - 1 Comment

PECL/mysqlnd_ms 1.6 is currently being modified to support sharding and fully automatic server and client failover when using MySQL Fabric (slides) to manage a farm of MySQL servers. PECL/mysqlnd_ms is a mostly transparent load balancer that works with all PHP MySQL APIs (PDO_MySQL, mysqli, …). The idea is, that if, for example, a MySQL server fails, the plugin talks to MySQL Fabric to learn about alternative servers that Fabric has provisioned automatically. This “talks to” gives implies a performance penalty for applications. One worth looking at, to understand it. One worth looking at, to have a success story once the early implementation is gone and replaced with a proper one ;-) .

Behind the scenes…

How exactly a “Fabric aware” driver or application talks to Fabric is implementation dependent. Figures given for PHP must not be used to extrapolate behaviour of Connector/J or Connector/Python. Only remarks about Fabric itself apply to all.

Let’s assume you want to use MySQL Fabric and PHP for sharding. Fabric takes care of all the server side stuff: splitting, merging, monitoring shards and so forth. The PHP application ensures that queries end up on the appropriate shards by hinting the driver which shard to use. In the PHP case, the “driver” is the PECL/mysqlnd_ms plugin for mysqlnd. The plugin exports a function mysqlnd_ms_select_shard() for hinting it.

$mysqli = new mysqli("myapp", "user", "password", "database");
mysqlnd_ms_select_shard($mysqli, "mydb.mytable", "key");
$mysql->query("INSERT INTO mytable(col1) VALUES ('abc')");

This tiny snippet triggers a huge machinerie: from you application it goes to the plugin. Then, the plugin calls Fabric via XML RPC over HTTP and waits for a reply. Once Fabric has replied, the plugin makes the connection handle point to the shard.

Client   Fabric
mysqlnd_ms_select_shard(link, table, key)  
PECL/mysqlnd_ms *.c XML RPC over HTTP: sharding.lookup_server(table, key, LOCAL) ->  
  HTTP worker thread
  <- XML reply HTTP worker thread
PECL/mysqlnd_ms: make $link use shard
mysqli_query($link, …)

The hotspots

Switching a connection from one server to another takes some effort. The current implementation will simply replace an plugin internal list of servers. This is a very fast operation. No new MySQL connection is opened yet. By default, lazy connections are used and the connect to the shard is delayed until the application issues a query. Let’s consider this a cheap step, that can be marked green below.

Client   Fabric
mysqlnd_ms_select_shard(link, table, key)  
PECL/mysqlnd_ms *.c XML RPC over HTTP: sharding.lookup_server(table, key, LOCAL) ->  
  HTTP worker thread

Asking Fabric for a set of one master any number of additional slaves that make a shard is expensive. Upon every call to mysqlnd_ms_select_shard(), PECL/mysqlnd_ms opens a HTTP connection to Fabric to make an XML remote procedure call. Future version of the mysql plugin will do less calls, but that’s not the point. The expensive operation is the HTTP connection established using PHP streams. Say, you do one INSERT on a shard, then the INSERT carries significant overhead. Recall, I am discussing implementation details that must and will be tuned…

  • mysqlnd_ms_select_shard()
    • HTTP connect and round-trip to Fabric for XML RPC
    • switch user connection handle to shard(cheap)
  • mysqli_query()
    • connect MySQL shard
    • if needed and possible align connection state

The overhead can be measured using the plugins performance statistics. Set the PHP configuration directive mysqlnd_ms.collect_statistics=1 to enable the collection of statistics. Here’s the outline of a script that helps you detect what the plugin does and where it spents time in a MySQL Fabric sharding scenario. Please, see the manual for to setup the plugin to use Fabric.

$stats = mysqlnd_ms_get_stats();
$start = microtime(true);

$link = mst_mysqli_connect("myapp", $user, $passwd, $db, $port, $socket);
mysqlnd_ms_fabric_select_shard($link, "fabric_sharding.test", 1));
$now = mysqlnd_ms_get_stats();
foreach ($stats as $k => $v) {
 if ($now[$k] != $v) {
   printf("%s: %s -> %s\n", $k, $v, $now[$k]);
$stats = $now;

var_dump($link->query("UPDATE test SET id = 1 WHERE id = 1"));
$now = mysqlnd_ms_get_stats();
foreach ($stats as $k => $v) {
  if ($now[$k] != $v) {
    printf("%s: %s -> %s\n", $k, $v, $now[$k]);

printf("Runtime %.fs\n", microtime(true) - $start);

When run in a VM on my notebook it will print something like this. The UPDATE on a shard took in total some 0.038 seconds. Asking Fabric which shard to use took 31535 / 1000000 = 0.0315 seconds. 450 bytes have been transferred from Fabric to the plugin to learn about the shard.

fabric_sharding_lookup_servers_success: 0 -> 1
fabric_sharding_lookup_servers_time_total: 0 -> 31535
fabric_sharding_lookup_servers_bytes_total: 0 -> 450
use_master: 0 -> 1
use_master_guess: 0 -> 1
lazy_connections_master_success: 0 -> 1
Runtime 0.038586s

Any slowdown you like can be provoked: 83x slower!

On my VM, a loop that executes mysqlnd_ms_select_shard() followed by an UPDATE 2000x times runs 83x slower than UPDATE taken when connecting to the shard without Fabric (but with the plugin loaded). The Fabric code takes 23.4 seconds in total and wastes 22.9 seconds on XML RPC. A plain UPDATE on non Fabric connection to the shard takes 0.28 seconds only! The difference between 23.4 – 22.9 = 0.5 and 0.28 seconds is down to the 2000x connects done by the plugin as part of the server/shard switch.

The Fabric hotspot

It certainly does not take 22.9 seconds to send 2000 HTTP replies of 450 bytes. Let’s ask Fabric to show what it does by setting the debug logging level to DEBUG in the Fabric configuration file.

level = DEBUG

This is the the result. For every call to mysqlnd_ms_select_shard(), PHP performs one XML RPC and each XML RPC triggers more than 10 SQL queries within Fabric!

[DEBUG] 1394661746.143667 - XML-RPC-Server - Enqueuing request (<socket._socketobject object at 0x1e718a0>) from (('', 53951)) through thread (<SessionThread(XML-RPC-Session-4, started daemon 140595192383232)>).
[DEBUG] 1394661746.143837 - XML-RPC-Session-4 - Processing request (<socket._socketobject object at 0x1e718a0>) from (('', 53951)) through thread (<SessionThread(XML-RPC-Session-4, started daemon 140595192383232)>).
[DEBUG] 1394661746.144319 - XML-RPC-Session-4 - Started command (LookupShardServers).
[DEBUG] 1394661746.144816 - XML-RPC-Session-4 - Statement (SELECT sm.shard_mapping_id, table_name, column_name, type_name, global_group FROM shard_tables as sm, shard_maps as smd WHERE sm.shard_mapping_id = smd.shard_mapping_id AND table_name = %s), Params(('fabric_sharding.test',)).
[DEBUG] 1394661746.146071 - XML-RPC-Session-4 - Statement (SELECT sr.shard_mapping_id, sr.lower_bound, s.shard_id FROM shard_ranges AS sr, shards AS s WHERE %s >= CAST(lower_bound AS SIGNED) AND sr.shard_mapping_id = %s AND s.shard_id = sr.shard_id ORDER BY CAST(lower_bound AS SIGNED) DESC LIMIT 1), Params(('1', 1)).
[DEBUG] 1394661746.147233 - XML-RPC-Session-4 - Statement (SELECT shard_id, group_id, state FROM shards WHERE shard_id = %s), Params(('1',)).
[DEBUG] 1394661746.148331 - XML-RPC-Session-4 - Statement (SELECT group_id, description, master_uuid, status FROM groups WHERE group_id = %s), Params(('sharding1_shard1',)).
[DEBUG] 1394661746.149338 - XML-RPC-Session-4 - Statement (SELECT server_uuid, server_address, user, passwd, mode, status, weight FROM servers WHERE group_id = %s), Params(('sharding1_shard1',)).
[DEBUG] 1394661746.150462 - XML-RPC-Session-4 - Statement (SELECT @@GLOBAL.SERVER_UUID as SERVER_UUID), Params(()).
[DEBUG] 1394661746.151100 - XML-RPC-Session-4 - Statement (SELECT @@GLOBAL.SERVER_ID as SERVER_ID), Params(()).
[DEBUG] 1394661746.151648 - XML-RPC-Session-4 - Statement (SELECT @@GLOBAL.VERSION as VERSION), Params(()).
[DEBUG] 1394661746.152203 - XML-RPC-Session-4 - Statement (SELECT @@GLOBAL.GTID_MODE as GTID_MODE), Params(()).
[DEBUG] 1394661746.152757 - XML-RPC-Session-4 - Statement (SELECT @@GLOBAL.LOG_BIN as LOG_BIN), Params(()).
[DEBUG] 1394661746.153374 - XML-RPC-Session-4 - Statement (SELECT @@GLOBAL.READ_ONLY as READ_ONLY), Params(()).
[DEBUG] 1394661746.153820 - XML-RPC-Session-4 - Connected to server with uuid (80716d72-9302-11e3-817c-000c299b2a06), server_id (3307), version (5.6.16-log), gtid (True), binlog (True), read_only (False).
[DEBUG] 1394661746.153932 - XML-RPC-Session-4 - Disconnecting from server with uuid (80716d72-9302-11e3-817c-000c299b2a06), server_id (3307), version (5.6.16-log), gtid (True), binlog (True), read_only (False).
[DEBUG] 1394661746.154039 - XML-RPC-Session-4 - Finished command (LookupShardServers).
[DEBUG] 1394661746.154313 - XML-RPC-Session-4 - Finishing request (<socket._socketobject object at 0x1e718a0>) from (('', 53951)) through thread (<SessionThread(XML-RPC-Session-4, started daemon 140595192383232)>).

Better do not compare that with the tiny PHP snippet used to select a shard…

Client   Fabric
mysqlnd_ms_select_shard(link, table, key)  
PECL/mysqlnd_ms *.c XML RPC over HTTP: 22.9 seconds  
  > 10 SQL queries against Fabric backing store
Connect to shard: 0.2s
UPDATE: 0.3s

Ghosts from the past

Of course, we will fix that for PHP even before releasing an alpha! Of course, that will make a nice story with a catchy “80x faster” title!

It is not necessary for a client to contact every time a shard server needs to be identified. There are other XML remote procedure calls that can be used. The reason why PHP has gone for this way initially is simple: the other XML RPCs have not been ready when the code was first written. Future versions (and the other drivers) do ask Fabric only once for a complete listing of all shards. The RPC overhead will then vanish in the overall runtime.

Client   Fabric
mysqli_connect(“fabric”, …)  
PECL/mysqlnd_ms *.c XML RPC over HTTP: get all shards ->  
  HTTP worker thread
  <- XML reply HTTP worker thread
mysqlnd_ms_select_shard(link, table, key)  
PECL/mysqlnd_ms: make $link use shard
mysqli_query($link, …)
mysqlnd_ms_select_shard(link, table, key)  
PECL/mysqlnd_ms: make $link use shard
mysqli_query($link, …)

How big the performance impact of such a driver is, cannot be answered in general. It depends mainly on the question how often the once fetched shard information can be reused and how expensive the one RPC is in comparison to the normal queries issues. Tipps for benchmarking PHP have been given – please, run your own tests.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PS: We recently moved some Connector manuals out of the main MySQL reference manual to their own places. For PHP we mirror the MySQL bits from the PHP, as ever. The mirrored PHP MySQL manual can be downloaded as a PDF from dev.mysql.com. It already shows some PECL/mysqlnd_ms Fabric notes whereas the PHP mirrors have not yet caught up to the PHP documentation staging server – http://docs.php.net/manual/en/book.mysqlnd-ms.php.

The early MySQL Fabric sharding support for PHP

2014/03/08 - by admin - 0 comments

The MySQL Fabric framework brings two major features: automatic client- and server-side failover and sharding. The manual hints, you need a “Fabric aware driver” for this but it does not list one for PHP. First, you don’t necessarily need new drivers! Second, the "driver" for PHP is the current development version of the PECL/mysqlnd_ms replication and load balancing plugin. The plugin covers the sharding but not the failover/HA quickstart example of Fabric: how the plugin works and why you should not use the plugin – yet.

Partial replication: the unlimited mode

At some point divide-and-conquer is the only know strategy to scale database replication. A huge data set must be devided into smaller sets that are distributed over many servers. There are middleware, in-core and hybrid architectures for such a system. The Fabric approach could be described as a middleware approach: a middleware manages servers and data distribution, clients talk to the middleware to learn about servers. Such an architecture minimizes the dependencies on the database. Power users can easily adapt the middleware to their needs.

PECL/mysqlnd_ms now also handles the "learn about servers/shards" part in the background. It talks to Fabric through XML RPC over HTTP.

MySQL Fabric PHP application
<-> learn about servers/shards PECL/mysqlnd_ms
| |
  • Provision: masters and slaves
  • Monitor: load, health/failover
  • Balance: clone, merge, split shards
  • Connect and query
| |
MySQL servers

The client view: all the standard MySQL APIs

PECL/mysqlnd_ms is a client-side proxy which tries to hide as much of the complexity of using any kind of MySQL cluster (MySQL Replication, MySQL Cluster, 3rd party, MySQL Fabric sharding/HA) from the developer. This includes tasks such as load balancing, read-write splitting, failover and so forth. In the most basic case it is entirely transparent on the API level (mysqli, PDO_MySQL). Given the appropriate configuration, this is a load balanced connection, the SELECT goes to the slaves, the DROP to the master…

$link = new mysqli("myapp", "user", "password", "db");
$link->query("SELECT 1");
$link->query("SELECT 2");
$link->query("DROP TABLE IF EXISTS test");

What happens is that mysqli respectively PDO_MySQL extensions call functions in the mysqlnd library. PECL/mysqlnd_ms plugs in to the mysqlnd library to hooks these calls. If, for example, mysqli_connect() tries to open a connection to the host myapp, PECL/mysqlnd_ms captures the call and checks it config for an entry named myapp. Let the entry be for a MySQL Replication cluster. Then, later when mysqli_query() is executed, the plugin inspects the query and picks the a master or slave from the config to run the query on. Connecting the acutal servers is (mostly) transparent from an application user perspective as you can see from the code example.

userspace – *.php mysqli_connect(…)
inside PHP – ext/mysqlnd/*.c mysqlnd_connect(…)
inside PHP – ext/mysqlnd_ms/*.c connect_hook(…) 1) read server list from file: fopen(file://…)

The PHP manual has all the details, including the pitfalls and why you can overrule any automatic decision the plugin makes.

Fabric: similar, yet very different

The plugin works much the same with MySQL Fabric. The major difference is that instead of reading a list of servers from a local configuration file, the plugin now connects to a Fabric host to get the list. This makes no difference with regards to the application code itself. When exactly the plugin calls Fabric is work in progress.

userspace – *.php mysqli_connect(…)
inside PHP – ext/mysqlnd/*.c mysqlnd_connect(…)
inside PHP – ext/mysqlnd_ms/*.c connect_hook(…) 1) read config for Fabric host list: fopen(file://…)
2) read server list from Fabric: fopen(http://…)

Because Fabric monitors and manages the MySQL servers in a farm, it always reports a fresh snapshot of the servers available. In case a server fails and Fabric decides to replace it, the client (here: PECL/mysqlnd_ms) will learn with the next RPC. When using local configuration files, the client will not recognize new or failed servers in the cluster until the configuration file has been deployed. The XML RPC over HTTP call from the client to Fabric to fetch the server adds overhead but it ensures fresh information. Choose between runtime overhead and deployment complexity as you like…

Measuring and taming the XML RPC overhead

To help with the choice, we’ve begun to add new performance statistics, which are subject to change.

Related to Fabric command sharding.lookup_servers
fabric_sharding_lookup_servers_success Number of successful RPC calls. A call is considered succesful if any Fabric host could be reached, a message has been sent and a reply has been received.
fabric_sharding_lookup_servers_failure Number of failed RPC calls.
fabric_sharding_lookup_servers_time_total Total time spent (ms).
fabric_sharding_lookup_servers_bytes_total Total bytes received.
fabric_sharding_lookup_servers_xml_failure How often the plugin failed to parse the reply from Fabric. Currently, the plugin
cannot distinguish between an valid error reply and an erroneous one.

To avoid long wait periods for Fabric, there is also a new timeout setting for the plugin configuration. If case of a timeout, you end up with no server list. Then you cannot do much but the most important: tell the impatient user about the problem before he begins hitting the Browsers reload button like crazy.


Setting a timeout has the same effect as setting a timeout for a PHP stream in userland. Judging from a quick glance of the streams code, my uneducated guess is that it sets a timeout for connect and read but not for write. Whatever it does is beyond the control of PECL/mysqlnd_ms.

Use case supported today: sharding

Out of the two major use cases for Fabric, the PHP plugin so far covers the sharding one only (in parts). Fabric has opted for a design where the clients have to give keys ex-bound. The client has to ask Fabric for a list of servers responsible for a given table and key. Then, Fabric replies and the client picks the appropriate server to run the query on. The MySQL manual and various blog postings over the details of the sharding logic. Johannes gives it a quick walk through and shows the following PHP code (comments removed):

$c = new mysqli("test", "root", "", "test");

mysqlnd_ms_fabric_select_global($c, "test.fabrictest");
var_dump($c->query("CREATE TABLE fabrictest (id INT NOT NULL)"));

mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10);
var_dump($c->query("INSERT INTO fabrictest VALUES (10)"));

mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10010);
$r = $c->query("SELECT * FROM fabrictest WHERE id = 10");

The two functions mysqlnd_ms_fabric_select_global() and mysqlnd_ms_fabric_select_shard() encapsulate the procedure of asking Fabric for the shards to use for the table test.fabrictest. They also tell Fabric what you want to do: perform DDL operations (here: CREATE TABLE) or manipulate data asociated with a shard key.

This short snippet calls Fabric three times. It is not hard to predict that this is a potential bottleneck – use the statistics to measure it. Caching is planned for the future.

Open ends, pitfalls, plugin logic breaks

The Fabric code in the plugin is almost untested. It was developed against a pre-pre-release of Fabric. Statistics are bells and whistles compared to – long ago – frequently changing XML formats. Meanwhile Fabric is stabilizing and we can look into a better integration with the rest of the 25k lines of the plugins C code.

Here are two examples of the details not taken care of. Fabric itself does not (yet) bother much about transactions but PECL/mysqlnd_ms has a transaction stickiness feature that will prevent switching servers in the middle of a transaction. The plugin tries to detect transaction boundaries and will not load balance between servers before the current transaction has ended. What should the plugin do in this case?

mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10);
var_dump($c->query("INSERT INTO fabrictest VALUES (10)"));

mysqlnd_ms_fabric_select_shard($c, "test.fabrictest", 10010);
$r = $c->query("SELECT * FROM fabrictest WHERE id = 10");

It is clearly a user error to start a transaction and then switch between shards. What should the plugin do if transaction stickiness is set? Shouldn’t we at least warn the user, if we can – I’ve pushed an first attempt for a warning mode today.


The Quality-of-Service filter likely needs a couple of tweaks too.

There are just too many cases not tested yet. For example, I would not be surprised if the following code failed without a useful message. In the current implementation mysqli_connect() will not do an actual connect to Fabric, the server lists used by the plugin will be empty and it will bark…

$c = new mysqli("fabric", "root", "", "test");
$c->query("SELECT 1");
mysqlnd_ms_fabric_select_global($c, "test.fabrictest");

Certainly, this is a minor issue. Be warned that there could be more and be warned about the RPC overhead.

The plugin has 26752 lines of C code and 47481 lines of .phpt test code. I guess we have to push it to 50kloc until we can release an PECL/mysqlnd_ms 1.6 alpha. Not sure if we ever got a bug report between alpha and GA, in any case, I would not want to change that…

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PS: I’ve done a mysqlfabric help today using the latest (internal) development version. There was a proper help message no error telling me my cry for help was not understood! Everything around Fabric is a fast moving target.

Using MySQL Fabric from any programming language

2014/03/04 - by admin - 1 Comment

MySQL Fabric is a framework for MySQL Replication high availability, automatic failover and sharding. Technically, a MySQL Fabric daemon monitors a set of MySQL servers and takes appropriate actions upon failure. Clients use Fabric aware drivers to learn about failed servers and shards to distribute queries accordingly. Simple to understand, simple to sell, simple to raise false expectations and simple to fail [, dear Sales]. With the usual blog posts telling only the story of the first three sentences, major parts of the story are covered in silence.

Development preview = announcement of a vision != ready

You first challenge will be to find the documentation for the MySQL Fabric development preview. From the documentation overview page it takes three clicks down to the server side documentation for Fabric:

  1. MySQL Workbench
  2. MySQL Utilities 1.4+ (not the older one!)
  3. MySQL Fabric

You better do not start your search in the MySQL Reference Manual under High Availability and Scalability. Fabric, a command line utility, is well hidden inside the documentation of a GUI tool. Eventually, you may find a download and attempt to install Fabric using an install prefix.

~/mysql-utilities-1.4.1/install/bin # ./mysqlfabric manage start
~/mysql-utilities-1.4.1/install/bin # Traceback (most recent call last):
  File "./mysqlfabric", line 23, in <module>
    from mysql.fabric.services import (
ImportError: No module named mysql.fabric.services
~/mysql-utilities-1.4.1/install/bin # export PYTHONPATH=/home/nixnutz/ftp/mysql-utilities-1.4.1/install/lib/python2.7/site-packages/

80% of the development community is out, true?

Pretty much all blogs and tutorials claim that Fabric aware drivers must be used. According to the manual, such drivers exist for Python and Java only. This covers an estimated 20% of the software development communities. Let’s cry for help:

~#47;mysql-utilities-1.4.1/install/bin # ./mysqlfabric help
Usage: %fabric <group> <cmd> [<option> ...] arg ...

mysqlfabric: error: Error (dispatch() takes exactly 3 arguments (1 given)).
Wrong number of parameters were provided for command (manage help).

~#47;mysql-utilities-1.4.1/install/bin # ./mysqlfabric help help help
Command (help, help) was not found.

Being among the 20% of priviledged Python or Java users you may not bother until you browse the etc/ directory for configuration files. Here you will find something seemingly related to PHP’s Doctrine

~/mysql-utilities-1.4.1/install/bin # ls -la ../etc/mysql/
insgesamt 32
drwxr-xr-x 2 nixnutz users  4096 11. Feb 16:34 .
drwxr-xr-x 3 nixnutz users  4096 11. Feb 16:33 ..
-rw-r----- 1 nixnutz users   119 11. Feb 16:34 default_plugins.cnf
-rw-r--r-- 1 nixnutz users   527 11. Feb 16:34 fabric.cfg
-rw-r--r-- 1 nixnutz users 13132 11. Feb 16:34 mysql-fabric-doctrine-1.4.0.zip

There is something for the PHP driver but it is not documented. Why there is something for a PHP application before the driver part is ‘done done’ may make some scratch their heads.

Putting expectations straight and low

Any project should state what is included and what is not. Fabrics’ vision should be clear and bright. Please, see the slides.

But, Fabric is available as a development preview release only. Core functionality exists. The examples from the manual work for me. But, no other scenarios do. For example, mysqlfabric group import_topology lacks documentation and mysqlfabric group promote blog_repl fails for me. Likely, the latter cannot parse the GTID histories from my replication setup:

[DEBUG] 1393921605.845434 - Executor-3 - Start executing function: get_num_gtid(('0d4adf79-9329-11e3-8278-000c299b2a06:1-2,\n1113b4b6-9326-11e3-8264-000c299b2a06:8-9:13:40-45:48-54:56-58:60-62:64-66:69-71:73-74:76-78:80-82:84-85:88-90:92-93:95-98:100-101:103-106:109:111:115-118:121:123:127-130:132-135:138-140:143-149:151-153:155-157:159-161:164-166:168-169:171-173:175-177:179-180:183-185:187-188:190-193:195-196:198-201:204:206:210-213:216:218:222-225:227-230:232-234:236-238:241-247:249-251:253-255:257-259:262-264:266-267:269-271:273-275:277-278:281-283:285-286:288-291:293-294:296-299:302:304:308-311:314:316:320-323:325-328:331-337:339-341:343-345:347-349:352-354:356-357:359-361:363-365:367-368:371-373:375-376:378-381:383-384:386-389:392:394:398-401:404:406:410-413:415-418:420-422:424-426:428-430:432-434:436-438:441-447:449-451:453-455:457-459:462-464:466-467:469-471:473-475:477-478:481-483:485-486:488-491:493-494:496-499:502:504:508-511:514:516:520-523:525-528:531-536:539:541:545-549:551-553:555-557:559-561:564-566:568-569:571-573:575-577:579-580:583-584:586-587:589-592:594-595:597-600:603:605:609-612:615:617:621-624:626-629:632-637:640:642:646-650:652-654:656-658:660-662:665-667:669-670:672-674:676-678:680-681:684-685:687-688:690-693:695-696:698-701:704:706:710-713:716:718:722-725:727-732:735-740:743:745:749-753:755-757:759-761:763-765:768-770:772-773:775-777:779-781:783-784:787-789:791-792:794-797:799-800:802-805:808:810:814-817:820:822:826-829:831-834:836-838:840', None), {}).
[DEBUG] 1393921605.845557 - Executor-3 - Error executing function: get_num_gtid.

Developers life without Fabric aware drivers

Fabric aware drivers are part of the vision, they contribute to the developer experience. As you are waiting for things to mature and Fabric aware drivers to enter ‘done done’ state, you can try to write a Fabric aware application without a special driver. This may be required on the long run as well: MySQL does not provide a driver for each and every language. Also, if you want to use an old version of a driver or gather some background information about how to debug a driver, knowing what “Fabric aware” means is a must.

A Fabric aware PHP [C/C++, Perl, Ruby, ...] application

The Fabric daemon and Fabric aware drivers/applications communicate with each others through XML RPC over HTTP, just like the mysqlfabric command line utility does. The RPC interface can be used, for example, to learn about running and failed servers and their roles. You can look up (logical) server groups, shard mappings, individual shards and so forth. Fabric aware drivers try to do these operations in the background in a way that is mostly transparent for the application. A Fabric aware application needs to the the XML RPC itself.

Fabric aware driver/application communicates with (XML RPC over HTTP) -> Fabric
mysqlfabric command line utility
Farms of MySQL Servers

The list of RPC commands is not documented, but you can ask the mysqlfabric utility for a list of commands:

# ./mysqlfabric list-commands
group activate                   Activate a group.
group import_topology            Try to figure out the replication topology and import it into the state store.
group deactivate                 Deactivate a group.
group create                     Create a group.
group remove                     Remove a server from a group.
group add                        Add a server into group.
group lookup_servers             Return information on existing server(s) in a group.
group check_group_availability   Check if any server within a group has failed and report health information.
group destroy                    Remove a group.
group demote                     Demote the current master if there is one.
group promote                    Promote a server into master.
group lookup_groups              Return information on existing group(s).
group description                Update group's description.
manage list-commands             List the possible commands.

Generally speaking each command listed maps to one XML RPC call. The XML RPC call parameters and – possibly – some help can be obtained using mysqlfabric help <command>.

# ./mysqlfabric help group lookup_groups 
group lookup_groups(group_id): Return information on existing group(s).

Often, this is enough input to craft the XML required for the corresponding XML RPC call:

#  cat lookup_groups.xml
<?xml version="1.0" encoding="iso-8859-1"?>

Give it a try and POST the XML to the Fabric server, which listens on port 8080 by default, or whatever port you configured during the installation:

# curl  -H 'Content-Type: text/xml' -d @./lookup_groups.xml
<?xml version='1.0'?>

The XML RPC was successful and Fabric has returned a list of all configured high availability groups (see also the Fabric quickstart). Compare the XML reply with the reply from the mysqlfabric command line utility. The information is identical but presented differently.

# ./mysqlfabric  group lookup_groups 
Command :
{ success     = True
  return      = [['blog_repl']]
  activities  = 

Teach your application to issue an XML RPC call, parse the result and you have a Fabric aware application…

Fabric aware application: HA group support

Assume you are using Fabric for nothing but monitoring a simplistic group of three MySQL Replication servers, the case sketched in the Fabric quickstart. There is one master and two slaves, no further constraints or rules exist.

monitors, handles failover/switchover, …
Fabric high availability group ‘blog_repl’
| |
Slave/Secondary Slave/Secondary

If an application wants to connect to such a logical Fabric high availability group it first needs to know the name of the group. Assume you have configured only one group and the groups name ‘blog_repl’ is known to the application. Then, all a most basic Fabric aware application would want to know from Fabric is:

  • Which servers belong to the high availability group ‘blog_reply’?
  • Which one is the master (writes), which ones are the slaves (reads)?

The Fabric command which answer these questions is:

 # ./mysqlfabric  group lookup_servers blog_repl
Command :
{ success     = True
  return      = [['28acd755-a37b-11e3-ace4-000c299b2a06', '', False, 'SECONDARY'], ['c3d24cc0-a37a-11e3-ace1-000c299b2a06', '', False, 'SECONDARY'], ['f159bc2e-a37a-11e3-ace2-000c299b2a06', '', True, 'PRIMARY']]
  activities  = 

Using the help output of the command, you can construct the XML request that a Fabric aware application has to send to get the questions answered:

# cat lookup_servers.xml
<?xml version="1.0" encoding="iso-8859-1"?>

The verbose answer from Fabric is below. Your application must parse the XML and extract the relevant information to learn about the servers and their roles in the high availability group ‘blog_repl’:

# curl  -H 'Content-Type: text/xml' -d @./lookup_servers.xml
<?xml version='1.0'?>

XML RPC from PHP and parsing the reply

Ignoring all the glory details, the code – here PHP – to fetch the list of servers in a group from Fabric is rather trivial. The example implements a class with three methods to fetch a list of all servers, a list of the read-only slaves and a list of the read-write master. From here on, the classic pattern of using one connection for read-only requests and one connection for writes could be used. Of course, this is not very comfy. This is where the various Fabric aware drivers try to deliver better options…

class fabric_ha {
  protected $server;
  protected $group_id;
  protected $xml = '<?xml version="1.0" 

  public function __construct($server, $group_id) {
   $this->server = sprintf("http://%s/",$server);
   $this->group_id = $group_id;

  public function getServerList($role = NULL) {
    $xml = sprintf($this->xml, $this->group_id);

    $opts = array('http' =>
        'method'  => 'POST',
        'header'  => 'Content-type: text/xml',
        'content' => $xml
    $context = stream_context_create($opts);

    $ret = array();
    $result = file_get_contents($this->server, false, $context);
    $xml = new SimpleXMLElement($result);
    $server_list = $xml->xpath('/methodResponse/params/param/value/array/data/value[3]/array/data/value');
    foreach ($server_list as $server_entry) {
      $server = $server_entry->xpath('array/data/value');
      if (!is_null($role) && ($server[3]->xpath('string')[0]->__toString() != $role)) {
      $ret[] = array(
                  'server_uuid' => $server[0]->xpath('string')[0]->__toString(),
                  'server'      => $server[1]->xpath('string')[0]->__toString(),
                  'master'      => $server[2]->xpath('boolean')[0]->__toString(),
    return $ret;

  public function getReadWriteServer() {
    return $this->getServerList('SECONDARY');

  public function getReadOnlyServer() {
    return $this->getServerList('PRIMARY');

$f = new fabric_ha("", 'blog_repl');

How do you get to know the XPath expressions, I’ve used? That is an excellent question to ask at the next conference when you meet the designers of Fabric. Whatever, there’s the script in action:

# php fabric.php 
array(2) {
  array(3) {
    string(36) "28acd755-a37b-11e3-ace4-000c299b2a06"
    string(14) ""
    string(1) "0"
  array(3) {
    string(36) "c3d24cc0-a37a-11e3-ace1-000c299b2a06"
    string(14) ""
    string(1) "0"
array(1) {
  array(3) {
    string(36) "f159bc2e-a37a-11e3-ace2-000c299b2a06"
    string(14) ""
    string(1) "1"

Failover: faster than I can type…

If now the master of your MySQL Replication setup fails, Fabric may detect this and promote a slave to become a master. I’ve shutdown my master on and Fabric has instantly reconfigured the replication setup as illustrated below. On my unloaded notebook the entire server-side failover takes less time than it takes me to write this sentence.

server side failover performed
Fabric high availability group ‘blog_repl’

As soon as the master becomes unavailable, an application will see transaction aborts and connection errors. A Fabric aware application should now breath for a second and ask Fabric for a fresh list of servers. Then, the Fabric aware application shall begin using the updated list of servers. I’ve omitted the error handling code and show the result of fetching a fresh server list only.

nixnutz@linux-dstv:~/src/php-src> php fabric.php 
array(1) {
  array(3) {
    string(36) "c3d24cc0-a37a-11e3-ace1-000c299b2a06"
    string(14) ""
    string(1) "0"
array(1) {
  array(3) {
    string(36) "28acd755-a37b-11e3-ace4-000c299b2a06"
    string(14) ""
    string(1) "1"

Note that I say your application shall take a breath: throttle Fabric requests. Fabric’s slow Python HTTP XML RPC – cannot handle tousands of concurrent requests from all your PHP webservers (development preview…). There’s a simple proof for this overload prediction that you will see if you put Fabric in DEBUG mode and analyze what it does (it runsn some 20 SQL statements per XML RPC).

As you can see, Fabric may be of use to you even if there is no Fabric aware driver for your preferred programming language be it PHP, C/C++, Perl, Ruby or whatever else…

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Novice user experience – ramblings

In discussions about Fabric, people sometimes express the hope using MySQL Replication would become easier. Easier for common, small installations with, say, one master and two slaves. Good news is, that Fabric can be used for such tasks, if using MySQL 5.6 and newer only. Fabric depends on MySQL Replication with Global Transaction IDs introduced with MySQL 5.6:

# ./mysqlfabric group add blog_repl root ''
Procedure :
{ uuid        = f417492c-1a11-460b-87a5-fd8df9d811d8,
  finished    = True,
  success     = False,
  return      = ServerError: Server (0d4adf79-9329-11e3-8278-000c299b2a06) does not have the binary log or gtid enabled.,
  activities  = 

This is no big deal, configure MySQL to use GTIDs! The MySQL Utilities will not do this for you, all of them implicitly require some MySQL configuration. But as a first time user who does not want to invest more than five minutes in his experiement, how will you know which of the 440 server configuration options (MySQL 5.7) so set?

The PDF version of the MySQL manual counts 3700+ pages. In order to profit from the comfort offered by the MySQL Replication Utilities you should preconfigure your MySQL server with the following settings:

  • Page 2054 – log_bin
  • Page 2055 – server_id, innodb_flush_log_at_trx_commit, sync_binlog
  • Page 2073 – gtid_mode, log-slave-updates, enforce-gtid-consistency

But do not miss the following, otherwise you end up in a hen-egg game because some Utilities, including Fabric, use SHOW SLAVE HOSTS on occasion.

  • Page 1606 – SHOW SLAVE HOSTS
  • Page 2101 – report_host
  • Page 2101 – report_port

Creating a HA setup with Fabric is peanuts and be done in one minute. But how long does it take to get there?

PHP Memcache access to MySQL 5.7, faster? Redis?

2013/12/12 - by admin - 4 Comments

PHP users can use two client protocols to query MySQL 5.6 and later. Not only standard SQL access but also faster key-value access to InnoDB tables is possible using the Memcache protocol. The MySQL benchmark team reports crazy figures. Of course, on hardware that makes the average PHP meetup visitor roll his eyes and say “yeah, Oracle, *yawn*…”. I’ve repeated my plain PHP benchmarks on an i3 desktop. And, I’ve added Redis to the game.

Short recap

Some ten years ago, colleaguages teached me that some 50% of the time processing a simple SELECT column_a, column_b FROM table WHERE pk = <x> is spent on parsing and processing SQL. The actual data access only contributes to 50% of the overall run time. Thus, MySQL had always offered some lower storage layer API access through the HANDLER command. Fast forward ten years, you can query MySQL either through SQL, as ever since, or through the Memcache protocol. The latter will bypass the SQL layers, get you directly to the storage and thus be faster. The presentation has the details.

Given how popular read only query like SELECT column_a, column_b FROM table WHERE pk = <key> can be, it may be worth scanning your code for them and replacing them with ~2x faster Memcache accesses. PECL/mysqlnd_memcached tries to do this in an automated fashion. It will match your queries against a pattern and try to replace a key-value style MySQL SQL access to a MySQL Memcache access. Nice in theory but the pattern matching may take that much time that is it not worth it (see also here). Please, run your own tests.

ext/mysqli, PDO_MYSQL, … PECL/memcached
SELECT column_a, column_b FROM table WHERE pk = <key> GET <key>
InnoDB SQL table

If automatic mapping from SQL to Memcache protocol may be too slow, manually replacing some calls is certainly not.

MySQL 5.6 Memcache vs. MySQL 5.7 Memcache vs. Memcache vs. SQL

To see whether MySQL 5.7 Memcache access is faster than MySQL 5.6 Memcache access, I’ve rewritten my PHP benchmark script. The basic procedures are exactly as before. I’m using PHP 5.7.0-dev with PECL/memcached 2.2.0b1 (libmemcached 1.0.16), mysqli, phpiredis. MySQL 5.7.3-m13, MySQL 5.6.15, Memcache 1.4.15, Redis 2.8.2 and PHP have been compiled from source using all defaults. The computer being tortured is a budget i3-2120T CPU @ 2.60GHz, 8GB RAM, Linux RAID-0, ext4, OpenSuse 12.1 desktop PC.

Here are the results for random key/value accesses with a key length of 100 bytes, values of 400 bytes lenght and 25.000 values in total. All reads have been repeated a couple of times to ensure that runtimes are not too short. PHP, Memcache and MySQL all run on one box. PHP uses heavyweight fork() to span workers for concurrent access to the data stores. The graph shows the average number of operations per second as observed by a single PHP worker. If running on the box 1x MySQL 5.7 w. Memcache and…

  • 1x PHP process, PHP script does 21556 ops
  • 2x PHP processes, each PHP does on average 21538 ops
  • 3x PHP processes, each PHP does on average 17816 ops

As expected the small CPU cannot handle more than 4-8 concurrent PHP processes sending queries as fast as they can: beyond that point each PHP processes will observer a significantly lower number of operations per second. Please, note the graph does not show the data stores’ view which would be total number of the answers to any client per second.

If you prefer a different graph: the benchmark script is at the end of the blog posting :-) .

MySQL 5.6 vs. MySQL 5.7 Memcache results

The results confirm prior findings:

  • MySQL 5.7 Memcache InnoDB access to SQL table reaches ~75% of the performance of a cache
  • MySQL 5.7 Memcache access is ~2.2x to ~2.5x faster than SQL
  • MySQL 5.7 seems ~ 20% faster than MySQL 5.6

It is cool to see a SQL system come close to the performance of a cache but the rest of the story is boring: new version, seems a bit faster on small boxes, could be way faster on Facebook-style boxes.

MySQL vs. Memcache vs. Redis

Comparing with Redis is hard but fun. The REmote DIctionary Server is more than a cache storing BLOBs. It handles complex data structures, such as lists or sorted maps. Redis offers neat publish-subscribe messages queues and much more. Still, it can be used as a cache. So can MySQL. Any quick benchmark is doomed to ignore this. So do I.

I am also ignoring the asynchronous API of Redis, which may bear significant tuning potential. Our MySQL Cluster folks love the asynchronous API for their benchmarks…

Redis is single threaded. One Redis instance cannot utilize more than one CPU thread. Redis likes fast CPUs but does not know what to do with a multi-core monster. This effect does not kick in on crazy 48-way boxes only but already my i3 desktop. MySQL uses all cores and CPU thread it gets, Redis is limited to a single CPU thread. Thus, for a fair comparison, I have to start multiple Redis instances. It is then the application developers task to pick the appropriate Redis instance.

Core 1
CPU Thread 1 – Redis CPU Thread 2
Core 2
CPU Thread 3 CPU Thread 4

An i3 has two cores and counting in hyper-threading 4 CPU cores. Therefore, I’ve compared MySQL and Memcache with 1, 2 and 3 Redis instances running. PHP workers are assigned in a round robin fashion to the Redis instances based on the workers process id. All instances are loaded with the same data set for the read-only test.

Core 1
CPU Thread 1 – Redis 1 CPU Thread 2 – Redis 2
Core 2
CPU Thread 3 – Redis 3 CPU Thread 4

Here’s the result: MySQL 5.7 InnoDB Memcache beats a single Redis 2.8.2 on read performance. If and only if, you use Redis inapproriately. If you bite the bullet and, for example, you partition your data across many Redis instances, then Redis is faster. However, at least on my desktop the difference is not in the order of magnitudes which is quite amazing for a SQL database.

MySQL 5.7 vs. Memcache vs. Redis

Closing ramblings

In the past decade MySQL has constantly managed to utilize latest generation of commodity hardware CPUs efficiently. Whether it was 4-core, 8-core or now 48-core (or more CPU’s), we’ve been there. In the past three years, since Handlersocket appeared, there is a hunt for higher and higher benchmark results in terms of queries per second. I understand that MySQL flagship customers demand such performance.

However, when speaking at a local PHP meetup a 48-core benchmark is worth little more than a good laugh. First, the machines are out of reach. The majority of the audience will use 8-core/32GB max. Some have less than three such machines in total. Second, as exciting it is to learn MySQL can do 1,000,000 queries/s, the PHP meetup visitor sometimes wants to hear about how his job as a developer becomes easier when using this or that data store.

Again, MySQL read performance is close to cache performance. Wow! Maybe writes could be even faster – please run your own tests. My benchmark script is below. It can do writes as well.

But, what about usability and features that make application development easier? Did we forget about that?

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Benchmark script used

Download Benchmark script (PHP, 21k)

Data massage: how databases have been scaled from one to one million nodes

2013/12/03 - by admin - 0 comments

Despite the CAP theorem, databases have been scaled from one to one million nodes. Globally distributed, highly available and fast. Databases supporting ACID transactions. How? A story and slides from a three hours workshop at the PHP Summit 2013 (Berlin)…

Please note: long, deep dive, quite some theory. And, neither funny NoSQL bashing nor a practical guide to MySQL scaling. Only 10 MySQL slides or so…

As we all know, traditional relational databases have seen massive pressure from NoSQL in recent time. They have been considered unflexible, slow and not been designed for the Cloud. Or, just too expensive, if massive scalability was needed. It was argued, the CAP theorem basically means pick “two out of three” of consistency, availability and partition tolerance. Not long ago Brewer himself has said a word or two on the abuse of CAP for fuzz.

The talk starts in the 60th and 70th with relational databases before it gets to explore CAP and early NoSQL systems. Two early systems at the oposite ends of the design space CAP allows are Amazon Dynamo and Bigtable. They became blueprints for so many distributed NoSQL systems: Riak, HBase, BigCouch, … Then came Google Spanner (ACID transactions), which is highly available, replicates synchronously between data centers and is fast – how?

Some distributed NoSQL systems innovate at the borderline between distributed systems theory and the theory of parallel and distributed databases. Unless you work with a system at one of the extremes of the CAP design space, such as Riak, this is never much visible to you. Time to look at their algorithms and poke around the question what their findings may mean to relational systems that support ACID transactions.

It turns out that it could be fruitful to learn from each other. Some NoSQL folks seem to dig into distributed SQL query execution and on-disk storage. Topics from the theory of (parallel) databases. RDBMS can NoSQL can intersect in topics such as co-location in the presence of partitioning to avoid distributed queries/transactions. RDBMS can try to move towards microkernels and loosely coupled modules – something no classic RDBMS developer would ever do as it may mean longer paths (= slower)… – and so forth.

Whatever. You can look at NoSQL vs. RDBMS from a “relation vs. JSON”, “joins are evil” perspective. Or, you dig deeper to see how they scale from one to one million node. Unfortunately, you may come across topics that may be new to you as a (PHP) application developer:

  • Overlay networks, distributed hash tables
  • Vector clocks, quorums, CRDT’s
  • FLP impossibility theorem, Paxos protocol family
  • Virtual Synchrony, Atomic Broadcast

If you don’t mind that, you may answer the question whether Spanner proves CAP wrong yourself. The talk will also help you to judge whether the CAP theorem matters at all in practice!

Happy hacking!

@Ulf_Wendel Follow me on Twitter

MySQL Fabric’s initial decision on data models and transactions put into perspective

2013/10/17 - by admin - 0 comments

When you design a distributed database you are faced with a variety of choices each having its well known pro’s and con’s. MySQL Fabric made an intial decision for one design. Let me try to put in context of NoSQL approaches to degree I understand the matter. Understanding the data model is key to understanding Fabric. There are limits: by design.


MySQL Fabric is an administration tool to build large “farms” of MySQL servers. In its most basic form, a farm is a collection of MySQL Replication clusters. In its most advanced form, a farm is a collection of MySQL Replication clusters with sharding on top.

There are plenty of presentations on slideshare and blog postings on PlanetMySQL to get you started. If you don’t know Fabric yet, please, consult them first. I’ll not present a practical view on Fabric but theoretical ramblings on the data model.

Recap: desire vs. reality

Scaling data stores beyond a single machine is a challenge. He hope is that the more servers work together in a distributed database, the higher the throughput. Distributed databases shall be always available, be scalable and offer distribution transparency.

From an application developers point of view distribution transparency is probably the most important one. Developers would love to use synchronous (eager) clusters. All servers shall show all the updates immediately. Developers would love to be able to send all queries to any of the servers around without bothering about read or write (update anywhere). Developers would love see a sytem where every server stores all data to answer any question (full replication).

Sorry, Fabric will disappoint. Why? … be warned, I take an even more theoretical route to argue than usual. And, I usually already get to hear I need to take a more practical view…

Replication has hard scalability limits. If all servers shall be able to answer all read questions, then all your data must be on every server of the cluster. If – sooner (eager/synchronous) or later (lazy/asynchronous) – all updates shall appear on all servers, then the updates must be applied to all servers. Classically, such clusters follow a ROWA approach: Read One, Write All. A ROWA system can run all reads locally one a single server without interaction with other systems. Whereas all write operations require coordination with all the other systems.

(Tilt?! Work through slides 25 and following. NoSQL systems sometimes offer tuning options but no ultimate solution. Not tilt?! Enjoy the precise differentiation between replica and node that Mats applied to the Fabric manual versus my writing.)

Simplified ROWA scalability approximation

Every server (replica or node, not making a differentiation here for simplicitly) in a ROWA cluster has to handle:

  • L = its own local work (reads and writes) – productive work
  • R = remote work (because all writes affect all servers) – unproductive work

The processing capacity of every server i in a ROWA cluster is Ci = Li + Wi. The server has to handle productive local work and unproductive remote work. First, simple lesson: a server i of a ROWA cluster will always be “slower” than a standalone server. The standalone server has no extra unproductive work to do.

We can put the server in a ROWA cluster in relation to a standalone server to get an estimation about the scaleout capabilities of the system: scaleout = ∑i=1nLi / C. Looks worse than it is. It only sums up the values for all servers in the cluster.

Next, we need an estimate for the remote work. That’s the write load that affects all the servers in the cluster. There are two ways of to apply the writes on remote servers:

  • symetric update processing = all local writes are replayed at the remote servers
  • asymmetric update processing = remote servers apply writesets (the actual changes) instead of the operations

Reminds you remotely of MySQL Replication formats: statement-based vs. row-based vs. mixed? Yes, that’s roughly the real life counterpart to the theortical model.

For symetric update processing, we can assume: Ri = w * (n – 1) * Li. Skipping a few steps and cheating with using a nice book, we end up with: scaleout = n / (1 + w * (n – 1)). For asynmetric processing we shall make a guess how efficient it is compared to symetric. If, for example, we assume applying writesets takes 1/4 or fully executing write operations, then wo = 0.25. One ends up with scaleout = n / (1 + w * wo * (n -1)).

(You did not believe I would understand this, did you? That’s Mats’ or MySQL Cluster folks skill level not mine! I need books to cheat.)

The hard earned, easy to grasp graphics

Armed with those formulas you can make a rough estimation of replication scaleout limits. Below is the plot for scaleout = n / (1 + w * (n – 1)). That’s ROWA, symetric update, write ratio from w=0.0 (0%) to w=1.0 (100%) and 1 to 16 servers. Read loads scale perfectly. With only 10% write load, you reach a scaleout factor of 4 with some six machines. It requires six machines to handle the load four standalone servers managed to handle. If you double the number of servers to 12, you are not even reaching the processing capability of six standalone servers.

Scaleout symetic update ROWA

I was too lazy to plot a graph for asymmetric update processing. However, it only shifts the picture a bit as you already can tell from the formula: scaleout = n / (1 + w * wo * (n -1)).

Before you ask: the above hints the theoretical scaleout potential of Galera/Percona Cluster – strong on distribution transparency, not so strong on scaleout.

Partial replication

Sorry, dear developers, nobody can give you the best possible distribution transparency together with the best possible scaleout. However, if users are willing to trade in a bit of the distribution transparency, scaleout capabilities become dramatically better!

The solution is to allow that not every server must have all the data. If a data item has copies at only r<=n servers (nodes), then the remote work does not comes from n-1 but only r-1 servers. The formula for partial replication and asymmetric update processing becomes: scaleout = n / (1 + w * wo * (r – 1)). Here’s the resulting graph for w = 0.2 (20% writes) and wo = 0.25 (asymmetric update four times more efficient than symmetric update). Stunning, ain’t it?

Scaleout symetic update ROWA

Welcome to MySQL Fabric (limit: thousands of nodes) and sharding! Or, MySQL Cluster at a smaller scale (limit: tens of nodes but millions of transactions per second). MySQL Cluster is also using sharding. Horizontal partitioning (sharding) is one approach for partial replication. There are two major reasons for using sharding:

  1. single server cannot handle the amount of data
  2. single server cannot handle the write load

The price you, as a user, have to pay is on distribution transparency:

  1. you may have to choose a specific server using some rules
  2. some questions cannot be answered by a single server, you have to collect answers from many

The first one is cheap with MySQL Fabric: provide a shard key, compute which server to use, send your queries there. The second one is expensive. Very expensive, depending on your needs. Again, depending on your needs, this (2) is where NoSQL systems may beat Fabric big times. Stay tuned, I am not talking MapReduce here! Data model, queries and scaleout capability go hand in hand.

Quick recap on MySQL Fabric

As said, if you are new to MySQL Fabric, please read some introduction first. For example, MySQL 5.7 Fabric: Introduction to High Availability and Sharding. Below is a quick recap on the data model and the replication model employed by Fabric. As you can see, it is optimized to scale writes on sharded tables. Reads are only fast if they cover no more than one partition of a sharded table plus any number of global tables.

MySQL Fabric data model
  Global Group  
  All unsharded tables g1…gn  
Shard Group for tpartition:1   Shard Group for tpartition:n
g1…gn   g1…gn
tpartition:1   tpartition:n

Replication based on primary copy for High Availability and read scale-out. Write scale-out is added by using horizontal partitioning. Clients query a distributed (future) middleware (Fabric) for catalog information. Clients pick servers depending on user/application provided shard keys and catalog information.

MySQL Fabric replication details
  Global Group  
  Primary Copy cluster  
Shard Group 1   Shard Group n
Primary Copy cluster   Primary Copy cluster
Copy of global group   Copy of global group
Primary of tpartition:1   Primary of tpartition:n

Fabrics weak spot

In the worst case a Fabric user asks a question that covers all partitions of a sharded table. No single MySQL server in the farm can answer the question, no single server stores all the data. Worse, there is no single server in the farm that can understand your intention behind question and warn you that it can’t give you a meaningful reply! There’s no way for a server to know whether you intended to query all shards or the partition on one shard.

SELECT shard_key, other_column FROM t1 ORDER BY other_column
|   |   |
Shard 1   Shard 2   Shard 3
t1(0..1000)   t1(1001..2000)   t1(2001..3000)

Database theory knows about the challenge and there are several solutions to problem. The below list is roughly ordered by distribution transparency. Highest distribution transparency (“user-friendliness”) first, lowest last:

From a users perspective it would be best if there was a server that could answer all questions because it would have all data from all partial replication servers in one logical data model. It would be then the servers task to translate queries that spawn multiple partial replication servers (here: shards) in appropriate access pattern. This is more or less what MySQL Cluster does behind the scenes (docs, presentation). When using MySQL Cluster, sharding is 100% transparent on the SQL level. The big advantages of transparency is the biggest drawback. Users and sales forget all to easy that a lot of networking is going on that will slow things down.

Full replication query server using remote database links
(no local data materialization)
Logical t0..3000
CREATE TABLE t1all(...) ENGINE=SHARDING PARTITION_CONNECT=user:password@Shard1 PARTITION_CONNECT=user:password@Shard2 PARTITION_CONNECT=user:password@Shard3

(Using UNION something similar might be possible with ENGINE=FEDERATED.)
| | |
Shard1 Shard2 Shard3
t1(0..1000) t1(1001..2000) t1(2001..3000)

If done properly, this is a very nice approach. The server takes care of distributed transactions (MySQL Cluster, Spider: pessimistic, 2PC), the server takes care of all SQL and tries to use “push down conditions” to minimize the amount of data sent around. There are no major negatives but, maybe, MySQL lacks it (= time to develop) and product differentiation with MySQL Cluster.

Please note, this is not the best approach if the majority of your queries is accessing one partition only. For example, Spider, adds latency to queries that access individual partitions only. But that’s not the question here. Anyway, if I was a power-user considering the use of MySQL Fabric, I’d evaluate this option very thoroughly. Maybe, I would even extend Fabric to setup and manage such full replication servers and teach my clients to use them.

Fictional hybrid replication approach on top of Fabric
Global Group Full Replication Server
Queries on unsharded tables Queries spawning multiple partitions
Primary One or more servers
Copy1 Copyn
Queries on global tables and one partition per shard
Shard1 Shardn
Copy1 Copyn

Another hybrid solution that works out-of-the box would be using MySQL 5.7 multi-source replication to merge shards on server. Although this is easy to setup it has one obvious killer-disadvantage: data is physically materialized on one server.

Full replication query server
(using local data materialization)
t1(0..1000) t1(1001..2000) t1(2001..3000)
Shard1 Shard2 Shard3
| | |
Multi-Source Replication
Full replication query server

As said above, sharding is applied for two reasons: either size of an entity or write scalability limits. If sharding is applied to reduce the volumne of data a server can handle, how could one build one server that handles all the data… If sharding is applied mostly because of write scalability reasons, this is a low hanging fruit to solve the distributed query problem. Ignoring all questions about the lazy (asynchronous) nature of MySQL Replication and stale data, this is a very sexy approach: no expensive distributed transactions! It even fits the marketing bill. Use slave for OLAP is no new story.

SELECT shard_key, other_column FROM t1 ORDER BY other_column
Middleware or client library handling cross-partition queries
(local data materialization, not SQL feature complete)
|   |   |
Shard 1   Shard 2   Shard 3
t1(0..1000)   t1(1001..2000)   t1(2001..3000)

Finally, one could try to use a middleware or client library for gathering data from multiple partitions. This seems to be the road that Fabric plans to use. This approach will scale by client/middleware but there is no code one could reuse, for example, to process the ORDER BY clause in the example. Furthermore, the problem of materializing will remain. Problems have to solved for each and every driver again (read: different programming languages, no way to use one library for all). Most likely, none of the drivers will ever become as intelligent as a servers’ optimizer. I have no clue what the strategic goal is here.

Distributed transaction control is likely to be pessimistic including distributed locks and potential for deadlocks (a 2PC/XA classic, see slide 30 and following for details).

Are NoSQL data models any better?

There are many reasons why NoSQL solutions became popular. Considering the aspect of cloud and web-scale only, their biggest achievement might be reminding us of, or even inventing, data models that scale virtually indefinitely. Of course, partial replication is used… divide and conquer rules.

The first wave of major NoSQL systems was designed around key-value data models and sharding. The leading systems are BigTable, Dynamo and PNUTS with many open source projects that followed their ideas: HBase, Cassandra, Voldemort, Riak, [Big]CouchDB, MongoDB and many more.

Key-value table variations
Name Value structure Notes
Blob data model BLOB Value is uninterpreted binary
Relational data model Fixed set of columns. MySQL/SQL-92: Flat (scalar) values only (related: Fabric)
Column family data model Multiple sets of columns Flat values, wide columns no problem (related: MariaDB Dynamic Columns)
Document data model No column set restrictions. Values can be nested (related: N1NF/NF2), wide and sparse columns no problem.

Key-value systems restrict users to atomic key based access. Early systems offered no guarantees when a query spawned multiple values. Take a second and compare with Fabric. How weak is Fabrics’ weak spot?

What makes the query limitations in key-value systems a little less restricting from an application developers point of view is the logical entity a value can hold. Particularily a nested value, as in the document model, can hold multiple logical rows from several flat relational tables. Translated in Fabric speech: document = shard + relevant global tables + x. All MySQL has to offer here are some, limited JSON functions. On an aside: this is only one of many ways to argue why MySQL should have strong JSON/BSON support.

Simplified applications view on logical data model (see text)
  Fabric Document
Client protocol Binary (MySQL Client Server), Memcache (to some degree) Binary (vendor specific), Memcache (sometimes), HTTP (sometimes)
(Read) Question SELECT * FROM tn, gn
WHERE t.shard_key = <key>
(Write) Query UPDATE tn SET ... WHERE t.shard_key = <key>,
(requires use of distributed transaction to be atomic)
update(<key>, <doc>)
(single atomic operation)
Logical entity tn
+ global table gn

+ global table gn
+ any other table un respectively column

If you are not willing to give up a millimeter on ACID and other RDBMS strengths, stop reading. This is simply not for you. Listing RDBMS strengths for comparison is out of scope for this (already too long) blog post.

Second generation NoSQL NewSQL: co-location

There is a good number of NoSQL solution users today. In 2012 some market researchers predicted upto 25% market loss for MySQL within five years. I doubt those reasearchers included the second generation of NoSQL stores from around 2010 in their results but rather based their prediction on the then popular open source siblings of the 2007 systems. Thus, the 25% is about the simplistic key value model.

The second generation of NoSQL stores continues to aim keeping accesses local to a single node. I am speaking of node here, as partial replication continues to be a must. Second generation systems include but are not limited to ElasTras, H-Store, Megastore, Spanner and – yes – Cloud SQL Server (Microsoft SQL Azure backend). In general, I believe there is some swing back to stricter schemas and declarative SQL-like query languages, however, let’s look at data models only. Let’s consider only data models that are statically defined but none that adapt dynamically to the actual access pattern. To me, a dynamical data model seems out of reach with regards to Fabric.

  • hierarchical: tree schema
  • hierarchical: entity groups
  • [keyed] table group

There are three kinds of (logical) tables in a tree schema: primary tables, secondary tables and global tables. The primary key of the primary table acts as a partitioning key. Secondary tables reference the primary tables using foreign keys. Global tables are additional read-mostly tables available on all nodes.

Hierarchical: tree schema data model
Primary table
|   |
Secondary table
s1(kp, ks1)
  Secondary table
s2(kp, ks2)
Global table

A partition on a node stores all matching records of the primary (sharded) table and the corresponding records in all secondary tables that reference the primary through a foreign key constraint. The records of the tables that are frequently joined are stored together on one node. Plus, additional global tables. It may be possible to use Fabric in a similar way, however, it would be very uncomfortable, possibly complex, manual operation. What’s hot about this approach is that need for distributed transactions for write queries is likely reduced. Its likely that updates spawn primary and secondary tables on one node only.

Hierarchical: entity group data model
Root table
|   |
Child table
c1(keg, kc1)
  Child table
c2(keg, kc2)

The basic idea with entity groups is similar. There are root tables/entities and child tables that reference the root tables by foreign keys. Records that reference each other belong together and shall form an entity group. There’s no counterpart to global tables in this model. Again, there is no way to formulate the “those tables belong together” with Fabric.

The table groups data model allows users to define sets of tables that shall be co-located on a node. In their simplest form those sets may consist of arbitrary tables. The tables of a table group may or may relate to each others through foreign keys.

Keyed table group data model
Row group
t1(kt1, partition_keyc)
t1(kt1, partition_keyc)

Partitioning is applied to keyed tables groups. All tables in a keyed table group have a column that acts as a partitioning key. The partitioning key does not have to be the primary key unlike as in the two models above. All rows in a keyed table group that have the same partition key form a row group. Partitions contain sets of row groups.

The keyed table group model allows a bit more flexibility as neither foreign keys not primary keys have to be taken into account when grouping tables and creating partitions.

Pfft rambling… what’s the moral?

Having written all this, what’s the moral? Well, if you didn’t know before you should know now why partital replication is the key to massive scaleout. Adding partial replication to MySQL means opening Pandora’s box. Accesses spawning multiple nodes become extremly expensive. Fabric has no answer yet how this is to be adressed. One proposal is to move parts of the task towards the drivers. I’m working for the Connectors team and my take is “ouch, really!?”. There are alternatives: adding full replication servers to the farm is one.

As ever, when you hit a problem its good to look around and ask how competition solves it. NoSQL and NewSQL systems know about the issues of distributed transactions. To some degree they work around them and try to reduce the cases when distributed transactions are required. If anyhow possible, queries shall run on individual nodes only. The document model is one prime example. MySQL could bend itself towards the model with relatively low investments.

Second generation NoSQL stores suggest hierarchical models. Again, Fabric could learn a lesson or two – long term. Again, its about avoiding accesses that spawn multiple nodes.

Speaking of distributed transactions: this would make a great topic for an endless follow-up posting titled “Fabric transactions and data model…”.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

In all fairness…

When I say anything that sounds like “MySQL designed”, I refer to the guys that made MySQL Fabric happen. But, there are countless customers that introduced sharding to the MySQL world long before. It is hard to find adequate wording – the end result is what counts! Replicated databases have been an active area of research since the 1980′s. What’s rather new are the data models that NoSQL systems use to attack the challenges of distributed transactions. What’s also rather new are the stunning WAN/VPN and routing options really big players like Google have.

MySQL 5.7: SQL functions for JSON

2013/10/09 - by admin - 4 Comments

For decades MySQL has considered BLOB a sink for data junk that didn’t fit the strictly typed SQL bill. As the outside world began voting for JavaScript, JSON and key-value/key-document stores, TEXT/BLOB was rediscovered as the only way to store JSON inside MySQL. But having no SQL support for the JSON serialization format, JSON remained junk for MySQL. Community developed SQL functions could not cure the issue. That said, MySQL 5.7 introduces SQL functions to work on JSON documents! MySQL slowly takes a different position on JSON.

From the labyrinth: labs.mysql.com

In the recent past, MySQL has developed a talent hiding feature previews at http://labs.mysql.com/. This is where you find MySQL Fabric, and this is where you find the JSON UDFs (user defined functions, pluggable SQL functions). Please keep in mind, all labyrinth treasures are of pre-production quality.

Visit Labs, click to download and install the JSON functions. If there is no binary for you, try a source build. The README guides you through the installation. Basically, it means copying the library that contains the pluggable SQL functions into MySQL’s plugin directory and loading the functions into MySQL.

CREATE FUNCTION json_valid RETURNS integer SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_search RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_extract RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_replace RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_append RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_remove RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_set RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_merge RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_contains_key RETURNS integer SONAME 'libmy_json_udf.so';

(Note: With my source build the resulting .so file got a name – libmy_json_udf.so – different from the one – libmy_json.so – assumed in the README.)

Putting expectations: writing about pre-production materials…

Let’s put expectations straight first. Whoever claims all downloads from labs.mysql.com are of highest, near production quality has probably never worked himself with the materials. Labs is for early feature previews. Quality can be accordingly. I think, this is fair. Any developer pushing work out as early as possible, given the constraints of a big US-based corporate company, deserves a “hurray!”. I can only ask you for providing user feedback as early as possible. Even if a quick test for one function shows the below result. Someone has been brave enough to push out code before the maximum amount of internal QA has been applied.

JSON JSON_VALID() JSLint Validator Comment
"a" Invalid Invalid Invalid. Must be object or array.
[1] Invalid Valid Valid. Array can be top level element.
["a"] Invalid Valid Valid. Follow-up error.
{"a":true} Valid Valid Object with pair.
{"a":false,} Valid Invalid Pair may only be followed by comma, if another pair follows.
{"a":null} Valid Valid Object with pair.
{"a":nuLL} Valid Invalid There is exactly one way to spell null: ‘null’.
{"a":True} Valid Invalid Follow-up error.
{"a":FaLSE} Valid Invalid Follow-up error.
{"a":12345678901234567890} Valid Valid Any number of digits allowed.
{"a":-0.12} Valid Valid Negative number.
{"a":-0.1234E+39} Valid Valid US national finance…
{"a":1.23e0001} Valid Valid
[1,{"a":"bc"}] Invalid Valid Follow-up error.
{"1":"d","2":{"a":"bc"}} Valid Valid Follow-up error.
{"a":"\z"} Valid Invalid Requires special handling.
{"a":"\u01"} Valid Invalid Must be four hexadecimal digits
{"a":[1,]} Invalid Invalid Unlike with objects, the wrong syntax is detected.

If you glance over the results, you may come to the conclusion:

  • This is pre-production with glitches
  • This is pre-production with smelly array handling

Putting expectations: TEXT/BLOB handling

As hinted in the introduction, MySQL’s support for TEXT/BLOB columns is not the best possible, but those are exactly the SQL column types one would use for storing JSON inside MySQL. Whoever says MySQL in 2013 likely speaks InnoDB when it comes to storage. InnoDB, for example, stores only a 768 bytes prefix of a variable-length column in the index. The rest goes off-page causing extra I/O. Please note, I/O could be pretty cheap once the working set has been loaded into memory.

If your JSON document requires more than 768 bytes, storage is not optimal. 768 bytes do not mean 768 characters. Possibly, you will be using CHARSET=utf8, means a character takes 1 to 4 bytes to store. However, you may also go for latin1 and have all unicode encoded upfront as \xxxx. In the worst case, 768 bytes means less than 200 characters before off-page storage happens.

Indexing is limited to the prefix. There is no function based index in MySQL to speed up searches on individual JSON document keys.

The PBXT storage engine from MySQL 5.1 times had pretty fancy BLOB streaming capabilities. If you wanted to do something like MapReduce in MySQL, you would probably love to see streaming supported to avoid materialization of intermediate results and to keep processing buffers small. Another use case for streaming, which is unrelated to JSON, are media files (video, audio).

All this does not mean MySQL becomes useless! Performance could still be pretty solid… – give it a try. I rarely show any figures because I have only one respectively two core notebooks available for testing.

Introduction to the MySQL JSON functions

The JSON UDF download contains SQL functions to search JSON documents and functions to modify JSON document. If a function wants you to define on which subset of the entire document it shall work, it is listed under “by key” in the below table. Otherwise, it is listed as driven “by value” (not very precise, however, I had no better idea).

by value JSON_SEARCH()
by value JSON_MERGE()

A function that works “by key” on a subset of a JSON document, usually has the prototype: JSON_SOMETHING(string json_document, string key [, string key...], other_parameter). The first argument passed to the functions is always the JSON document to work on. It follows a variable number of arguments that describe on which subpart of the entire document the function shall work. Having this string key [, string key...] arguments in the middle of the function signature is a bit odd. Some programming languages may forbid this for style considerations. However, depending on the function, further arguments may follow, such as the value to search for or to add.

The key syntax is the key to understanding…

Let’s use JSON_CONTAINS_KEY(string document, string key[, string key...]) to explore the syntax at the example of a most basic JSON document. The JSON document consists of an object with one member called “key”.


To check whether the document has a member called “key” you call JSON_CONTAINS_KEY('{"key":"value"}', "key"). Easy! Albeit: not beautiful.

The deeper your JSON document is nested, the longer the variable length string key [, string key...] part in your function call gets. To get down to the nested object from the document below you write:


mysql> select json_contains_key(
    -> "other_key_level_1", "another_key_level_2")\G
*************************** 1. row ***************************
"other_key_level_1", "another_key_level_2"): 1

From my short experience with the functions, nesting of arbitrary depth is supported. Means, the limit should be in the tens of tousands or the available amount of memory.

Keys and arrays

Keys and arrays work as expected. Array keys are zero-based. The first element in an array is accessed through the key “0″, the second element through key “1″ and so forth. Note, however, that you have to use strings.

mysql> select json_contains_key('{"key":[1]}', "key", "0")\G
*************************** 1. row ***************************
json_contains_key('{"key":[1]}', "key", "0"): 1

No secret key to candies…

It may sound a bit though but this is a case where you see the difference between a MeeTooSQL and a system that has had the time and resources to add syntactic sugar. With a bit of syntactic sugar, say “SELECT document.member[offset]” this could look much more appealing. On the contrary, I am talking syntaxtic sugar only! Syntactic sugar is really hard to add with todays MySQL. The MySQL parser is not modular, and at the hearth of the entire system, which forbids icing on the cake in the making. Not to speak of ignoring the SQL standard and the option to support such a notation in UDFs at the cost of parsing string arguments (over processing a **va_args list). Still, as a developer, … See also,
Searching document stores in 2013: from 1983 to SQL:2003 in a blink?

Searching for values

Searching JSON documents is most limited to exact match. There is no support for wildcards. Neither are regular expressions supported not fuzzy matching as with SQL LIKE. I am not aware of a straight forward way to do a case insensitive search. Converting the entire JSON document to upper or lower case and comparing with a correspondingly modified search value is no option as the conversion would also affect JSON object member names (your “keys”).

Here’s what the README, the only documentation available apart from the *.c[omment]/*.h[elp] files, says about JSON_SEARCH(): Searches for specified value in the document. Returns key path of the element which contains the value in reverse order or NULL if parsing failed.. I assume that there is no way to search for “keys” with a specific name.

Whatever, here’s my attempt to find “value” in the most basic document…


mysql> select json_search('{"key":"value"}', "value" )\G
*************************** 1. row ***************************
json_search('{"key":"value"}', "value" ): NULL

I don’t quite buy that there is no “value” in my writing. Maybe I got the syntax wrong?

mysql> select json_search('{"a":{"b":"c"}}', "c" )\G
*************************** 1. row ***************************
json_search('{"a":{"b":"c"}}', "c" ): NULL

Unfortunately, the result does not look any better. I must be missing something fundamental not only documentation, because:

Document JSON_SEARCH(what) Comment
{"key":"value"} value Not found
{"a":{"b":"c"}} c Not found
{"a":"b", "c": "d"} d Not found
{"a":"b","c":["2", "3"]} 2 Not found
{"a":[1]} 1 Found
Exact string match Supported
Regular expression Unsupported
Fuzzy wildcard Unsupported
User defined comparsion expression, e.g. for case-insensitive search Unsupported

Fast forward to a successful search. Upon success, the function returns a [...] key path of the element which contains the value in reverse order [...].

mysql> select json_search('{"key":[1]}', "1" )\G
*************************** 1. row ***************************
json_search('{"key":[1]}', "1" ): 0:key::

Note the key path returned: 0:key::. The interesting bit here is the double double dot ::. JSON_CONTAINS_KEY unveils that no magic key “::” exists. But, if it does not exist, why does JSON_SEARCH report it? There is no single root element in JSON documents, because JSON documents do not form trees, and there seems to be no need for a prefix.

mysql> select json_contains_key('{"a":[1]}', "::" )\G
*************************** 1. row ***************************
json_contains_key('{"a":[1]}', "::" ): 0

mysql> select json_contains_key('{"a":[1]}', ":" )\G
*************************** 1. row ***************************
json_contains_key('{"a":[1]}', ":" ): 0

Modifying data

It may be entertaining to point out bugs and quirks of a pre-production version, but what really matters is the feature set and direction. Search works, albeit limited to most basic exact string match. What about modifying data? Again, there are functions that work on parts of a document and one that does not require a key path.


Merges two or more documents into one. Returns first document with following documents appended.

But what exactly is a document? A document is anything that JSON_VALID considers valid. As argued above, JSON_VALID should consider both an object and an array as a valid document. That is {"key":"value"} respectively [1].

mysql> select json_merge('{"a":"b"}', '{"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('{"a":"b"}', '{"c":"d"}' ): {"a":"b", "c":"d"}

Simple cases work flawless. But then, there are the more tricky ones such as merging and object with an array. This may be actually undefined and there may be no solution: under which member name should the array appear? Or, what if two objects are merged that have members of the same name at the same level (see table below).

The README continues stating about JSON_MERGE: If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generated. Reasonable, but…

mysql> select json_merge('{"a":"b"}', '"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('{"a":"b"}', '"c":"d"}' ): {"a":"b"}
1 row in set (0,00 sec)

mysql> show warnings;
Empty set (0,00 sec)

And the last sentence from the README is: [returns] NULL if first document does not contain an opening curly bracket. Which tells me that “document” is a quite fuzzy description for the functions parameters. Assuming the function would work on documents (objects or arrays), a missing opening curly bracket (applies to object only) would be just one of many possible ways of passing invalid “documents” as input values. Whatever, test and fail:

mysql> select json_merge('"a":"b"}', '{"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('"a":"b"}', '{"c":"d"}' ): "a":"b", "c":"d"}

JSON 1 JSON 2 Result Comment
{"a":"b"} {"c":"d"} {"a":"b","c":"d"} OK
{"a":"b"} {"c":{"d":"e"}} {"a":"b","c":{"d":"e"}} OK
{"a":"b"} {} {"a":"b", } Input: two valid objects, Result: invalid object
{"a":"b"} {"a":"c"} {"a":"b", "a":"c" } OK, but for serialization into native JavaScript variable questionable
{"a":"b"} {"a":"b"} {"a":"b", "a":"b" } OK, but again seems a bit questionable considering serialization into any native programming language object
{"a":"b"} [1] {"a":"b"} Somewhat undefinable result, error/warning missing
[1,2] [3,4,5] [1,2] Questionable, error/warning missing
{"a":"b"} "c":"d"} {"a":"b"} OK, but error/warning missing
{"a":"b"} {true: "c"} {"a":"b", true:"c"} Wrong, returns invalid JSON, error/warning missing
"a":"b"} {"c":"d"} "a":"b","c":"d"} Wrong, README says NULL shall be returned

The remainder

I’ll iterate over the remaining functions only very briefly. The story pretty much continues as it has begun.

It seems that JSON_APPEND can be used inject new data into an existing document: Inserts new element into JSON document. Returns document with appended element or NULL if parsing failed.. The syntax documented is json_append(doc, keypart1, keypart2, ..., new_element). When I first time read this, I looked pretty puzzled when I tried to append an element to the end of an array:

mysql> select json_append('{"a":[1]}', "a", "2" )\G
*************************** 1. row ***************************
json_append('{"a":[1]}', "a", "2" ): {"a":[1]}

The trick is that here new_value should read new_key, new_value. The correct syntax seems to be:

mysql> select json_append('{"a":[1]}', "a", "99", "3" )\G
*************************** 1. row ***************************
json_append('{"a":[1]}', "a", "99", "3" ): {"a":[1, 3]}

Note, that I used offset “99″. Any offset greater or equal to “1″ worked for me. Next try: appending pairs to a nested object:

mysql> select json_append('{"a":{"b":"c"}}', "a", "d", "e" )\G
*************************** 1. row ***************************
json_append('{"a":{"b":"c"}}', "a", "d", "e" ): {"a":{"b":"c", "d": e}}

Have you noticed? Invalid JSON returned…

Modifying: JSON_APPEND
JSON key part new_key new_value Result Comment
{"a":"b"} "a" "c" "d" {“a”:”b”} Nothing inserted, no error/warning
{"a":"b"} "a" "c" {“a”:”b”} Nothing inserted, no error/warning, unclear syntax description. Same with arrays.
{"a":[1]} "a" 99 2 {“a”:[1, 2]} OK, nice offset handling
"a":{"b":"c"}} "a" "d" "e" {“a”:{“b”:”c”, “d”: e}} Invalid JSON returned

All in all, it smells as if it was too early to blog about it. Instead of demoing how to use JSON with MySQL, I ended up in a life debug session for PlanetMySQL. It is not really hard to do the testing. JSON has a pretty simple gramma. Go and play with the productions of the gramma, add a bit of “first user, no docs”, and its easy to find yet another “gotcha”. Invalid JSON returned:

mysql> select json_replace('{"a":[1]}', "a", "b")\G
*************************** 1. row ***************************
json_replace('{"a":[1]}', "a", "b"): {"a":b

Taken from the README. For most functions manipulating or generating JSON documents is true: Warning! This version does not check whole document for validity. Hey, it is a labs.mysql.com pre-production release :-) .

  • json_append(doc, keypart1, keypart2, ..., new_element)
    Inserts new element into JSON document. Returns document with appended element or NULL if parsing failed.

    Note: json_append(doc, keypart1, keypart2, ..., new_pair_key, new_pair_value) might be a better description, see above.
  • json_contains_key(doc, keypart1, keypart2, ...)
    Checks if documents contains specified key. Returns TRUE if key exists, FALSE if not exists, NULL if parsing failed.
  • json_extract(doc, keypart1, keypart2, ...)
    Extracts value of the specified key. Returns value of the key specified, NULL if parsing failed.
  • json_merge(doc1, doc2, ...)
    Merges two or more documents into one. Returns first document with following documents appended. If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generated NULL if first document does not contain an opening curly bracket.
  • json_remove(doc, keypart1, keypart2, ...)
    Removes element specified by the key. Returns document without the element or NULL if parsing failed.
  • json_replace(doc, keypart1, keypart2, ..., new_value)
    Updates value of specified key. Returns document with replaced key or original document if no such an element found, NULL if parsing failed.
  • json_search(doc, value)
    Searches for specified value in the document. Returns key path of the element which contains the value in reverse order or NULL if parsing failed.
  • json_set(doc, keypart1, keypart2, ..., new_value)
    Performs kind of INSERT ... ON DUPLICATE KEY UPDATE operation.
    Returns document with updated or inserted element or NULL if parsing failed.
  • json_test_parser(doc)
    Returns text representation of parse tree of the JSON document, empty string if document is invalid. This function is supposed to use for tests only and should not be used in production.
  • json_valid(doc)
    Checks if doc is valid JSON document. Returns TRUE if document is valid, FALSE is document is invalid.

My initial take

The new set of SQL functions to work on JSON is a most welcome addition. BUT there are issues with the very first pre-production version published on MySQL Labs.

The list of functions available covers only most basic search and modification. Check the PHP manual array functions to get an idea of what developers can envision to do with hashes and have readily available as a built-in function! For example, there is no way to recursively search a document/hash using a user-supplied comparison function. This, however, does not bother me much: once there is a place for dumping new functions, it is only a matter of demand until someone adds them.

TEXT/BLOB columns holding JSON documents remain second class citizen to MySQL. Think off page storage, think indexing. Given MySQLs’ current pace, are you pessimistic about the future…?

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Mir unbekannte Welten: DOAG SIG Development/Tools

2013/09/26 - by admin - 0 comments

Beim gestrigen Event DB-Programmierung: Oracle 12c für Entwickler der Deutsche ORACLE-Anwendergruppe (DOAG) SIG Development/Tools erfolgte ein Blick über den Tellerrand für Referent und Teilnehmer. Ich durfte über die verschiedenen Cluster Optionen für MySQL sprechen und die Sicht des PHP-Entwicklers andeuten.

Im Bereich verteilte Datenbanken ist MySQL mittlerweile sehr breit aufgestellt. Wer sich als DBA einen ersten Überblick verschaffen möchte, der kann die verschiedene Systeme mittels zweier Fragen grob einordnen (Folie 1-16):

  1. Können alle Transaktionen auf allen Replikas ausgeführt werden?
  2. Erfolgt die Konfliktlösung zwischen Transaktionen verschiedener Replikas sofort oder zeitversetzt?

Die Antworten bestimmen die grundsätzlichen Vor- und Nachteile der vier sich ergebenden Kombinationsmöglichkeiten. Es zeigt sich, daß MySQL in mindestens drei Feldern vertreten ist:

  • Lazy (wann: asynchron) Primary (wo: Master) Copy
  • Lazy (wann: asynchron) Update Anywhere (wo: “Multi-Master”)
  • Eager (wann: synchron) Update Anywhere (wo: “Multi-Master”)

Innerhalb dieser ersten Einordnung ist weiterhin nach vollständiger und partieller Replikation zu unterscheiden. Damit weitet sich die Auswahl bei MySQL auf:

  • Lazy (wann: asynchron) Primary (wo: Master) Copy
    • Vollständig
    • Partiell
  • Lazy (wann: asynchron) Update Anywhere (wo: “Multi-Master”)
  • Eager (wann: synchron) Update Anywhere (wo: “Multi-Master”)
    • Vollständig
    • Partiell

Wer hätte das von "der kleinen Webdatenbank" gedacht?

Je nach System ergeben prinzipbedingte Vor- und Nachteile. Insbesonders bei der beliebten MySQL Replikation kommen neue Aufgaben auf die Applikation zu (Folien 27 – 30). Der PHP MySQL Treiber versucht viele der Aufgaben zu abstrahieren und zu übernehmen, um die Entwicklung von Applikationen zu vereinfachen.

Im Rahmen eines DOAG SIG Development/Tools Vortrags sind die Details für den Oracle DBA eher unwichtig. Entsprechend lag der Fokus während des Vortrags auf den grundsätzlichen Fragen nach den verfügbaren Systemen und der Frage auf welcher Ebene ein Load Balancer eingesetzt werden sollte (Folien 31-45).

Wer MySQL als Cluster einsetzt, der findet hoffentlich in der Präsentation bei der Darstellung des PHP Treibers (Folien 45-85) eine Checkliste für die wichtigsten Aspekte beim Umgang mit einem beliebigen MySQL Cluster. Das Wort “PHP” sollte nicht abschrecken. Die Herausforderungen und Lösungsansätze sind sprachübergreifend. Und, natürlich führen viele Wege zum Ziel. Wer seit Jahren eine funktionierende Lösung hat, sollte sie weiterverwenden.

85 Folien?! Jede zweite ist eine Kommentierung dessen, was während des Vortrags sichtbar war. So wird der Vortrag auch ohne Referent online lebendig…

PHP Unconference Hamburg 2013: hungry for more

2013/09/23 - by admin - 0 comments

A day before the 7th edition of the PHP Unconference Hamburg the caretake of the venue called the organizers to inform them that their event cannot take place. The space would be needed as a polling station for the german federal elections… Eventually, the polling station was moved to the 7th floor and, well, business as usual: 330+ visitors, ~70 session proposals, 32 high quality sessions run, especially my own on MySQL 5.7 with built-in sharding ;-) , ~40 Euros entry fee for what costs 400+ Euros elsewhere!

Great, WOW, please more, but…

I am a huge fan of the PHP Unconference. Ever since the first edition the who-is-who of the national PHP community gathers at this family stlyle event. Very few events made it into this league from the start. Having a big name or being a sponsor, however, does not mean you make it on the session table as its the people that vote for sessions every morning. Only topics that really matter to the audience have a chance to make it. You can find photos of the session table at http://www.php-unconference.de/.


(See the end of the posting for a slideshow. Stefan has a really cool picture showing how crowded it is, if you make it to the event on a saturday morning – *yawn* – at: http://album.yabba.net/v/Stefan/PHPUnconference/2013/)

There are long breaks between the sessions to force people to discuss and exchange ideas instead of hunting from one session to another session in a huge conference hotel. The building of the University of Hamburg, where the unconference takes place, is just about perfect for the event. There’s a “bath tub” where 200+ people can sit surrounded by six lecture auditoriums.

The breaks are not the opportunity for discussions. When I came there on saturday, some looked amazingly tired and spoke about intensive discussions held in pubs at the Reeperbahn the night before. Well, I hope, they all have been fit enough to join the official, meanwhile traditional free drinks PHPProst event that took place on saturday evening.

Did we all get a bit lazy?

All in all the event does not need to hide. The unconferences continues to compare well with much more expensive business conferences and – due to the low ticket price – is very attractive to students.

But over the years, some bad habits crept in as well. Tickets are rare and very hard to get. The event is usually sold out in days or even hours, because its such high quality. If you know you cannot make it, please give your ticket to someone else. I wish the organizers had a place for controlled ticket trading on the events web site.

In the early years of the event it was common to run a feedback session at the end of the unconference. Everybody was asked to write down positive as well as negative impressions. Probably, many of you do retrospectives on a regular basis as you know the value of them. This year there was only one lonely “well done” or the like on the cupboard. With all the retrospectives experts around we should be able to do better.

The sessions are hot. This year its been a very, very close race among the top proposals. You had to get like 30+ votes to make it (each participant has four votes per day). I missed some fun sessions. Sure, with all the experts being around to present the cutting edge topcis and all the presentations suggested on stuff that matters to solve pressing day to day problems, its hard to get through with fun sessions or excotic topics. Having Thomas talk about Arduino with PHP seemed the maximum possible.

I wish there had been more. Unfortunately I have nobody but myself to blame. Unconferences are driven by their audience. Keep going there, keep the standards high for next years 8th edition.

Maybe I come up with a nice idea for PHP Unconference Europe 2014/ myself. The place where it takes place deserves a special show. #PHPucEU will take place on the island of Majorca – probably in May 2014. Can you think of a better place? Majorca is affordable, sunny, easy to travel to, …

2013 Sponsors & Supporters

No such community event is possible without sponsors and organizers (BOOT – Best Of Open Technologies e.V.). Thanks, everybody!

Thanks, Galileo computing: I won a book: “CouchDB – Das Praxisbuch für Entwickler und Administratoren“. Next time I meet Till in Berlin, I’ll ask him to sign it.

2014 – looking for help

The organizers are looking for helping hands to make the event happen again in 2014. If you like Open Source, you happen to be in Hamburg and you have some spare time, please, drop them an email: orga@php-unconference.de.

Happy gathering! Your Obi-Wan Kenobi*

* Upon arrival, my badge seemed lost. Instead of searching for a long time they gave me one with the name Obi-Wan Kenobi on it. Well, why not. The wrong date on it allows me to claim one day after the event that this is live blogging…