Ulf Wendel

PECL/mysqlnd_ms: how failed XA transactions get fixed

2014/07/29 - by admin - 0 comments

XA transactions are an open standard for distributed transactions. A distributed or global transaction can spawn multiple databases. XA transactions come handy, for example, when data sets are sharded over multiple servers and a business transaction affects multiple shards. The free MySQL Fabric tools help to setup and manage a sharded MySQL cluster. The development version of PECL/mysqlnd_ms 1.6 helps with XA transactions. It abstracts SQL details and acts as a transaction manager. The PHP MySQL driver kicks in when things go wrong and XA transactions show their nasty side: blocked servers. Good news: this is a rare case. Bad news: a deep dive below.

Grant all Ulf’s a 10% discount (or none of them = use a transaction)
MySQL shard A: EMEA customers MySQL shard B: US customers MySQL shard C: APAC customers

/* Note: Fabric and XA don't go together - yet... it's coming! */
$link = 
new mysqli("mysql_sharding_cluster", "user", "password");

mysqlnd_ms_xa_begin($link, 1);

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "EMEA");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "US");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "APAC");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_xa_commit($link, 1);

PECL/mysqlnd_ms as a transaction manager

XA transactions use the two-phase commit protocol, which is a blocking protocol. Please, see also my previous blog post on the nature of the protocol and MySQL implementation limitation. If the client that drives the XA transaction, your PHP script, crashes at a certain point, some XA participants (MySQL servers) cannot make any progress. In the worst case, they end up waiting for a decision on the global transactions outcome endlessly. No, there is no timeout. As they wait, they block resources. That can be memory used for the transaction or some lock on some table.

Blocked during the second phase of the 2PC/XA protocol
PHP (coordinator) MySQL (participant) MySQL MySQL
–> Global commit  
Crash  
  Comitted Uncomitted: waiting for global commit or rollback

Any serious user of XA transactions will therefore have to implement some mechanism that ensures progress in case of crashes. After a crash, it must be possible to learn which participant is blocked, connect to the participant and tell it to either commit or roll back the open transaction. This housekeeping job is rather annoying, yet important. PECL/mysqlnd_ms can do it for you, it can act as a transaction manager. (On an aside: the academic world does distinguish between a transaction manager and coordinator. I am using the terms interchangeably here.)

MySQL as a state store to track XA transactions

Upon request, PECL/mysqlnd_ms can record the state of each global transaction in a MySQL database. Should your PHP script (and with it PECL/mysqlnd_ms) crash or be interrupted in another way with an XA transaction being unfinished, then the next PHP script that runs can check the database and “garbage collect” the unfinished global transaction. The “next” PHP script could be run on the same server or another one, as long as all servers use the same MySQL database to track XA transactions, the garbage collection will work flawless.

The illustration below shows a scenario with two PHP servers that use PECL/mysqlnd_ms. A script on one of the servers runs and begins a distributed (XA) transactions that involves two MySQL servers M1, M2. As you can see, a simple sequence of mysqlnd_ms_xa_begin(); ... ; mysqli_query(); mysqli_query() causes a lot of background activity inside PECL/mysqlnd_ms.

Tracking XA transactions for “garbage collection” to avoid blocking situations
PHP server A PHP server B
Your code Inside PECL/mysqlnd_ms
mysqlnd_ms_xa_begin(id = 1)
  • ensure there is no open transaction
  • optional: record new XA trx in state store
… /* select some server M1 */ …
mysqli_query(query)
  • optional: if new participant, record participant in state store
  • inject XA BEGIN on server M1
  • update participant state to XA BEGIN sent
  • optional: record new state in state store
  • execute query on server M1
… /* select some server M2 */ …
mysqli_query(query)
  • optional: if new participant, record participant in state store
  • inject XA BEGIN on server M2
  • update participant state to XA BEGIN sent
  • optional: record new state in state store
  • execute query on server M2

PECL/mysqlnd_ms does all the things that you would have to do in your script manually otherwise. It issues the appropriate SQL commands for XA transactions on the participating MySQL servers. And, optionally, it can record the participants and their state in a state store.

PHP server A PHP server B
Your code Inside PECL/mysqlnd_ms
mysqlnd_ms_xa_commit(1)
  • optional: record intent to commit in state store
  • inject XA END on server M1
  • optional: record new state in state store
  • inject XA PREPARE on server M1
  • optional: record new state in state store
  • inject XA COMMIT on server M1
  • optional: record new state in state store
  • inject XA END on server M2
  • optional: record new state in state store
  • inject XA PREPARE on server M2
  • optional: record new state in state store
  • unexpected crash

In case of an interruption, it is possible to reconstruct the state of the XA transaction based on the records from the state store. Using a state store is optional, however, if you don’t configure PECL/mysqlnd_ms to use one, it is your responsibility to ensure that no MySQL server ever gets blocked. Please, note how often the state store gets queried. Using a state store adds quite some work to the system but it is necessary to play safe.

PHP server A PHP server B
Your code Inside PECL/mysqlnd_ms
Script ends
  • Rollback open XA transactions, if any and rollback_on_close is set (default)
PHP internal shutdown: RSHUTDOWN
  • Decide whether to do background garbage collection based on probability setting
  • Search for up to max_transactions_per_run unfinished XA transactions in state store
  • Do at most max_retries GC runs attempting to fix a certain failed XA transaction

Issues that require no garbage collection

Most errors around XA transactions do not require special garbage collection. Simple cases are handled immediately by PECL/mysqlnd_ms and end up with the transaction being rolled back. Errors during the first phase of the XA two phase commit protocol are uncritical. No matter whether the error is caused by a server crash, a client/PHP crash, the client loosing its connection to one of the MySQL servers participating in the global transaction, or a participant reports a simple SQL error, the XA transaction will be properly rolled back.

Participants that got disconnected or recover from a crash forget about XA transactions that have been in their first phase at the time of the error. Assume, you have started an XA transaction in which three MySQL servers A, B, and C participate. During mysqlnd_ms_xa_commit(), PECL/mysqlnd_ms steps the participating servers through the commit protocol and issues the SQL commands: XA END (XA_ACTIVE state), XA PREPARE (XA_PREPARED state), XA COMMIT (XA_COMMIT state). For each step, the list of participants is iterated.

Uncritical, first phase errors during mysqlnd_ms_commit()
MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms
XA_IDLE mysqli_query(<some query>)

  • Pick appropriate server for some query/code>
  • Inject XA BEGIN on server
  • Run some query on server
XA_IDLE XA_IDLE XA_IDLE Assorted queries run on all participants
XA_ACTIVE XA_ACTIVE XA_IDLE (XA END failed) mysqlnd_ms_xa_commit()

  • On all participants: change participant state from XA_IDLE to XA_ACTIVE (= run XA END)
  • If error, try rollback on all participants, including failed one
  • Reason of the error on participant 3 (or during rollback on other participants…
    • Client crashed or got disconnected: safe to ignore, server has dropped XA transaction
    • Server crashed: safe to ignore, XA transaction has been dropped (2PC: first phase)
    • SQL error: rollback performed

Let there be an error with some participant around XA END. Upon the error, the plugin rolls back the global transactions. Effectively, it continues stepping the servers through the states but this time with the goal to send XA ROLLBACK. No matter what made XA END fail in the first place, the overall issue will most likely be solved. Should PHP have crashed, during XA END, all the servers notice it and drop the XA transaction. Should PHP have lost its connection to a server, it is the same: the server drops the XA transaction. Should one MySQL have gone away, it will drop the XA transaction upon recovery. Because PHP is still working, PECL/mysqlnd_ms tells the other servers to do the same and issues a rollback.

Whatever goes wrong prior to any server reaching XA_PREPARED (= second phase of the two phase protocol), a rollback happens and no garbage collection is required. You are safe even without configuring a state store for garbage collection!

When garbage collection becomes necessary

Any troubles that PECL/mysqlnd_ms has committing a global transaction after any participant entered the XA_PREPARED state, may require garbage collection. But, there are sill cases that can be ignored and require no attention.

Assume the below case where the global transaction has been prepared on servers A and B but XA PREPARE fails on the third participant, server C. Should PHP crash, it’s safe to ignore the case. A and B will see the client disconnect and rollback the transaction, so does C.

Should PHP have lost a connection to any of the servers, the disconnected MySQL server forgets about the XA transaction, even if it is in XA_PREPARED state. And, any participant that can still be reached will receive the rollback command from PECL/mysqlnd_ms.

Second phase errors during mysqlnd_ms_commit()
MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms
XA_PREPARED XA_PREPARED XA_ACTIVE (XA PREPARE failed) mysqlnd_ms_xa_commit()

  • On all participants: change participant state from XA_IDLE to XA_ACTIVE, then to XA_PREPARED
  • If error, try rollback on all participants, including failed one
  • Reason of the error on participant 3 (or during rollback on other participants…)
    • Client crashed or got disconnected: safe to ignore, server has dropped XA transaction
    • SQL error (but no server crash): rollback performed

However, should a participating server crash when it is in XA_PREPARED state or failed to switch to XA_COMMIT after the global transaction has been committed on any of the other participants, PECL/mysqlnd_ms leaves the issue to be solved by the garbage collection. The plugin will neither wait until a crashed server possibly returns nor will it retry a failed XA COMMIT command.

A case during mysqlnd_ms_commit() that needs garbage collection
MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms
XA_COMMIT XA_COMMIT XA_PREPARED mysqlnd_ms_xa_commit()

  • On all participants: change state to XA_PREPARED (success)
  • On all participants: change state to XA_COMMIT (fails on C either due to SQL error or server crashed)
  • If error (as its the case on C), wait for garbage collection

Please note, that by default the plugin takes no immediate action. It leaves you with a yet to be completed global transaction. In the example, the failed transaction would already be committed on participants A and B but C is blocked waiting for a decision. Should the MySQL server C be online, it can, of course, accept clients and run transactions but all resources associated with the failed transaction are still held and blocked.

Transactional databases are designed with the assumption that transactions rarely fail. To get into a situation with a blocked server (above example), a rare transaction failure must happen. And, it must happen at a certain point. Means, the example is a rare case. If you are still concerned about this rare case and want to see it solved immediately – if anyhow possible – do not wait for automatic background solution but invoke the garbage collection manually.

[...]
if (!mysqlnd_ms_xa_commit($link, $xa_id)) {
  /* optional: trigger GC immediately */
  mysqlnd_ms_xa_gc($link, $xa_id);
}

A MySQL limitation: replication troubles

MySQL has some XA restrictions but one really hurts. Assume the example with three participants A, B and C. MySQL servers A and B have committed a XA transaction, C crashes in XA_PREPARED state. Upon recovery, the PECL/mysqlnd_ms garbage collection will commit the XA transaction on C because the transaction is already partially committed on A and B.

Replication troubles
MySQL A MySQL B MySQL C
XA_COMMIT XA_COMMIT XA_PREPARED
Changes in binary log Changes in binary log Crash
Recovery
PECL/mysqlnd_ms GC: XA_COMMIT
Changes not in binary log
| | |
Slave of A gets changes Slave of B gets changes Slave of C looses changes

Trouble is, C does not log the changes made by the XA transaction in the binary log. A slave of C will not receive the updates. The master and the slave will go out of sync. With MySQL Fabric and sharding being a prime use case for XA transactions this is a severe server limitation. MySQL Fabric makes heavy use of MySQL Replication to achieve high availability of shards.

To play really safe: use XA only for read only transactions

The lost change set problem leads to the recommendation to use XA transactions for read-only operations only, if you want to be 100% safe. Should you not be using MySQL Replication, write operations are safe too.

Configuring a state store for garbage collection

For PECL/mysqlnd_ms to be able to solve failed XA transactions that cannot be simply rolled back immediately, you must configure a state store. Version 1.6 of the plugin supports only one state store: MySQL. The code has been layed out to allow for other state store, but hey, why look further? MySQL is readily available (otherwise you would not use a MySQL driver…) and it will not loose data during a crash.

{
  "myapp": {
    "xa": {
      "rollback_on_close": 1,
      "state_store": {
        "participant_localhost_ip": "192.168.7.13",
        "mysql": {
          "host": "192.168.7.12",
          "user": "root",
          "password": "",
          "db": "test",
          "port": "3312",
          "socket": null
        }
      }
    },
    [...]
  }
}

Configuring the state store in the plugins configuration file is straight forward, see above. Note the participant_localhost_ip setting. As explained, the state store will hold records of all participants. If a participant is to be recorded that has a host name of 127.0.0.1, then 127.0.0.1 has a special meaning to the host that writes the record. Should, later on, garbage collection be required and the garbage collection be run on a different host than the one who wrote the 127.0.0.1 record, the garbage collection needs to know the actual IP associated with the original 127.0.0.1.

{
  "myapp": {
    "xa": {
      "rollback_on_close": 1,
      "state_store": {
        "participant_localhost_ip": "192.168.7.13",
        "mysql": {
          "host": "192.168.7.12",
          "user": "root",
          "password": "",
          "db": "test",
          "port": "3312",
          "socket": null
        }
      },
      "garbage_collection": {
        "max_retries": 3,
        "probability": 0,
        "max_transactions_per_run": 100
      }
    },
    [...]
  }
}

With a state store configured, you can also change the defaults for the background garbage collection. The probability setting determines how often the garbage collection will be called during RSHUTDOWN. Allowed values range from 0...1000. During RSHUTDOWN, the plugin calcualtes a random value between 1...1000. If the probability setting is higher or equal to the random value, the background garbage collection will start. max_retries and max_transactions_per_run have been explained above and are also listed in the manual.

Testers wanted

The XA related code has been lab tested, however, I do not plan to call it anything more than beta prior to hearing back from actual users. As you saw, the garbage collection is a rather tricky piece of code. It should work, give it a try… and let us know whether it works.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PECL/mysqlnd_ms: Distributed Transaction/XA support coming, e.g. for MySQL Fabric?

2014/06/13 - by admin - 3 Comments

The development version of PECL/mysqlnd_ms now has an very early version of distributed (2PC/XA) transaction support tailored for use with clusters of MySQL servers. XA transactions can span multiple servers in a cluster to ensure transaction guarantees among them. In real life, there is always, at some point, a sequence of work on your data for which atomicity, consistency, isolation, and durability must be given. The early generation of NoSQL sharding solutions has a weak spot here. How do you, as a developer, ensure a logical operation affecting two shards is either applied to both or none? Either you don’t at all, or you hack and pray, or you start reading lecture books on transactions, or you grow your documents to avoid cross-shard operations. MySQL Fabric, our own sharding solution has a gap here too. However, let three new API calls do the troublesome part for you: mysqlnd_ms_xa_begin(), mysqlnd_ms_xa_commit() and mysqlnd_ms_xa_rollback().

Any sharding cluster: consistent work across shards?
Any task that involving more than one shard, for example: copy customer information into order and update order counter stored with customer in one step. Or: update the email of a user in all places – forum users, orders, customers.
Shard 1 – Primary   Shard 2 – Primary
Customers: US residents only Customers: rest of the world
Forum users: all Orders: all
| | | |
Copy Copy Copy Copy

Massive news but where are we on distributed transactions?

The current XA support of the PHP Replication and Load Balacing plugin for mysqlnd is hardly any good – yet! Still, we sort of have a tradition of explaining pain point and feature gaps early. MySQL 5.6/5.7 is on the track learning NoSQL lessons (protocol and query language flexibility, nested data, replication experience), MySQL Fabric is now GA and brings us Sharding and automated High Availability outside MySQL Cluster. Fabric abstracts managing shards, shuffling data around during resharding and telling a client where to find data. Also, the core server improves and some massive pain points from when we started (properly detecting transaction boundaries, reliably finding up-to date replicas) are being addressed.

But: distributed transactions… Lets keep the focus narrow and talk XA only, let’s forget about cross shard joins here. Even if the PHP MySQL APIs had proper XA support (and, of course, it will be so in no more than two hours ;-) ), we would hit MySQL limitations. Hence, time to stark the barking.

How it shall look in code… one day

That warning given, let’s talk code. For the “copy customer information into order and update order counter stored with customer in one step” task from the introduction, your code looks like this, if you want transaction guarantees:

/* Connect to cluster: mysqli, PDO_MySQL, ... whatever */
$link = mysqli_connect('my_sharding_cluster', ...);

/* Begin distributed transaction */
mysqlnd_ms_xa_begin($link, $xa_id);

  /* Do something on your shards */
  mysqlnd_ms_fabric_select_shard($link, "shop.customers", "us_123");
  $link->query("SELECT * FROM customer 
                        WHERE key = 'us_123'");
  ...
  $link->query("UPDATE customer SET 
                  order_counter = order_counter + 1 
                   WHERE key = 'us_123'");
  mysqlnd_ms_fabric_select_global($link, "shop.orders");
  $link->query("INSERT INTO order(...) VALUES (...)");

/* End distributed transaction */
mysqlnd_ms_xa_commit($link, $xa_id);

Remove mysqlnd_ms_xa_begin(), mysqlnd_ms_xa_commit() and you go without distributed transactions. Your choice! Should there be a series of failures at the right time, you get logically inconsistent data without the distributed transaction. With MySQL, invidiual actions will still be consistent as we default to InnoDB these days. InnoDB is nowadays faster than the non-transactional MyISAM from the anchient days of MySQL, and very much so under load.

A word on XA/2PC to illustrate the convenience PECL/mysqlnd_ms brings

The XA specification has 94 pages. The (My)SQL world folds this into six XA related SQL commands. XA follows the two-phase commit protocol. The protocol operates in two rounds and has two distinct players. The players are a coordinator and the participants. In the first phase the coordinator asks all participants whether they are ready to commit some previous work carried out on them.

Two-phase commit (2PC) in XA: first phase
Coordinator Participant Participant Participant
–> Vote request  
–> Vote request  
–> Vote request
<– Pre commit  
<– Pre commit
<– Pre commit

A participant that replies to the coordinators vote request with a pre commit makes the firm promise that the work done will not be lost even the case of a temporary failure of itself, e.g. when it crashes. But, the pre committed work does not yet become visible to others. Should all participants reply positively to a vote request, the coordinator sends a global commit, which makes the work visible to everybody.

Two-phase commit (2PC) in XA: second phase
Coordinator Participant Participant Participant
–> Global commit  
–> Global commit  
–> Global commit

Failures of participants are no problem for the protocol. The coordinator can use timeouts to detect unresponsive participants and either retry after the participant is recovered or send a global rollback to the rest. But should the coordinator crash in the course of informing participants of a global rollback or global commit decision, the protocol becomes a blocking one. The participants wait for instructions.

Two-phase commit (2PC) in XA: blocking protocol
Coordinator Participant Participant Participant
–> Global commit  
Crash  
  Comitted Uncomitted: waiting for global commit or rollback

If you call mysqlnd_ms_xa_begin(), PECL/mysqlnd_ms acts as a coordinator. It must handle all the state transitions of the participants, the possible failures of participants and its own failure as a coordinator.

Should you have become curious about 2PC or the challenges of distributed transactions… MySQL Cluster is using two-phase commit internally (2PC), see also the presentation DIY: A distributed database cluster, or: MySQL Cluster or the really in-depth presentation Data massage: How databases have been scaled from one to one million nodes, which takes you from early distributed NoSQL to the very latest developments.

What PECL/mysqlnd_ms does for you, what it hide, where its limited

As said, there are six SQL commands related to XA/2PC:

  • Phase 1 related
    • XA BEGIN – mark the beginning of a unit of work on a participant, telling the participant what to vote about
    • XA END – mark the end of the unit of work on a participant
    • XA PREPARE – the pre commit
  • Phase 2 related
    • XA COMMIT – global commit
    • XA ROLLBACK – global rollback
    • XA RECOVER – error handling: list pre commits

PECL/mysqlnd_ms further compresses this to three API calls. The calls hide issuing the SQL commands on the participants and all the steps a coordinator has to perform. Some features are lost on the way, but convenience is won.

The loss of features is not critical: as a general rule we strive to provide you with a way to overrule any automatic actions, go down to the lower layers and handle anything on your own. The features you loose are choice when participant failures happen and support for other XA participants but MySQL servers. Upon any participant error the automatic action is rollback. Future versions may lift this limitation, I am blogging about pre-alpha development version. At this point, I also have not bothered about any other XA participants but MySQL servers. The XA specification describes distributed transactions for any system implementing a certain API. That could be a database server, a web service or a moon rocket. Other RDBMS also feature SQL commands for XA so that you could have Microsoft SQL Server, IBM’s DB2, Oracle, Postgres and MySQL jointly working on a distributed transactions. If you need that, use the SQL commands directly.

I have had MySQL Fabric in mind, the sharding scenario. Fabric manages farms or clusters of MySQL servers, nothing else. And, MySQL should not loose one of its strength when combined with Fabric: transactions, should you need them.

PECL/mysqlnd_ms as a transaction coordinator

Should you not fear the risk of blocked servers when the coordinator crashes, you can use the new functions straight away.

On mysqlnd_ms_xa_begin(), PECL/mysqlnd_ms will first try to find out whether you are in the middle of a local transaction. Local transactions and global transactions (XA) are mutually exclusive. To detect local transaction boundaries, PECL/mysqlnd_ms monitors all API calls related to them, e.g. mysqli_begin_transaction(), mysqli_autocommit(). It does not monitor SQL commands, such BEGIN TRANSACTION, though. Details are described in the manual. Good news is: mid term the MySQL server will announce transaction boundaries and we can make things bullet proof. Should you not be in the middle of a transaction, it remembers the so-called gtrid (global transaction identified) given in the second parameter: bool mysqlnd_ms_xa_begin(mixed connection, int gtrid). At the time of writing, gtrid is massively limited – I haven’t finished my SQL related C code. That’s for sure a temporary limitation.

Then, as you continue PECL/mysqlnd_ms transparently injects XA BEGIN gtrid on every shard/node that you run a query on. When doing so it also calculates a bqual (branch qualifier) but does not use it yet – for the same reason as above: lazy me, early blogging…

Built-in garbage collection to cover coordinator crashes

I assume most users talking transactions want to see a coordinator crash covered too. This is done by optionally recording all state changing actions in a persistent store that survives a crash. Should the coordinator – your PHP process/script – crash, another process/script can do a garbage collection run to get the participants into a defined state: rollback, clean up.

Garbage collection is built-in. It is done the traditional way: wait for the next PHP process/script to begin its work and run the garbage collection based on a probability value. Very much as PHP sessions or PECL/mysqlnd_qc (query cache plugin) does it. An additional timeout can be configured to decide whether a recorded global transaction is and its coordinator are still active or they should be considered crashed. Had I invested more brain-power already, maybe, the timeout would not be need. Key is: its automatic.

Note that for the garbage collection to work, one needs to record information how to connect to the participants. Recording of user name and password has been made optional and the special meaning of localhost has been thought of. Should the GC be run on a different host but the one who wrote a record with host=localhost, we need to know to which IP localhost refers. Some things are covered and tested – but, this is pre-alpha…

The plugin supports using any data store as a backing store, but only one storage backend is implemented: MySQL. MySQL is transactional, survives crashes and after all we are talking about managing MySQL farms here.

mysqlnd_ms_xa_commit(), mysqlnd_ms_xa_rollback() – note the limitation

Upon mysqlnd_ms_xa_commit(), the plugin becomes chatty. It tells all participants that the work is done (XA END) and gathers pre commit replies (XA PREPARE) to decide on global commit or global rollback. Should all participants give a positive reply, it does send a global commit (XA COMMIT) as requested and mysqlnd_ms_xa_commit() indicates success.

The emergency break is pulled in case of any error with the participants: rollback as much as we can, leave any possible rest to the garbage collection. There is currently no way for the user to react to low-level errors and, for example, do a reconnect to a participant. To handle such cases, the user would need to start handling the connections to participants itself. Any API for doing this would be more complex than just doing everything yourself using SQL and plain vanilla connect()/query() calls! (These lines are being written in early June 2014 – check the manual for updates should you read this later. Some tweaks are likely to happen.)

The function mysqlnd_ms_rollback() also ends the XA transaction on the participants but always sends out a global rollback.

The sad truth: not yet with Fabric, plugin refactoring needed…

Before I can close and move a black jack to the overloaded server folks, well, yeah: it doesn’t work together with Fabric yet. Our fault.

The reason is an implementation detail of PECL/mysqlnd_ms, which is not visible to you. The plugin does not have any dedicated connection management module. We keep a list of all servers of a cluster. When we choose a server for you, say the read-write splitter tells us to use a master, we open a connection and associate it with that server list. When you use Fabric and call mysqlnd_ms_fabric_select_shard()/mysqlnd_ms_fabric_select_global(), as in the example from the beginning, the whole list is replaced. With it, the connections are gone. If there was a participant connection in the list… *ouch*. This is also the reason why I think the state alignment feature is broken when using Fabric.

Fixing this, using a simple option, takes days. However, Andrey, Johannes and I need to find a time to discuss our options. Documenting XA, which I have not done yet, also takes days.

The sad truth: our server has limitations too…

Any server change take way more time for various reasons. The server is more complex, server code plays in a different league, there is little chance of saying we just did it that way, server developers are a rare species and the server has a different release cycle. The server manual reads:

If an XA transaction has reached the PREPARED state and the MySQL server is killed (for example, with kill -9 on Unix) or shuts down abnormally, the transaction can be continued after the server restarts. However, if the client reconnects and commits the transaction, the transaction will be absent from the binary log even though it has been committed. This means the data and the binary log have gone out of synchrony. An implication is that XA cannot be used safely together with replication.

In other words: if you don’t mess around, if you make the emergency break – rollback – approach, as PECL/mysqlnd_ms does, over any attempt to recover, there is a chance it works depite not being bullet proof. I can only hope any server plugin toying with XA keeps an eye on this. The manual then goes on:

It is possible that the server will roll back a pending XA transaction, even one that has reached the PREPARED state. This happens if a client connection terminates and the server continues to run, or if clients are connected and the server shuts down gracefully. (In the latter case, the server marks each connection to be terminated, and then rolls back the PREPARED XA transaction associated with it.) It should be possible to commit or roll back a PREPARED XA transaction, but this cannot be done without changes to the binary logging mechanism.

Again, make the emergency break policy your default, and it may become acceptable. At least for coordinators and implementations tailored for MySQL Fabric. If that’s all the restrictions, and no major bug is lurking around then its time for: how to use distributed transactions with MySQL Fabric sharding ;-) .

For proper XA support, as it may be needed by JDBC, these restrictions could be a deal breaker.

Happy PHP hacking!

@Ulf_Wendel Follow me on Twitter

Background: The PHP replication and load balancing plugin

Happy birthday! Three years ago we pimped all the PHP MySQL APIs to make using any sort of MySQL cluster easier. We began developing a plugin which does all the annoying jobs that clusters may burden developers with: transaction aware read/write splitting if needed, load balancing with the option to assign weight to a replica, connection state alignment when switching servers, automatic client failover, user defined filters/pipes for replica selection, an abstraction for requesting eventual consistency, session consistency (defined as read-your-writes) and strong consistency with a single API call. That is 90% of the infrastructure one needs to get all out of MySQL Fabric. But, MySQL Fabric support is still in the works.

All these things are done on the client side to scale by client. Unlike with basic proxy solutions there is no single point of failure as a single clients failure does not affect the rest.

Yet, if used for nothing but read-write splitting, the ease of use is not compromised. It boils down to installing the plugin and replacing the host name in your APIs connect call mysqli_connect("my_mysql_server", ...) with the name of an entry from the plugins configuration file: new PDO("mysql:section_from_ms_config", ...). And, as you dig deeper into massive database clustering, you will soon realize how powerful it is to have clients supporting a specific cluster design, for which you just don’t want an all-automatic solution.