Filtering can be done on the
Master or on the Slave. Filtering on the Slave is normally preferred as in this
case you get full binary log on the Master which is not only helpful for
replication needs but also if you ever need to do roll forward recovery while
recovering from backup. There are however some cases when you really do not
need events in binary log – even for backup recovery, for example dealing with
temporary tables (unless they are used to populate normal tables using SQL).
On the Master you can do filtering using binlog-do-db and binlog-ignore-db commands.
In
reality however these filters are very fragile – you need to always have
default database specified and you only need to update this default database
for filtering to work normally. If you’re using this rules you have to be both
sure your application(s) only write to default database and everyone who has
access to the database always keeps this in mind. In quite a lot of cases I’ve
seen such filtering being broken by developer forgetting this restriction and
running one time update query without selecting proper default database.
Using
this option takes a lot of discipline and so do not work well for a lot of
installations, especially as MySQL Server do not offer any enforcements – ie
there is no way to forbid updating not default database so you can ensure your
application and staff members really follow these rules.
Another
option to filter writes on the Master is of course manual filtering – using SET
SQL_LOG_BIN=0 if
you need to skip some statements from binary logging. This command requires
SUPER Privilege and extreme care in use but it can be very helpful in advanced
cases allowing replication to control what will be logged to binary log and so
replicated.
On the Slave you have more options to filter the data. There is replicate-do-db and replicate-ignore-db options which closely mirror behavior
of their binlog-do-* counterparts and which are as easy to break. So I
generally would not recommend using these.
There are
also replicate-do-table, replicate-ignore-table, replicate-wild-do-table and replicate-wild-ignore-table options which act differently. Instead
of looking at the query and analyzing which tables this query is using.
This
works in much larger amount of cases, but not all of them. For example stored
procedures are handled (because binary logging is done on statements as
executed inside stored procedures), however Stored Functions are not – so if
you have updates done by stored functions the tables are not taken into
account.
The
traditional pre MySQL 5.0 way to break such replication is also using
multi-table update or delete statements, if you happen to replicate one table
or another this would not work. Though this is rather easy to avoid, but it is
still remains a way how one can break filtered replication by executing some
statements on the master.
Even though
replicate-*-table options also have some issues they are much safer than
database based counterparts.
In some
cases however you can’t really do filtering on the slave or it gets really
inefficient – what if you’re replicating over long distance and would like to
save on traffic or if write load is high and it would be waste to pull binary
logs to large variety of slaves?
This is
the case when BLACKHOLE storage engine can come into the play. In the
nutshell you can create intermediately slave which has BLACKHOLE as
default-storage engine which would fetch all binary logs from the master,
filter out only what you need replicated and pass it to the group of slaves. If
you have several group of slaves which need different data sets replicated to
them – you can use number of such filtering servers.
The good
thing about BLACKHOLE filtered server is – it is generally rather lightweight
as it only needs to fetch logs filter and write filtered logs – the queries are
formally executed but it is rather fast as there is no any real data to modify.
Daniel
Schneller wrote pretty good tutorial on creation for setting up BLACKHOLE Based
Replication Filtering so I will not go into much of details but just note
couple of issues you need to watch out for.
First
having BLACKHOLE Slave in the middle you get into classical issues of chain
replication – the latency will increase a bit (though not much because query
execution on such slave is very fast) and what is more important you will get
into more complicated math of dealing with binary log positions. For example
cloning Master (and dropping not replicated tables) to set up replication
becomes more tricky because you need to perform mapping between master
positions which you get with backup to distribution slave position. You can
also clone other slaves which are getting the same filtered stream of data
though it does not help if you would like to add more tables to be replicated
to the slaves.
Another
issue is of course creating and altering tables. If you have Innodb tables
you’re normally OK because you can use –skip-innodb on the filtering slave and default-storage-engine=BLACKHOLE to ensure all Innodb tables are
created as BLACKHOLE. However you can’t do the same with MyISAM tables, because
MyISAM can’t be disabled and so if you create tables as MYISAM or ALTER them to
MyISAM you will get them as MyISAM on BLACKHOLE server as well. So you need to
watch out for this one as well.
As you
can see no matter which road you take with partial MySQL Replication there are
things to be careful with, so if you’re considering to implement it make sure
there are skills and discipline in your team to make sure you do not shot
yourself in the foot.
If you’re not doing things which do not work Filtered Replication can work pretty well for you.
If you’re not doing things which do not work Filtered Replication can work pretty well for you.
P.S It is in my todo to see how well
MySQL 5.1 row based replication works and performs and I will be testing
filtered scenarios as well. Hopefully everything is taken care of in this case.
0 comments:
Post a Comment