-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert MergeTree tables to replicated on server restart if flag is set #57798
Conversation
src/Interpreters/loadMetadata.cpp
Outdated
String get_attach_queries_query = fmt::format("SELECT DISTINCT partition_id FROM system.parts WHERE table = '{}' AND database = '{}' AND active;", id.table_name, database_name); | ||
WriteBufferFromOwnString buffer2; | ||
ReadBufferFromOwnString buffer3 {std::move(get_attach_queries_query)}; | ||
auto select_query_context = Context::createCopy(context); | ||
select_query_context->makeQueryContext(); | ||
select_query_context->setCurrentQueryId(""); | ||
|
||
executeQuery(buffer3, buffer2, false, select_query_context, {}, {.internal=true}); | ||
|
||
std::stringstream partition_ids_string{buffer2.str()}; | ||
std::string line; | ||
|
||
/// Attach partitions | ||
while (std::getline(partition_ids_string, line, '\n')) | ||
{ | ||
String query3 = fmt::format("ALTER TABLE {} ATTACH PARTITION ID '{}' FROM {};", qualified_quoted_name, line, tmp_qualified_quoted_name); | ||
executeQuery(query3, select_query_context, {.internal=true}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there's SYSTEM RESTORE REPLICA
query and StorageReplicatedMergeTree::restoreMetadataInZooKeeper()
, please consider using it (just update the engine and restore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this won't help attach partitions from the old table since there is nothing inside new table's directory even after name exchange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that you don't need a new table and you don't need to exchange them. You can just update the engine of the old table, load it as ReplicatedMergeTree (it will be in readonly mode due to missing metadata in zk) and restore it.
This is an automated comment for commit c5ab189 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
Looks like failed tests are not related to the pr |
@@ -304,6 +304,12 @@ We use the term `MergeTree` to refer to all table engines in the `MergeTree fami | |||
|
|||
If you had a `MergeTree` table that was manually replicated, you can convert it to a replicated table. You might need to do this if you have already collected a large amount of data in a `MergeTree` table and now you want to enable replication. | |||
|
|||
`MergeTree` table can be automatically converted on server restart if `convert_to_replicated` flag is set at the table's flags directory (`/var/lib/clickhouse/data/db_name/table_name/flags/`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of table-local flags, maybe we can use MergeTreeSettings
instead? (but I'm not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach better, but the disadvantage is that user can't remove the flag without a working server.
For example, if you set convert_to_replicated
setting for a table with an Ordinary
database, the NOT_IMPLEMENTED
exception is raised when the server is loaded and it becomes impossible to start the server.
In this case, you can either prohibit setting this setting in the ALTER
query interpreter (but it helps only in this case), or remove the setting when an exception occurs and load without conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yep, it's a good point. But I don't like table_name/flags/
dir either because we have some code that doesn't expect to find extra subdirs in the table dir, for example
ClickHouse/src/Storages/MergeTree/MergeTreeData.cpp
Lines 2779 to 2800 in 852f397
LOG_INFO(log, "dropAllData: remove format_version.txt, detached, moving and write ahead logs"); | |
disk->removeFileIfExists(fs::path(relative_data_path) / FORMAT_VERSION_FILE_NAME); | |
if (disk->exists(fs::path(relative_data_path) / DETACHED_DIR_NAME)) | |
disk->removeSharedRecursive(fs::path(relative_data_path) / DETACHED_DIR_NAME, /*keep_all_shared_data*/ true, {}); | |
if (disk->exists(fs::path(relative_data_path) / MOVING_DIR_NAME)) | |
disk->removeRecursive(fs::path(relative_data_path) / MOVING_DIR_NAME); | |
try | |
{ | |
if (!disk->isDirectoryEmpty(relative_data_path) && | |
supportsReplication() && disk->supportZeroCopyReplication() | |
&& settings_ptr->allow_remote_fs_zero_copy_replication) | |
{ | |
std::vector<std::string> files_left; | |
disk->listFiles(relative_data_path, files_left); | |
throw Exception( | |
ErrorCodes::ZERO_COPY_REPLICATION_ERROR, | |
"Directory {} with table {} not empty (files [{}]) after drop. Will not drop.", | |
relative_data_path, getStorageID().getNameForLogs(), fmt::join(files_left, ", ")); |
But maybe it's okay if we ensure that it's removed on server start after the coversion. Also, I would prefer a single flag file, without the subdir
src/Databases/DatabaseOrdinary.cpp
Outdated
if (create_query->storage && create_query->storage->engine->name.ends_with("MergeTree") && !create_query->storage->engine->name.starts_with("Replicated")) | ||
{ | ||
auto convert_to_replicated_flag_path = fs::path(getContext()->getPath()) / "data" / qualified_name.database / qualified_name.table / "flags" / "convert_to_replicated"; | ||
|
||
LOG_INFO(log, "Searching for convert_to_replicated flag at {}.", backQuote(convert_to_replicated_flag_path.string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the conversion code to a separate method
src/Databases/DatabaseOrdinary.cpp
Outdated
if (fs::exists(convert_to_replicated_flag_path)) | ||
{ | ||
LOG_INFO(log, "Found convert_to_replicated flag for table {}. Will try to load it as replicated table.", backQuote(qualified_name.getFullName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, having a separate method for this would allow using "early exit" and avoiding too many nested if
s
src/Databases/DatabaseOrdinary.cpp
Outdated
auto * storage = create_query->storage; | ||
|
||
String replica_path = getContext()->getConfigRef().getString("default_replica_path", "/clickhouse/tables/{uuid}/{shard}"); | ||
replica_path = boost::algorithm::replace_all_copy(replica_path, "{uuid}", fmt::format("{}", create_query->uuid)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_query->uuid
is always Nil
for tables in Ordinary
databases. However, the Ordinary
engine is deprecated, so it's fine to just throw NOT_IMPLEMENTED
in this case
src/Databases/DatabaseOrdinary.cpp
Outdated
/// Set uuid explicitly, because it is forbidden to use the 'uuid' macro without ON CLUSTER | ||
auto * storage = create_query->storage; | ||
|
||
String replica_path = getContext()->getConfigRef().getString("default_replica_path", "/clickhouse/tables/{uuid}/{shard}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use getDefaultZooKeeperPath
. Copy-pasting default values is error-prone (if we update it, we may forget to update it everywhere)
src/Databases/DatabaseOrdinary.cpp
Outdated
String replica_name = getContext()->getConfigRef().getString("default_replica_name", "{replica}"); | ||
String replicated_args = fmt::format("('{}', '{}')", replica_path, replica_name); | ||
String replicated_engine = "ENGINE = Replicated" + storage->engine->name + replicated_args; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
localhost.localdomain :) create table kek (n int, primary key n) engine=ReplicatedMergeTree('/test/kek/''/lol', '1')
CREATE TABLE kek
(
`n` int
)
ENGINE = ReplicatedMergeTree('/test/kek/\'/lol', '1')
PRIMARY KEY n
localhost.localdomain :) select * from system.zookeeper where path='/test/kek'
SELECT *
FROM system.zookeeper
WHERE path = '/test/kek'
Query id: 0dbff469-7865-4b14-895a-68e1d7814c7d
┌─name─┬─value─┬─path──────┐
│ ' │ │ /test/kek │
└──────┴───────┴───────────┘
1 row in set. Elapsed: 0.034 sec.
(I doubt someone will ever specify a funny path like this as the default path, but it demonstrates the poor quality of the code that formats the engine clause)
src/Databases/DatabaseOrdinary.cpp
Outdated
LOG_INFO | ||
( | ||
log, | ||
"Table {} is loaded as replicated. Not removing convert_to_replicated flag until metadata in zookeeper is restored.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not loaded yet at this point
src/Databases/DatabaseOrdinary.cpp
Outdated
auto convert_to_replicated_flag_path = fs::path(getContext()->getPath()) / "data" / name.database / name.table / "flags" / "convert_to_replicated"; | ||
if (fs::exists(convert_to_replicated_flag_path)) | ||
{ | ||
if (rmt->isTableReadOnly()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be readonly for a different reason (although it's unlikely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what to do in such cases. Should we just ignore the exceptions from restoreMetadataInZooKeeper
in case table is readonly not for missing metadata reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's better check has_metadata_in_zookeeper
instead of is_readonly
src/Databases/DatabaseOrdinary.cpp
Outdated
|
||
static fs::path getConvertToReplicatedFlagPath(ContextPtr context, const QualifiedTableName & name) | ||
{ | ||
return fs::path(context->getPath()) / "data" / name.database / name.table / CONVERT_TO_REPLICATED_FLAG_NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use getTableDataPath
(it works differently with DatabaseAtomic
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You suggest to set the flag at /clickhouse/store/xxx/...
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using getTableDataPath
which will provide the right location to create the flag.
src/Databases/DatabaseOrdinary.cpp
Outdated
|
||
auto convert_to_replicated_flag_path = getConvertToReplicatedFlagPath(getContext(), qualified_name); | ||
|
||
LOG_DEBUG(log, "Searching for {} flag at {}.", CONVERT_TO_REPLICATED_FLAG_NAME, backQuote(convert_to_replicated_flag_path.string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message will be logged every time the server starts, for every MergeTree table even if you don't want to convert anything. Also, this message doesn't look really useful. So consider removing it
src/Databases/DatabaseOrdinary.cpp
Outdated
catch (...) | ||
{ | ||
/// Something unpredicted happened | ||
/// Remove flag to allow server to restart | ||
fs::remove(convert_to_replicated_flag_path); | ||
throw; | ||
} | ||
} | ||
} | ||
fs::remove(convert_to_replicated_flag_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the flag anyway, then we can remove it before restoring metadata (just after checking if the flag exists), and remove the try-catch
src/Databases/DatabaseOrdinary.cpp
Outdated
LOG_INFO | ||
( | ||
log, | ||
"No connection to ZooKeeper, can't restore metadata for {} in ZooKeeper after conversion. Run SYSTEM RESTORE REPLICA while connected to ZooKeeper.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be hard to notice this message, let's at least change it to LOG_WARNING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI It could be a .sh
functional test with DETACH/ATTACH DATABASE instead of restarting the server, but it's okay to keep it as integration test
@kirillgarbar I'm trying to use this for a table which uses JBOD policy and has multiple data_paths and it doesn't seem to work:
I've tried creating the flag in the first or all directories and it is seemingly ignored. It worked just fine on a default policy table which has one storage directory. |
@cthu1hoo Currently the path in which the flag is being searched leads to default disk. |
Done similarly to conversion of database engine from
Ordinary
toAtomic
.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty
convert_to_replicated
file in table's data directory (/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/
) and that table will be converted automatically on next server start.Documentation entry for user-facing changes