Skip to content

Conversation

@Thirunarayanan
Copy link
Member

  • The Jira issue number for this PR is: MDEV-36733

Description

MDEV-36733 Clone: Implement support for local cloning

This commit introduces the CLONE LOCAL DATA DIRECTORY command, enabling the creation of a
physical copy of a MariaDB instance on the same server.

  • Adds the clone plugin, which must be installed to enable the functionality.
  • Requires a new RELOAD_ACL & LOCK_TABLES privilege for the user to perform the clone operation.
  • Clones all InnoDB and ARIA tables, including system tables.
  • Supports cloning while concurrent DML operations are active.
    This new functionality provides a fast and efficient way to duplicate a MariaDB data directory.

Limitation:

  1. Single threaded
  2. Block DDL operation
  3. Avoids encrypted and compressed table
  4. Binary log position and GTID are not cloned

How can this PR be tested?

Added --suite=clone

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

INSTALL PLUGIN clone SONAME 'mysql_clone.so';

CLONE LOCAL DATA DIRECTORY [=] 'clone_dir';

CLONE INSTANCE FROM 'user'@'host':port
IDENTIFIED BY 'password'
[DATA DIRECTORY [=] 'clone_dir']
[REQUIRE [NO] SSL];

UNINSTALL PLUGIN CLONE;
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a draft, so I am posting just some superficial comments.

Comment on lines +449 to +450
case DB_ABORT_INCOMPLETE_CLONE:
return("Incomplete cloned data directory.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make use of DB_INTERRUPTED instead of introducing a new, rarely used, error code?

Comment on lines 289 to 296
void ut_format_byte_value(uint64_t data_bytes, std::string &data_str)
{
int unit_sz= 1024;
auto exp = static_cast<int>(
(data_bytes == 0) ? 0 : std::log(data_bytes) / std::log(unit_sz));
auto data_value=static_cast<double>(data_bytes) / std::pow(unit_sz, exp);

char unit[]= " KMGTPE";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be duplicating d9f7a6b (#1957). Can we make use of ib::bytes_iec instead of adding this function?

Comment on lines 30 to 41
/*************************************************************//**
Report a failed assertion. */

static std::function<void()> assert_callback;

void ut_set_assert_callback(std::function<void()> &callback) {
assert_callback = callback;
}

ATTRIBUTE_NORETURN
void
ut_dbg_assertion_failed(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "hijacking" the Doxygen comment of ut_dbg_assertion_failed(). Furthermore, we do not seem to need this indirection.

Instead, we could just invoke clone_files_fatal_error() whenever needed. What is the purpose of that function? Why would we want to invoke it on ut_a or ut_error, but not on any fatal signal? Could clone_files_fatal_error be registered as part of the signal handler? What is its purpose?

Comment on lines +649 to +651
Clone_notify notifier(Clone_notify::Type::SPACE_UNDO_TRUNCATE,
UINT32_MAX, true);
if (notifier.failed()) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce the scope of this object?

    if (Clone_notify(Clone_notify::Type::SPACE_UNDO_TRUNCATE,
                     UINT32_MAX, true).failed())
      // some comment that explains what is going on
      return;

Copy link
Contributor

@dr-m dr-m Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Clone_notify is a RAII object and that we probably want to block concurrent clone activity while the undo log truncation is in progress. Some comments and debug assertions would be helpful:

  /* Prevent innodb_undo_log_truncate=ON conflict with clone activity. */
  Clone_notify notifier(Clone_notify::Type::SPACE_UNDO_TRUNCATE,
                        UINT32_MAX, true);
  if (ut_d(int err=) notifier.get_error())
  {
    ut_ad(err == ER_CLONE_IN_PROGRESS);
    return;
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, do we actually need to block a concurrent clone operation? I understand that in MySQL 5.7 and possibly also later versions, innodb_undo_log_truncate=ON will execute a log checkpoint and create a separate recovery log file. The undo tablespace truncation is just using regular write-ahead logging and some special logic in recv_sys_t::trim(). Is there any reason why this would not work when applying the log on the cloned files? Can we remove this code altogether here?

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be quite a bit of dead code included. I think that we need to remove or at least #if 0 it out.
https://www.theregister.com/2025/08/30/programmers_watch_your_weight/
https://www.nongnu.org/lzip/the_emperors_old_clothes.txt

You include only those features which you know to be needed

Comment on lines +28 to +29
/** @file include/clone0api.h
Innodb Clone Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s usually spelled InnoDB.

Comment on lines +33 to +34
#ifndef CLONE_API_INCLUDE
#define CLONE_API_INCLUDE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, let us use #pragma once.

Comment on lines +36 to +37
#include "univ.i"
#ifndef UNIV_HOTBACKUP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNIV_HOTBACKUP should have been removed in 63574f1 (MDEV-11690).
Do we really have to include univ.i in this code?

Comment on lines +38 to +42
#include "handler.h"
#include <fil0fil.h>

using space_id_t = decltype(fil_space_t::id);
using page_no_t = uint32_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to include both headers in this code? The fil0fil.h only seems to be needed for this definition of space_id_t. Starting with ca501ff we can simply use uint32_t.

Comment on lines 168 to 193
/** Notification type. Currently used by various DDL commands. */
enum class Type {
/* Space is being created. */
SPACE_CREATE,
/* Space is being dropped. */
SPACE_DROP,
/* Space is being renamed. */
SPACE_RENAME,
/* Space is being discarded or imported. */
SPACE_IMPORT,
/* Space encryption property is altered. */
SPACE_ALTER_ENCRYPT,
/* Space encryption property of general tablespace is altered. */
SPACE_ALTER_ENCRYPT_GENERAL,
/* Space encryption flags of general tablespace are altered. */
SPACE_ALTER_ENCRYPT_GENERAL_FLAGS,
/* In place Alter general notification. */
SPACE_ALTER_INPLACE,
/* Inplace Alter bulk operation. */
SPACE_ALTER_INPLACE_BULK,
/* Special consideration is needed for UNDO as these DDLs
don't use DDL log and needs special consideration during recovery. */
SPACE_UNDO_TRUNCATE,
/* Redo log resizing */
SYSTEM_REDO_RESIZE
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum constants are missing Doxygen-style comments (which start with /** instead of /*). Furthermore, only the last 2 types are actually being used at the moment. I would suggest to add #if 0 around any other constants, or removing this parameter for now altogether.

I would actually suggest removing or #if 0’ing this enum class Type altogether for now because the SYSTEM_REDO_RESIZE can be quite different from anything else, not related to any particular tablespace. I see that there is quite a bit of dead code in Clone_notify::Clone_notify(). Anything after the following is currently unreachable:

  if (type == Type::SYSTEM_REDO_RESIZE || type == Type::SPACE_IMPORT ||
      type == Type::SPACE_UNDO_TRUNCATE)
  {
    //
    return;
  }

All parameters of Clone_notify::Clone_notify() as well as the related data members can (and should) be removed for now.

Comment on lines +227 to +237
/** Notification wait type set. */
enum class Wait_at {
/* Clone doesn't need to wait. */
NONE,
/* Clone needs to wait before entering. */
ENTER,
/* Clone needs to wait before state change. */
STATE_CHANGE,
/* Clone needs to abort. */
ABORT
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each value is missing Doxygen comments. Are any other values than ABORT actually assigned for now? If not, the Wait_at as well as m_wait need to be #if 0’d or removed.

Comment on lines +240 to +250
/** Tablespace ID for which notification is sent. */
space_id_t m_space_id;

/** Notification type. */
Type m_type;

/** Wait type set. */
Wait_at m_wait;

/** Blocked clone state if clone is blocked. */
uint32_t m_blocked_state;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all these can be #if 0’d out. m_blocked_state is being never assigned, because Clone_notify is only being constructed with type == Type::SYSTEM_REDO_RESIZE or type == Type::SPACE_UNDO_TRUNCATE, and that case will assign m_wait= Wait_at::ABORT (unless the first component returned by clone_sys->check_active_clone() holds).

Comment on lines +418 to +435
std::tuple<bool, Clone_Handle *> Clone_Sys::check_active_clone() {
mysql_mutex_assert_owner(&m_clone_sys_mutex);

bool active_clone = false;
Clone_Handle *active_handle = nullptr;

/* Check for active clone operations. */
for (int idx = 0; idx < CLONE_ARR_SIZE; idx++) {
auto clone_hdl = m_clone_arr[idx];

if (clone_hdl != nullptr && clone_hdl->is_copy_clone()) {
active_clone = true;
active_handle = clone_hdl;
break;
}
}
return std::make_tuple(active_clone, active_handle);
}
Copy link
Contributor

@dr-m dr-m Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first component of the tuple is completely redundant, and it will incur runtime overhead. Please let us just return a pointer:

Clone_handle *Clone_sys::check_active_clone()
{
  mysql_mutex_assert_owner(&m_clone_sys_mutex);
  for (Clone_handle *h: m_clone_arr)
    if (h && h->is_copy_clone())
      return h;
  return nullptr;
}

Comment on lines +406 to +416
bool Clone_Sys::check_active_clone(bool print_alert) {
bool active_clone = false;
std::tie(active_clone, std::ignore) = check_active_clone();

if (active_clone && print_alert) {
/* purecov: begin inspected */
ib::info() << "DDL waiting for CLONE to abort";
/* purecov: end */
}
return (active_clone);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like mostly dead code; we’re always invoking it with print_alert=false.

Comment on lines +1292 to +1299
/** Check if any active clone is running.
@param[in] print_alert print alert message
@return true, if concurrent clone in progress */
bool check_active_clone(bool print_alert);

/** Check if any active clone is running.
@return (true, handle) if concurrent clone in progress */
std::tuple<bool, Clone_Handle *> check_active_clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the first function is only being invoked with print_alert=false, we can remove this function altogether, and just invoke the more complex check_active_clone(), which must be revised:

/**
@return handle of active clone
@retval nullptr if no clone is in progress */
Clone_handle *check_active_clone();

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that part of the code that is currently in storage/innobase/include/chunk* and storage/innobase/chunkcould be better be a part ofplugin/clone. I still don’t have a good overview of everything. I think that storage/maria/ma_clone.cc`, which I did not read yet, should give an idea of what to except from a minimal storage engine implementation.

Comment on lines 50 to 54
/** Directory under data directory for all clone status files. */
#define CLONE_FILES_DIR OS_FILE_PREFIX "clone" OS_PATH_SEPARATOR_STR

/** Clone in progress file name length. */
const size_t CLONE_INNODB_FILE_LEN = 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel like unnecessary obfuscation. The const should also cause run-time overhead; constexpr would be better. It is not clear which file name is being referred to, and whether the name would include a CLONE_FILES_DIR prefix.

The definition of CLONE_FILES_DIR is overly complex. This patch is introducing OS_FILE_PREFIX in a different header file, as well as reintroducing OS_PATH_SEPARATOR and OS_PATH_SEPARATOR_STR, which had been removed for a good reason in cf552f5.

It would be much shorter and more obvious to just write something like

extern const char clone_files_dir[];

and define it in clone/clone0api.cc (which has 2 references to the constant) or clone/clone0clone.cc (1 reference):

const char clone_files_dir[]{"#clone/"}

I have a feeling that Clone_Handle should actually belong to the clone plugin, and not to InnoDB. We would want to minimize the overhead when the clone plugin is not being loaded.

Comment on lines +56 to +101
#ifdef UNIV_DEBUG
/** Clone simulate recovery error file name. */
const char CLONE_INNODB_RECOVERY_CRASH_POINT[] =
CLONE_FILES_DIR OS_FILE_PREFIX "status_crash_point";
#endif

/** Clone in progress file name. */
const char CLONE_INNODB_IN_PROGRESS_FILE[] =
CLONE_FILES_DIR OS_FILE_PREFIX "status_in_progress";

/** Clone error file name. */
const char CLONE_INNODB_ERROR_FILE[] =
CLONE_FILES_DIR OS_FILE_PREFIX "status_error";

/** Clone fix up file name. Present when clone needs table fix up. */
const char CLONE_INNODB_FIXUP_FILE[] =
CLONE_FILES_DIR OS_FILE_PREFIX "status_fix";

/** Clone recovery status. */
const char CLONE_INNODB_RECOVERY_FILE[] =
CLONE_FILES_DIR OS_FILE_PREFIX "status_recovery";

/** Clone file name for list of files cloned in place. */
const char CLONE_INNODB_NEW_FILES[] =
CLONE_FILES_DIR OS_FILE_PREFIX "new_files";

/** Clone file name for list of files to be replaced. */
const char CLONE_INNODB_REPLACED_FILES[] =
CLONE_FILES_DIR OS_FILE_PREFIX "replace_files";

/** Clone file name for list of old files to be removed. */
const char CLONE_INNODB_OLD_FILES[] =
CLONE_FILES_DIR OS_FILE_PREFIX "old_files";

/** Clone file name for list of temp files renamed by ddl. */
const char CLONE_INNODB_DDL_FILES[] =
CLONE_FILES_DIR OS_FILE_PREFIX "ddl_files";

/** Clone file extension for files to be replaced. */
const char CLONE_INNODB_REPLACED_FILE_EXTN[] = "." OS_FILE_PREFIX "clone";

/** Clone file extension for saved old files. */
const char CLONE_INNODB_SAVED_FILE_EXTN[] = "." OS_FILE_PREFIX "clone_save";

/** Clone file extension for temporary renamed file. */
const char CLONE_INNODB_DDL_FILE_EXTN[] = "." OS_FILE_PREFIX "clone_ddl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these had better be declared extern here and defined somewhere else. Most of these seem to be specific to clone0api.cc. It would be cleaner to declare these as static data members of an object, say, Clone_Sys or Clone_Handle.

Even better, if clone0api.cc exposed some interfaces to clone0apply.cc and clone0snapshot.cc, then only clone0api.cc would have to know about those file names.

For the _EXTN constants, it could be cleaner to try to use suffixes that are already in use by mariadb-backup: .new, .del, .ren. Those suffixes should avoid a risk of exceeding the maximum file name length limit, compared to the .ibd suffix that the being-copied server is using. The limit is only 255 bytes on Linux extfs and its derivatives.

Comment on lines +103 to +105
using Clone_Msec = std::chrono::milliseconds;
using Clone_Sec = std::chrono::seconds;
using Clone_Min = std::chrono::minutes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like unnecessary obfuscation. Why can’t we just use those longer names in place of these aliases?

Comment on lines +107 to +114
/** Default sleep time while waiting: 100 ms */
const Clone_Msec CLONE_DEF_SLEEP{100};

/** Default alert interval in multiple of sleep time: 5 seconds */
const Clone_Sec CLONE_DEF_ALERT_INTERVAL{5};

/** Default timeout in multiple of sleep time: 30 minutes */
const Clone_Min CLONE_DEF_TIMEOUT{30};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are only used in the wrapper Clone_Sys::wait_default. I think that it would be clearer to just use the literals directly.

Comment on lines +1357 to +1358
/** Clone system global */
extern Clone_Sys *clone_sys;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singleton objects are best allocated statically, like we do for trx_sys, dict_sys, lock_sys, buf_pool, recv_sys.

The functions clone_init() and clone_free() should be replaced with Clone_Sys::init() and Clone_Sys::free().

Comment on lines +1350 to +1351
/** If all innodb tablespaces are initialized. */
std::atomic<bool> m_space_initialized;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear why this member is needed. Can innodb_clone_begin() be invoked several times, "incrementally"?

Comment on lines +1344 to +1345
/** Clone system mutex */
mysql_mutex_t m_clone_sys_mutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which data members or operations does this protect?

Comment on lines +188 to +190
/** Map for current block number for unfinished chunks. Used during
restart from incomplete clone operation. */
using Chunk_Map = std::map<uint32_t, uint32_t>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an incomplete clone operation and how does the restart happen? Could the comment mention some relevant function names or SQL statements? What is protecting the Chunk_Info::m_incomplete_chunks? Why do we have this alias at all?

Comment on lines +192 to +193
/** Bitmap for completed chunks in current state */
class Chnunk_Bitmap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a chunk in this context? The class name is mistyped: it should be Chunk_Bitmap.

Comment on lines +224 to +232
operator bool() const {
auto &val = m_bitmap_ref[m_map_index];

if ((val & m_bit_mask) == 0) {
return (false);
}

return (true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather complex way of writing

operator bool() const { return m_bitmap_ref[m_map_index] & m_bit_mask; }

The entire Bitmap_Operator_Impl seems to a potential pessimization. I didn’t check the generated code yet. But, I think that we should consider using the existing template<uint> class Bitmap, instead of introducing a reimplementation of something similar.

Comment on lines +123 to +124
/** Clone plugin name */
std::string m_plugin_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any reason for this to be anything else than const char *const. It is being copied from compile-time constant strings. Why don’t we just have a pointer to the compile-time constant?

Is this the name of plugin/clone? There is only one implementation, right? Do we really need this data member at this point of time?

Comment on lines +117 to +121
/** True if clone provisioning in progress. */
static std::atomic<int> s_provision_in_progress;

/** True, if any user data is dropped by clone. */
static std::atomic<bool> s_is_data_dropped;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these be behind the m_plugin_handle interface? That is, can we have as little code in the server core when the plugin has not been loaded?

Comment on lines +32 to +35
#include <atomic>
#include <string>
#include <set>
#include <tuple>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need every single one of these headers here?

usr/lib/mysql/plugin/sql_errlog.so
usr/lib/mysql/plugin/type_mysql_json.so
usr/lib/mysql/plugin/wsrep_info.so
usr/lib/mysql/plugin/mariadb_clone.so

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigabrt found on debug build whilee testing main-MDEV-36733


main-MDEV-36733, origin/main-MDEV-36733 7cc113f555cecc6ade9cadd3e3ebae3c9129e0aa 2025-09-02T18:48:45+05:30

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=139922541356608) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=139922541356608) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=139922541356608, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  __GI_abort () at ./stdlib/abort.c:79
#5  safe_mutex_destroy (mp=<myclone::Client::s_table_mutex>, file=<optimized out>, line=<optimized out>) at /data/Server/MDEV-36733/mysys/thr_mutex.c:597
#6  inline_mysql_mutex_destroy (src_line=712, src_file="/data/Server/MDEV-36733/plugin/clone/src/clone_client.cc", that=<optimized out>)
    at /data/Server/MDEV-36733/include/mysql/psi/mysql_thread.h:724
#7  myclone::Client::uninit_pfs () at /data/Server/MDEV-36733/plugin/clone/src/clone_client.cc:712
#8  myclone::Table_pfs::drop_proxy_tables () at /data/Server/MDEV-36733/plugin/clone/src/clone_status.cc:139
#9  myclone::Table_pfs::release_services () at /data/Server/MDEV-36733/plugin/clone/src/clone_status.cc:225
#10 plugin_clone_deinit (plugin_info=<optimized out>) at /data/Server/MDEV-36733/plugin/clone/src/clone_plugin.cc:408
#11 plugin_deinitialize (plugin=, ref_check=ref_check@entry=true) at /data/Server/MDEV-36733/sql/sql_plugin.cc:1279
#12 reap_plugins () at /data/Server/MDEV-36733/sql/sql_plugin.cc:1350
#13 mysql_uninstall_plugin (thd=thd@entry=, name=<optimized out>, dl_arg=<optimized out>) at /data/Server/MDEV-36733/sql/sql_plugin.cc:2494
#14 mysql_execute_command (thd=thd@entry=, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/MDEV-36733/sql/sql_parse.cc:5790
#15 mysql_parse (thd=thd@entry=, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=) at /data/Server/MDEV-36733/sql/sql_parse.cc:7914
#16 dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=, packet=packet@entry="UNINSTALL PLUGIN IF EXISTS clone /* E_R Thread6 QNO 2774 CON_ID 23 */ ", 
    packet_length=packet_length@entry=70, blocking=blocking@entry=true) at /data/Server/MDEV-36733/sql/sql_parse.cc:1896
#17 do_command (thd=thd@entry=, blocking=blocking@entry=true) at /data/Server/MDEV-36733/sql/sql_parse.cc:1419
#18 do_handle_one_connection (connect=<optimized out>, connect@entry=, put_in_cache=put_in_cache@entry=true) at /data/Server/MDEV-36733/sql/sql_connect.cc:1414
#19 handle_one_connection (arg=) at /data/Server/MDEV-36733/sql/sql_connect.cc:1326
#20 start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#21 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

RR trace is present at pluto:-
/data/results/1756833677/TBR-2326
Raised the bug:-
https://jira.mariadb.org/browse/MDEV-37551

partition.m_index_file, &m_cap, block, copy_buffer) ==
HA_ERR_END_OF_FILE))
break;
error= aria_read_index(partition.m_index_file,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion found on main-MDEV-36733 on debug build


main-MDEV-36733, origin/main-MDEV-36733 7cc113f555cecc6ade9cadd3e3ebae3c9129e0aa 2025-09-02T18:48:45+05:30

# 2025-09-02T17:29:54 [2496309] | mariadbd: /data/Server/MDEV-36733/sql/sql_plugin.cc:1396: void intern_plugin_unlock(LEX*, plugin_ref): Assertion `pi->ref_count' failed.


Stacktrace

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140442117543488) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140442117543488) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140442117543488, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  __GI_abort () at ./stdlib/abort.c:79
#5  __assert_fail_base (fmt="%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion="pi->ref_count", file="/data/Server/MDEV-36733/sql/sql_plugin.cc", line=1396, 
    function=<optimized out>) at ./assert/assert.c:92
#6  __GI___assert_fail (assertion="pi->ref_count", file="/data/Server/MDEV-36733/sql/sql_plugin.cc", line=1396, function="void intern_plugin_unlock(LEX*, plugin_ref)")
    at ./assert/assert.c:101
#7  intern_plugin_unlock (lex=lex@entry=, plugin=) at /data/Server/MDEV-36733/sql/sql_plugin.cc:1396
#8  plugin_unlock_list (thd=thd@entry=, list=, list@entry=, count=0, count@entry=86) at /data/Server/MDEV-36733/sql/sql_plugin.cc:1438
#9  plugin_foreach_with_mask (thd=<optimized out>, func=func@entry=<show_plugins(THD*, plugin_ref, void*)>, type=type@entry=-1, state_mask=state_mask@entry=4294967294, 
    arg=<optimized out>) at /data/Server/MDEV-36733/sql/sql_plugin.cc:2555
#10 fill_plugins (thd=<optimized out>, tables=<optimized out>, cond=<optimized out>) at /data/Server/MDEV-36733/sql/sql_show.cc:346
#11 get_schema_tables_result (join=join@entry=, executed_place=executed_place@entry=PROCESSED_BY_JOIN_EXEC) at /data/Server/MDEV-36733/sql/sql_show.cc:9740
#12 JOIN::exec_inner (this=this@entry=) at /data/Server/MDEV-36733/sql/sql_select.cc:5047
#13 JOIN::exec (this=this@entry=) at /data/Server/MDEV-36733/sql/sql_select.cc:4874
#14 mysql_select (thd=thd@entry=, tables=, fields=..., conds=, og_num=0, order=, group=, having=, proc_param=, select_options=2701396736, result=, unit=, select_lex=)
    at /data/Server/MDEV-36733/sql/sql_select.cc:5402
#15 handle_select (thd=thd@entry=, lex=lex@entry=, result=result@entry=, setup_tables_done_option=setup_tables_done_option@entry=0) at /data/Server/MDEV-36733/sql/sql_select.cc:634
#16 execute_sqlcom_select (thd=thd@entry=, all_tables=) at /data/Server/MDEV-36733/sql/sql_parse.cc:6198
#17 mysql_execute_command (thd=thd@entry=, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/MDEV-36733/sql/sql_parse.cc:3975
#18 mysql_parse (thd=thd@entry=, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=) at /data/Server/MDEV-36733/sql/sql_parse.cc:7914
#19 dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=, 
    packet=packet@entry="SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME = 'clone' /* E_R Thread10 QNO 469 CON_ID 26 */ ", 
    packet_length=packet_length@entry=131, blocking=blocking@entry=true) at /data/Server/MDEV-36733/sql/sql_parse.cc:1896
#20 do_command (thd=thd@entry=, blocking=blocking@entry=true) at /data/Server/MDEV-36733/sql/sql_parse.cc:1419
#21 do_handle_one_connection (connect=<optimized out>, connect@entry=, put_in_cache=put_in_cache@entry=true) at /data/Server/MDEV-36733/sql/sql_connect.cc:1414
#22 handle_one_connection (arg=) at /data/Server/MDEV-36733/sql/sql_connect.cc:1326
#23 start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

RR trace is present on pluto:
/data/results/1756833677/TBR-2327

Raised the bug:-
https://jira.mariadb.org/browse/MDEV-37552

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not a complete review.

It seems to me that we should refactor the log archiving. I am not happy to see a large number of include/arch* and arch/* files. Do we really need a new file format for the archived log? Also, I do not understand what a doublewrite buffer would be useful (even in the case that the archiving covers actual persistent data files and not only the write-ahead log).

Comment on lines +649 to +654
#endif // NEVER

Progress_pfs::Progress_pfs() : Table_pfs(S_NUM_ROWS)
{
// auto table = get_proxy_share();
// table->m_table_name = "clone_progress";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this logic would be needed in MDEV-36742 Clone: Support Remote Clone. I would suggest to remove all #ifdef NEVER and commented-out lines, to make this reviewable. It’s easier to review and understand just the bare minimum code that is needed.

Comment on lines +789 to +792
status_file << state << " " << m_threads[cur_index] << " "
<< m_start_time[cur_index] << " " << m_end_time[cur_index]
<< " " << m_estimate[cur_index] << " " << m_complete[cur_index]
<< " " << m_network[cur_index] << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code (which is not enclosed in #ifdef NEVER) really used or reachable? It seems that m_network is irrelevant for this task (local clone).

Comment on lines +8967 to +9009
| CLONE_SYM INSTANCE_SYM FROM user ':' ulong_num
IDENTIFIED_SYM BY TEXT_STRING
opt_datadir_ssl
{
Lex->sql_command= SQLCOM_CLONE;
/* TODO: Reject space characters around ':' */
$4->auth= new (thd->mem_root) USER_AUTH();
$4->auth->pwtext= $9;

Lex->m_sql_cmd= new (thd->mem_root)
Sql_cmd_clone($4, $6, $10);

if (Lex->m_sql_cmd == nullptr)
MYSQL_YYABORT;
}
;

opt_datadir_ssl:
opt_ssl
{
$$= null_clex_str;
}
| DATA_SYM DIRECTORY_SYM opt_equal TEXT_STRING_filesystem opt_ssl
{
$$= $4;
}
;

opt_ssl:
/* empty */
{
Lex->account_options.ssl_type= SSL_TYPE_NOT_SPECIFIED;
}
| REQUIRE_SYM SSL_SYM
{
Lex->account_options.ssl_type= SSL_TYPE_SPECIFIED;
}
| REQUIRE_SYM NO_SYM SSL_SYM
{
Lex->account_options.ssl_type= SSL_TYPE_NONE;
}
;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (as well as the introduction of the terminal symbol INSTANCE_SYM) seems to be needed for a future task MDEV-36742 (remote clone), not this one.

Comment on lines +68 to +70
while (arch_sys && arch_sys->signal_archiver())
{
std::this_thread::sleep_for(sleep_time);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of sleeping, can we wait for a condition variable, possibly with a timeout? There seems to be a function Arch_Sys::archiver_wait(), which could be extended with a timeout argument.

Comment on lines +104 to +110
mysql_mutex_lock(&m_mutex);
if (!m_archiver_active)
{
try
{
std::thread(Arch_Sys::archiver).detach();
m_archiver_active= true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cleaner to set m_archiver_active before creating the thread. Also, m_archiver_active could be Atomic_relaxed<bool>.

Furthermore, I think that it would be better to use the tpool subsystem for scheduling these tasks, like we do for the purge and most other InnoDB tasks.

Comment on lines +125 to +127
bool alive= false;
mysql_mutex_lock(&m_mutex);
if (m_archiver_active)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler:

mysql_mutex_lock(&m_mutex);
const bool alive{m_archiver_active};
if (alive)

Comment on lines +246 to +247
/* First file in the archive group. */
ut_ad(m_file_ctx.get_count() == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one ib_logfile0 in MariaDB, and this is about log archiving. What is an "archive group"? The parameter innodb_log_files_in_group was removed from MariaDB.

Also, if this subsystem is only about log archiving, why don’t we just append to the archived log file in log_t::write_buf(), similar to how we maintain an extra copy re_write_buf for ib_logfile101 during log resizing?

Comment on lines +279 to +280
Arch_Group::write_to_doublewrite_file(from_file, from_buffer, write_size,
dblwr_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to write any doublewrite buffer? mariadb-backup does not deal with any doublewrite buffer either.

Comment on lines +325 to +330
bool Arch_File_Ctx::delete_file(uint file_index, lsn_t begin_lsn)
{
bool success;
char file_name[MAX_ARCH_PAGE_FILE_NAME_LEN];

build_name(file_index, begin_lsn, file_name, MAX_ARCH_PAGE_FILE_NAME_LEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no comment of the purpose of the begin_lsn (not even in arch0arch.h). Apparently it may form a part of the file name. It is also completely unclear what a "group" is, in the context of begin_lsn. There are no high-level source code comments about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Arch_File_Ctx::delete_file() is not being invoked at all while executing the regression tests.

Comment on lines +18786 to +18789
#ifndef EMBEDDED_LIBRARY
Clone_notify *notifier= new Clone_notify(
Clone_notify::Type::SYSTEM_REDO_RESIZE, UINT32_MAX, true);
if (notifier->failed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is completely unclear to me why log resizing would interfere with the clone functionality. If we replicated the log in log_t::write_buf(), then we should be completely indifferent to any log resizing.

Log resizing currently interferes with mariadb-backup --backup because once the log resizing completes, the old ib_logfile0 (which will be replaced by the ib_logfile101 that was created for the resized log) will cease to receive updates. So, from the point of view of mariadb-backup --backup which is holding a file handle to the old ib_logfile0, the server will appear to hang (stop writing any log to the old file).

In the clone, as far as I understand, we could just create an append-only archive log file, starting from the latest checkpoint LSN or the current LSN. It should basically be just like the ib_logfile101 that we create for log resizing, only an independent copy, using its own log_sys.clone_lsn instead of log_sys.resize_lsn. It should not matter if the server is concurrently resizing its main log file; we can just keep appending to the clone/ib_logfile0 or whatever we call that file.

Comment on lines +45 to +54
typedef struct st_net_server NET_SERVER;
typedef struct st_mysql MYSQL;
typedef struct st_mysql_socket MYSQL_SOCKET;

// #include "mysql_com_server.h"

/** Connection parameters including SSL */
typedef struct mysql_clone_ssl_context_t {
/** Clone ssl mode. Same as mysql client --ssl-mode */
int m_ssl_mode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not seem to need any of this for the local clone functionality. Do we need this file at all? Also, I see that many unrlated .h.pp files for other plugins were changed. That does not seem right to me.

../sql/opt_hints.cc ../sql/opt_hints.h
../sql/opt_trace_ddl_info.cc ../sql/opt_trace_ddl_info.h
${GEN_SOURCES}
${GEN_SOURCES}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be only this white-space-only change to this file. I would suggest to revert it.

Comment on lines -1957 to +1958
performance-schema-max-stage-classes 160
performance-schema-max-statement-classes 227
performance-schema-max-stage-classes 175
performance-schema-max-statement-classes 229
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all these classes for the local clone functionality?

Comment on lines +9 to +14
# 2. Test Remote Clone command. Default is local clone.
# --let remote_clone = 1
#
# 3. Remote Clone command is expected to return error and the error number
# is different from local clone.
# --let clone_remote_err = <error number>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not relevant here. I would suggest to omit the numbering "1.", "2.", "3." as well, to avoid the need of any future merge conflicts or adjustments.

Comment on lines +16 to +29
# 4. Skip clone_valid_donor_list configuration for testing error cases
# --let skip_donor_config = 1
#
# 5. Test clone automatic tuning of threads
# --let clone_auto_tune = 1
#
# 6. Test clone command forcing SSL [REQUIRES SSL]
# --let clone_require_ssl = 1
#
# 7. Test clone command forcing insecure connection [REQUIRES NO SSL]
# --let clone_require_no_ssl = 1
#
# 8. Test clone command forcing SSL certificate validation
# --let clone_require_ssl_certificate = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about 4. and 5., but the SSL related options definitely do not seem to be applicable to the local clone. I would suggest to trim this file to the minimum.

Comment on lines +33 to +35
if ($ddl_encryption) {
SET GLOBAL DEBUG = '+d,log_redo_with_invalid_master_key';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no matching DBUG_EXECUTE_IF instrumentation.

Comment on lines +37 to +39
if ($ddl_redo_encrypt) {
--let $skip_ddl = 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need this? The log_t::write_buf(), which in my opinion is the correct place for replicating or "archiving" the log, would just see and copy the encrypted data as is if innodb_encrypt_log=ON.

Comment on lines +13 to +14
--source include/big_test.inc
--source include/not_parallel.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these for? Are the tests that include this file being run on buildbot at all?

--source include/count_sessions.inc

## Get rid of the binary logs from previous run.
RESET BINARY LOGS AND GTIDS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax seems to be specific to MySQL and should throw a syntax error on MariaDB. I did not find any .result file with this output, or any reference to this file ddl_common.inc in any other file.

Comment on lines +25 to +31
if ($remote_clone) {
SET GLOBAL DEBUG = '+d,remote_release_clone_file_pin';
}

if (!$remote_clone) {
SET GLOBAL DEBUG = '+d,local_release_clone_file_pin';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBUG_EXECUTE_IF for both these instrumentations exist in the source code, but they are not covered by any other than this one (which apparently is not being included in any test).

Comment on lines 925 to 930
ut_ad(latch_have_wr());

if (arch_sys)
arch_sys->log_sys()->wait_archiver(lsn);

lsn_t old= flushed_to_disk_lsn.load(std::memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a race condition here, because Arch_Log_Sys::wait_archiver() may release log_sys.latch. I think that it would be easiest to just not support the clone function with memory-mapped log writes (PMEM). An easy way to do this might be to have log_mmap() return MAP_FAILED when the clone plugin is loaded.

Comment on lines +1106 to 1110
if (arch_sys)
arch_sys->log_sys()->wait_archiver(lsn);

if (resizing != RETAIN_LATCH)
latch.wr_unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arch_Sys::archiver_wait() may release and reacquire log_sys.latch. This would break the assumption that is being made in log_write_and_flush().

I think that it would be cleaner to introduce data members to log_t related to archiving the log, such as clone_lsn and clone_log (similar to resize_lsn and resize_log), and introduce a log_t::clone_write_buf() as well. In that way, cloning the log would simply rely on the existing group commit mechanism and there would be fewer context switches and no potential race condition scenarios.

Comment on lines 227 to +228
/** Last written LSN; protected by latch */
lsn_t write_lsn;
Atomic_relaxed<lsn_t> write_lsn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made? I think that it would be cleanest to have log_sys.latch cover also the cloning of the log, which would be reflected by new data members clone_lsn and clone_log, similar to resize_lsn and resize_log.

MYSQL_SYSVAR(delay_after_data_drop),
#endif
nullptr};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion found on debug build


main-MDEV-36733, origin/main-MDEV-36733 7cc113f555cecc6ade9cadd3e3ebae3c9129e0aa 2025-09-02T18:48:45+05:30

# 2025-09-02T20:19:12 [2626295] | mariadbd: /data/Server/MDEV-36733/plugin/clone/src/clone_client.cc:625: int myclone::Client::pfs_begin_state(): Assertion `s_num_clones == 1' failed.


Stack

# 2025-09-02T20:27:48 [2626295] | Thread 3 (Thread 2635011.2671291 (one_connection)):
# 2025-09-02T20:27:48 [2626295] | #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140496569935424) at ./nptl/pthread_kill.c:44
# 2025-09-02T20:27:48 [2626295] | #1  __pthread_kill_internal (signo=6, threadid=140496569935424) at ./nptl/pthread_kill.c:78
# 2025-09-02T20:27:48 [2626295] | #2  __GI___pthread_kill (threadid=140496569935424, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
# 2025-09-02T20:27:48 [2626295] | #3  0x000060bc66901476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
# 2025-09-02T20:27:48 [2626295] | #4  0x000060bc668e77f3 in __GI_abort () at ./stdlib/abort.c:79
# 2025-09-02T20:27:48 [2626295] | #5  0x000060bc668e771b in __assert_fail_base (fmt=0x60bc66a9c130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x1e99687ea0de "s_num_clones == 1", file=0x1e99687ea4a8 "/data/Server/MDEV-36733/plugin/clone/src/clone_client.cc", line=625, function=<optimized out>) at ./assert/assert.c:92
# 2025-09-02T20:27:48 [2626295] | #6  0x000060bc668f8e96 in __GI___assert_fail (assertion=0x1e99687ea0de "s_num_clones == 1", file=0x1e99687ea4a8 "/data/Server/MDEV-36733/plugin/clone/src/clone_client.cc", line=625, function=0x1e99687ea798 "int myclone::Client::pfs_begin_state()") at ./assert/assert.c:101
# 2025-09-02T20:27:48 [2626295] | #7  0x00001e99687b6092 in myclone::Client::pfs_begin_state (this=this@entry=0x7fc7e82481a8) at /data/Server/MDEV-36733/plugin/clone/src/clone_client.cc:625
# 2025-09-02T20:27:48 [2626295] | #8  0x00001e99687d2337 in myclone::Local::clone (this=this@entry=0x7fc7e82481a0) at /data/Server/MDEV-36733/plugin/clone/src/clone_local.cc:64
# 2025-09-02T20:27:48 [2626295] | #9  0x00001e99687b34a6 in plugin_clone_local (thd=0x631360020158, data_dir=<optimized out>) at /data/Server/MDEV-36733/plugin/clone/src/clone_plugin.cc:429
# 2025-09-02T20:27:48 [2626295] | #10 0x000055a258a17842 in Clone_handler::clone_local (this=0x1359200d7010, thd=thd@entry=0x631360020158, data_dir=<optimized out>) at /data/Server/MDEV-36733/sql/clone_handler.cc:124
# 2025-09-02T20:27:48 [2626295] | #11 0x000055a25888128c in Sql_cmd_clone::execute (this=0x6313600356f8, thd=0x631360020158) at /data/Server/MDEV-36733/sql/sql_admin.cc:1800
# 2025-09-02T20:27:48 [2626295] | #12 0x000055a2586e2b61 in mysql_execute_command (thd=thd@entry=0x631360020158, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/MDEV-36733/sql/sql_parse.cc:5889
# 2025-09-02T20:27:48 [2626295] | #13 0x000055a2586e3871 in mysql_parse (thd=thd@entry=0x631360020158, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fc7e8249320) at /data/Server/MDEV-36733/sql/sql_parse.cc:7914
# 2025-09-02T20:27:48 [2626295] | #14 0x000055a2586e59ed in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x631360020158, packet=packet@entry=0x63136002a7b9 "CLONE LOCAL DATA DIRECTORY = '/invalid/nonexistent/path' /* E_R Thread13 QNO 631 CON_ID 27 */ ", packet_length=packet_length@entry=94, blocking=blocking@entry=true) at /data/Server/MDEV-36733/sql/sql_parse.cc:1896
# 2025-09-02T20:27:48 [2626295] | #15 0x000055a2586e7a3e in do_command (thd=thd@entry=0x631360020158, blocking=blocking@entry=true) at /data/Server/MDEV-36733/sql/sql_parse.cc:1419
# 2025-09-02T20:27:48 [2626295] | #16 0x000055a25886ea17 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55a25c6186a8, put_in_cache=put_in_cache@entry=true) at /data/Server/MDEV-36733/sql/sql_connect.cc:1414
# 2025-09-02T20:27:48 [2626295] | #17 0x000055a25886ec57 in handle_one_connection (arg=0x55a25c6186a8) at /data/Server/MDEV-36733/sql/sql_connect.cc:1326
# 2025-09-02T20:27:48 [2626295] | #18 0x000060bc66953ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
# 2025-09-02T20:27:48 [2626295] | #19 0x000060bc669e4a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

RR trace is present on pluto:-
/data/results/1756833677/TBR-2328

Bug:-https://jira.mariadb.org/browse/MDEV-37561



@dr-m
Copy link
Contributor

dr-m commented Sep 4, 2025

I compiled both 4fd2564 and the baseline e02f4d7 with cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo for AMD64 and ran strip on the main build artefacts to assess the size impact. I guess that cmake -DCMAKE_BUILD_TYPE=Release would have achieved the same:

build mariadbd mariadb-backup difference mariadb_clone.so
e02f4d7 29644656 30165488 520832
4fd2564 30026480 30547376 520896 223872
overhead 381824 381888 223872

The mariadbd-backup executable is basically a proper superset of mariadbd, about 509 KiB of additional code. (Including the entire server is an overkill, and the goal is that the mariadb_clone.so would supercede the mariadb-backup entirely.)

We can see that 4fd2564 is actually adding much more code to the server (373 KiB) rather than the clone plugin (219 KiB). In an earlier review comment I expressed the impression that some code in storage/innobase/clone had better be moved to the clone plugin itself.

Because the local clone implements a small subset of what mariadb-backup does, a fair goal would be that the added size (mariadb_clone.so and the growth of mariadbd) would be smaller than the size difference between mariadbd and mariadbd-backup, and that most of it would be in the mariadbd_clone.so. My build of 4fd2564 satisfies neither goal: 223872<381824 and 223872+381824>520832.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to look at code coverage, as well as some things that I noticed while checking reports of code that is not being covered by the tests. This report only covers the storage/innobase/arch subdirectory. I will check storage/innobase/clone later.

Comment on lines +1 to +10
# Monitor clone operations using performance schema's stage and statement events.
--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
--source include/count_sessions.inc
--source include/not_embedded.inc

# Disable PFS monitoring for threads by default
connect (con1,localhost,root,,);
CALL sys.ps_setup_disable_thread(CONNECTION_ID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--source include/have_perfschema.inc

is missing here, causing a failure for cmake -DPLUGIN_PERFSCHEMA=NO builds:

CURRENT_TEST: clone.monitor_progress
mysqltest: At line 10: query 'CALL sys.ps_setup_disable_thread(CONNECTION_ID())' failed: ER_SP_DOES_NOT_EXIST (1305): PROCEDURE sys.ps_setup_disable_thread does not exist

Comment on lines +71 to +82
if (++sleep_count == 10 && backoff_count < 3)
{
sleep_time*= 10;
sleep_count= 0;
++backoff_count;
continue;
}
if (sleep_count == 30 )
ib::warn() << "Archiver still running: Waited 30 seconds.";

else if (sleep_count >= 600)
ib::fatal() << "Archiver still running: Waited for 10 minutes.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions do not hold in the regression test suite, in a cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_GCOV=ON build.

Comment on lines +97 to +101
if (!os_file_create_directory(ARCH_DIR, false))
{
my_error(ER_CANT_CREATE_FILE, MYF(0), ARCH_DIR, errno);
return ER_CANT_CREATE_FILE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is not being covered. It would seem to be relatively simple to cover, for example, by creating a read-only parent directory and asking the directory to be created there.

Comment on lines +112 to +117
catch (...)
{
my_error(ER_CANT_CREATE_THREAD, MYF(0), errno);
m_archiver_active= false;
err= ER_CANT_CREATE_THREAD;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reachable. (But, we should use tpool anyway.)

Comment on lines +189 to +191
void Arch_Sys::remove_dir(const char *dir_path, const char *dir_name)
{
char path[MAX_ARCH_DIR_NAME_LEN];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not being invoked when running the regression tests. Why would we need this code at all?

Comment on lines +1179 to +1180
/** @return stop block position of the group. */
Arch_Page_Pos get_stop_pos() const { return (m_stop_pos); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable

Comment on lines +1208 to +1216
static lsn_t align_lsn(lsn_t lsn, lsn_t ref_lsn)
{
lsn_t lsn_diff = (ref_lsn <= lsn) ? (lsn - ref_lsn) :
(ref_lsn - lsn + OS_FILE_LOG_BLOCK_SIZE);
lsn_diff = ut_uint64_align_down(lsn_diff, OS_FILE_LOG_BLOCK_SIZE);
lsn_t r = (ref_lsn <= lsn) ? (ref_lsn + lsn_diff) : (ref_lsn - lsn_diff);
ut_ad(r <= lsn);
return r;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we reintroduce OS_FILE_LOG_BLOCK_SIZE and this complex math? Thankfully, this function appears to be unreacahble (at least in some expansions of this inline function).

Comment on lines +1218 to +1226
/** Align down LSN value to multiples of OS_FILE_LOG_BLOCK_SIZE with respect
to groups first LSN value.
@param[in] lsn LSN value to align
@return aligned LSN value. */
lsn_t align_lsn(lsn_t lsn) const
{
ut_ad(m_first_lsn <= lsn);
return Arch_Group::align_lsn(lsn, m_first_lsn);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not want to reintroduce the 512-byte log blocks after 685d958 got rid of them.

Comment on lines +1249 to +1252
void get_dir_name(char *name_buf, uint buf_len)
{
m_file_ctx.build_dir_name(m_begin_lsn, name_buf, buf_len);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable

Comment on lines +1777 to +1787
/** Get the mutex protecting concurrent start, stop operations required
for initialising group during recovery.
@return mutex */
mysql_mutex_t *get_mutex() { return (&m_mutex); }

/** @return operation mutex */
mysql_mutex_t *get_oper_mutex() { return (&m_oper_mutex); }

/** Fetch the system client context.
@return system client context. */
Page_Arch_Client_Ctx *get_sys_client() const { return (m_ctx); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_mutex() appears to be unreachable. What is the point of these? We could declare each data member public. If m_ctx really needs the pointer indirection, then it could be declared as Page_Arch_Client_Ctx *const m_ctx; and be public.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked clone0api, clone0apply, clone0clone. Lots of unreachable code. I will abort the code coverage analysis for now.

Comment on lines +71 to +78
/** Rename clone status file. The operation is expected to be atomic
when the files belong to same directory.
@param[in] from_file name of current file
@param[in] to_file name of new file */
static void rename_file(const std::string &from_file,
const std::string &to_file)
{
auto ret= std::rename(from_file.c_str(), to_file.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function. This is also duplicating os_file_rename_func().

Comment on lines +90 to +100
static void create_file(std::string &file_name)
{
std::ofstream file(file_name.c_str());

if (file.is_open())
{
file.close();
return;
}
ib::error() << "Error creating file : " << file_name.c_str();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating os_file_create(), and it is hiding the status from the caller. The error output is unreachable.

Comment on lines +102 to +121
/** Delete clone status file or directory.
@param[in] file name of file */
static void remove_file(const std::string &file)
{
os_file_type_t file_type;
bool exists;

if (!os_file_status(file.c_str(), &exists, &file_type))
{
ib::error() << "Error checking a file to remove : " << file.c_str();
return;
}
/* Allow non existent file, as the server could have crashed or returned
with error before creating the file. This is needed during error cleanup. */
if (!exists)
return;

/* In C++17 there will be std::filesystem::remove_all and the
code below will no longer be required. */
if (file_type == OS_FILE_TYPE_DIR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of unnecessary logic here. Why do we invoke stat before unlink? Also, we already have os_file_delete_func(). Also, we do depend on C++17 and std::filesystem elsewhere. If this is essentially rm -fr (removing a directory tree), the function name remove_file is misleading. I think that the code around this would have to be refactored.

Thankfully, the file_type == OS_FILE_TYPE_DIR code path seems to be unreachable. Maybe this entire function could omitted and the caller could invoke os_file_delete() or even better, IF_WIN(DeleteFile(file.c_str()), unlink(file.c_str())) if all we want is "delete if exists" semantics?

Comment on lines +150 to +156
if (clone->replace_datadir())
{
/* Create error file for rollback. */
file_name.assign(CLONE_INNODB_ERROR_FILE);
create_file(file_name);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is unreachable. But, I think that replacing the target directory is something that would have to be explicitly asked for in the syntax. mariadb-backup --backup would fail if --target-dir already exists.

Comment on lines +158 to +163
file_name.assign(path);
/* Add path separator if needed. */
if (file_name.back() != OS_PATH_SEPARATOR)
file_name.append(OS_PATH_SEPARATOR_STR);

file_name.append(CLONE_INNODB_IN_PROGRESS_FILE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if condition is unreachable. We might just unconditionally do some snprintf() with "%s/%s"; an extra / will not actually hurt. We might also use openat and friends.

Comment on lines +2238 to +2247
auto err_exit= [&](int in_err)
{
char errbuf[MYSYS_STRERROR_SIZE];
my_error(in_err, MYF(0), file_name.c_str(), errno,
my_strerror(errbuf, sizeof(errbuf), errno));
return in_err;
};

if (!status)
return err_exit(ER_CANT_OPEN_FILE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All err_exit are unreachable. A simple goto err_exit should produce less code.

Comment on lines +2266 to +2272
bool success= false;
auto handle= os_file_create(innodb_clone_file_key, file_name.c_str(), option,
file_type, read_only, &success);

int err= 0;
if (!success)
err= (option == OS_FILE_OPEN) ? ER_CANT_OPEN_FILE : ER_CANT_CREATE_FILE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable error path

Comment on lines +2276 to +2282
auto db_err= init_cbk(handle);

if (db_err != DB_SUCCESS)
{
os_file_close(handle);
err= ER_ERROR_ON_WRITE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable condition

Comment on lines +2320 to +2337
int Clone_Handle::close_file(Clone_Task *task) {
bool success = true;

/* Close file, if opened. */
if (task->m_current_file_des.m_file != OS_FILE_CLOSED) {
success = os_file_close(task->m_current_file_des);
}

task->m_current_file_des.m_file = OS_FILE_CLOSED;
task->m_file_cache = true;

if (!success) {
my_error(ER_INTERNAL_ERROR, MYF(0), "Innodb error while closing file");
return (ER_INTERNAL_ERROR);
}

return (0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this much code? Why not just unconditionally invoke os_file_close() and let it do any error reporting? The my_error is unreachable here.

Comment on lines +2376 to +2384
} else if (buf_cbk) {
unsigned char *data_buf = nullptr;
uint32_t data_len = 0;
/* Get data buffer */
err = cbk->apply_buffer_cbk(data_buf, data_len);
if (err == 0) {
/* Modify and write data buffer to file. */
err = modify_and_write(task, offset, data_buf, data_len);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable condition. Can the bool buf_cbk=false parameter be removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants