MySQL技术内幕InnoDB存储引擎
总结
本书较通俗易懂的讲述了mysql的特性, 评分4星
1. MySQL体系结构和存储引擎
1.3 MySQL存储引擎
存储引擎是基于表的, 而不是数据库
2. InnoDB存储引擎
2.6 InnoDB关键特性
2.6.1 插入缓存
辅助索引不能是唯一的, 因为在插入缓存时, 数据库并不回去查找索引页来判断插入的记录的唯一性
如果去查找肯定又会有离散读取的情况发生, 从而导致Insert Buffer失去了意义
3. 文件
- 参数文件: mysql启动的配置文件
- 日志文件: 错误日志, 二进制日志文件, 慢查询日志文件, 查询日志文件等
- socket文件: 用unix域嵌套字方式进行连接时需要的文件
- pid文件: mysql实例的进程id文件
- mysql表结构文件: 用来存放mysql表结构定义文件
- 存储引擎文件: 存储记录和索引的文件
3.1 参数文件
- 启动时参数指定参数文件
查询方式:
# 根据端口查询占用的pid
[root@x aaa]# lsof -i:3308
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqld 2924 mysql 13u IPv6 18684 0t0 TCP *:tns-server (LISTEN)
# 根据pid=2924查询启动命令
[root@aaa 2924]# cat /proc/2924/cmdline
/usr/local/mysql-5.7.17/bin/mysqld--defaults-file=/etc/my.cnf.3308
- 查看mysql的默认查找参数日志顺序
[root@aaa bin]# ./mysql --help | grep my.cnf
order of preference, my.cnf, $MYSQL_TCP_PORT,
/etc/my.cnf /etc/mysql/my.cnf /usr/local/mysql/etc/my.cnf ~/.my.cnf
- 无配置文件找到
无找到时, 会按mysql的默认配置启动
3.2 日志文件
3.2.1 错误日志
mysql> show variables like 'log_error';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| log_error | /usr/local/mysql/data_3307/st01023vm45.err |
+---------------+--------------------------------------------+
1 row in set (0.00 sec)
3.2.2 慢查询日志
查询慢日志相关开关配置
# 查询是否开启慢日志记录
mysql> show variables like 'log_slow_queries';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| log_slow_queries | ON |
+------------------+-------+
1 row in set (0.00 sec)
# 查询慢日志的耗时阈值, 大于该值输出慢日志
mysql> show variables like 'long_query_time';,
+-----------------+----------+
| Variable_name | Value |
+-----------------+----------+
| long_query_time | 1.000000 |
+-----------------+----------+
# 查询未使用索引的日志
mysql> show variables like 'log_queries_not_using_indexes';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| log_queries_not_using_indexes | OFF |
+-------------------------------+-------+
# 每分钟允许未使用索引的日志写入文件个最大次数
mysql> show variables like 'log_throttle_queries_not_using_indexes';
+----------------------------------------+-------+
| Variable_name | Value |
+----------------------------------------+-------+
| log_throttle_queries_not_using_indexes | 0 |
+----------------------------------------+-------+
查询慢日志路径
mysql> show variables like 'slow_query_log_file';
+---------------------+------------------+
| Variable_name | Value |
+---------------------+------------------+
| slow_query_log_file | slow_queries.log |
+---------------------+------------------+
mysql> show variables like 'datadir';
+---------------+-----------------------------+
| Variable_name | Value |
+---------------+-----------------------------+
| datadir | /usr/local/mysql/data_3307/ |
+---------------+-----------------------------+
[root@st01023vm45 data_3307]# cat /usr/local/mysql/data_3307/slow_queries.log
# User@Host: root[root] @ [172.30.43.69]
# Query_time: 7.214444 Lock_time: 0.001474 Rows_sent: 0 Rows_examined: 728249
SET timestamp=1688711829;
UPDATE `component_entry_configs` SET `deleted_at`='2023-07-07 14:36:46.131707' WHERE key_name in ('cdtid9tqdmgdm34nng50','catbmmp0ep6np4pt61n0','c6559fqqj8qtlq7ep2h0','cdqvd8118hu96ddfnb80','ccmjau1inmkroeob6h20','cd8dbj9inmkmskps1jdg','cdq4rj118hu96ddfn6j0','cdupjkpr0p470mov751g','ce8d97ih6lsdjo3aali0','c5a2f22qj8qoh2hgu3bg','cd8obgpinmkmskps9os0','chjlhbcsqei01766houg','ca2ds7iqj8qpmbs4qda0','cac7nf2qj8qqnksl7uog','cfpir2hr0p4c18b4d7s0','cdtid9tqdmgdm34nng6g','c6fl8jaqj8qtfpahtln0','cdtf8elqdmgdm34nn2n0','ccq19ae25sus2gs6hjm0'
3.2.3 查询日志
mysql> SHOW VARIABLES LIKE 'general_log';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| general_log | OFF |
+---------------+-------+
1 row in set (0.00 sec)
3.2.4 二进制日志
二进制日志记录了对mysql数据执行更改的所有操作
查看二进制文件路径
# 查询bin_log 是否开启
mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
1 row in set (0.00 sec)
# 查询datadir bin_log就在这
mysql> show variables like 'datadir';
+---------------+-----------------------------+
| Variable_name | Value |
+---------------+-----------------------------+
| datadir | /usr/local/mysql/data_3307/ |
+---------------+-----------------------------+
# 查看bin-log文件
mysql> SHOW BINARY LOGS;
+------------------+------------+
| Log_name | File_size |
+------------------+------------+
| mysql-bin.000233 | 1073745120 |
| mysql-bin.000234 | 1073742062 |
| mysql-bin.000235 | 152539169 |
+------------------+------------+
#查看binlog当前位点
mysql> show master status \G;
*************************** 1. row ***************************
File: mysql-bin.000235
Position: 159895850
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
# 查看bin_log文件, 文件名默认为主机名, 可以通过配置参数log-bin=[name]制定
-rw-rw---- 1 mysql mysql 1073745604 2024-08-20 11:13:41 mysql-bin.000231
-rw-rw---- 1 mysql mysql 353864634 2024-09-13 03:28:59 mysql-bin.000232
-rw-rw----. 1 mysql mysql 38 2024-08-20 11:13:41 mysql-bin.index
# 查看binlog索引文件
[root@st01023vm45 data_3307]# cat mysql-bin.index;
./mysql-bin.000231
./mysql-bin.000232
以下几个参数与binlog文件有关
参数 | 作用 | 查询方式 |
---|---|---|
max_binlog_size | 单个二进制日志文件的最大值, 默认为1G | SHOW VARIABLES LIKE 'max_binlog_size'; |
binlog_cache_size | binlog缓存池的大小 | show variables like 'binlog_cache_size'; |
sync_binlog | 二进制日志文件写入缓冲池的频率, 可以为0, 1, 大于1的整数 | show variables like 'sync_binlog'; |
binlog-do-db | /usr/local/mysql/data_3307/ | |
binlog-ignore-db | /usr/local/mysql/data_3307/ | |
log-slave-update | /usr/local/mysql/data_3307/ | |
binlog-format | /usr/local/mysql/data_3307/ |
- binlog_cache_size的写入规则
默认为32K, 是每个会话创建一个事务时, mysql会自动分配一个binlog_cache_size的缓存, 所以不能设置过大.
也不能设置过小, 因为会频繁写入二进制临时文件
mysql> show variables like 'binlog_cache_size';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| binlog_cache_size | 32768 |
+-------------------+-------+
# 查看binlog_cache_use和disk_use的次数, 注意这里是次数, 不是cache空间的大小
mysql> show global status like 'binlog_cache%';
+-----------------------+--------+
| Variable_name | Value |
+-----------------------+--------+
| Binlog_cache_disk_use | 5741 |
| Binlog_cache_use | 458770 |
+-----------------------+--------+
当Binlog_cache_disk_use出现比较多次时,就应该考虑增加binlog_cache_size
- sync_binlog
sync_binlog=1为采用同步写磁盘的方式写二进制
当is_innodb_support_xa为0时, binlog和redo log互不影响, 会出现主从数据不一致的情况
flowchart LR
创建事务 --> 执行update-sql
执行update-sql -->
is_innodb_support_xa{是否开启innodb_support_xa}
subgraph writeLog[write log并行]
direction LR
redoLog[write redo log]
binLog[write bin log]
end
is_innodb_support_xa -->|否| writeLog
当is_innodb_support_xa为1时, 当写入binlog后, mysql重启会检查prepare的事务是否已提交了binlog, 提交了的话, prepare的会执行commit环节
@startuml
|innodb_server|
start
:创建事务;
:执行update-sql;
if (是否开启innodb_support_xa?) is (是) then
:write redo log, prepare阶段;
:提交事务;
:写binlog;
|机器|
:故障重启;
|innodb_server|
:检查prepare的redo log完成了binlog提交;
:提交prepare阶段的redo log;
stop;
|innodb_server|
else
:此处省略...;
detach
@enduml
- binlog-format
查询binlog格式
mysql> select @@session.binlog_format;
+-------------------------+
| @@session.binlog_format |
+-------------------------+
| ROW |
+-------------------------+
1 row in set (0.01 sec)
更改binlog格式
mysql>set global binlog_format='ROW'
格式有三种
- Row
- Statement
- MIXED
设置了binlog_format为ROW, 则可以将innodb事务隔离级别设置为READ COMMTTED, 以获得更好的并发性
MIXED模式下, 默认情况下采取STATEMENT格式存储二进制日志, 但是有些情况会使用ROW格式
- 使用了uuid()、user()等不确定函数
- 使用了临时表
3.6 Innodb存储引擎文件
3.6.1 表空间文件
Innodb将存储的数据按表空间(tablespace)进行存放
查看innodb的文件配置
mysql> show variables like "innodb_data_file_path";
+-----------------------+------------------------+
| Variable_name | Value |
+-----------------------+------------------------+
| innodb_data_file_path | ibdata1:12M:autoextend |
+-----------------------+------------------------+
这个表示ibdata1大小为12M, 当用完了,autoextend表示会自动增长
也可设置为/db1/ibdata1:12M;/db2/ibdata1:12M:autoextend
表示两个文件共同承担表空间, 如果两个文件在不同磁盘里, 磁盘负载会被平均
如果设置了innodb_file_per_table, 每个表都会有单独的表空间文件
mysql> show variables like 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_file_per_table | ON |
+-----------------------+-------+
1 row in set (0.02 sec)
文件的命令是,数据库为目录名,文件名问表名.ibd, 例如我有个表是study_mysql_002_go_mysql_canal.canal_test
bash-4.2$ ls -llah
total 136K
drwxr-x--- 2 mysql mysql 4.0K Dec 9 14:29 .
drwxrwxr-x 1 mysql root 4.0K Dec 5 15:44 ..
-rw-r----- 1 mysql mysql 8.5K Dec 9 14:29 canal_test.frm
-rw-r----- 1 mysql mysql 112K Dec 9 14:29 canal_test.ibd
-rw-r----- 1 mysql mysql 65 Dec 5 15:44 db.opt
bash-4.2$ pwd
/var/lib/mysql/data/study_mysql_002_go_mysql_canal
独立的表空间只存储数据、索引、插入缓存BITMAP等信息, 其他信息还是存放在默认表空间
4 表
主要讲述和表在磁盘的存储结构.
4.1 索引组织表
innodb中, 表是根据主键顺序存放的, 如果表没有主键, 会按以下方式选择主键
- 首先选择表中第一个有唯一索引的列(顺序为定义索引的顺序、而不是建表时列的顺序), 如果有则该列为主键
- 如果没有, 则自动创建一个6字节的指针
4.2 Innodb逻辑存储结构
表空间由段、区、页组成
4.2.1 表空间
设置了innodb_file_per_table为on时, 每个表都会自己一个独立的表空间, 但还是会有部分数据内容存储在共享表空间, 以下为单独表空间和共享表空间存储的数据内容
可以使用py_innodb_page_info.py进行表空间的分析
下载地址: https://github.com/qingdengyue/david-mysql-tools/tree/master/py_innodb_page_type
使用方式
> python2 py_innodb_page_info.py /tmp/ibdata1
# 总页数
Total number of page: 768:
# 插入缓存的空闲列表的页数
Insert Buffer Bitmap: 8
#
System Page: 136
Transaction system Page: 2
# 可用页
Freshly Allocated Page: 468
# undo页
Undo Log Page: 110
File Segment inode: 13
# 数据页
B-tree Node: 23
File Space Header: 8
4.2.2 段
表空间有叶子节点段、非叶子节点段、回滚段组成
叶子节点段其实就是数据段
非叶子节点段就是索引段
4.2.3 区
区是由连续的页组成的, 每个区的大小为1MB, 为了保证区中页的连续性, Innodb存储引擎一次从磁盘申请4~5个区
页的大小为16KB, 一个区一共有64个(1024/16)连续的页面
# 查看页大小
mysql > SHOW VARIABLES LIKE 'innodb_page_size';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| innodb_page_size| 16384 |
+-----------------+-------+
# 创建mysql表
mysql > create table read_innodb_4_2_3_t1 (
col1 int not null auto_increment,
col2 varchar(7000),
primary key (col1)) ENGINE=InnoDB character set utf8;
);
# 查看刚创建时私有表的大小
bash-4.2$ ls -llh
total 132K
-rw-r----- 1 mysql mysql 65 Dec 25 09:31 db.opt
-rw-r----- 1 mysql mysql 29K Dec 25 09:38 read_innodb_4_2_3_t1.frm
-rw-r----- 1 mysql mysql 96K Dec 25 09:38 read_innodb_4_2_3_t1.ibd
bash-4.2$ pwd
/var/lib/mysql/data/read_innodb
# 查看刚创建表时的页分析
➜ py_innodb_page_type git:(master) ✗ python2 ./py_innodb_page_info.py -v /tmp/read_innodb_4_2_3_t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 6:
Freshly Allocated Page: 2
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 1
File Segment inode: 1
# 插入记录
mysql > insert read_innodb_4_2_3_t1 select null, repeat('a', 7000);
# 分析页数量
➜ py_innodb_page_type git:(master) ✗ python2 ./py_innodb_page_info.py -v /tmp/read_innodb_4_2_3_t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 6:
Freshly Allocated Page: 2
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 1
File Segment inode: 1
# 这里其实会比较奇怪, 因为我的col2是utf8, 所有应该占据了7000 * 3 / 1024 = 20kb 大小, 超出我页的限制大小16K, 但是b+树还是只有一个页节点
# 继续插入记录
mysql > insert read_innodb_4_2_3_t1 select null, repeat('a', 7000);
➜ py_innodb_page_type git:(master) ✗ python2 ./py_innodb_page_info.py -v /tmp/read_innodb_4_2_3_t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 6:
Freshly Allocated Page: 2
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 1
File Segment inode: 1
# 继续插入记录
mysql> insert read_innodb_4_2_3_t1 select null, repeat('a', 7000);
➜ py_innodb_page_type git:(master) ✗ python2 ./py_innodb_page_info.py -v /tmp/read_innodb_4_2_3_t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
page offset 00000004, page type <B-tree Node>, page level <0000>
page offset 00000005, page type <B-tree Node>, page level <0000>
Total number of page: 6:
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 3
File Segment inode: 1
插入了3条记录后, 开始分裂成3个节点, 其中 00000003还是非叶子节点, 因为page level 为1(0则为叶子节点)
批量插入60条一样的记录
#插入后, 文件从96K大小上升为592K
➜ py_innodb_page_type git:(master) ✗ ls -lhla /tmp/read_innodb_4_2_3_t1.ibd
-rw-r----- 1 maizhikun wheel 592K 12 30 09:41 /tmp/read_innodb_4_2_3_t1.ibd
# 页分析
python2 ./py_innodb_page_info.py -v /tmp/read_innodb_4_2_3_t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
page offset 00000004, page type <B-tree Node>, page level <0000>
page offset 00000005, page type <B-tree Node>, page level <0000>
page offset 00000006, page type <B-tree Node>, page level <0000>
page offset 00000007, page type <B-tree Node>, page level <0000>
page offset 00000008, page type <B-tree Node>, page level <0000>
page offset 00000009, page type <B-tree Node>, page level <0000>
page offset 0000000a, page type <B-tree Node>, page level <0000>
page offset 0000000b, page type <B-tree Node>, page level <0000>
page offset 0000000c, page type <B-tree Node>, page level <0000>
page offset 0000000d, page type <B-tree Node>, page level <0000>
page offset 0000000e, page type <B-tree Node>, page level <0000>
page offset 0000000f, page type <B-tree Node>, page level <0000>
page offset 00000010, page type <B-tree Node>, page level <0000>
page offset 00000011, page type <B-tree Node>, page level <0000>
page offset 00000012, page type <B-tree Node>, page level <0000>
page offset 00000013, page type <B-tree Node>, page level <0000>
page offset 00000014, page type <B-tree Node>, page level <0000>
page offset 00000015, page type <B-tree Node>, page level <0000>
page offset 00000016, page type <B-tree Node>, page level <0000>
page offset 00000017, page type <B-tree Node>, page level <0000>
page offset 00000018, page type <B-tree Node>, page level <0000>
page offset 00000019, page type <B-tree Node>, page level <0000>
page offset 0000001a, page type <B-tree Node>, page level <0000>
page offset 0000001b, page type <B-tree Node>, page level <0000>
page offset 0000001c, page type <B-tree Node>, page level <0000>
page offset 0000001d, page type <B-tree Node>, page level <0000>
page offset 0000001e, page type <B-tree Node>, page level <0000>
page offset 0000001f, page type <B-tree Node>, page level <0000>
page offset 00000020, page type <B-tree Node>, page level <0000>
page offset 00000021, page type <B-tree Node>, page level <0000>
page offset 00000022, page type <B-tree Node>, page level <0000>
page offset 00000023, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 37:
Freshly Allocated Page: 1
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 33
File Segment inode: 1
再增加一条数据
mysql> insert read_innodb_4_2_3_t1 select null, repeat('a', 7000);
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
# 查看ibd文件大小
bash-4.2$ ll -ah
total 2.1M
drwxr-x--- 2 mysql mysql 4.0K Dec 25 09:38 .
drwxrwxr-x 1 mysql root 4.0K Dec 25 09:31 ..
-rw-r----- 1 mysql mysql 65 Dec 25 09:31 db.opt
-rw-r----- 1 mysql mysql 29K Dec 25 09:38 read_innodb_4_2_3_t1.frm
-rw-r----- 1 mysql mysql 2.0M Dec 30 09:57 read_innodb_4_2_3_t1.ibd
bash-4.2$
# 页分析
➜ py_innodb_page_type git:(master) ✗ python2 ./py_innodb_page_info.py -v /tmp/read_innodb_4_2_3_t12.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
page offset 00000004, page type <B-tree Node>, page level <0000>
page offset 00000005, page type <B-tree Node>, page level <0000>
page offset 00000006, page type <B-tree Node>, page level <0000>
page offset 00000007, page type <B-tree Node>, page level <0000>
page offset 00000008, page type <B-tree Node>, page level <0000>
page offset 00000009, page type <B-tree Node>, page level <0000>
page offset 0000000a, page type <B-tree Node>, page level <0000>
page offset 0000000b, page type <B-tree Node>, page level <0000>
page offset 0000000c, page type <B-tree Node>, page level <0000>
page offset 0000000d, page type <B-tree Node>, page level <0000>
page offset 0000000e, page type <B-tree Node>, page level <0000>
page offset 0000000f, page type <B-tree Node>, page level <0000>
page offset 00000010, page type <B-tree Node>, page level <0000>
page offset 00000011, page type <B-tree Node>, page level <0000>
page offset 00000012, page type <B-tree Node>, page level <0000>
page offset 00000013, page type <B-tree Node>, page level <0000>
page offset 00000014, page type <B-tree Node>, page level <0000>
page offset 00000015, page type <B-tree Node>, page level <0000>
page offset 00000016, page type <B-tree Node>, page level <0000>
page offset 00000017, page type <B-tree Node>, page level <0000>
page offset 00000018, page type <B-tree Node>, page level <0000>
page offset 00000019, page type <B-tree Node>, page level <0000>
page offset 0000001a, page type <B-tree Node>, page level <0000>
page offset 0000001b, page type <B-tree Node>, page level <0000>
page offset 0000001c, page type <B-tree Node>, page level <0000>
page offset 0000001d, page type <B-tree Node>, page level <0000>
page offset 0000001e, page type <B-tree Node>, page level <0000>
page offset 0000001f, page type <B-tree Node>, page level <0000>
page offset 00000020, page type <B-tree Node>, page level <0000>
page offset 00000021, page type <B-tree Node>, page level <0000>
page offset 00000022, page type <B-tree Node>, page level <0000>
page offset 00000023, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000040, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 128:
Freshly Allocated Page: 91
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 34
File Segment inode: 1
4.2.4 页
页场景的类型有
- 数据页(B-tree Node)
- undo页(unodo log page)
- 系统页(system page)
- 事务数据页(Transaction system page)
- 插入缓冲位图页(insert buffer bitmap)
- 插入缓冲空闲列表页(inser buffer free list)
- 未压缩的二进制大对象页(uncompressed blob page)
- 压缩的二进制大对象页(compressed blob page)
4.2.5 行
innodb是按行存储的, 页存放的记录最多允许存放 16KB / 2 -200 行的记录, 既7992行
4.3 Innodb 行记录格式
查看行存储格式: show table status like '表名'
mysql> show table status like 'read_innodb_4_2_3_t1' \G;
*************************** 1. row ***************************
Name: read_innodb_4_2_3_t1
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 31
Avg_row_length: 17441
Data_length: 540672
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: 65
Create_time: 2024-12-25 09:38:35
Update_time: 2024-12-30 09:57:31
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
ERROR:
No query specified
常见的格式有
格式 | 特点 |
---|---|
Dynamic | mysql5.7后默认, 对于 TEXT, BLOB 以及很长的 VARCHAR 类型列,Dynamic 行格式会将数据的 一部分 存储在页的行记录部分,其余部分存储在溢出页中 |
Compact | 所有列尽量存储在主存储页中,溢出页使用较少 |
Compressed | 类似于 Dynamic,但对数据页进行压缩 |
4.3.1 Compact 行记录格式
- 变长字段字段: 按照字段的倒序放置, 列的长度小于等于255, 则用一个字节表示, 大于255, 则用两个字节表示. 不可以超过两字节, 因为varchar的最大长度为65535
- Null标记位: 该行中是否有NULL值, 转为二进制, 对应的位数为1,则表示对应的行数为NULL值
- 事务ID: 6字节
- 回滚指针列: 7字节
- rowid: 没有定义主键则会增加这一列, 长度为6字节
- 记录头信息: 固定占用5字节(40bit)
Compact记录头信息
名称 | 大小(bit) | 描述 |
---|---|---|
() | 1 | 未知 |
() | 1 | 未知 |
deleted_flag | 1 | 该行是否已删除 |
min_rec_flag | 1 | 为1, 如果该行时预先被定义的最小记录 |
n_owned | 4 | 该记录拥有的记录数 |
heap_no | 13 | 索引堆中该条记录的排序书序 |
record_type | 3 | 记录类型, 000表示普通、001表示B+树节点指针、010表示Infimum, 011表示Supremum, 1xx表示保留 |
next_record | 16 | 页中下一条记录的相对位置 |
Total | 40 |
# 创建demo表
mysql > CREATE TABLE read_innodb_4_3_1_mytest(
t1 VARCHAR(10),
t2 VARCHAR(10),
t3 CHAR(10),
t4 VARCHAR(10)
) ENGINE=INNODB CHARSET=latin1 ROW_FORMAT=COMPACT;
mysql> insert into read_innodb_4_3_1_mytest values ('a', 'bb', 'bb', 'ccc');
mysql> insert into read_innodb_4_3_1_mytest values ('d', 'ee', 'ee', 'fff');
mysql> insert into read_innodb_4_3_1_mytest values ('d', NULL, NULL, 'fff');
# 输出
hexdump -C -v /tmp/read_innodb_4_3_1_2.ibd > a.log
03 02 01 // 变长字段长度列表, 逆序
00 // NULL标志位, 第一行没有NULL
00 00 10 00 2C // Record Header 固定5字节长度
00 00 00 00 02 03 // RowId Innodb自动创建 6字节
00 00 00 00 07 30 //TransactionID
A5 00 00 01 19 01 10 //Roll Pointer
61 // 列1 数据a
62 62 // 列2 数据bb
62 62 20 20 20 20 20 20 20 20 //列3 数据bb
63 63 63 列4 数据ccc
- char用0x20填充直到占满10个字节
- Record Header最后两个字节为next_recorder, 就是2C就是下一个记录的偏移量(这里没弄明白是怎么找到下条记录的)
再看看第三行记录
03 01 // 变长字段长度列表, 逆序
06 // NULL标志位, 转为二进制位00000110, 为1的代表第二列和第三列的数据为NULL
00 00 20 FF 98 // Record Header 固定5字节长度
00 00 00 00 02 05 // RowId Innodb自动创建 6字节
00 00 00 00 07 36 //TransactionID
A9 00 00 01 1D 01 10 //Roll Pointer
64 // 列1 数据d
66 66 66 66 列4 数据fff
4.3.3 行溢出数据
- 最高变长65535字节是指所有varchar列的长度总和
- 当发生行溢出时, 数据存放在类型业务 Uncompress BLOB页中
create table read_innodb_4_3_3_t (
a varchar(65532)
) engine=innodb charset=latin1;
insert into read_innodb_4_3_3_t select repeat('a', 65532);
> python2 /Users/maizhikun/project/38200-david-mysql-tools/py_innodb_page_type/py_innodb_page_info.py /tmp/read_innodb_4_3_3.ibd
Total number of page: 9:
Insert Buffer Bitmap: 1
Uncompressed BLOB Page: 5
File Space Header: 1
B-tree Node: 1
File Segment inode: 1
开始
结束
从00010028到00013FF7
4.4 InnoDB数据页结构
4.4.1 File Header
总共38字节
名称 | 大小(字节) | 说明 |
---|---|---|
FIL_PAGE_SPACE_OR_CHKSUM | 4 | 该页的checksum值 |
FIL_PAGE_OFFSET | 4 | 表空间中页的偏移量, 例如.ibd占用1GB, 页大小为16K, 那么总共有65536页, 该字段表明该页在所有页的位置, 若表空间id为10, 那么搜索页(10,1)就是查表中第二个页 |
FIL_PAGE_PREV | 4 | 当前页的上一个页, B+tree特性决定了叶子节点必须是双向列表 |
FIL_PAGE_NEXT | 4 | 当前页的下一个页, B+tree特性决定了叶子节点必须是双向列表 |
FIL_PAGE_LSN | 8 | 该页最后被修改的日志序列位置LSN(log sequence number) |
FIL_PAGE_TYPE | 2 | innodb页的类型 |
FIL_PAGE_FILE_FLUSH_LSN | 8 | 系统表空间的一个页中定义, 代表文件至少被刷新到了该LSN值, 独立表空间值为0 |
FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID | 4 | 页属于哪个表空间 |
FIL_PAGE_TYPE类型枚举
名称 | 十六进制 | 说明 |
---|---|---|
FIL_PAGE_INDEX | 0x45BF | B+树叶节点 |
FIL_PAGE_UNDO_LOG | 0x0002 | Undo log页 |
FIL_PAGE_INODE | 0x0003 | 索引节点 |
FIL_PAGE_IBUF_FREE_LIST | 0x0004 | insert buffer空闲列表 |
FIL_PAGE_TYPE_ALLOCATED | 0x0000 | 该页为最新分配 |
FIL_PAGE_IBUF_BITMAP | 0x0005 | insert buffer 位图 |
FIL_PAGE_TYPE_SYS | 0x0006 | 系统页 |
FIL_PAGE_TYPE_TRX_SYS | 0x0007 | 事务系统数据 |
FIL_PAGE_TYPE_ESP_HDR | 0x0008 | File Space Header |
FIL_PAGE_TYPE_XDES | 0x0009 | 扩展描述页 |
FIL_PAGE_UNDO_BLOB | 0x000A | BLOB页 |
4.4.2 Page Header
该部分用来记录数据页的状态信息, 总共56字节
名称 | 大小(字节) | 说明 | |
---|---|---|---|
PAGE_N_DIR_SLOTS | 2 | 在Page Directory中的Slot数 | |
PAGE_HEAP_TOP | 2 | 堆中第一个记录的指针 | |
PAGE_N_HEAP | 2 | 堆中的记录数, 占用2字节, 第15位表示行记录格式 | |
PAGE_FREE | 2 | 指向可重用空间的首指针 | |
PAGE_GARBAGE | 2 | 已删除记录的字节数, 既delete flag为1的记录大小总数 | |
PAGE_LAST_INSERT | 2 | 最后插入记录的位置 | |
PAGE_DIRECTION | 2 | 最后插入的方向 | |
PAGE_N_DIRECTION | 2 | 一个方向连续插入记录的数量 | |
PAGE_N_RECS | 2 | 该页中记录的数量 | |
PAGE_MAX_TRX_ID | 8 | 修改当前页的最大事务ID, 注意该值仅在Secondary Index中定义 | |
PAGE_LEVEL | 2 | 当前页在索引树中的位置, 0x00代表叶节点, 既叶子点总是在第0层 | |
PAGE_INDEX_ID | 8 | 索引ID | 表示当前页属于哪个索引 |
PAGE_BTR_SEG_LEAF | 10 | B+树数据页非叶子节点所在段的segment header. 该值仅在b+树的Root页中定义 | |
PAGE_BTR_SEG_TOP | 10 | B+树数据页所在段的segment header. 该值仅在B+树的Root页定义 |
PAGE_DIRECTION的枚举
名称 | 16进制 |
---|---|
PAGE_LEFT | 0x01 |
PAGE_RIGHT | 0x02 |
PAGE_SAME_REC | 0x03 |
PAGE_SAME_PAGE | 0x04 |
PAGE_NO_DIRECTION | 0x05 |
4.4.3 Infimum和Supremum Record
这两个是数据页中两个虚拟的行记录
Infimum记录比该页任何主键都要小的值
Supremum Record记录比该页任何主键都要大的值
4.4.4 User Record和Free Space
在一条记录被删除后, 该空间会被加入到空闲链表中
4.4.5 Page Directory
记录的是记录的相对位置(相对于页)
4.4.6 File Trailer
为了检查页是否已经完整写入磁盘, 只有FIL_PAGE_END_LSN部分, 占用8字节
前4字节是该页的checksum值, 最后4字节和File Header中中的FIL_PAGE_LSN相同
将这两个值于File Header的FIL_PAGE_SPACE_OR_CHECKSUM和FIL_PAGE_LSN值进行比较, 看是否一致
4.4.7 InnoDB数据页结构示例分析
插入表
mysql > CREATE TABLE read_innodb.read_innodb_4_4_7_t (
a int unsigned not null auto_increment,
b char(10),
primary key(a)
) engine=Innodb charset=utf8;
插入数据100条数据, eg: insert read_innodb.read_innodb_4_2_3_t1 select null, repeat(char(97 + RAND() * 26), 10);
python2 /Users/maizhikun/project/38200-david-mysql-tools/py_innodb_page_type/py_innodb_page_info.py -v /tmp/read_innodb_4_4_7_t.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 6:
Freshly Allocated Page: 2
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 1
File Segment inode: 1
看B+树的数据,是第三页, 从0x0000c000开始看(16K*3=0xc000)
File Header的38字节
- A1 66 AE 46 数据页的checksum值(4)
- 00 00 00 03 页的偏移量(4)
- FF FF FF FF 前一页, 因为只有一个数据页,所以是 0xffffffff(4)
- FF FF FF FF 后一页, 因为只有一个数据页,所以是 0xffffffff(4)
- 00 00 00 00 00 32 D6 CC, 页的LSN(8)
- 45 BF, 页类型 为数据页(2)
- 00 00 00 00 00 00 00 00, 独立表空间都为0(8)
- 00 00 00 00 2A 表空间的SPACE ID(4)
Page Header的56字节
- 00 1A, Page Directory中的Slot数, 为26个, 每个槽占2字节
- 0D C0, 空闲空间开始的位置偏移量, 0xc000+0xcdc0=0xcdc0
- 80 66, 堆中记录数, 当行记录为compact时, 初始值为0x0802, 0x8066-0x8002=0x64, 代表页中实际有一百条记录
- 00 00, 指向可重用空间的首指针, 没有删除记录, 所以为0
- 00 00 已删除记录的字节数, 没有删除记录, 所以为0
- 0D A5, 最后插入记录的位置, 应该在0xc0000+0xc0da5=0xcda5
- 00 02, 最后插入的方向是通过自增方式增加记录的, 所以向右 0x0002
- 00 63, 一个方向连续插入记录的数量, 我们插入了100条记录, 所以这里是99
- 00 64, 该页中记录的数量 100
- 00 00 00 00 00 00 00 00, 修改当前页的最大事务ID, 注意该值仅在Secondary Index中定义
- 00 00, 代表是叶子节点
- 00 00 00 00 00 00 00 3E, 索引ID
- 00 00 00 2A 00 00 00 02 00 F2
- 00 00 00 2A 00 00 00 02 00 32
- Infimum伪行记录
- 01 00 02 00 1C recorder header, 00 01代表下一个记录的偏移量, 当前位置(0x063)+偏移量(0x001c)=0xc07f
- 69 6E 66 69 6D 75 6D 00 只有一个列的伪行记录, 记录内网就是 Infimum, 多一个0x00字节
- Supermum伪行记录
- 05 00 0B 00 00
- 73 75 70 72 65 6D 75 6D
Infimum指引的下一个记录为, 0xc07f的数据为
- 00 00 00 01: 主键
- 00 00 00 00 07 5C: Transaction ID
- C5 00 00 01 4C 01 10: Roll Pointer
- 79 79 79 79 79 79 79 79 79 79: b列yyyyyyyy
mysql> mysql> select * from read_innodb_4_4_7_t limit 1;
+---+------------+
| a | b |
+---+------------+
| 1 | yyyyyyyyyy |
+---+------------+
通过recorder header 最后两个字节就可以找到下一行的记录偏移量, 得到该页中所有的行记录 通过Page header里的PAGE_PREV和PAGE_NEXT就知道上一页和下一页, 找到Innodb存储所有行的记录
5. 索引算法
5.2 数据结构和算法
二叉查找树: 左子树的键值必须比根节点的键值小, 而右子树必须比根节点的键值大
平衡二叉树: 符合二叉查找树的定义, 并且任何节点的两科子树高度最大差为1
5.2 B+数
基本概念
概念 | 说明 |
---|---|
最大度数 | 每个非叶子节点最多可以拥有的子节点数 |
5.3.1 B+树的插入操作
Leaf Page满 | Index Page满 | 操作 |
---|---|---|
No | No | 直接插入到叶子节点 |
Yes | No | 1. 拆分LeafPage 2. 将中间的节点放入到IndexPage中 3. 小于中间节点的放左边 4. 大于或等于中间节点的放右边 |
Yes | Yes | 1. 拆分LeafePage 2. 小于中间节点的放左边 3. 大于或等于中间节点的放右边 4. 拆分IndexPage 5. 小于中间节点的放左边 6. 大于或等于中间节点的放右边 7. 将中间的节点放入到上层IndexPage中 |
7. 事务
事务是数据库区别于文件系统的重要特性
Innodb的存储引擎的事务完全符合ACID特性
- 原子性
- 一致性
- 隔离性
- 持久性