本文转自oschina

一、介绍 
 首先,Toast是一个名字缩写,全写是The OverSized Attribute Storage Technique,即超尺寸字段存储技术,顾名思义,是说超长字段在Postgres的一个存储方式。Postgres采用的存储默认是每个页面存储固定8Kb大小的数据,并且元组不允许跨页面存储,所以并不能直接存储大字段数据。Toast就是为此应运而生,它会将大字段值压缩或者分散为多个物理行来存储。对于用户来说完全不用关注这一技术实现,完全是透明的。 

二、TOAST的存储方式 
       Postgres的部分类型数据支持toast,不是全部类型是因为有些字段类型是不会产生大字段数据的,完全没必要用到Toast技术(比如date,time,boolean等)。支持Toast的数据类型应当时变长的(variable-length),变长字段最多可选择32bit的列头(header),Toast占用两个变长的bit位来作为FLAG,故Toast的逻辑尺寸限制是(2^30-1)~1GB,当两个bit都是0是,这个数据类型的值就是非Toast的(untoasted)。当表中字段任何一个有Toast,那这个表都会有这一个相关联的Toast表,OID被存储在pg_class.reltoastrelid里面。超出的的数值将会被分割成chunks,并最多toast_max_chunk_size 个byte(缺省是2Kb),当存储的行数据超过toast_tuple_threshold值(通常是2kB),就会触发toast存储,这时toast将会压缩或者移动字段值直到超出部分比toast_tuple_targer值小(这个值通常也是2KB)。 

相比较普通表(MAIN TABLE),TOAST有额外的三个字段

chunk_id   :标识TOAST表的OID字段 chunk_seq  :chunk的序列号,与chunk_id的组合唯一索引可以加速访问 chunk_data :存储TOAST表的实际数据

Toast有识别4种不同可存储toast的策略:

--plain避免压缩或行外存储 PLAIN prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-TOAST-able data types --extended允许压缩和行外存储(默认toast存储) EXTENDED allows both compression and out-of-line storage. This is the default for most TOASTable data types. Compression will be attempted first, then out-of-line storage if the row is still too big --external允许行外但不允许压缩 EXTERNAL allows out-of-line storage but not compression. Use of EXTERNAL will make substring operations on wide text and bytea columns faster(at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed --main允许压缩但不允许行外存储 MAIN allows compression but not out-of-line storage. (Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page

上述压缩采用的是LZ compression技术, 
源码见: postgresql-9.2.3/src/backend/utils/adt/pg_lzcompress.c 

TOAST可以通过SQL来更改存储方式,示例:

ALTER TABLE table_name ALTER COLUMN  column_name SET STORAGE {PLAIN | EXTENDED | MAIN | EXTERNAL};

postgres=# \d+ t_kenyon
                            Table "public.t_kenyon"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-------------------+-----------+----------+--------------+-------------
dd | character varying | | extended | |
Has OIDs: no

postgres=# alter table t_kenyon alter column dd set storage main;
ALTER TABLE
postgres=# \d+ t_kenyon
                            Table "public.t_kenyon"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-------------------+-----------+---------+--------------+-------------
dd | character varying | | main    | |
Has OIDs: no

三、TOAST表的计算 
 计算一个表的大小时要注意统计Toast的大小,因为对超长字段存储时,在基础表上可能只存了20%,另外的数据都存到了Toast里面去了,计算大小时要结合起来看,索引也是一样,对于表里有main或者extended类型的会创建Toast表,两者的关联是通过pg_class里的OID去关联的。下面举例子: 
1.TOAST表关联查询 
 Example a:

[postgres@localhost ~]$ psql
psql (9.2.3)
Type "help" for help.

postgres=# create table t_kenyon(id int); CREATE TABLE

postgres=# select relname,reltoastrelid from pg_class where relname = 't_kenyon'; relname | reltoastrelid
----------+---------------
t_kenyon | 0 (1 row)

postgres=# \d+ t_kenyon Table "public.t_kenyon" Column |  Type   | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
id     | integer |           | plain |              | Has OIDs: no

上面的字段没有toast表,因为字段int是定长的。 
 Example b:

postgres=# select relname,reltoastrelid from pg_class where relname = 't_kenyon';
relname  | reltoastrelid ----------+--------------- t_kenyon | 16411 (1 row)

postgres=# select relname from pg_class where oid = 16411;
    relname ---------------- pg_toast_16408
(1 row)

2.TOAST表计算大小

postgres=# drop table t_kenyon;
DROP TABLE
postgres=# create table t_kenyon(id int,vname varchar(48),remark text);
CREATE TABLE
postgres=# \d+ t_kenyon
                              Table "public.t_kenyon" Column |         Type          | Modifiers | Storage  | Stats target | Description --------+-----------------------+-----------+----------+--------------+------------- id     | integer               |           | plain    |              |
vname  | character varying(48) |           | extended |              |
remark | text                  |           | extended |              |
Has OIDs: no

postgres=# select oid,relname,reltoastrelid from pg_class where relname = 't_kenyon';
  oid   | relname  | reltoastrelid --------+----------+--------------- 121174 | t_kenyon | 121177 (1 row)

postgres=# insert into t_kenyon select generate_series(1,2000),repeat('kenyon here'||'^_^',2),repeat('^_^ Kenyon is not God',500);
INSERT 0 2000 postgres=# insert into t_kenyon select generate_series(1,2),repeat('kenyon here'||'^_^',2),repeat('^_^ Kenyon is not God,Remark here!!',2000);
INSERT 0 2 postgres=# select pg_column_size(id),pg_column_size(vname),pg_column_size(remark) from t_kenyon limit 2;
pg_column_size | pg_column_size | pg_column_size ----------------+----------------+---------------- 4 | 29 | 851 4 | 29 | 851 (2 rows) --查看基础表和Toast的大小 postgres=# select pg_relation_size(121174);
pg_relation_size ------------------ 8192 (1 row)

postgres=# select pg_relation_size(121177);
pg_relation_size ------------------ 0 (1 row) --文本数据量增多,这时可以看到toast表字段大小在2kb左右时有大小了 postgres=# insert into t_kenyon select generate_series(3,4),repeat('kenyon here'||'^_^',2),repeat('^_^ Kenyon is not God,Remark here!!',4000);
INSERT 0 2 postgres=# select pg_relation_size(121174);
pg_relation_size ------------------ 8192 (1 row)

postgres=# select pg_relation_size(121177);
pg_relation_size ------------------ 0 (1 row)


postgres=# insert into t_kenyon select generate_series(5,6),repeat('kenyon here'||'^_^',2),repeat('^_^ Kenyon is not God,Remark here!!',5500);
INSERT 0 2 postgres=# select pg_column_size(id),pg_column_size(vname),pg_column_size(remark) from t_kenyon;
pg_column_size | pg_column_size | pg_column_size ----------------+----------------+---------------- 4 | 29 | 851 4 | 29 | 851 4 | 29 | 1651 4 | 29 | 1651 4 | 29 | 2247 4 | 29 | 2247 (6 rows)

postgres=# insert into t_kenyon select generate_series(1,2),repeat('kenyon here'||'^_^',2),repeat('^_^ Kenyon is not God,Remark here!!',10000);
INSERT 0 2 postgres=# select pg_relation_size(121174);
pg_relation_size ------------------ 8192 (1 row)

postgres=# select pg_relation_size(121177);
pg_relation_size ------------------ 16384 (1 row)

postgres=# insert into t_kenyon select generate_series(7,8),repeat('kenyon here'||'^_^',2),repeat('^_^ Kenyon is not God,Remark here!!',20000);
INSERT 0 2 postgres=# select id,pg_column_size(id),pg_column_size(vname),pg_column_size(remark) from t_kenyon;
id | pg_column_size | pg_column_size | pg_column_size ----+----------------+----------------+---------------- 1 | 4 | 29 | 851 2 | 4 | 29 | 851 3 | 4 | 29 | 1651 4 | 4 | 29 | 1651 5 | 4 | 29 | 2247 6 | 4 | 29 | 2247 7 | 4 | 29 | 8056 8 | 4 | 29 | 8056 (8 rows)

postgres=# select pg_relation_size(121174);
pg_relation_size ------------------ 8192 (1 row)

postgres=# select pg_relation_size(121177);
pg_relation_size ------------------ 24576 (1 row)

可以看到后插入的数据随着字段内容的增多,toast段一直在变大。这个和Oracle存储的大字段内容比较像,Oracle存储Blob类的数据时也是指定另外的segment来存储,而不是在原表中存储,当然可以设置enable storage in row来指定表中存储,所以Oracle里的表异常大时一般不是水位线过高就是表字段里存了大数据类型了。 

四、TOAST的优缺点 
1.可以存储超长超大字段,避免之前不能直接存储的限制 
2.物理上与普通表是分离的,检索查询时不检索到该字段会极大地加快速度 
3.更新普通表时,该表的Toast数据没有被更新时,不用去更新Toast表 

Toast的劣势: 
 1.对大字段的索引创建是一个问题,有可能会失败,其实通常也不建议在大字段上创建,全文检索倒是一个解决方案。 
2.大字段的更新会有点慢,其它DB也存在,通病 

五、其他 
Toast的源码: 
postgresql-9.2.3/src/backend/access/heap/tuptoaster.c 
postgresql-9.2.3/src/backend/catalog/toasting.c

发表评论

电子邮件地址不会被公开。 必填项已用*标注