在用find查看系统中一些大文件时,发现一些/var/log下面的文件其真实磁盘占用空间的大小与find中设置的size选项筛选的不一致。如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
$sudo find /var/log -type f -size +200M | xargs -i{} ls -sh {} #其他一些输出 47M /var/log/rpmdbdata.mdb-20201214 47M /var/log/rpmdbdata.mdb 47M /var/log/rpmdbdata.mdb-20201129 47M /var/log/rpmdbdata.mdb-20201206 47M /var/log/rpmdbdata.mdb-20201220 200K /var/log/lastlog $ls -l /var/log/lastlog -rw-r--r-- 1 root root 393861864 Dec 27 23:39 /var/log/lastlog $ls -l /var/log/rpmdbdata.mdb -rw-r--r-- 1 root root 268435456 Dec 27 03:37 /var/log/rpmdbdata.mdb $sudo du -s /var/log/lastlog /var/log/rpmdbdata.mdb 200 /var/log/lastlog # 单位是KB 47596 /var/log/rpmdbdata.mdb |
尽管是用find筛选出文件大小大于200MB的,但其中的/var/log/rpmdbdata.mdb 占用的磁盘空间只有47MB,/var/log/lastlog 则更少 只占用200KB的磁盘空间。
原来 find 中的size筛选针对的是文件大小,而 ls 的 -s 选项是显示实际占用的磁盘空间大小,du 命令也是查看文件占用的磁盘空间。(当然平时的ls -l 不用 -s 选项时 显示的文件大小 而非磁盘空间占用量)
当发现du看到的占用磁盘大小比ls看到的文件大小 小一些时,说明该文件是稀疏文件。
现代很多文件系统都支持稀疏文件(sparse file),稀疏文件内存在空洞(hole)就是一些值为0的空间,在存储时这部分只存储一些元数据表示空洞而不是真正存有为0的值,这样起到节省磁盘空间的作用。在虚拟化中的磁盘镜像,经常用到稀疏文件(qcow2/raw等格式都支持sparse file)。稀疏文件示意图 如下图所示:
同时在find命令中使用 -printf "%S"
也是可以直接打印出稀疏文件的稀疏值的,稀疏文件的稀疏值是小于1的。演示如下:
1 2 3 4 5 6 7 8 |
$sudo find /var/log -type f -printf "%S\t%p\n" | awk '$1 < 1.0 {print}' 0.180618 /var/log/rpmdbdata.mdb-20201214 0.181564 /var/log/rpmdbdata.mdb 0.180618 /var/log/rpmdbdata.mdb-20201129 0.180618 /var/log/rpmdbdata.mdb-20201206 0.181091 /var/log/rpmdbdata.mdb-20201220 0.000519979 /var/log/lastlog |
可以看到 /var/log/rpmdbdata.mdb 的稀疏值为0.181564,/var/log/lastlog 的稀疏值则更小。满足这个公式:文件大小 * 稀疏值 = 实际的磁盘占用空间大小
查询的man手册中关键信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
man ls -h, --human-readable with -l, print sizes in human readable format (e.g., 1K 234M 2G) -s, --size print the allocated size of each file, in blocks -S sort by file size (largest first 文件大的在前面) man find -size n[cwbkMG] File uses n units of space. The following suffixes can be used: `b' for 512-byte blocks (this is the default if no suffix is used) `c' for bytes `w' for two-byte words `k' for Kilobytes (units of 1024 bytes) `M' for Megabytes (units of 1048576 bytes) `G' for Gigabytes (units of 1073741824 bytes) The size does not count indirect blocks, but it does count blocks in sparse files that are not actually allocated. -printf format : print format on the standard output %k The amount of disk space used for this file in 1K blocks. Since disk space is allocated in multiples of the filesystem block size this is usually greater than %s/1024, but it can also be smaller if the file is a sparse file. %p File's name. %s File's size in bytes. %S File's sparseness. This is calculated as (BLOCKSIZE*st_blocks / st_size). The exact value you will get for an ordinary file of a certain length is system-dependent. However, normally sparse files will have values less than 1.0, and files which use indirect blocks may have a value which is greater than 1.0. The value used for BLOCKSIZE is system-dependent, but is usually 512 bytes. If the file size is zero, the value printed is undefined. On systems which lack support for st_blocks, a file's sparseness is assumed to be 1.0. |
参考资料:
https://wiki.archlinux.org/index.php/sparse_file
https://www.lisenet.com/2014/so-what-is-the-size-of-that-file/
man ls, man find