Split on Linux
split即一款分割文件的小工具,可以根据设定的大小(如行数、字节数等)将一个文件等分成更小的文件。若文件大小超出文件系统支持的单文件最大值,或由于网络传输的限制,此时将大文件切分成同等大小的小文件,则可以很好的解决这些问题。
split help
[[email protected] ~]$ split --help
Usage: split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-l, --lines=NUMBER put NUMBER lines per output file
--verbose print a diagnostic just before each
output file is opened
--help display this help and exit
--version output version information and exit
SIZE may be (or may be an integer optionally followed by) one of following:
KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.
Report split bugs to [email protected]
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
For complete documentation, run: info coreutils 'split invocation'
使用
查看文件大小
[[email protected] ~]# ll -h
total 4.0K
-rw-r--r-- 1 root root 1.5K Mar 12 10:19 netstat.log.bz2
按bytes分割文件
[[email protected] ~]# split -d -b 1K netstat.log.bz2 netstat.log.bz2.
[[email protected] ~]# ll -h
total 12K
-rw-r--r-- 1 root root 1.5K Mar 12 10:19 netstat.log.bz2
-rw-r--r-- 1 root root 1.0K Mar 12 10:22 netstat.log.bz2.00
-rw-r--r-- 1 root root 500 Mar 12 10:22 netstat.log.bz2.01
测试分割后文件的完整性
[[email protected] ~]# bzip2 -v -t netstat.log.bz2.00
netstat.log.bz2.00: file ends unexpectedly
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
[[email protected] ~]# bzip2 -v -t netstat.log.bz2.01
netstat.log.bz2.01: bad magic number (file not created by bzip2)
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
合并分割后的文件
[[email protected] ~]# cat netstat.log.bz2.0[0-1] > netstat.log.recover.bz2
测试合并后文件的完整性
[[email protected] ~]# bzip2 -v -t netstat.log.recover.bz2
netstat.log.recover.bz2: ok
注意
-a 指定后缀名的长度。根据数字或字母,可以确定分割后的最大文件数
如果后缀为数字[0-9],则分割后最多有 10 ** ${suffix_length};
如果后缀为字母[a-z],则分割后最多有 26 ** ${suffix_length};
通常在使用split分割文件前,根据原文件大小,分割后大小,估算下分割后文件数量,以此确定合适的分割分割后缀和后缀长度,否则可能出现后缀不够用的情况。
查看要分割的文件大小
[[email protected] ~]$ wc netstat.log
302 1918 23945 netstat.log
按行分割文件
[[email protected] ~]$ split -a 2 -d -l 2 --verbose netstat.log netstat.log.
creating file `netstat.log.00'
... ...
creating file `netstat.log.99'
split: output file suffixes exhausted
该文件有302行,按2行一个文件进行分割,则会产生151(302/2)个文件。但在分割时,使用数字为后缀,长度为2,则最多能够产生 10 ** 2 = 100个文件,显然不够用。
参考
blog comments powered by Disqus