split即一款分割文件的小工具,可以根据设定的大小(如行数、字节数等)将一个文件等分成更小的文件。若文件大小超出文件系统支持的单文件最大值,或由于网络传输的限制,此时将大文件切分成同等大小的小文件,则可以很好的解决这些问题。

split help

[test@server ~]$ split --help
Usage: split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   use suffixes of length N (default 2)
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes  use numeric suffixes instead of alphabetic
  -l, --lines=NUMBER      put NUMBER lines per output file
      --verbose           print a diagnostic just before each
                            output file is opened
      --help     display this help and exit
      --version  output version information and exit

SIZE may be (or may be an integer optionally followed by) one of following:
KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.

Report split bugs to [email protected]
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
For complete documentation, run: info coreutils 'split invocation'

使用

查看文件大小

[root@server ~]# ll -h
total 4.0K
-rw-r--r-- 1 root root 1.5K Mar 12 10:19 netstat.log.bz2

按bytes分割文件

[root@server ~]# split -d -b 1K netstat.log.bz2 netstat.log.bz2.
[root@server ~]# ll -h
total 12K
-rw-r--r-- 1 root root 1.5K Mar 12 10:19 netstat.log.bz2
-rw-r--r-- 1 root root 1.0K Mar 12 10:22 netstat.log.bz2.00
-rw-r--r-- 1 root root  500 Mar 12 10:22 netstat.log.bz2.01

测试分割后文件的完整性

[root@server ~]# bzip2 -v -t netstat.log.bz2.00 
  netstat.log.bz2.00: file ends unexpectedly

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

[root@server ~]# bzip2 -v -t netstat.log.bz2.01 
  netstat.log.bz2.01: bad magic number (file not created by bzip2)

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

合并分割后的文件

[root@server ~]# cat netstat.log.bz2.0[0-1] > netstat.log.recover.bz2

测试合并后文件的完整性

[root@server ~]# bzip2 -v -t netstat.log.recover.bz2 
  netstat.log.recover.bz2: ok

注意

-a  指定后缀名的长度。根据数字或字母,可以确定分割后的最大文件数
	如果后缀为数字[0-9],则分割后最多有 10 ** ${suffix_length};
	如果后缀为字母[a-z],则分割后最多有 26 ** ${suffix_length};

通常在使用split分割文件前,根据原文件大小,分割后大小,估算下分割后文件数量,以此确定合适的分割分割后缀和后缀长度,否则可能出现后缀不够用的情况。

查看要分割的文件大小

[test@server ~]$ wc netstat.log 
  302  1918 23945 netstat.log

按行分割文件

[test@server ~]$ split -a 2 -d -l 2 --verbose netstat.log netstat.log.
creating file `netstat.log.00'
... ...
creating file `netstat.log.99'
split: output file suffixes exhausted

该文件有302行,按2行一个文件进行分割,则会产生151(302/2)个文件。但在分割时,使用数字为后缀,长度为2,则最多能够产生 10 ** 2 = 100个文件,显然不够用。

参考



blog comments powered by Disqus

Published

29 March 2013

Categories

Tags

Github