For example, consider the following data, from a file called "data.txt", which is filled with the output of "ps aux":
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 23824 1828 ? Ss Oct24 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S Oct24 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Oct24 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S Oct24 0:53 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Oct24 0:00 [watchdog/0]
www-data 982 0.0 0.0 142728 5840 ? S Nov01 0:00 /usr/sbin/apache2 root 989 0.0 0.0 21268 5616 ? Ss Oct24 0:13 /sbin/mount.ntfs root 1000 0.0 0.0 0 0 ? S Oct24 0:00 [kjournald]
root 1003 0.0 0.0 0 0 ? S Oct24 0:00 [kjournald]
syslog 1027 0.0 0.0 192284 1336 ? Sl Oct24 0:01 rsyslogd -c4
106 1038 0.0 0.0 24644 2112 ? Ss Oct24 0:04 dbus-daemon root 1046 0.0 0.0 28804 924 ? Ss Oct24 0:00 kdm
root 1057 0.0 0.0 84776 3712 ? Ss Oct24 0:00 NetworkManager
avahi 1058 0.0 0.0 34060 1468 ? S Oct24 0:03 avahi-daemon: avahi 1059 0.0 0.0 33928 312 ? Ss Oct24 0:00 avahi-daemon: root 1065 0.0 0.0 57868 2004 ? S Oct24 0:00 /usr/sbin/modem
root 1 0.0 0.0 23824 1828 ? Ss Oct24 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S Oct24 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Oct24 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S Oct24 0:53 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Oct24 0:00 [watchdog/0]
www-data 982 0.0 0.0 142728 5840 ? S Nov01 0:00 /usr/sbin/apache2 root 989 0.0 0.0 21268 5616 ? Ss Oct24 0:13 /sbin/mount.ntfs root 1000 0.0 0.0 0 0 ? S Oct24 0:00 [kjournald]
root 1003 0.0 0.0 0 0 ? S Oct24 0:00 [kjournald]
syslog 1027 0.0 0.0 192284 1336 ? Sl Oct24 0:01 rsyslogd -c4
106 1038 0.0 0.0 24644 2112 ? Ss Oct24 0:04 dbus-daemon root 1046 0.0 0.0 28804 924 ? Ss Oct24 0:00 kdm
root 1057 0.0 0.0 84776 3712 ? Ss Oct24 0:00 NetworkManager
avahi 1058 0.0 0.0 34060 1468 ? S Oct24 0:03 avahi-daemon: avahi 1059 0.0 0.0 33928 312 ? Ss Oct24 0:00 avahi-daemon: root 1065 0.0 0.0 57868 2004 ? S Oct24 0:00 /usr/sbin/modem
daemon 1220 0.0 0.0 18884 308 ? Ss Oct24 0:00 atd
root 1221 0.0 0.0 21076 852 ? Ss Oct24 0:00 cron
mysql 1283 0.0 0.2 178828 13420 ? Ssl Oct24 3:26 /usr/sbin/mysqld
proxy 1461 0.0 0.0 6552 784 ? Ss Oct24 0:00 /usr/bin/polipo root 1472 0.0 0.0 97040 1032 ? Ss Oct24 0:00 /usr/bin/svnserve 120 1480 0.0 0.3 51336 21180 ? S Oct24 1:01 /usr/sbin/tor
yossarian 1505 0.1 0.3 400928 23180 ? Ssl Oct24 29:06 /usr/bin/mpd
root 1221 0.0 0.0 21076 852 ? Ss Oct24 0:00 cron
mysql 1283 0.0 0.2 178828 13420 ? Ssl Oct24 3:26 /usr/sbin/mysqld
proxy 1461 0.0 0.0 6552 784 ? Ss Oct24 0:00 /usr/bin/polipo root 1472 0.0 0.0 97040 1032 ? Ss Oct24 0:00 /usr/bin/svnserve 120 1480 0.0 0.3 51336 21180 ? S Oct24 1:01 /usr/sbin/tor
yossarian 1505 0.1 0.3 400928 23180 ? Ssl Oct24 29:06 /usr/bin/mpd
Now suppose that you wanted to put each user's processes into a separate file. This example is a little contrived, but works just as well when you have a file with tons of entries with the first column as the key and you wish to split them into multiple files based on the key.
The first row is the header, and we can eliminate that with sed. The splitting part is the coll part is accomplished with awk's field variables:
awk '{print > $1".txt"}
All this says is print the line a text file named after the value of the first column, effectively splitting the original text file into multiple text files, each named after the column entry and with all rows that start with that column entry.
A quick ls shows us that exactly the files we wanted were created:
-rw-r--r-- 1 yossarian yossarian 235 2010-11-04 19:30 yossarian.txt
-rw-r--r-- 1 yossarian yossarian 92 2010-11-04 19:30 www-data.txt
-rw-r--r-- 1 yossarian yossarian 78 2010-11-04 19:30 syslog.txt
-rw-r--r-- 1 yossarian yossarian 1178 2010-11-04 19:30 root.txt
-rw-r--r-- 1 yossarian yossarian 135 2010-11-04 19:30 proxy.txt
-rw-r--r-- 1 yossarian yossarian 82 2010-11-04 19:30 mysql.txt
-rw-r--r-- 1 yossarian yossarian 69 2010-11-04 19:30 daemon.txt
-rw-r--r-- 1 yossarian yossarian 206 2010-11-04 19:30 avahi.txt
-rw-r--r-- 1 yossarian yossarian 79 2010-11-04 19:30 120.txt
-rw-r--r-- 1 yossarian yossarian 93 2010-11-04 19:30 106.txt
-rw-r--r-- 1 yossarian yossarian 92 2010-11-04 19:30 www-data.txt
-rw-r--r-- 1 yossarian yossarian 78 2010-11-04 19:30 syslog.txt
-rw-r--r-- 1 yossarian yossarian 1178 2010-11-04 19:30 root.txt
-rw-r--r-- 1 yossarian yossarian 135 2010-11-04 19:30 proxy.txt
-rw-r--r-- 1 yossarian yossarian 82 2010-11-04 19:30 mysql.txt
-rw-r--r-- 1 yossarian yossarian 69 2010-11-04 19:30 daemon.txt
-rw-r--r-- 1 yossarian yossarian 206 2010-11-04 19:30 avahi.txt
-rw-r--r-- 1 yossarian yossarian 79 2010-11-04 19:30 120.txt
-rw-r--r-- 1 yossarian yossarian 93 2010-11-04 19:30 106.txt
Combining the two commands into one gives us:
sed 1d data.txt | awk '{print > $1".txt"}'
That's some catch indeed. Enjoy your one-liner text processing.
No comments:
Post a Comment