Bash Script Fu....

**Ynot** · 03-04-2008 05:18 PM

I know maybe not the best forum to ask
but I know most people here, rather than asking in another place and not getting any answers

anyway, I have this:

Code:

#!/bin/sh

echo "To: [email protected]
Subject: JG Server - Weekly Checkup
***** JG Server - Weekly Checkup *****
" > /tmp/jg_weekly.txt

uptime >> /tmp/jg_weekly.txt

echo -n "
Hostname:        " >> /tmp/jg_weekly.txt
hostname >> /tmp/jg_weekly.txt

echo -n "Hostname (FQDN):    " >> /tmp/jg_weekly.txt
hostname -f >> /tmp/jg_weekly.txt

echo "" >> /tmp/jg_weekly.txt

df -h --type=ext3 >> /tmp/jg_weekly.txt

echo "
Location    Size    Mounted on" >> /tmp/jg_weekly.txt

echo -n "E: Drive:    " >> /tmp/jg_weekly.txt
du -hs /media/data/data >> /tmp/jg_weekly.txt

echo -n "    *.doc =    " >> /tmp/jg_weekly.txt
find /media/data/data -type f -iname '*.doc' -exec du -k {} \; | awk '{sum+=$1} END {split("K,M,G,T", Units, ",");u = 1;while (sum >= 1024){sum = sum / 1024;u += 1}sum = sprintf("%.1f%s", sum, Units[u]);print sum;}' >> /tmp/jg_weekly.txt
echo -n "    *.xls =    " >> /tmp/jg_weekly.txt
find /media/data/data -type f -iname '*.xls' -exec du -k {} \; | awk '{sum+=$1} END {split("K,M,G,T", Units, ",");u = 1;while (sum >= 1024){sum = sum / 1024;u += 1}sum = sprintf("%.1f%s", sum, Units[u]);print sum;}' >> /tmp/jg_weekly.txt
echo -n "    *.jpg =    " >> /tmp/jg_weekly.txt
find /media/data/data -type f -iname '*.jpg' -exec du -k {} \; | awk '{sum+=$1} END {split("K,M,G,T", Units, ",");u = 1;while (sum >= 1024){sum = sum / 1024;u += 1}sum = sprintf("%.1f%s", sum, Units[u]);print sum;}' >> /tmp/jg_weekly.txt
echo -n "    *.dwg =    " >> /tmp/jg_weekly.txt
find /media/data/data -type f -iname '*.dwg' -exec du -k {} \; | awk '{sum+=$1} END {split("K,M,G,T", Units, ",");u = 1;while (sum >= 1024){sum = sum / 1024;u += 1}sum = sprintf("%.1f%s", sum, Units[u]);print sum;}' >> /tmp/jg_weekly.txt
echo -n "    *.pdf =    " >> /tmp/jg_weekly.txt
find /media/data/data -type f -iname '*.pdf' -exec du -k {} \; | awk '{sum+=$1} END {split("K,M,G,T", Units, ",");u = 1;while (sum >= 1024){sum = sum / 1024;u += 1}sum = sprintf("%.1f%s", sum, Units[u]);print sum;}' >> /tmp/jg_weekly.txt

echo -n "Mailboxes:    " >> /tmp/jg_weekly.txt
du -hs /var/mail/virtual >> /tmp/jg_weekly.txt

echo -n "Web Pages:    " >> /tmp/jg_weekly.txt
du -hs /var/www >> /tmp/jg_weekly.txt

cat /tmp/jg_weekly.txt | /usr/sbin/sendmail [email protected]
rm /tmp/jg_weekly.txt

which outputs:

Code:

***** JG Server - Weekly Checkup *****

 16:01:59 up 7 days, 22:13,  1 user,  load average: 0.93, 1.02, 0.81

Hostname:        server1
Hostname (FQDN):    server1.blah.co.uk

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              18G  3.1G   14G  19% /
/dev/sdb1              57G   29G   26G  53% /media/data

Location    Size    Mounted on
E: Drive:    25G    /media/data/data
    *.doc =    2.5G
    *.xls =    24.9M
    *.jpg =    16.2G
    *.dwg =    688.6M
    *.pdf =    1.1G
Mailboxes:    3.1G    /var/mail/virtual
Web Pages:    119M    /var/www

and the thing takes 10+ mins to run
and while it's only going to be run once a week (midnight on sundays), I'd like to speed it up if I can
I know I'm horribly inefficient calculating the different file type totals
does anyone have any advice to speed it up?

Thanks

**ninja9578** · 03-04-2008 07:29 PM

You're summing everything up, why aren't you using the -s parameter instead of -k?

Also, the -h parameter will put it into MB and GB for you (won't it?)

Wait for conformation from somebody else before quoting me on that. I'm not much of a UNIX or script guy. I may have misunderstood my UNIX book. Want gfx help instead?

**Ynot** · 03-04-2008 08:15 PM

Originally Posted by ninja9578

You're summing everything up, why aren't you using the -s parameter instead of -k?

I'm summing the main file types across our network share

du -s will give me a basic total of disk space usage, but for all files / directories (not split down by extension)

Originally Posted by ninja9578

Also, the -h parameter will put it into MB and GB for you (won't it?)

Yes, -h will, but currently, I need everything in a standard denomination (kb's) so I can sum them properly

(indeed, I'm using -s & -h to do the "E:\ Drive" summary before the filetype breakdowns)

I was just wondering if there was a better way to do it?

**Replicon** · 03-04-2008 08:33 PM

What's taking longer, the du or the finds? One obvious optimization is to run find only once, though I would think after the first run, it would go faster due to caching. That's something to check into. You should echo something to stdout between operations as well, just to get a feel for what's taking so long. If it's the repeated find, then you will definitely benefit from it.

If it's the 'du' that's causing problems (and I have no idea whether you have a large number of little files or a small number of large files), then two possible solutions to consider are:

1) mount /media/data/data separately, and you can just run df, which is quick (this requires a separate partition though...)
2) Depending on how your stuff is laid out, you can run 'du' on everything in /media/data EXCEPT /media/data/data and subtract it from your DF results above. Of course, you'll have to rerun DF to get a block-level size and not the human readable, and then muck with the results to format them the right way, but that's no longer a performance limitation.

More sophisticated options might be some way to keep track of size dynamically, but it may not be worth it. Are you expecting this thing to scale to a point where it takes hours to run? A few minutes may not be the end of the world. Maybe if you install disk quotas and enforce quotas on the directory (no max), then it will be kept track of for you and you can just run the disk quota utils to get how much space is being used, but I'm not certain that happens on a directory basis or a user basis.

**Ynot** · 03-04-2008 08:49 PM

Originally Posted by Replicon

What's taking longer, the du or the finds? One obvious optimization is to run find only once, though I would think after the first run, it would go faster due to caching. That's something to check into. You should echo something to stdout between operations as well, just to get a feel for what's taking so long. If it's the repeated find, then you will definitely benefit from it.

If it's the 'du' that's causing problems (and I have no idea whether you have a large number of little files or a small number of large files), then two possible solutions to consider are:

1) mount /media/data/data separately, and you can just run df, which is quick (this requires a separate partition though...)
2) Depending on how your stuff is laid out, you can run 'du' on everything in /media/data EXCEPT /media/data/data and subtract it from your DF results above. Of course, you'll have to rerun DF to get a block-level size and not the human readable, and then muck with the results to format them the right way, but that's no longer a performance limitation.

More sophisticated options might be some way to keep track of size dynamically, but it may not be worth it. Are you expecting this thing to scale to a point where it takes hours to run? A few minutes may not be the end of the world. Maybe if you install disk quotas and enforce quotas on the directory (no max), then it will be kept track of for you and you can just run the disk quota utils to get how much space is being used, but I'm not certain that happens on a directory basis or a user basis.

thanks,
lots (and lots) of small files - almost everything is under 10Mb in size
it's the finds that take the time
but again, using df would not give me the breakdowns according to extension

I think multiple finds is the only way to do this by extensions
(short of a daemon monitoring writes & updating a total throughout the week - which I really don't want to do)

Age old curse,
Apps programmers make poor sys-admins
and sys-admins make poor application programmers

thanks both for having a look

**Replicon** · 03-04-2008 11:09 PM

Oh whoops I completely missed the du -k and awk-fu at the end of each find.

Have you considered using "ls -lR" instead of "find + du"?

**dsr** · 03-04-2008 11:29 PM

It really seems like there should be a way to eliminate all the awk stuff, but I can't help with the actual scripting. However, you might want to run the script in verbose mode (sh -v [filename]) so you can spot the bottleneck. I think we already know where it is, but you never know for sure.

Thread: Bash Script Fu....

LinkBack

Thread Tools

Display

Bash Script Fu....

Bookmarks

Bookmarks

Posting Permissions