Bashing my way through shell scripting
Posted on Tuesday, December 18th (4016 days ago)
As I work my way through migrating a huge web application from Microsoft Technologies to open source equivalents, there are of course some processes that are uglier to tackle than others.
One of the major headaches up to this point has been dealing with the IDX ftp data downloads. On my IIS server (windows), downloading the files, extracting and rebuilding the database is a breeze. It only takes 5 minutes of my time, even doing it completely by hand. Using a windows server had spoiled me in a couple ways though.
Let me explain.
In windows, a lowercase “a” is treated the same as an uppercase “A”. They’re both the letter A. Pretty simple. Not in the Unix environment. Not in Debian Linux which is the operating system on my newer ‘nix server. “A” and “a” are two separate and unequal things. This shouldn’t present a big problem, right? Wrong.
As I was building out the core engine of the search system, I noticed almost half of the property photos weren’t showing up. What’s going on? I scan the file system - there they are? Hmmm… I notice that some of the photos are not named identically.
Some are named “capemay123456.jpg”. Some are named “CapeMay123456.jpg”. Some are even named “CAPEMAY123456.jpg”. To complicate matters further, some of the 3 naming conventions listed above use an uppercase file extension like “capemay123456.JPG”. That make 6 naming possibilities for any given photo out of the 50,000+/- being requested at any time.
My first thought was to write a PHP script that checks for the existence of each of the 6 file names than stops once it finds a match, or doesn’t at all. That’s 7 possible loops through a directory of over 50,000 photos. Not exactly efficient.
I didn’t notice a particular slowdown as lists of homes and photos appeared, but I knew that behind the scenes I was seriously pissing off both the server’s processor and hard drives.
If it doesn’t work, bash it
Okay, coming from a windows environment means that you know nothing about shell scripting. It’s like some alien language from years past. I knew to solve this conundrum, I’d have to dive in.I first tried renaming everything to lowercase using a script like:
rename 'y/A-Z/a-z/' *
Nope. Way too many files for this type of operation. Okay, obviously I have to wack away at the photos a little at a time, say 5000 at a time:
rename 'y/A-Z/a-z/' * `ls |head -5000`
It works! Running this command 10 times or so, every photo is renamed and my life is easier.One more challenge overcome towards having a fancy new open source system that kicks ass.
About Joseph R. B. Taylor
Joseph R. B. Taylor is a designer/developer who makes stuff for screens of all shapes and sizes. He currently works at Edvisors, Inc. where he creates screen-based experiences for used by millions of college students every year.