Over the many years I’ve been a WordPress engineer I’ve performed countless launches and migrations. It’s actually one of my favourite things to do. Often times that involves manipulating large amounts of data and as such I’ve learnt about a bunch of command line tools to help me process and manipulate data and assets.
My skills on the command line are still pretty rough though. I’ve learnt what I need to know and most of the time it takes me a little while, including searching the web, to figure out exactly what command(s) will get me the end result. Sometimes this is hours of figuring things out. Each time I learn something new and I’m quicker the next time, which is great.
Which is where AI comes in…
Now amongst the hype about AI tools like ChatGPT taking our jobs I wanted to see if it could help me be more efficient. I was pleasantly surprised with how ChatGPT not only provides you with answers but explains them. In practice I found that I could both save myself time searching through slightly-related Stack Overflow answers and have a better learning experience at the same time.
To demonstrate what I mean I’ll take you through a real life example of how I used ChatGPT to help me with a migration task. I’m moving a WordPress site with 7 years of posts and over 800,000 files in the uploads folder that I need to copy to the new host (in this case, WordPress.com VIP).
VIP accepts an archived uploads folder which is limited to 60GB in size, so we want to create an archive that includes only the original media and not the intermediate sizes that WordPress automatically creates for us. On VIP, these intermediate sizes are generated dynamically.
Now I know how to use find
and I know how to create folder archives with tar
but I’m not sure how best to combine these to create an archive without the intermediate images. This is where I find AI tools really helpful, but you also have to know how to ask the question so that ChatGPT can understand what you want. This is what I went with:
I have a folder named “uploads”. Within that folder a year folders (e.g. “uploads/2023/”). Within each year folder are month folders (e.g. “uploads/2023/05/”). Within each month folder are media files such as images, PDFs etc. For each image file (e.g. “Wizards.jpg”) there are additional, resized, copies with a naming convention that includes the image dimensions. For example, Wizards.jpg is copied to Wizards-768×283.jpg, Wizards-319×179.jpg and so on. There are thousands of such files, we do not know the name of each file or the sizes, only the filename format. I need an archive (.zip or .tar.gz) of the entire uploads folder but I want to exclude all of the resized images and only include the originals. For example, I want Wizards.jpg but not Wizards-319×179.jpg. When extracted, the archive must create the uploads folder. I need to do this using the linux command line (bash). What command(s) will accomplish this?
Note how I’m very concise and specific. I want to make sure to avoid any ambiguity that could be misinterpreted or misunderstood. Like any piece of software the output is only ever going to be as good as the input.
The initial response is good, but I see an obvious issue:
I understand that the -not -name '-.*'
portion will exclude any media files with a hyphen in, not just those with intermediate sizes. My question wasn’t clear enough! I raise the issue and ask for a refined command:
I have a file at uploads/2023/05/White_House-1.jpg that I believe would be excluded by the proposed command. Refine it to ensure such files are included.
I got refined command back, but see if you can spot the issue:
find uploads/ -type f | grep -vE '[-][0-9]+x[0-9]+[.][^/]*$' | tar czvf originals.tar.gz -T -
Now we have grep
with a regex too – fun! But it’s focusing too much on those intermediate file sizes. Again, my question wasn’t specific enough, so I asked:
Will this file be included?: uploads/wpforms/cache/index.html
Sure enough, it recognised it wouldn’t and provided an update. After some back and forth it reintroduced the problematic ! -name '*-*.*'
clause so I realised I needed to be much clearer about my expectations:
Nearly there… to test the command I created the following directory structure and files:
uploads/2021/01/White_House-1-150×150.jpg
uploads/2021/01/White_House-1.jpg
uploads/2023/05/Wizards-768×283.jpg
uploads/2023/05/Wizards-319×179.jpg
uploads/2023/05/Wizards.jpg
uploads/blog-1-ads.txt
uploads/wpforms/cache/index.htmlThis is what I expect the archive to include:
uploads/2021/01/White_House-1.jpg
uploads/2023/05/Wizards.jpg
uploads/blog-1-ads.txt
uploads/wpforms/cache/index.htmlHowever, the command produces an archive with only:
uploads/2023/05/Wizards.jpg
uploads/wpforms/cache/index.htmlCan you refine the command to match my expectation?
As you can see I’m now giving it very good test data and very clear expectations of output. However, it got stuck and gave me this:
find uploads/ -type f ! -name '*-*.*' ! -name '*[0-9]x[0-9]*.*' | tar czvf originals.tar.gz -T -
It really likes that regex!
But I knew this wasn’t going to work, but I used my existing knowledge to fix the command:
find uploads/ -type f ! -name '*[0-9]x[0-9]*.*' | tar czvf originals.tar.gz -T -
So you see, AI tools like ChatGPT can be incredibly useful when you have incomplete knowledge and need some help figuring out the rest. With good questions and input AI can help you speed up figuring out tasks like this.
Of course, with it being so polite I had to let ChatGPT know the command wasn’t quite right and what the correct command is. Perhaps it’ll be able to learn from that, just like it helped me learn today.
So polite
Leave a Reply