Simple script to minify web resources

A couple of years ago, I blogged about a way to minify HTML with sed. Now, I streamlined this in a script.

How it works

The script will copy the original directory (or single file) and leave it untouched, while removing any useless space and comments from any .htm, .html, .php, .css and .js file in the copy.

Running the script

Easy one:

If the target already exists, it will be removed first (you will need to confirm).

The code

The basics

Much is transparent or commented. The minification is the only really interesting part. So let’s go at it!

The minification

I used the trick I wrote about yesterday.

Basically, the minification is all done here:

This code is commented and should not require more than is already said.

Now, the essential enhancement I made since the sed era is the removal of all comments and correction of my code to remove all new lines.

I abandoned sed and went to Perl because sed does not support the non-greedy operator, which greatly simplified my work when removing delimited patterns (typically multiline comments).

Getting the full script

It is available on Github. Don’t hesitate to fork and make it better.

Note: the competition

There is none: this is not a tool I would recommend in production environment (for lack of test, for instance).

Why this script is good

It is a quick fix for simple and mostly static websites. It removes potentially sensitive information (many website have their .git repository deployed, containing everything in it, and comments can be a source of leak too, be it only be advertising the technology behind the scenes).

Why this script is bad

And yet, many tools were specifically designed for this purpose. See Grunt or Gulp plugins for JavaScript projects, or wro4j for Java projects. I imagine tools exist also for ASP or PHP projects.

I needed something working quickly and without effort. This is it.

Combine it

You can use it on your server to automatically deploy the latest version of your website. Imagine:

How to make a regex pattern that matches except if…

Short answer

We will take an example: match all /* ... */-type comments except when in string (in our example: JavaScript, so both single and double quotes will have to be excluded).

  1. Write the regex for what you want to match (later called m; our example: /\*.*?\*/1).
  2. Write the regular expressions for matching your exclusions (later called e1, e2, e3; our example: '[^']+' and "[^"]+").

Regex #1

  1. Write your final regex: e1|e2|(m) (our example: '[^']+'|"[^"]+"|(/\*.*?\*/)).
  2. Match group 1 instead of matching group 0.

Note: This requires code to parse the result. If you need something simple (sed? Which you would replace with a Perl command? Or just something you can use with match), see below.

Regex #2: alternative for Perl and PCRE (PHP)

  1. Write your final regex: (?:e1|e2)(*SKIP)(?!)|m (our example: (?:'[^']+'|"[^"]+")(*SKIP)(?!)|/\*.*?\*/).
  2. Match group 0 as usual.

Note: You can execute this expression as a Perl command: perl -pe '(?:\'[^\']+\'|"[^"]+")(*SKIP)(?!)|/\*.*?\*/'

You can test the regex and have it explained at this link.

Something more detailed

Continue reading How to make a regex pattern that matches except if…