Subtitle Script for I to l Converstions

Posted by JD 11/30/2009 at 18:45

A quick script to change a capital I (eye) in the middle of a word into a lowercase l (el). If you like Asian films, you understand why I wrote this script. I had an itch. It needed to be scratched. This is useful for .srt files used in movie subtites.

#!/usr/bin/perl
# Perl script to change every 'I' into an 'l' in the middle of a word
# input is stdin and output is to stdout; redirection is your friend
my $line;
while(<>){
 chomp;
 $line=$_;
 $_=$line;
# Match lines with non-whitespace characters leading a capital I 
  if ( m/[\S]I/ ){
     $line =~ tr/I/l/;
  }
  print "$line\n";
}

It is very common for subtitle files, SRT format, to have a capital I in the middle of words since bitmap patterns are used to create the files. For native speakers of English, this is HIGHLY distracting – to the point that the subtitles must be fixed before a movie can be enjoyed.

I tried a few other methods, before determining this simple character translation was needed.

  1. ispell – There were too many words that were not in the dictionary and spacing of words often groups them in strange ways.
  2. replacement dictionary – I created a hundred word dictionary replacement sed script. There were always new words that needed to be added for every SRT file.
  3. Manual editing – yep, I spent a few hours manually editing files. This wasn’t very efficient and ruined the movie plot since I’d already read it before viewing it.

Some combination of methods will probably be necessary. I intend to merge them into a single perl script and perform them in the most efficient order. It will begin with the I—>l translation.

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=382