Saturday, July 4, 2009

Accented and other UTF Characters

I've worked up a test version of the morse code generator that handles UTF characters (accented characters and letters that exist in Western languages other than English.) However, I haven't pushed that to be the default version of the script yet for a couple of reasons. The first and foremost is the ability of programs at the command line level to deal with truncating a string by the number of UTF characters (where a character may be up to 3 bits in length.) It's a little complex to describe, but for the purposes of naming the files I had to resort to substituting x for the utf characters.... which is less than ideal.

So, for now I've left the live version of the script where it simply strips out utf encoded characters to make life simpler.

The software that I use on the backend to create the mp3 files fully supports utf encoded characters and has the correct morse code for those (as well as punctuation.) But, the nasty problem is that the input has to be sanitized before being parsed by the script because I don't want just ANY characters to be in the variable that becomes the morse text. If the input isn't sanitized "that would be bad" as they say and it might be possible to run commands on the system other than what the script intends.

I am pleased to have seen the volume of traffic we've received in recent weeks though. I appreciate everyone stopping by and taking a look at the morse code ringtones project. Other work has been busy enough in the last few weeks though that I haven't devoted much time to furthering the idea. That's probably just as well though....

I have been drilling myself on recognizing the Prosigns and punctuation that I was always a bit fuzzy on before though. I'm getting a bit better with it, but the punctuation seems to be slow going.

Thanks for stopping by!

0 comments:

Post a Comment