In part 1, I described a solution for processing the CW audio and extracting the carrier level changes that represent the dots and dashes. The next step was to develop the software to translate this into characters and word breaks.
Morse Code
Morse code is unusual in that there is no fixed length for a character - the number of symbols (dots/dashes) ranges from 1 to 7. If we consider a dot to be represented by a binary 0 and a dash by 1 then all possible combinations can be represented by using 7 bits of a byte giving 127 values. However, these values would not be unique - for instance E, I, S, H and 5 would all have the value 0. Therefore the number of symbols also needs to be taken into account.
I created a 2 dimensional array to map characters to the dot/dash combinations. I started with a 128 x 7 array which caters for all the characters encoded into a length of up to 7 symbols. This would reserve 896 bytes of memory, much of which is not used - for instance, the 1 symbol characters only need 2 values (E and T), the 2 symbol characters 4, etc.
As a compromise between memory usage and processing speed I mapped the 6 and 7 symbol characters into column zero of the array. Fortunately this could be done without any clash of values as shown in the table. The result is a 32 x 6 array which saves over 700 bytes but is simple to process.
Timing
To determine whether the signal is a dot or a dash, the decoding algorithm needs to know how long the signal is high for and to determine when a character ends and when a word ends it also needs to know how long the signal is low for.
The unit of time used for these measurements is the length of a dot which I will call 1 bit time. All other Morse code timings are multiples of this bit time as listed in the example below.
As there is no timing signal in Morse code, the bit time needs to be determined before the signal can be decoded. The first version of my decoder monitored the signal for a few seconds before starting to decode. This allowed the shortest high/low state to be found and assumed this to be 1 bit time. However, the disadvantage of this approach is that the timing will change - both during a transmission and when switching to another signal. An additional timing algorithm was needed to dynamically adapt when the timing varied.
The result was a sampling algorithm that detected high-low and low-high changes of state and measured the time between them. These were used to maintain a moving average bit time against which the high/low time was compared:
High time < 2 x bit time = dot
High time > 2 x bit time = dash
Low time < 2 x bit time = element space
Low time > 5 x bit time = word space
Low time between 2 x bit time and 5 x bit time = character space.
The accuracy of the bit timing was limited by the sampling time of 3.3ms. At 20 words per minute, the nominal bit time is 60ms which gives 5% uncertainty for each edge detection which is not a problem but it would become more significant at higher transmission rates.
The sampling could not be run continuously as I needed processing time for other activities. I therefore waited half a bit time from the last transition before starting to sample for the next transition. The resulting high level software flow chart is shown here.
This solution worked well on training Morse code videos where the recording was made in ideal conditions but it struggled with real life audio and had a number of limitations. The next update will look at these issues and how I overcame them.
Reference
Most of the Morse code information is taken from Wikipedia