miércoles, 17 de abril de 2013

Extra points: Huffman coding

2. Table4.1 gives the relative frequencies,in English prose minus punctuation and blanks, ignoring capitalization, of the alphabetic characters a, b, . . . , z, estimated by examination of a large block of English prose, believed to be typical. This table is copied, with one small change, from [6, Appendix 1]. Find an optimal (with respect to average code word length) prefix-condition encoding scheme for S = {a, b, ... , z} if


(a) A={0,1};
(b) A={0,1,∗}.
(c) What are the lengths of the shortest fixed-length encoding schemes, resulting in uniquely decodable codes, for S, in cases (a) and (b), above?

For do this exercise I made a program that are pretty similar that my last homework where by doing the huffman tree we have the optimal prefix-condition, this is the code.



So we have like result for the question a:

  • 'a': '0000', 
  • 'c': '010001', 
  • 'b': '0111000',
  •  'e': '1', 
  • 'd': '00011', 
  • 'g': '0111011',
  •  'f': '001101', 
  • 'i': '01101', 
  • 'h': '01010', 
  • 'k': '00110011', 
  • 'j': '0011001011',
  •  'm': '001111', 
  • 'l': '00010', 
  • 'o': '01111',
  • 'n': '01100', 
  • 'q': '0011001001', 
  • 'p': '0111001', 
  • 's': '01011',
  •  'r': '01001', 
  • 'u': '010000', 
  • 't': '0010', 
  • 'w': '001110',
  •  'v': '0011000', 
  • 'y': '0111010', 
  • 'x': '0011001010', 
  • 'z': '0011001000'
Then I modified the code to have another node, so in the alphabet we have a new character and also a new coding data.


  • 'a': '11', 
  • 'c': '*01',
  •  'b': '*0**', 
  • 'e': '1'
  • 'd': '*11', 
  • 'g': '*1**', 
  • 'f': '0*0',
  •  'i': '01', 
  • 'h': '**1', 
  • 'k': '*0*1*', 
  • 'j': '*0*10', 
  • 'm': '0**', 
  • 'l': '*10',
  •  'o': '10', 
  • 'n': '00', 
  • 'q': '*0*111', 
  • 'p': '*1*0', 
  • 's': '***', 
  • 'r': '**0', 
  • 'u': '*00', 
  • 't': '1*', 
  • 'w': '0*1', 
  • 'v': '*0*0', 
  • 'y': '*1*1', 
  • 'x': '*0*11*', 
  • 'z': '*0*110'

For the next question we have that e are the last in the list and only have a number 1 as a bit of codification and for the question b also have e with one bit, but in general we have a code more shorter in the second one than in the first one.

1 comentario: