Huffman encoding: how to write binary data in Python -


i have tried methods using struct module, shown lines commented out in code, didn't work out. have 2 options: can either write binary data code code (my code sequences of bits of length varying 3 13 bits), or convert whole string of n characters (n=25000+ in case) binary data. don't know how implement either methods. code:

import heapq import binascii import struct  def createfrequencytuplelist(inputfile):     frequencydic = {}      intputfile = open(inputfile, 'r')     line in intputfile:         char in line:             if char in frequencydic.keys():                 frequencydic[char] += 1             else:                 frequencydic[char] = 1      intputfile.close()     tuplelist = []     mykey in frequencydic:         tuplelist.append((frequencydic[mykey],mykey))     return tuplelist  def createhuffmantree(frequencylist):     heapq.heapify(frequencylist)     n = len(frequencylist)     in range(1,n):         left = heapq.heappop(frequencylist)         right = heapq.heappop(frequencylist)         newnode = (left[0] + right[0], left, right)         heapq.heappush(frequencylist, newnode)     return frequencylist[0]  def printhuffmantree(mytree, somecode,prefix=''):     if len(mytree) == 2:         somecode.append((mytree[1] + "@" + prefix))     else:         printhuffmantree(mytree[1], somecode,prefix + '0')         printhuffmantree(mytree[2], somecode,prefix + '1')  def parsecode(char, mycode):     k in mycode:         if char == k[0]:             return k[2:]   if __name__ == '__main__':     mylist = createfrequencytuplelist('input')     myhtree = createhuffmantree(mylist)     mycode = []     printhuffmantree(myhtree, mycode)     inputfile = open('input', 'r')     outputfile = open('encoded_file2', "w+b")     asciistring = ''     n=0     line in inputfile:         char in line:             #outputfile.write(parsecode(char, mycode))             asciistring += parsecode(char, mycode)             n += len(parsecode(char, mycode))     #values = asciistring     #print n     #s = struct.struct('25216s')     #packed_data = s.pack(values)     #print packed_data     inputfile.close()     #outputfile.write(packed_data)     outputfile.close() 

you're looking this:

packed_data = ''.join(chr(int(asciistring[i:i+8], 2))                           in range(0, len(asciistring), 8)) 

it take 8 bits @ time asciistring, interpret integer, , output corresponding byte.

your problem here requires length of asciistring multiple of 8 bits work correctly. if not, you'll insert 0 bits before last few real bits.

so need store number of bits in last byte somewhere, know ignore bits when them back, instead of interpreting them zeros. try:

packed_data = chr(len(asciistring) % 8) + packed_data 

then when read back:

packed_input = coded_file.read() last_byte_length, packed_input, last_byte = (packed_input[0],                                               packed_input[1:-1],                                               packed_input[-1]) if not last_byte_length: last_byte_length = 8 ascii_input = ''.join(chain((bin(ord(byte))[2:].zfill(8) byte in packed_input),                       tuple(bin(ord(last_byte))[2:].zfill(last_byte_length),))) # or # ascii_input = ''.join(chain(('{0:0=8b}'.format(byte) byte in packed_input), #                       tuple(('{0:0=' + str(last_byte_length) + '8b}').format(last_byte),))) 

edit: either need strip '0b' strings returned bin() or, on 2.6 or newer, preferably use new, alternate versions added use string formatting instead of bin(), slicing, , zfill().

edit: eryksun, use chain avoid making copy of ascii string. also, need call ord(byte) in bin() version.


Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -