utf 16 - utf-16 file seeking in python. how? -


for reason can not seek utf16 file. produces 'unicodeexception: utf-16 stream not start bom'. code:

f = codecs.open(ai_file, 'r', 'utf-16') seek = self.ai_map[self._cbclass.text]  #seek valid int f.seek(seek) while true:     ln = f.readline().strip() 

i tried random stuff first reading stream, didnt help. checked offset seeked using hex editor - string starts @ character, not null byte (i guess sign, right?) how seek utf-16 in python?

well, error message telling why: it's not reading byte order mark. byte order mark @ beginning of file. without having read byte order mark, utf-16 decoder can't know order bytes in. apparently lazily, first time read, instead of when open file -- or else assuming seek() starting new utf-16 stream.

if file doesn't have bom, that's problem , should specify byte order when opening file (see #2 below). otherwise, see 2 potential solutions:

  1. read first 2 bytes of file bom before seek. seem didn't work, indicating perhaps it's expecting fresh utf-16 stream after seek, so:

  2. specify byte order explicitly using utf-16-le or utf-16-be encoding when open file.


Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -