utf 16 - utf-16 file seeking in python. how? -
for reason can not seek utf16 file. produces 'unicodeexception: utf-16 stream not start bom'. code:
f = codecs.open(ai_file, 'r', 'utf-16') seek = self.ai_map[self._cbclass.text] #seek valid int f.seek(seek) while true: ln = f.readline().strip()
i tried random stuff first reading stream, didnt help. checked offset seeked using hex editor - string starts @ character, not null byte (i guess sign, right?) how seek utf-16 in python?
well, error message telling why: it's not reading byte order mark. byte order mark @ beginning of file. without having read byte order mark, utf-16 decoder can't know order bytes in. apparently lazily, first time read, instead of when open file -- or else assuming seek()
starting new utf-16 stream.
if file doesn't have bom, that's problem , should specify byte order when opening file (see #2 below). otherwise, see 2 potential solutions:
read first 2 bytes of file bom before seek. seem didn't work, indicating perhaps it's expecting fresh utf-16 stream after seek, so:
specify byte order explicitly using
utf-16-le
orutf-16-be
encoding when open file.
Comments
Post a Comment