string - Forcing a pass-by-reference in Python -
boy not understanding python pass-by-reference issues... have created extremely useful "unpacker" class pass around various objects need unpack it, yet given how extraordinarily slow is, can tell it's making copy of binarystr each time pass bu object. know because if break bu smaller chunks, runs, literally, 100x faster (i using hold 16mb file i/o buffer)
so question is, why member not getting passed reference, , there way force to? pretty sure bu object passed reference (since code works), speed suggests .binarystr object copied. there more subtle i'm missing?
class binaryunpacker(object): def __init__(self, binarystr): self.binarystr = binarystr self.pos = 0 def get(self, vartype, sz=0): pos = self.pos if vartype == uint32: value = unpack('<i', self.binarystr[pos:pos+4])[0] self.pos += 4 return value elif vartype == uint64: value = unpack('<q', self.binarystr[pos:pos+8])[0] self.pos += 8 return value elif vartype == var_int: [value, nbytes] = unpackvarint(self.binarystr[pos:]) self.pos += nbytes ....
the use case along lines of :
def unserialize(self, tounpack): if isinstance(tounpack, binaryunpacker): budata = tounpack else: # assume string budata = binaryunpacker(tounpack) self.var1 = budata.get(var_int) self.var2 = budata.get(binary_chunk, 64) self.var3 = budata.get(uint64) self.var4obj = anotherclass().unserialize(budata)
thanks help.
the copies made when slice string substring. example:
[value, nbytes] = unpackvarint(self.binarystr[pos:])
this create copy of string index pos
end, can take time long string. faster if can determine number of bytes need before taking substring, , use self.binarystr[pos:pos+nbytes]
, since taking small substring relatively fast.
note time depends on length of substring, self.binarystr[pos:pos+4]
should take same amount of time regardless of length of self.binarystr
.
Comments
Post a Comment