python - Organizing XML data into dictionaries -
i'm trying organize data dictionary format xml data. used run monte carlo simulations.
here example of couple of entries in xml like:
<retirement> <item> <low>-0.34</low> <high>-0.32</high> <freq>0.0294117647058824</freq> <variable>stock</variable> <type>historic</type> </item> <item> <low>-0.32</low> <high>-0.29</high> <freq>0</freq> <variable>stock</variable> <type>historic</type> </item> </retirement>
my current data sets have 2 variables , type can 1 of 3 or possible 4 discrete types. hard coding 2 variables isn't problem, start working data has many more variables , automate process. goal automatically import xml data dictionary able further manipulate later without having hard code in array titles , variables.
here have:
# import xml parser import xml.etree.elementtree et # parse xml directly file path tree = et.parse('xmlfile') # create iterable item list items = tree.findall('item') # create master dictionary masterdictionary = {} # assign variables dictionary item in items: thiskey = item.find('variable').text if thiskey in masterdictionary == false: masterdictionary[thiskey] = [] else: pass thislist = masterdictionary[thiskey] newdatapoint = datapoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) thissublist.append(newdatapoint)
i'm getting keyerror @ thislist = masterdictionary[thiskey]
i trying create class deal of other elements of xml:
# define class each data point contains low, hi , freq attributes class datapoint: def __init__(self, low, high, freq): self.low = low self.high = high self.freq = freq
would able check value like:
masterdictionary['stock'] [0].freq
any , appreciated
update
thanks john. indentation issues sloppiness on part. it's first time posting on stack , didn't copy/paste right. part after else: in fact indented part of loop , class indented 4 spaces in code--just bad posting here. i'll keep capitalization convention in mind. suggestion indeed worked , commands:
print masterdictionary.keys() print masterdictionary['stock'][0].low
yields:
['inflation', 'stock'] -0.34
those indeed 2 variables , value syncs xml listed @ top.
update 2
well, thought had figured 1 out, careless again , turns out hadn't quite fixed issue. previous solution ended writing of data 2 dictionary keys have 2 equal lists of data assigned 2 different dictionary keys. idea have distinct sets of data assigned xml matching dictionary key. here current code:
# import xml parser import xml.etree.elementtree et # parse xml directly file path tree = et.parse(xml file) # create iterable item list items = tree.findall('item') # create class historic variables class datapoint: def __init__(self, low, high, freq): self.low = low self.high = high self.freq = freq # create master dictionary , variable list historic variables masterdictionary = {} thislist = [] # loop assign variables dictionary keys , associate values them item in items: thiskey = item.find('variable').text masterdictionary[thiskey] = thislist if thiskey not in masterdictionary: masterdictionary[thiskey] = [] newdatapoint = datapoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) thislist.append(newdatapoint)
when input:
print masterdictionary['stock'][5].low print masterdictionary['inflation'][5].low print len(masterdictionary['stock']) print len(masterdictionary['inflation'])
the results identical both keys ('stock' , 'inflation'):
-.22 -.22 56 56
there 27 items stock tag in xml file , 29 tagged inflation. how can make each list assigned dictionary key pull particular data in loop?
update 3
it seems work 2 loops, have no idea how , why won't work in 1 single loop. managed accidentally:
# import xml parser import xml.etree.elementtree et # parse xml directly file path tree = et.parse(xml file) # create iterable item list items = tree.findall('item') # create class historic variables class datapoint: def __init__(self, low, high, freq): self.low = low self.high = high self.freq = freq # create master dictionary , variable list historic variables masterdictionary = {} # loop assign variables dictionary keys , associate values them item in items: thiskey = item.find('variable').text thislist = [] masterdictionary[thiskey] = thislist item in items: thiskey = item.find('variable').text newdatapoint = datapoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) masterdictionary[thiskey].append(newdatapoint)
i have tried large number of permutations make happen in 1 single loop no luck. can of data listed both keys--identical arrays of data (not helpful), or data sorted correctly 2 distinct arrays both keys, last single data entry (the loop overwrites each time leaving 1 entry in array).
you have serious indentation problem after (unnecessary) else: pass
. fix , try again. problem occur sample input data? other data? first time around loop? value of thiskey
causing problem [hint: it's reported in keyerror error message]? contents of masterdictionary before error happens [hint: sprinkle few print
statements around code]?
other remarks not relevant problem:
instead of if thiskey in masterdictionary == false:
consider using if thiskey not in masterdictionary:
... comparisons against true
or false
redundant and/or bit of "code smell".
python convention reserve names initial capital letter (like item
) classes.
using 1 space per indentation level makes code illegible , severely deprecated. use 4 (unless have reason -- i've never heard of one).
update wrong: thiskey in masterdictionary == false
worse thought; because in
relational operator, chained evaluation used (like a <= b < c
) have (thiskey in masterdictionary) , (masterdictionary == false)
evaluate false, , dictionary never updated. fix suggested: use if thiskey not in masterdictionary:
also looks thislist
(initialised not used) should thissublist
(used not initialised).
Comments
Post a Comment