compression - Trying to figure out a basic pattern finding algorithm -

- September 15, 2012

long story short, have data need find patterns in. data (each character represents immutable chunk): dabababacdacdacdab

i want able break data following kinds of chunks:

d (repeated 1x) ab (repeated 3x) (1x) cda (3x) b (1x)

i'm familiar basic run length encoding, don't know how when length of "thing" may variable (in example, cda 3 chunks of data, d 1 chunk). making sense? appreciated.

the main difficulty here "greediness" of algorithm , ambiguity can produce. given example, rule tell program not represent string as:

d x 1 ababab x 1 <<<<<----- here's problem cda x 1 b x 1

or worse yet: d x 1 abababcdab x 1

you see mean. you need establish set of rules algo follow, , code kinda write itself. t gain insight on this, might try looking @ of regular expression parsing code in grep.c, although may more advanced need.

as start, consider algorithm that: 1. limits how far ahead scan (i.e. set maximum substring length) 2. favors longer substrings, subject #1

for example, take "aaaaaaaaaaaaaaaa" (which 16 a's). be: x 16 aa x 8 aaaa x 4 aaaaaaaa x 2 aaaaaaaaaaaaaaaa x 1

if set maximum match length of 4, , favor longest match, answer unambiguous: aaaa x 4. can write algorithm.

Search This Blog

C A N B

compression - Trying to figure out a basic pattern finding algorithm -

Comments

Post a Comment

Popular posts from this blog

php - How can I edit my code to echo the data of child's element where my search term was found in, in XMLReader? -

iphone - Using nested NSDictionary with Picker -

javascript - Iterate over array and calculate average values of array-parts -