c# - how to locate and store character positions in a text file -


i trying create lexicographically sorted index of words along position in text file.

with of experts in forum able create lexicographically sorted index of words. need storing position of lexicographically sorted index of words

this have far:- text file (sometextfile.txt) containing data follows:- "this sample text file"

        private const string filepath = @"d:\sometextfile.txt";         using (streamreader sr = file.opentext(filepath))         {             string input;             //dictionary store position of characters in file long , lexicographically sorted value string             var parts = new dictionary<long,string>();              while ((input = sr.readline()) != null)             {                 string[] words = input.split(' ');                 foreach (var word in words)                 {                     var sortedsubstrings =                         enumerable.range(0, word.length)                             .select(i => word.substring(i))                             .orderby(s => s);                parts.addrange(<store position of character>, sortedsubstrings);                  }             }         } 

using readline loses critical information position in file, if intend position byte position can seek to. end of line marked carriage return (\r) or line feed (\n) or both, kind of need know how many bytes @ end of line. it's possible (depending on encoding of text file) characters represented varying numbers of bytes, may need handle. suggest reading file @ lower level can track position.

var parts = new dictionary<long,string>(); using (system.io.streamreader sr = new system.io.streamreader(myfile)) {    var sb = new system.text.stringbuilder();    long currentposition = 0;    long wordposition = 0;    bool wordstarted = false;    int nextcharnum = sr.read();    while (nextcharnum >= 0)    {       char nextchar = (char)nextcharnum;       switch(nextchar)       {          case ' ':          case '\r':          case '\n':             if (wordstarted)             {                parts[wordposition] = sb.tostring();                sb.clear();                wordstarted = false;             }             break;          default:             sb.append(nextchar);             if (!wordstarted)             {                wordposition = currentposition;                wordstarted = true;             }             break;       }       currentposition += sr.currentencoding.getbytecount(nextchar.tostring());       nextcharnum = sr.read();    }    if (wordstarted)       parts[wordposition] = sb.tostring(); } foreach (var de in parts) {    console.writeline("{0} {1}", de.key, de.value); } 

Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -