c# - Sorting algorithm for large set of Server and Path -
Working in C #, I would like to write an efficient sorting algorithm that will take as a text file that includes the serial list of the server and the path combination and the output file sorted.
As an exercise, I am working under the impression that the size of the input data will be more than the available memory, so I am thinking of reading the file one part in the memory at one time, one Quick type (or a heap sort, maybe?) Is doing, outputting sorted partition in temporary files, then sorting a merge to produce the final output.
The format of the input file is in my discretion. This can be just a list of UNC paths (server and path as a string) or with CSV as separate fields with server and path.
My question is, is there any benefit to this? In my data structure, the server and path can be different bodies and evaluate them separately?
Due to server and path separation, comparison of server names will be finished while running a path comparison, but requiring additional runs to be sorted by the server and, seeing the available memory constraints, I Sorted cache will need server listings, increase disk IO overhead.
Can any technology take advantage of optimizing the performance of such an application by providing server and path as separate fields in my input?
Any other customization techniques that I might consider considering the nature of the dataset?
EDIT: This is a one time task. I do not need to see entries later
I am thinking of reading the file One piece at a time, a quick type (or a heap sort, maybe?) Are sorted, sorting out the temporary files, sorting a merge to produce the final output.
This is a perfectly appropriate plan
An alternative solution would be: Create an on-disk B-tree, and in B-tree all your data at one time Enter a record. You do not need more than a few pages of B-tree in memory and you can read the records at any one time from the ordered list. Once it is in B-tree, then read it back in the order.
Separating the server and path will end the comparison of the server names while running a path comparison, but sorting will require additional runs by the server and, considering the available memory constraints , I need to cache the sorted server lists, the disk IO overhead increases.
OK.
My question is, can any data from the server and the path be different from my data structure and can be evaluated separately?
You have just said that what are the pros and cons you have already listed them, if you already know the answer, then why are you asking this question?
Is there a technology that can take advantage of optimizing the performance of such applications by providing server and path as separate fields in my input?
Probably, yes
How do I know?
Write and run the code in both ways.
Any other optimization technique that can consider the nature of the dataset?
Your question and speculation are ahead of time.
Start by defining a display goal.
Then apply the code clearly and accurately as you can.
Then take a careful look to see if you fulfill your goal.
If you have done so, knock it quickly and go to the beach.
If you did not, then get a profiler and use it to find the worst performance part to analyze your program. Then customize that part.
Unless you accomplish your goal, or you continue to give up.
Comments
Post a Comment