[Show all top banners]

Saajha
Replies to this thread:

More by Saajha
What people are reading
Subscribers
Subscribers
[Total Subscribers 1]

Slackdemic
:: Subscribe
Back to: Computer/IT Refresh page to view new replies
 --Comparing Text Strings--
[VIEWED 6538 TIMES]
SAVE! for ease of future access.
Posted on 08-14-09 2:13 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

I have two text files to compare: file A and  file B

file A is nicely formatted with sections, headings etc, for better visibility

file B is a linear raw list of strings (really a bunch of machine names)

I am trying to compare file A and file B, and locate the strings in each file that don't exist in the other, and vice versa-- in other words, identify unique strings in each file.

UNIX utility *diff* works great, so do Windows tools like 'ExamDiff', 'CompareIt!' etc; but they only compare a single occurence of each string pair, and ignore the rests.

For instance, I have

List A        List B
------       ------
abc          bcd
def          def
def          ijk    
ghi           jkl

The result will be:

List A        List B
------        ------
abc           bcd
def           ijk
ghi            jkl

(Note that the eliminated strings were the ones that followed One-to-One matching)

While the expected result is:

List A        List B
------      ------
abc         bcd
ghi          ijk
              jkl

With both occurences of 'def' being eliminated - with One-to-many comparison.

Can anyone suggest a solution? A tool or an script logic?

~@~

 
Posted on 08-14-09 4:37 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

try http://www.scootersoftware.com/download.php
Last edited: 14-Aug-09 04:42 PM

 
Posted on 08-14-09 4:59 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

Just installed and tried it -- still the same issue ..it does the comparison, but only for a single occurrence. I haven't had a chance to look at the options yet though.

Thanks!

~@~

 
Posted on 08-14-09 7:07 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

Download the 30 day free trail of arexis merge. This tool works great.
http://www.araxis.com/merge/

 
Posted on 08-18-09 12:19 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

Have you tried this tool for similar purpose?
Will take a look. Thanks!

BTW, I was able to get it done on Excel (underrated, but worked great) by playing around with logical comparison formulas.

~@~

 
Posted on 08-18-09 1:34 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

Create two text files.
x.txt with
abc
def
def
ghi

and y.txt with
bcd
def
ijk
jkl

Issue following commands (at cygwin prompt)
//sort and copy unique elements of x to x1
sort -u x.txt > x1.txt
//sort and copy unique elements of y to y1
sort -u y.txt > y1.txt

//copy lines that appear in the both x1 and y1 to z
comm -1 -2 x1.txt y1.txt >z.txt

//output lines that appear in x1 only
comm -2 -3 x1.txt z.txt

//output lines that appear in y1 only
comm -2 -3 y1.txt z.txt

Works for this limited dataset.
Give it a try.




 
Posted on 08-18-09 4:37 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

Thanks @gidilat, for the advice, and welcome to the forum - if you are a new member :-)

Just looked through the commands and spotted a minor gotcha..

When you do sort -u and sort the list in order by unique elements, you'd actually get rid of multiple occurrences of each element. Once that's done, it's really a one to one comparison, no?

In the above scenario, ALL occurrences of def on list A were eliminated - as they matched def on list B.
But if I had, say abc listed twice in list A and did not exist at all in list B, sort -u would remove the second instance of abc in list A, correct? But the goal is to have every single occurrence of abc in list A if it's not present in list B.

Thanks again for chiming in.
~@~

 
Posted on 08-18-09 7:02 PM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

You are changing the goal from
"in other words, identify unique strings in each file"
to
"goal is to have every single occurrence of abc in list A if it's not present in list'

A problem needs to defined properly before it can be solved.

 
Posted on 08-19-09 11:08 AM     Reply [Subscribe]
Login in to Rate this Post:     0       ?    
 

Hmm, so - I said:
"...the goal is to have every single occurrence of abc in list A if it's not present in list B"

Doesn't that leave each list's strings/elements unique to those of the other list (and hence "...unique strings in each file")?

When we compare two files and speak about 'uniqueness', I'd think the reference would be toward uniqueness with respect to each other, and not within oneself.

Regardless, I'll buy the fact that 'unique strings in each file with respect to each other' or something similar would've made it little more descriptive. Sorry -- Thanks!

So, just for the heck of it - I tried comm without sorting and isolating the unique elements, works just as good as any other alternatives (except Excel) that I tried in the past. It does the comparison, but only gets rid of a single occurrence of the match.

~@~

 


Please Log in! to be able to reply! If you don't have a login, please register here.

YOU CAN ALSO



IN ORDER TO POST!




Within last 365 days
Recommended Popular Threads Controvertial Threads
Lets play Antakshari...........
शीर्षक जे पनि हुन सक्छ।
NRN card pros and cons?
What are your first memories of when Nepal Television Began?
TPS Re-registration
Democrats are so sure Trump will win
is Rato Bangala school cheating?
What Happened to Dual Citizenship Bill
Basnet or Basnyat ??
We live in precarious times
H1B fraud
nrn citizenship
इन्दिरा जोशीको चिन्ता लौच
Sajha has turned into MAGATs nest
Nas and The Bokas: Coming to a Night Club near you
श्राद्द
सेक्सी कविता - पार्ट २
डलराँ कमाएर ने .रु मा उडांउदा !@#
ChatSansar.com Naya Nepal Chat
Why always Miss Nepal winner is Newari??
Nas and The Bokas: Coming to a Night Club near you
राजदरबार हत्या काण्ड बारे....
Mr. Dipak Gyawali-ji Talk is Cheap. US sends $ 200 million to Nepal every year.
Harvard Nepali Students Association Blame Israel for hamas terrorist attacks
TPS Update : Jajarkot earthquake
is Rato Bangala school cheating?
NOTE: The opinions here represent the opinions of the individual posters, and not of Sajha.com. It is not possible for sajha.com to monitor all the postings, since sajha.com merely seeks to provide a cyber location for discussing ideas and concerns related to Nepal and the Nepalis. Please send an email to admin@sajha.com using a valid email address if you want any posting to be considered for deletion. Your request will be handled on a one to one basis. Sajha.com is a service please don't abuse it. - Thanks.

Sajha.com Privacy Policy

Like us in Facebook!

↑ Back to Top
free counters