Tuesday, December 30, 2008

How Similar Are Two Text Files?

Suppose you are doing a job hunt (like I'm doing right now) and after applying for quite a few of jobs you see you are loosing track of jobs you have already applied for. The company name is often not shown, since most recruiting/staffing agencies don't reveal it in their job postings. The job titles are often very similar. After inquiring 2 or 3 times a recruiting company about a job you have already been submitted for by another staffing agency (and after you have already wasted your time on duplicate application process), you just wish you'd bee able to quickly detect that the job description you are looking at is very similar to one you have applied to 3 weeks earlier. What do you do? Using the diff isn't very useful, since the job descriptions are often slightly modified by the staffing agency before posting. You want to search for a similar instead of identical text.

I have found that there is research on this topic. The best short-and-sweet summary, I have found is on Y! answers. From the tools mentioned there, I chose the SIM tool by Dick Grune. DOS binary is available there, but the trick was to select command parms that will fit best comparing two html files. I have found that the following combination gives the most to the point answer: sim_text.exe -nT -r 100 job1.html job2.html. It will show only relatively large common sequences (over 100 chars), and if it does show any of those, you better check that the two files don't correspond to the same job opening.

I have also tried stripping the html tags from these html files using Lynx browser with the -dump option. DOS binaries for Lynx, after some digging, I was able to find here. I had also to create the following lynx.bat:

set home=c:\bin\lynx\temp
set temp=c:\bin\lynx\temp
set lynx_cfg=c:\bin\lynx\lynx.cfg
set lynx_save_space=c:\bin\lynx\temp
c:\bin\lynx\lynx.exe %1 %2 %3 %4 %5

The results were not convincing. Actually, for some reason, the file similarity was less obvious when using the stripped files than when I used the original html files. Not sure why. I haven't analyzed this issue.

Monday, December 29, 2008

Struts, Hibernate Development with Eclipse

I'm developing an web application to test algorithms for Romba-like devices. I have some algorithms in mind, that I want to test, so I decided I develop an application for the purpose of testing these algorithms or any algorithms.

I wanted to use Java standard frameworks, for now Struts and Hibernate and Eclipse as the. It took me some time to figure out what free plugins to use with Eclipse to add support to Struts and Hibernate. I looked for some time and finally found the following set of plugins:
For DB development support I have used three:
  • DB Development plugin (from Eclipse.org)
  • QuantumDB
  • Derby plugin (from Eclipse.org)
Probably the first of these would be enough.

A few notes:
  • Tomcat plugin doesn't come with a documentation, but the website has a decent one. Plugin works nicely. Starts Tomcat in debug mode, unless user disables this feature in plugin preferences. Switch to Debug perspective and you can debug your application.
  • QuantumDB is a well known plugin, but I had problems with it. I couldn't find in it the tables that were in the database. Possibly, I'll learn how to use it. DB Development plugin is easier to learn it.
I'm still in early phase of the development. I'll update this blog when I have more experience with these plugins.

Sunday, December 28, 2008

How to dry wet shoes or boots

Materials needed:
  • wet shoes
  • old newspaper
Do you have shoes that are wet and you need to dry them as quickly as possible? Unfortunately, putting them in a warm place, even near the heating stove isn't a very quick way to dry them. I had often shoes staying overnight in a warm place like this and they were still wet in the morning. So, here's what you do:
  • Tear a piece of the newspaper and make a ball out of it.
  • Push the paper ball into the shoe, as far as you can.
  • Put a few more paper balls (say, 2-3 per shoe) into your shoes
  • Place the shoes in a warm place.
Even, if the only warm place is your living room, your shoes should be dry after 4-6 hours. Definitly, the next morning.

Enjoy your dry shoes!