Algorithm Design and Problem Solving

 

Project 6: Kill the plagiarists!

 

Hand-out: 2.11., deadline: 22.11.

 

Story: You are an editor of a scientific and technical journal. There appeared a complaint on one of your articles – it is claimed that the author already published a substantial part of this text under a different name elsewhere. You have both articles on your desk now and you are to decide whether it is true. As the topic of the article is not your cup of tea, you would rather do it by computer than by hand. How to do it?

 

The task based on this story is as follows:

 

  1. Given two text documents T and T‘, find the longest common substring, i.e., the longest character string from the 1st document such that it appears also in the 2nd document. Position of both strings in the document can be arbitrary. Implementation, demonstration on examples and description of the used method in documentation will be awarded by 5 points in case of brute force approach and 10 points in case of some more effective method. If you were able to use your program on a real text from some article for a journal and the program will be able to skip the figures, you will get extra 5 points.
  2. If your solution will be usable also for the task where character insertion, deletion and change is allowed, you will get other 3 points.