Abstract
The reliability of plagiarism detection sytems, which try to
identify similar programs in large populations, is critically
dependent on the choice of program representation. Software
metrics conventionally used as representations are described,
and the limitations of metrics adapted from software complexity
measures are outlined. An application-specific metric is proposed,
one that represents the structure of a program as a variable-
length profile. Its constituent terms, each recording the control
structures in a program fragment, are ordered for efficient
comparison. The superior performance of the plagiarism detection
system based on this profile is reported, and deriving complexity
measures from the profile is discussed.