Although the technique is no good for tackling larger sections of text, it does show that officials need to be more careful with their sensitive documents. Naccache argues that the most important conclusion of this work "is that censoring text by blotting out words and re-scanning is not a secure practice". . . .. An Irish graduate student has uncovered words blacked-out of declassified US military documents using nothing more than a dictionary and text analysis software. Claire Whelan, a computer science student at Dublin City University was given the problems by her PhD supervisor as a diversion. David Naccache, a cryptographer with Gemplus, challenged her to discover the words missing from two documents: one was a memo to George Bush, and another concerned military modifications to civilian helicopters. The process is quite straightforward, and according to Naccache, Whelan's success proves that merely blotting words out of declassified documents will not keep the contents secret. The link for this article located at TheRegister is no longer available. . An American postgraduate researcher uncovered hidden phrases in released UK intelligence files using linguistic and analytical methods.. Text Analysis, Military Secrets, Data Security, Document Censorship, Sensitive Information. . LinuxSecurity.com Team
I was reading through a bioinformatics book the other day, and was reminded of a useful shortcut for comparing a text against various corpora. A number of researchers have simply fed DNA sequence data into the popular Ziv-Lempel compression algorithm, to . . . . I was reading through a bioinformatics book the other day, and was reminded of a useful shortcut for comparing a text against various corpora. A number of researchers have simply fed DNA sequence data into the popular Ziv-Lempel compression algorithm, to see how much redundancy it contains. Loosely speaking, the LZ (Zip) and the related gzip compression algorithms look for repeated strings within a text, and replace each repeat with a reference to the first occurrence. The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text. A related technique allows us to measure how much a given, "test" text has in common with a corpus of possibly similar documents. If we concatenate the corpus and the test text, and gzip them together, the test text will get a better compression ratio if it has more fragments, words, or phrases in common with the corpus, and a worse ratio if it is dissimilar. Since the LZ algorithm scans the entire input for repetitions, it tends to map pieces of the test text to previous occurrences in the corpus, thereby achieving a high "appended compression ratio" if the test text is similar to what it's appended to. . I was reading through a bioinformatics book the other day, and was reminded of a useful shortcut for. reading, through, bioinformatics, other, reminded, useful, shortcut. . LinuxSecurity.com Team
Get the latest Linux and open source security news straight to your inbox.