Apache SpamAssassin celebrates its 18th birthday this year, a huge accomplishment for everyone who has contributed to the open-source project for nearly the past two decades. SpamAssassin, a renowned and respected open-source anti-spam platform, provides a secure, reliable framework upon which companies can build highly effective spam filtering and email security solutions.

The project is the epitome of an open source success story: expert engineers and developers volunteered their time to combat the unsolicited email problem. The team demonstrated innovation, leadership and perseverance in the face of both success and adversity. Along the way, they incorporated enterprise functionality into the platform they had created as a means to solve real-world issues. 

Kevin McGrail, a cyber security and privacy expert and one of the lead developers for the SpamAssassin project since 1996, also considers SpamAssassin an open source success story, stating in a recent conversation with the LinuxSecurity team, “It protects millions of users every day and provides the inspiration if not the foundation of numerous commercial solutions for battling spammers.” Over the years McGrail has served as a developer, administrator, project chair and release manager for the SpamAssassin project. He is still involved with the project to this day. McGrail is also Director of Business Growth at InfraShield.com and serves as a Top Contributor, Developer Expert and Evangelist for Google G Suite.

The History of SpamAssassin: How an Ingenious Idea Evolved into a World-Renowned Anti-Spam Platform

SpamAssassin was created by Justin Mason, a software engineer who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic. Mason rewrote all of Jeftovic’s code and uploaded the rewritten codebase to SourceForge on April 20, 2001. At the time, spam email was becoming increasingly problematic and no real tools existed to effectively combat it. Bill Cole, one of the lead developers involved in the SpamAssassin project, recalls, “2001 was a low point in the ‘arms race’ against spam and new tools were needed.” Engineers and developers saw potential in the SpamAssassin project and began to get involved. In the summer of 2004, Spamassassin became an Apache Software Foundation project and was officially renamed Apache SpamAssassin

Support and critique provided by the open source community drove rapid innovation and notable improvements during the project’s initial years. In an interview with the LinuxSecurity team, Bill Cole explains that he was impressed by the project’s rapid evolution, and that his outlook on the project changed drastically as he got involved. Cole was initially highly skeptical of the core mechanisms of SpamAssassin on both ethical and technical grounds. He admits that he was not an early fan due to “some ill-considered rules and sarcastic commentary.” However, by 2004 a combination of Cole’s experience with other tools and techniques to fight spam in corporate environments and improvements that had been made to the SpamAssassin project converted him from a heckler to a user. In 2018, Cole was invited to join the Apache PMC and has served there ever since.

Over the past decade, SpamAssassin has evolved into a well-known anti-spam platform utilized by companies worldwide. The project now has 32 committers and 13 PMC members, and the radical transparency required of ASF projects provides a reputation of trustworthiness that the pre-ASF SpamAssassin had a hard time earning. Over the years, SpamAssassin has evolved significantly, still leveraging the scoring and rule framework that have made it successful and future-proof.

SpamAssassin: A Highly Effective Open-Source Scoring Framework with Enterprise Functionality

SpamAssassin does not simply block or accept mail; it analyzes it. Each message is given both a binary spam/not-spam decision and a simple numeric score indicating how strongly it looks like spam or ham (a.k.a. non-spam). The program operates on the principle that there is no single definitive mechanism to identify spam. Rather, it has a modular plugin architecture that supports a wide range of independent operations that can be correlated to the spam/ham classification. These operations include Bayesian classification (which utilizes Artificial Intelligence and Machine Learning), local history of similar messages, querying of shared reputation systems such as traditional DNSBLs and databases of URLs seen in spam, and identification of patterns in message headers, MIME structure, raw data and rendered content. These mechanisms are used to define "rules," each for a specific characteristic of a message. Each rule has its own score value (positive or negative) and messages are classified as spam or ham based on the sum of the scores of all rules that they match. “Mass-check” is a tool that SpamAssassin uses to maintain the quality and scoring of its default ruleset. It determines which rules are worth promoting as active.

Open-source development has had a significant influence on SpamAssassin’s ability to provide companies with a highly flexible, scalable and effective framework for filtering spam. Unlike proprietary anti-spam platforms, SpamAssassin’s open-source, enterprise-grade code is available at no charge. Moreover, the scoring framework that SpamAssassin offers is supported by a knowledgeable, passionate community of mail server experts that help the developers in creating new rules and in developing new ideas that could improve the platform. McGrail summarizes the benefits of open-source development: “Open Source is about controlling your destiny and limiting risk. SpamAssassin is always available and the source code is there for anyone to modify.”

ISPs and email security providers recognize and respect SpamAssassin’s transparency and effectivity. However, it is important to note that while SpamAssassin is a great piece of software, it must be implemented as part of a comprehensive email security gateway solution in order to effectively mitigate the risk and aggravation associated with spam email in the enterprise. 

Guardian Digital uses SpamAssassin’s framework as an element of its multi-tiered, open-source EnGarde Email Security Gateway. SpamAssassin’s scoring platform is a critical part of EnGarde’s spam filtering method. If SpamAssassin’s software indicates that a message resembles spam, EnGarde quarantines the email, preventing it from reaching the inbox. SpamAssassin works in conjunction with multiple other advanced security features to make EnGarde Email Security Gateway highly effective at identifying and blocking spam email, while keeping the rate of false positives impressively low. Guardian Digital CEO and lead architect Dave Wreski states, “Email security is all about defense in depth. No one feature or piece of software alone is enough to protect against sophisticated threats that constitute today’s email threat landscape. However, SpamAssassin’s scoring platform is definitely a key element of our EnGarde Email Security Gateway.” Wreski, who was working as a security engineer at UPS at the time, founded Guardian Digital in 1999 as a means of solving real-world digital problems with open-source software at a level capable of supporting the most intensive enterprise security demands. The company has since narrowed its focus from Internet security to email security, and has evolved into the premier open-source email security provider, successfully meeting the security needs of businesses worldwide.

The Future of SpamAssassin: Upcoming Releases, Exciting New Features and Impressive Performance Improvements 

The future is bright for SpamAssassin, as well as providers and customers benefiting from the project’s valuable technology. Currently, SpamAssassin developers are working hard to finalize v3.4.3 with mostly bug and security fixes, along with as few new features and performance improvements. However, v4.0.0 is where the team is putting the majority of the new features that they are in the process of developing. These features and improvements include:

  • Comprehensive Unicode and IDN support
  • Unified common interface to all supported "GeoIP" backends 
  • More consistent logging format
  • Asynchronous calls to remote services (Razor, Pyzor, DCC) 
  • Additional filtering plugin using "AI" principles
  • Automated rule generation subsystem revived

There is no set date or feature set defined for the 4.0.0 release; however, members of the project’s PMC indicate that it is approaching and will be well worth the wait. Giovanni Bechis, a security expert, OpenBSD enthusiast, international speaker and one of the lead developers for the SpamAssassin project, elaborates, “Both our last and our upcoming release have lots of improvements and new features, including antiphishing and antimalware technologies. SpamAssassin is a R&D project, so a lot of the technologies that are used to improve and become more efficient with every release.” 

The past two years or so have been a period of renewed forward movement for the Apache SpamAssassin project. And, with SpamAssassin’s 20th anniversary on the horizon, this momentum shows no signs of slowing down. McGrail reflects, “I’m proud that we have a stable and mature project that still helps people every day!”