On Lisa Rein's Radar: Spam Tech Archives

home > archives > Spam Tech

January 12, 2004

Alright, Alright, The Comments Will Stay

After receiving numerous letters and offers of solutions and technical expertise, I've decided to implement a few technical solutions rather than taking comments down completely.

Apparently, my blog without comments is a completely unacceptable scenario for most of you. This warms my heart, for it is indeed a cornerstone of this community that anyone can post and contribute to the discussion.

So this week I'll be implementing the MT-Blacklist module, among other techniques that have been passed on to me. I'm going to document the process from beginning to end -- even though doing so my help the spammers try to beat it. We've all got to work together and learn from each other on this one.

Talk soon...

lisa

ps. I'm still getting my act together in general this month, but the Daily Show Clips, Bill Moyers Clips, and other goodies will be up soon!

Posted by Lisa at 08:10 AM

August 05, 2003

Lou Katz, Cindy Cohn, Craig Newmark And A Ton O' Spam Tech Vendors At The Hillside Club's CyberSalon On Spam

Jeff Ubois and Sylvia Paull put together a CyberSalon On Spam June 15, 2003 at Berkeley's Hillside Club. Craig and Cindy's presentations and the discussion that follows is of particular interest.

Lou Katz On Spam (Small - 27 MB)

Craig Newmark On Spam (Small - 16 MB)

The EFF's Cindy Cohn On Spam (Small - 16 MB)

Follow up w/Craig and Cindy (Small - 7 MB)

Spam Filter Vendor Talks:

IronPort On Spam (Small - 24 MB)

Enrique Salem, CEO of Brightmail, On Spam (Small - 20 MB)

Pavri Diwariji from MailFrontier, On Spam (Small - 9 MB)

Jordan Ritter of Cloudmark, On Spam (Small - 10 MB)

Doug McLean of Postini, On Spam (Small - 12 MB)

Lou Katz (below)

Craig Newmark (below)

Cindy Cohn, EFF (below)

Craig Newmark, Cindy Cohn (below)

Posted by Lisa at 10:25 AM

June 13, 2003

Uber-Spammers and Anti-spam Super Heroes Duke It Out In Berkeley This Sunday

There's a cool panel I'll be going to this Sunday in Berkeley at the Hillside Club.

I don't mean to make the event sound confrontational in my headline. The goal of this panel is to get everybody in one room so we can hear all of the different viewpoints on these issues. Hopefully we'll be willing to listen to one another. It should be pretty interesting.

Members of the press: this would be a good chance for you to spend an hour or so of your time and learn everything you ever wanted to know about spam tech and collect a round of business cards from the participating parties for quotes in the future when this stuff hits the mainstream media over the next few months.

CAN WE STOP SPAM?
A Panel of Spammers, Anti-Spammers, and the Spam-Inflicted Duke It Out

Here's the official description:

We'll hear all sides - including your own - at a revolving panel, which
includes antispam developers Brightmail, Postini, Mail Frontier,
Cloudmark, and ActiveState (from Canada); Internet entrepreneur Gary
Kremen, founder of Match.com and sex.com, who argues that spam is
ineradicable; Paul Goldman, CEO of Markado, an "intelligent" etailer;
EFF Chair Brad Templeton; Craig Newmark, founder of Craigslist; and PC
World's Harry McCracken, whose team has just completed an exhaustive
round-up of anti-spam legislation.

What: Sylvia's and Jeff's CyberSalon
When: SUNDAY, June 15, 2003
Time: 5:30-8:00 p.m.
Where: Hillside Club, 2286 Cedar St. Berkeley

Directions are at the bottom of the full invite.

See you there.

Sylvia's and Jeff's CyberSalon
SUNDAY, June 15, 2003
5:30-8:00 p.m.
Hillside Club, 2286 Cedar St.*
Berkeley

CAN WE STOP SPAM?
A Panel of Spammers, Anti-Spammers, and the Spam-Inflicted Duke It Out

Come join an exhilarating panel discussion covering all aspects of a
growing Internet irritant: spam. Most people treat it like a weed, while
some consider it their bread and butter. Others think a cure is
impossible or potentially - like Agent Orange -- worse than the problem.

We'll hear all sides - including your own - at a revolving panel, which
includes antispam developers Brightmail, Postini, Mail Frontier,
Cloudmark, and ActiveState (from Canada); Internet entrepreneur Gary
Kremen, founder of Match.com and sex.com, who argues that spam is
ineradicable; Paul Goldman, CEO of Markado, an "intelligent" etailer;
EFF Chair Brad Templeton; Craig Newmark, founder of Craigslist; and PC
World's Harry McCracken, whose team has just completed an exhaustive
round-up of anti-spam legislation.

Jeff Ubois, my partner in this event and a member of the Hillside Club -
which he hopes will become a forum for ideas about technology and
society -- will archive the discussion online and also tally votes at
the end when we see which solution you all prefer.

The Cybersalon is open to all, including friends and family. We start at
5:30 with food and drinks (we ask for a $15 donation), and because it's
also Father's Day, anyone who brings a father gets dad in for half
price. The panel discussion starts around 6:15. RSVP to
whoisylvia@aol.com by June 13.

Sylvia (Paull)
Founder, Gracenet, Cybersalons, Nerd Walks

*Directions:

From San Francisco, take Bay Bridge, and merge onto I-80 East.
Exit University Ave. and bear RIGHT UNDER the freeway toward 4th St.
Continue STRAIGHT on frontage road for half a mile
RIGHT onto Cedar
STRAIGHT on Cedar for a couple of miles, past Shattuck and Spruce.

From North Bay, take Richmond Bridge to I-80 and Berkeley.
Exit Gilman St., go straight up past San Pablo and bear LEFT onto Cedar
at fork in road
Straight on Cedar for a couple of miles, past Shattuck and Spruce.

Parking: yes.

Public transport: Get off at Downtown Berkeley BART - you can get a bus,
cab, or hitch/walk/bicycle one mile along Shattuck, then make a right
onto Cedar.

Posted by Lisa at 02:56 PM

May 03, 2003

Lessig On The New Anti-Spam Bill He's Betting His Job On

This is a clip from my local San Francisco news station KTVU, from last Monday night.

(Many apologies for the late turn around - I am swamped people! Swamped!)

Lawrence Lessig is helping Congresswoman Zoe Lofgren to spread the word about the launch of her new anti-spam bill, The Reduced Spam Act of 2003.

The bill would protect a business's right to send correspondence to its existing customer base, while providing a cash bounty to users-at-large who are the first to report spam.

More on this in the days to come -- as you can see I've started a category to start tracking spam legislation as it inches along through the various State Legislatures.

I think this kind of legislation is very important. We need it, but we need to protect the right to send unsolicited email. (I get great unsolicited email all the time, and I probably send even more of it.)

Lofgren's Anti-Spam Bill, KTVU - Channel 2 News 4-28-03 (Small - 6 MB)
Lofgren's Anti-Spam Bill, KTVU - Channel 2 News 4-28-03 (Hi-Res - 97 MB)
Audio - Lofgren's Anti-Spam Bill, KTVU - Channel 2 News 4-28-03 (MP3 - 4 MB)

Posted by Lisa at 01:53 PM

December 06, 2002

Spam Justice

Mike Wendland Interviewed bragging spam king Alan Ralsky for the Detriot Free Press. The article got Slashdotted, and then someone on the discussion list got the idea to spam back. (I can't find the exact thread, but would love to link to it from here.)

Now the Spam King is complaining that...well...that it really sucks to be spammed.

This is a stretch on "spam tech" -- more like good old fashioned grass roots organizing in action, but I think it's interesting that Ralsky doesn't seem to grasp the irony of the situation.

"They've signed me up for every advertising campaign and mailing list there is," he told me. "These people are out of their minds. They're harassing me."

That they are. Gleefully. Almost 300 anti-Ralsky posts were made on the Slashdot.org Web site, where the plan was hatched after spam haters posted his address, even an aerial view of his neighborhood.

"Several tons of snail mail spam every day might just annoy him as much as his spam annoys me," wrote one of the anti-spammers.

Ralsky is indeed annoyed. He says he's asked Bloomfield Hills attorney Robert Harrison to sue the anti-spammers.

Here is the full text of the first article (1 of 2) in case the link goes bad:

http://www.freep.com/money/tech/mwend22_20021122.htm

MIKE WENDLAND: Spam king lives large off others' e-mail troubles

West Bloomfield computer empire helped by foreign Internet servers

November 22, 2002

BY MIKE WENDLAND
FREE PRESS COLUMNIST

You might call it the house that spam built.

Alan Ralsky's brand new 8,000-square-foot luxury home near Halsted and Maple in West Bloomfield has been a busy place this month. Outside, landscapers worked against the November cold to get a sprinkler system installed before the ground freezes. Inside, painters prepared to hang wallpaper.

Meanwhile, delivery trucks pulled into the bricked circular driveway with computers, routers, servers and other high-tech gear that will hook up to the high-speed T1 line installed a few weeks ago.

In the lower level of the home, tucked away in a still-unfinished room, will soon be an array of 20 different computers -- the control center of what many believe is the largest single bulk e-mailing operation in the world.

It's an operation still very much in business, despite last month's much-hyped settlement of a lawsuit against Ralsky by Verizon Internet Services. The suit used Virginia's tough anti-spam laws to get Ralsky to promise to stop using Verizon servers and pay an undisclosed fee for sending out millions of unsolicited e-mails to its customers.

Anti-spam groups and Verizon hailed the settlement as a major victory in the war against spam. But that war still feels far away, down on the lower level of Ralsky's home, where racks of computers instruct scores of other computers halfway around the world to fire off millions of e-mails every day.

Ralsky said the legal fuss and settlement costs were a big hit and that things slowed down for a while. But now, after moving a few weeks ago into his new $740,000 house, he claims he's back in business.

"I've gone overseas," he said. "I now send most of my mail from other countries. And that's a shame. I pay a fortune to providers to do this, and I'd much rather have it go to American companies. But I have to stay in business, and if I have to go out of the country, then so be it."

The computers in Ralsky's basement control 190 e-mail servers -- 110 located in Southfield, 50 in Dallas and 30 more in Canada, China, Russia and India. Each computer, he said, is capable of sending out 650,000 messages every hour -- more than a billion a day -- routed through overseas Internet companies Ralsky said are eager to sell him bandwidth.

All this is bad news to the anti-spam movement.

"He's very sophisticated in his activities," said John Mozena of Grosse Pointe Woods, a founder of the Coalition Against Unsolicited Commercial E-Mail (www.cauce.org), a national spam-fighting organization. "He uses hundreds of domains (Internet addresses) to send his spams."

In London, Steve Linford of the Spamhaus Project (www.spamhaus.org) has monitored Ralsky for several years.

"There are probably about 150 major spammers who are responsible for 90 percent of all the spam everyone gets," said Linford. "Ralsky has been the biggest of them, and is certainly still in the top five."

Ralsky used to be easy to locate, with a listed address and phone number. But his attorney, Robert Harrison of Bloomfield Hills, said Ralsky is so hated by anti-spammers that he's had to be less visible.

"There were threats against him, cars driving by and people checking out his house," Harrison said. "Someone even left a package of what appeared to be dog feces."

Today, Ralsky says he is trying to keep a lower profile, operating through cell phones and unlisted numbers. Ralsky agreed to this interview and the tour of his operation only if I promised not to print the address of his new home, which I found in Oakland County real estate records.

Ralsky admits to using lots of different domain names and Internet providers, but said he does nothing illegal. He prefers to call his e-mails marketing messages instead of spam.

Whatever you call it, unsolicited messages now account for 36 percent of all e-mail, up from just 8 percent a year ago, according to Brightmail, a leading anti-spam software maker.

Ralsky has done his share to account for the increase.

"I'll never quit," said the 57-year-old master of spam. "I like what I do. This is the greatest business in the world."

It's made him a millionaire, he said, seated in the wood-paneled first floor library of his new house. "In fact," he added, "this wing was probably paid for by an e-mail I sent out for a couple of years promoting a weight-loss plan."

Ralsky said he turns down many who want his services.

"I don't do any porn or sexual messages," he said, citing a promise he made to his wife, Irmengard. Instead, he sends e-mail come-ons for things like online casinos, vacation promotions, mortgage refinancing and Internet pharmacies.

Ralsky acknowledges that his success with spam arose out of a less-than-impressive business background. In 1992, while in the insurance business, he served a 50-day jail term for a charge arising out of the sale of unregistered securities. And in 1994, he was convicted of falsifying documents that defrauded financial institutions in Michigan and Ohio and ordered to pay $74,000 in restitution.

He lost his license to sell insurance and he declared personal bankruptcy. But in 1997, he sold a late model green Toyota and used the money to pay back taxes on his house and buy two computers.

A friend had told him about mass marketing on the Internet, and he thought it made sense. He bought a couple of mailing lists from advertising brokers and, with the help of the computers, launched a new career that soon was making him $6,000 a week.

In the lower level of his house, working around a half-dozen computers sitting atop temporary tables, two of Ralsky's associates monitored the operation.

One of them, Ralsky's list man, concentrated on finding new names to add to the 250 million e-mail addresses in his database and weeding out canceled accounts.

The other kept track of current campaigns, connecting with the bank of e-mail servers in Southfield and watching as e-mails scrolled line-by-line in rapid fire down the screen.

"There is no way this can be stopped," Ralsky said. "It's a perfectly legal business that has allowed anybody to compete with the Fortune 500 companies."

Ralsky said he includes a link on each e-mail he sends that lets the recipient opt out of any future mailings. He said 89 million people have done just that over the past five years, and he keeps a list of them that grows by about 1,000 every day. That list is constantly run against his master list of 250 million valid addresses.

Ralsky's list man is named Charlie Brown. That's his real name, he said, describing himself as a native of Louisiana who travels the country working as a consultant to bulk e-mailers, developing custom software called harvesting programs that constantly scour the Internet, gaining access to millions of Web sites and mailing lists every day in search of any and all e-mail addresses.

The response rate is the key to the whole operation, said Ralsky. These days, it's about one-quarter of 1 percent.

"But you figure it out," said Ralsky. "When you're sending out 250 million e-mails, even a blind squirrel will find a nut."

Ralsky makes his money by charging the companies that hire him to send bulk e-mail a commission on sales. He sometimes charges just a flat fee, up to $22,000, for a single mailing to his entire database.

Ralsky has other ways to monitor the success of his campaigns. Buried in every e-mail he sends is a hidden code that sends back a message every time the e-mail is opened. About three-quarters of 1 percent of all the messages are opened by their recipients, he said. The rest are deleted.

From that response, Ralsky can monitor the effectiveness of his pitch and the subject line on the e-mail to make sure he's getting maximum return. He said he spends 18 hours a day on the job.

Ralsky said he's frustrated by attacks on his character by the anti-spammers. Linford said his organization has been getting Internet networks around the world to block mail from any Chinese provider that sends Ralsky e-mail.

"When the Chinese providers contact us to ask why their outgoing mail is blocked, we tell them because of Ralsky, and they pull his plug," said Linford. "He moves on to another provider and it starts all over again."

Earlier this month, said Ralsky, somebody told the Chinese government that a Web company from which he leases e-mail servers in Beijing was sending messages critical of Chinese policy.

Police promptly raided the business and confiscated Ralsky's servers. Although they were returned a few days later, Ralsky now tries to cover his tracks better, so opponents won't know what companies and servers he's using.

Linford said he heard of the raid. "It wasn't us that caused it," he said. "But there are a lot of anti-spam activists, and apparently some of them on their own started organizing a campaign to get the Chinese government to think that Ralsky was supporting" the Falun Gong, an outlawed spiritual group the Chinese government considers subversive. "We didn't endorse that, but it shows you how deep the anti-Ralsky feelings are."

Ralsky, meanwhile, is looking at new technology. Recently he's been talking to two computer programmers in Romania who have developed what could be called stealth spam.

It is intricate computer software, said Ralsky, that can detect computers that are online and then be programmed to flash them a pop-up ad, much like the kind that display whenever a particular Web site is opened.

"This is even better," he said. "You don't have to be on a Web site at all. You can just have your computer on, connected to the Internet, reading e-mail or just idling and, bam, this program detects your presence and up pops the message on your screen, past firewalls, past anti-spam programs, past anything.

"Isn't technology great?"

Contact MIKE WENDLAND at 313-222-8861 or mwendland@freepress.com.

Here is the full text of the second article (2 of 2) in case the link goes bad:

http://www.freep.com/money/tech/mwend6_20021206.htm

MIKE WENDLAND: Internet spammer can't take what he dishes out

December 6, 2002

BY MIKE WENDLAND
FREE PRESS COLUMNIST

West Bloomfield bulk e-mailer Alan Ralsky, who just may be the world's biggest sender of Internet spam, is getting a taste of his own medicine.

Ever since I wrote a story on him a couple of weeks ago (www.freep.com/money/tech/mwend22_20021122.htm), he says he's been inundated with ads, catalogs and brochures delivered by the U.S. Postal Service to his brand-new $740,000 home.

It's all the result of a well-organized campaign by the anti-spam community, and Ralsky doesn't find it funny.

"They've signed me up for every advertising campaign and mailing list there is," he told me. "These people are out of their minds. They're harassing me."

That they are. Gleefully. Almost 300 anti-Ralsky posts were made on the Slashdot.org Web site, where the plan was hatched after spam haters posted his address, even an aerial view of his neighborhood.

"Several tons of snail mail spam every day might just annoy him as much as his spam annoys me," wrote one of the anti-spammers.

Ralsky is indeed annoyed. He says he's asked Bloomfield Hills attorney Robert Harrison to sue the anti-spammers.

Contact MIKE WENDLAND at 313-222-8861 or mwendland@freepress.com.

Posted by Lisa at 12:37 PM

November 24, 2002

Useful Backgrounder On Spam Filtering Techniques

By David Mertz, Ph.D., for IBM DeveloperWorks:
Six approaches to eliminating unwanted e-mail
(Thanks, Cory.)

For purposes of my testing, I developed two collections of messages: spam and legitimate. Both collections were taken from mail I actually received in the last couple of months, but I added a significant subset of messages up to several years old to broaden the test. I cannot know exactly what will be contained in next month's e-mails, but the past provides the best clue to what the future holds. That sounds cryptic, but all I mean is that I do not want to limit the patterns to a few words, phrases, regular expressions, etc. that might characterize the very latest e-mails but fail to generalize to the two types.

In addition to the collections of e-mail, I developed training message sets for those tools that "learn" about spam and non-spam messages. The training sets are both larger and partially disjoint from the testing collections. The testing collections consist of slightly fewer than 2000 spam messages, and about the same number of good messages. The training sets are about twice as large.

A general comment on testing is worth emphasizing. False negatives in spam filters just mean that some unwanted messages make it to your inbox. Not a good thing, but not horrible in itself. False positives are cases where legitimate messages are misidentified as spam. This can potentially be very bad, as some legitimate messages are important, even urgent, in nature, and even those that are merely conversational are ones we do not want to lose. Most filtering software allows you to save rejected messages in temporary folders pending review -- but if you need to review a folder full of spam, the usefulness of the software is thereby reduced.

Here is the full text of the article in case the link goes bad:

http://www-106.ibm.com/developerworks/linux/library/l-spamf.html

IBM developerWorks : Linux | Open source projects : Linux articles | Open source projects articles developerWorks
Spam filtering techniques
79KB e-mail it!
Contents:
Hiding contact information
Looking at filtering software
1. Basic structured text filters
2. Whitelist/verification filters
3. Distributed adaptive blacklists
4. Rule-based rankings
5. Bayesian word distribution filters
6. Bayesian trigram filters
Summary
Resources
About the author

Six approaches to eliminating unwanted e-mail

David Mertz, Ph.D. (mertz@gnosis.cx)
Analyzer, Gnosis Software, Inc.
September 2002
The problem of unsolicited e-mail has been increasing for years, but help has arrived. In this article, David discusses and compares several broad approaches to the automatic elimination of unwanted e-mail while introducing and testing some popular tools that follow these approaches.

Unethical e-mail senders bear little or no cost for mass distribution of messages, yet normal e-mail users are forced to spend time and effort purging fraudulent and otherwise unwanted mail from their mailboxes. In this article, I describe ways that computer code can help eliminate unsolicited commercial e-mail, viruses, trojans, and worms, as well as frauds perpetrated electronically and other undesired and troublesome e-mail. In some sense, the final and best solution for eliminating spam will probably take place on a legal level. In the meantime, however, you can do some things from a code perspective that can serve as an interim solution to the problem, until (if ever) the laws begin to evolve at the same rate as public frustration.

Considering matters technically -- but also with common sense -- what is generally called "spam" is somewhat broader than the category "unsolicited commercial e-mail"; spam encompasses all the e-mail that we do not want and that is only very loosely directed at us. Such messages are not always commercial per se, and some push the limits of what it means to be solicited. For example, we do not want to get viruses (even from our unwary friends); nor do we generally want chain letters, even if they don't ask for money; nor proselytizing messages from strangers; nor outright attempts to defraud us. In any case, it is usually unambiguous whether a message is spam, and many, many people get the same such e-mails.

The problem with spam is that it tends to swamp desirable e-mail. In my own experience, a few years ago I occasionally received an inappropriate message, perhaps one or two each day. Every day of this month, in contrast, I received many times more spams than I did legitimate correspondences. On average, I probably get 10 spams for every appropriate e-mail. In some ways I am unusual -- as a public writer, I maintain a widely published e-mail address; moreover, I both welcome and receive frequent correspondence from strangers related to my published writing and to my software libraries. Unfortunately, a letter from a stranger -- with who-knows-which e-mail application, OS, native natural language, and so on, is not immediately obvious in its purpose; and spammers try to slip their messages underneath such ambiguities. My seconds are valuable to me, especially when they are claimed many times during every hour of a day.

Hiding contact information
For some e-mail users, a reasonable, sufficient, and very simple approach to avoiding spam is simply to guard e-mail addresses closely. For these people, an e-mail address is something to be revealed only to selected, trusted parties. As extra precautions, an e-mail address can be chosen to avoid easily guessed names and dictionary words, and addresses can be disguised when posting to public areas. We have all seen e-mail addresses cutely encoded in forms like "" or "echo zregm@tabfvf.pk | tr A-Za-z N-ZA-Mn-za-m".

In addition to hiding addresses, a secretive e-mailer often uses one or more of the free e-mail services for "throwaway" addresses. If you need to transact e-mail with a semi-trusted party, a temporary address can be used for a few days, then abandoned along with any spam it might thereafter accumulate. The real "confidantes only" address is kept protected.

In my informal survey of discussions of spam on Web-boards, mailing lists, the Usenet, and so on, I've found that a category of e-mail users gains sufficient protection from these basic precautions.

For me, however -- and for many other people -- these approaches are simply not possible. I have a publicly available e-mail address, and have good reasons why it needs to remain so. I do utilize a variety of addresses within the domain I control to detect the source of spam leaks; but the unfortunate truth is that most spammers get my e-mail address the same way my legitimate correspondents do: from the listing at the top of articles like this, and other public disclosures of my address.

Looking at filtering software
This article looks at filtering software from a particular perspective. I want to know how well different approaches work in correctly identifying spam as spam and desirable messages as legitimate. For purposes of answering this question, I am not particularly interested in the details of configuring filter applications to work with various Mail Transfer Agents (MTAs). There is certainly a great deal of arcana surrounding the best configuration of MTAs such as Sendmail, QMail, Procmail, Fetchmail, and others. Further, many e-mail clients have their own filtering options and plug-in APIs. Fortunately, most of the filters I look at come with pretty good documentation covering how to configure them with various MTAs.

For purposes of my testing, I developed two collections of messages: spam and legitimate. Both collections were taken from mail I actually received in the last couple of months, but I added a significant subset of messages up to several years old to broaden the test. I cannot know exactly what will be contained in next month's e-mails, but the past provides the best clue to what the future holds. That sounds cryptic, but all I mean is that I do not want to limit the patterns to a few words, phrases, regular expressions, etc. that might characterize the very latest e-mails but fail to generalize to the two types.

In addition to the collections of e-mail, I developed training message sets for those tools that "learn" about spam and non-spam messages. The training sets are both larger and partially disjoint from the testing collections. The testing collections consist of slightly fewer than 2000 spam messages, and about the same number of good messages. The training sets are about twice as large.

A general comment on testing is worth emphasizing. False negatives in spam filters just mean that some unwanted messages make it to your inbox. Not a good thing, but not horrible in itself. False positives are cases where legitimate messages are misidentified as spam. This can potentially be very bad, as some legitimate messages are important, even urgent, in nature, and even those that are merely conversational are ones we do not want to lose. Most filtering software allows you to save rejected messages in temporary folders pending review -- but if you need to review a folder full of spam, the usefulness of the software is thereby reduced.

1. Basic structured text filters
The e-mail client I use has the capability to sort incoming e-mail based on simple strings found in specific header fields, the header in general, and/or in the body. Its capability is very simple and does not even include regular expression matching. Almost all e-mail clients have this much filtering capability.

Over the last few months, I have developed a fairly small number of text filters. These few simple filters correctly catch about 80% of the spam I receive. Unfortunately, they also have a relatively high false positive rate -- enough that I need to manually examine some of the spam folders from time to time. (I sort probable spam into several different folders, and I save them all to develop message corpora.) Although exact details will differ among users, a general pattern will be useful to most readers:

* Set 1: A few people or mailing lists do funny things with their headers that get them flagged on other rules. I catch something in the header (usually the From:) and whitelist it (either to INBOX or somewhere else).

* Set 2: In no particular order, I run the following spam filters:
o Identify a specific bad sender.
o Look for "<>" as the From: header.
o Look for "@<" in the header (lots of spam has this for some reason).
o Look for "Content-Type: audio". Nothing I want has this, only virii (your mileage may vary).
o Look for "euc-kr" and "ks_c_5601-1987" in the headers. I can't read that language, but for some reason I get a huge volume of Korean spam (of course, for an actual Korean reader, this isn't a good rule).

* Set 3: Store messages to known legitimate addresses. I have several such rules, but they all just match a literal To: field.

* Set 4: Look for messages that have a legit address in the header, but that weren't caught by the previous To: filters. I find that when I am only in the Bcc: field, it's almost always an unsolicited mailing to a list of alphabetically sequential addresses (mertz1@..., mertz37@..., etc).

* Set 5: Anything left at this point is probably spam (it probably has forged headers to avoid identification of the sender).

2. Whitelist/verification filters
A fairly aggressive technique for spam filtering is what I would call the "whitelist plus automated verification" approach. There are several tools that implement a whitelist with verification: TDMA is a popular multi-platform open source tool; ChoiceMail is a commercial tool for Windows; most others seem more preliminary. (See Resources later in this article for links.)

A whitelist filter connects to an MTA and passes mail only from explicitly approved recipients on to the inbox. Other messages generate a special challenge response to the sender. The whitelist filter's response contains some kind of unique code that identifies the original message, such as a hash or sequential ID. This challenge message contains instructions for the sender to reply in order to be added to the whitelist (the response message must contain the code generated by the whitelist filter). Almost all spam messages contain forged return address information, so the challenge usually does not even arrive anywhere; but even those spammers who provide usable return addresses are unlikely to respond to a challenge. When a legitimate sender answers a challenge, her/his address is added to the whitelist so that any future messages from the same address are passed through automatically.

Although I have not used any of these tools more than experimentally myself, I would expect whitelist/verification filters to be very nearly 100% effective in blocking spam messages. It is conceivable that spammers will start adding challenge responses to their systems, but this could be countered by making challenges slightly more sophisticated (for example, by requiring small human modification to a code). Spammers who respond, moreover, make themselves more easily traceable for people seeking legal remedies against them.

The problem with whitelist/verification filters is the extra burden they place on legitimate senders. Inasmuch as some correspondents may fail to respond to challenges -- for any reason -- this makes for a type of false positive. In the best case, a slight extra effort is required for legitimate senders. But senders who have unreliable ISPs, picky firewalls, multiple e-mail addresses, non-native understanding of English (or whatever language the challenge is written in), or who simply overlook or cannot be bothered with challenges, may not have their legitimate messages delivered. Moreover, sometimes legitimate "correspondents" are not people at all, but automated response systems with no capability of challenge response. Whitelist/verification filters are likely to require extra efforts to deal with mailing-list signups, online purchases, Web site registrations, and other "robot correspondences".

3. Distributed adaptive blacklists
Spam is almost by definition delivered to a large number of recipients. And as a matter of practice, there is little if any customization of spam messages to individual recipients. Each recipient of a spam, however, in the absence of prior filtering, must press his own "Delete" button to get rid of the message. Distributed blacklist filters let one user's Delete button warn millions of other users as to the spamminess of the message.

Tools such as Razor and Pyzor (see Resources) operate around servers that store digests of known spams. When a message is received by an MTA, a distributed blacklist filter is called to determine whether the message is a known spam. These tools use clever statistical techniques for creating digests, so that spams with minor or automated mutations (or just different headers resulting from transport routes) do not prevent recognition of message identity. In addition, maintainers of distributed blacklist servers frequently create "honey-pot" addresses specifically for the purpose of attracting spam (but never for any legitimate correspondences). In my testing, I found zero false positive spam categorizations by Pyzor. I would not expect any to occur using other similar tools, such as Razor.

There is some common sense to this. Even those ill-intentioned enough to taint legitimate messages would not have samples of my good messages to report to the servers -- it is generally only the spam messages that are widely distributed. It is conceivable that a widely sent, but legitimate message such as the developerWorks newsletter could be misreported, but the maintainers of distributed blacklist servers would almost certainly detect this and quickly correct such problems.

As the summary table below shows, however, false negatives are far more common using distributed blacklists than with any of the other techniques I tested. The authors of Pyzor recommend using the tool in conjunction with other techniques rather than as a single line of defense. While this seems reasonable, it is not clear that such combined filtering will actually produce many more spam identifications than the other techniques by themselves.

In addition, since distributed blacklists require talking to a server to perform verification, Pyzor performed far more slowly against my test corpora than did any other techniques. For testing a trickle of messages, this is no big deal, but for a high-volume ISP, it could be a problem. I also found that I experienced a couple of network timeouts for each thousand queries, so my results have a handful of "errors" in place of "spam" or "good" identifications.

4. Rule-based rankings
The most popular tool for rule-based spam filtering, by a good margin, is SpamAssassin. There are other tools, but they are not as widely used or actively maintained. SpamAssassin (and similar tools) evaluate a large number of patterns -- mostly regular expressions -- against a candidate message. Some matched patterns add to a message score, while others subtract from it. If a message's score exceeds a certain threshold, it is filtered as spam; otherwise it is considered legitimate.

Some ranking rules are fairly constant over time -- forged headers and auto-executing JavaScript, for example, almost timelessly mark spam. Other rules need to be updated as the products and scams advanced by spammers evolve. Herbal Viagra and heirs of African dictators might be the rage today, but tomorrow they might be edged out by some brand new snake-oil drug or pornographic theme. As spam evolves, SpamAssassin must evolve to keep up with it.

The README for SpamAssassin makes some very strong claims:
In its most recent test, SpamAssassin differentiated between spam and non-spam mail correctly in 99.94% of cases. Since then, it's just been getting better and better!

My testing showed nowhere near this level of success. Against my corpora, SpamAssassin had about 0.3% false positives and a whopping 19% false negatives. In fairness, this only evaluated the rule-based filters, not the optional checks against distributed blacklists. Additionally, my spam corpus is not purely spam -- it also includes a large collection of what are probably virus attachments (I do not open them to check for sure, but I know they are not messages I authorized). SpamAssassin's FAQ disclaims responsibility for finding viruses; on the other hand, the below techniques do much better in finding them, so the disclaimer is not all that compelling.

SpamAssassin runs much quicker than distributed blacklists, which need to query network servers. But it also runs much slower than even non-optimized versions of the below statistical models (written in interpreted Python using naive data structures).

5. Bayesian word distribution filters
Paul Graham wrote a provocative essay in August 2002. In "A Plan for Spam" (see Resources later in this article), Graham suggested building Bayesian probability models of spam and non-spam words. Graham's essay, or any general text on statistics and probability, can provide more mathematical background than I will here.

The general idea is that some words occur more frequently in known spam, and other words occur more frequently in legitimate messages. Using well-known mathematics, it is possible to generate a "spam-indicative probability" for each word. Another simple mathematical formula can be used to determine the overall "spam probability" of a novel message based on the collection of words it contains.

Graham's idea has several noteworthy benefits:

1. It can generate a filter automatically from corpora of categorized messages rather than requiring human effort in rule development.
2. It can be customized to individual users' characteristic spam and legitimate messages.
3. It can be implemented in a very small number of lines of code.
4. It works surprisingly well.

At first blush, it would be reasonable to suppose that a set of hand-tuned and laboriously developed rules like those in SpamAssassin would predict spam more accurately than a scattershot automated approach. It turns out that this supposition is dead wrong. A statistical model basically just works better than a rule-based approach. As a side benefit, a Graham-style Bayesian filter is also simpler and faster than SpamAssassin.

Within days -- perhaps hours -- of Graham's article being published, many people simultaneously started working on implementing the system. For purposes of my testing, I used a Python implementation created by a correspondent of mine named John Barham. I thank him for providing his implementation. However, the mathematics are simple enough that every other implementation is largely equivalent.

There are some issues of data structures and storage techniques that will effect operating speed of different tools. But the actual predictive accuracy depends on very few factors -- the main significant factor is probably the word-lexing technique used, and this matters mostly for eliminating spurious random strings. Barham's implementation simply looks for relatively short, disjoint sequences of characters in a small set (alphanumeric plus a few others).

6. Bayesian trigram filters
Bayesian techniques built on a word model work rather well. One disadvantage of the word model is that the number of "words" in e-mail is virtually unbounded. This fact may be counterintuitive -- it seems reasonable to suppose that you would reach an asymptote once almost all the English words had been included. From my prior research into full text indexing, I know that this is simply not true; the number of "word-like" character sequences possible is nearly unlimited, and new text keeps producing new sequences. This fact is particularly true of e-mails, which contain random strings in Message-IDs, content separators, UU and base64 encodings, and so on. There are various ways to throw out words from the model (the easiest is just to discard the sufficiently infrequent ones).

I decided to look into how well a much more starkly limited model space would work for a Bayesian spam filter. Specifically, I decided to use trigrams for my probability model rather than "words". This idea was not invented whole cloth, of course; there is a variety of research into language recognition/differentiation, cryptographic unicity distances of English, pattern frequencies, and related areas, that strongly suggest trigrams are a good unit.

There were several decisions I made along the way. The biggest choice was deciding what a trigram is. While this is somewhat simpler than identifying a "word", the completely naive approach of looking at every (overlapping) sequence of three bytes is non-optimal. In particular, considering high-bit characters -- although occurring relatively frequently in multi-byte character sets (in other words, CJK) -- forces a much bigger trigram space on us than does looking only at the ASCII range. Limiting the trigram space even further than to low-bit characters produces a smaller space, but not better overall results.

For my trigram analysis, I utilized only the most highly differentiating trigrams as message categorizers. But I arrived at the chosen numbers of "spam" and "good" trigrams only by trial and error. I also picked the cutoff probability for spam rather arbitrarily: I made an interesting discovery that no message in the "good" corpus was assigned a spam probability above .0071 other than two false positives in the .99 range. Lowering my cutoff from an initial 0.9 to 0.1, however, allowed me to catch a few more message in the "spam" corpus. For purposes of speed, I select no more than 100 "interesting" trigrams from each candidate message -- changing that 100 to something else can produce slight variations in the results (but not in an obvious direction).

Summary
Given the testing methodology described earlier, let's look at the concrete testing results. While I do not present any quantitative data on speed, the chart is arranged in order of speed, from fastest to slowest. Trigrams are fast, Pyzor (network lookup) is slow. In evaluating techniques, as I stated, I consider false positives very bad, and false negatives only slightly bad. The quantities in each cell represent the number of correctly identified messages vs. incorrectly identified messages for each technique tested against each body of e-mail, good and spam.

Table 1. Quantitative accuracy of spam filtering techniques
Technique Good corpus
(correctly identified vs. incorrectly identified) Spam corpus
(correctly identified vs. incorrectly identified)
"The Truth" 1851 vs. 0 1916 vs. 0
Trigram model 1849 vs. 2 1774 vs. 142
Word model 1847 vs. 4 1819 vs. 97
SpamAssassin 1846 vs. 5 1558 vs. 358
Pyzor 1847 vs. 0 (4 err) 943 vs. 971 (2 err)

Resources

* The TDMA home page provides more information about the Tagged Message Delivery Agent.

* You can get more information about ChoiceMail from DigitalPortal Software.

* Pyzor is a Python-based distributed spam catalog/filter.

* Vipul's Razor is a very popular distributed spam catalog/filter. Razor is optionally called by a number of other filter tools, such as SpamAssassin.

* Read Paul Graham's essay "A Plan for Spam."

* Eric Raymond has created a fast implementation of Paul Graham's idea under the name "bogofilter." In addition to using some efficient data representation and storage strategies, bogofilter tries to be smart about identifying what makes a meaningful word.

* My own trigram-based categorization tools are still at an early alpha or prototype level. However, you are welcome to use them as a basis for development. They are public domain, like all the tools I write for developerWorks articles.

* Lawrence Lessig has written a number of books and articles that insightfully contrast what he metonymically calls "west-coast code" and "east-coast code," in other words, the laws passed in Washington D.C. (and elsewhere) versus the software written in Silicon Valley (and elsewhere). I've written a short review of Lessig's Code and Other Laws of Cyberspace. See Lessig's Web site for more to think about.

* Find more Linux articles in the developerWorks Linux zone.

About the author
author David Mertz dislikes spam. He wishes to thank Andrew Blais for assistance in this article's testing, as well as for listening to David's peculiar fascination with trigrams and their distributions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/. Suggestions and recommendations on this, past, or future articles are welcome.

Posted by Lisa at 07:45 PM

August 21, 2002

Using Bad Poetry To Fight Spam?

If this isn't a sure sign of desperation, I don't know what is.
(Also, this year's rash of cyber-haiku is officially being taken a little too far, don't cha think?)

See the story by Michelle Delio for Wired News:
Haiku'da Been a Spam Filter.

Posted by Lisa at 05:00 PM

August 07, 2002

New Salon Story By Katharine Mieszkowski

The latest wave of spam bots have their own love-starved human
knowledge workers:

The bot who loved me.

here's the article in case the link goes bad:

page 1

The bot who loved me
Are those secret-admirer e-mails real -- or just the latest excrescence of an Internet marketing machine grown unfathomably sleazy?

- - - - - - - - - - - -
By Katharine Mieszkowski

printe-mail

Aug. 7, 2002 | He has blond hair, blue eyes and a sarcastic sense of humor. He's an artist, writer or musician, between the ages of 28 and 32. His idea of fun on a first date is a walk in the park, but he hankers to go on an African safari.

And this man -- whoever he is -- likes me. The Internet told me so.

Just a few walks in the park from now, I could be on the savanna in Zaire with Mr. X, trading acerbic remarks about the redoubtable mating habits of wildebeests.

There's just one hitch: I'm not convinced that this secret admirer actually exists. He may just be the bot who loved me.

____

Sign O' the Times: Pop's last great double album in 'Masterpiece'

Start your Summer Fling @ Matchmaker

____

A flirty e-mail from matchmaker@someonelikesyou.com tipped me off to this mystery man's tender crush. "You have a secret admirer!" gushed the message. Like half a dozen similar Web sites -- eCrush, Crushlink and SecretAdmirer.com among them -- SomeoneLikesYou plays Internet go-between. The gimmick: An anonymous e-mail crush notification service can pave the way for romance without the risk of rejection.

But while most of these "crush" sites operate above-board, proudly listing the founders' names and e-mail addresses, the cupids behind SomeoneLikesYou and its corporate sister site, Crushlink, play hard to get. The sites conceal the identities not only of the source of your crush note, but also of the people who run the services. Even some of the publicly available domain-name registration information about the sites is fake.

This secrecy, along with the sheer volume of admiring messages spewing from crushmaster@crushlink.com and matchmaker@someonelikeyou.com, has raised speculation that there's less romance than savvy marketing going on here. Competitors accuse Crushlink and SomeoneLikesYou of spamming any old e-mail address they can scrape off the Net with love notes, building membership by preying on sad-sack lonely-hearts -- then peddling affiliate programs to those members to bring in some cash.

"My dog has gotten 'someone has a crush on you' e-mail messages -- she's a cute dog, but no one has a crush on her," says Karen Demars, co-founder of eCrush. "My belief is that they are sending 'someone has a crush on you' messages to people who have not been legitimately crushed."

One consumer advocacy group in California is even threatening a lawsuit against Crushlink for misleading consumers about their love lives. And vigilant webmasters and anti-spam crusaders, suspicious that the sites are simply cynical e-mail harvesters, charge "spam!"

Forget "Who's my crush?" The more interesting question is: who's the crushmaster?

Is Mr. Crush really Mr. Spammer in a cupid's costume, breeding false hopes among the lovelorn with fake messages about nascent crushes that don't really exist? Or could the crushmaster be a scorned lover turning his vindictive rage on the Net's lonely millions in a frenzy of mixed messages? Or, maybe, just maybe, there's actually this much latent love out there on the Web, just waiting for the right database to come along and play yenta.

All the accusations of nefarious behavior and the secrecy surrounding these sites has made unmasking the identities of the frenzied cupids behind them a true Internet whodunit. After all, for geeks, speculating about the identity of a mysterious webmaster is as captivating as thinking about who might have a crush on you.

By following the geeks' trail in the ether, I found out who the crushmaster is -- and just like Mr. Right, he's the kind of guy you'd least expect.

page 2

SomeoneLikesYou and Crushlink represent a more extreme version of what all crush sites do. They inspire you to reveal your own crushes' e-mail addresses by dangling the lure that they know who wants you.

To find out what guy would be such a fourth-grader as to reveal his interest in me in this cheesy way, I first registered at SomeoneLikesYou, giving away a bevy of valuable demographic facts about myself in the process, like my date of birth and my ZIP code. Then I filled out a profile from a fixed menu of canned choices, indicating my hair color, eye color and ideal first date.

Finally, I was invited to offer all my own crushes' e-mail addresses up for sacrifice.

If I guess who my secret admirer is and turn over his e-mail address to the site, our identities will be revealed to each other, and we could be pricing safaris before the week is out!

____

Sign O' the Times: Pop's last great double album in 'Masterpiece'

Start your Summer Fling @ Matchmaker

____

But if there's no love connection, every address I've given to the site will get a message announcing "You have a secret admirer!" and the whirlwind of anonymous, crazy-making romantic madness just spreads.

What makes SomeoneLikesYou and Crushlink different from the rest of the sites in the genre is this: they bait hopeful visitors to hand over as many e-mail addresses as possible by trading clues for e-mail addresses.

The more e-mails that you reveal to SomeoneLikesYou, the more hints you get about your admirer's identity, like his hair color and his approximate age. Five e-mail addresses generates one clue. I gave away more than two-dozen e-mail addresses before the system ran out of hints about my admirer. Not even the most love-sick puppy has that many real crushes.

So, what's stronger -- the hunger for any clue that might unmask your own admirer, or the desire to protect the in-boxes of your friends, loved ones and colleagues from random romance spam, which could potentially embarrass you in the process? "She has a crush on me! Yikes!" And is it really spam if friends or colleagues have sold out your address in their own search for romance?

I elected to take a middle road, which wouldn't embarrass me or abuse my friends' trust, but might turn up enough hints to reveal my crush. I gamed the system by entering random, made-up e-mail addresses, potentially muddling the in-boxes (and sanity) of total strangers in pursuit of my own love interest.

Crushes -- they make people do crazy things.

But the system anticipates this simple ploy. If a made-up e-mail address I turned over bounced, SomeoneLikesYou just demanded another one.

This clues system helps explain why the SomeoneLikesYou and Crushlink romance virus has spread so far. A single wistful crushee hankering to know who likes her can generate dozens of "crush" messages to people she doesn't even know, which will likely spur some percentage of those suckers to spread the love as well.

That's got the California Consumer Action Network, a nonprofit consumer advocacy group, considering filing a lawsuit against these online cupids, according to the group's attorney, Joe Hughes. He charges that the site is violating the state's laws against unfair and deceptive advertising.

"We're concerned about the fact that it's a spam generator. They're implying to the user that they're going to find out if the e-mail address they enter is someone who has a crush on them, although it's probably more likely that someone is doing just what they're doing, which is guessing who had a crush on them." Could a class action lawsuit of lovelorn crushees hurt by messages about fake admirers be far behind?

The more I learned about the "someone" who likes me, the less real he seemed.

The e-mail that I got from this "secret admirer" came to an official corporate address that no friend would use. Besides, the "hints" I received about my admirer bore an uncanny resemblance to what I told the system about myself when I registered.

Maybe my account had just become a bit of currency to buy someone else a "hint." But the competitors to SomeoneLikesYou and Crushlink in the online crush space say that it's more than just this hints system that's generating all those befuddling crush messages.

Clark Benson, the co-founder of eCrush, says: "Crushlink must have bought tons of spam lists. The site went from nothing to a million visitors in no time. In about two weeks, everybody's accounts here were getting Crushlink e-mails." Among the addresses at eCrush that have gotten "crush" messages from Crushlink and SomeoneLikesYou: webmaster@ecrush.com, bizdev@ecrush.com, jobs@ecrush.com and Maggie@ecrush.com, a joke account for his co-founder's dog, which is published on the eCrush site.

Demars, the eCrush co-founder who owns Maggie, charges: "They're obtaining e-mail addresses in a way that is either technically generated or generated out of a hostage marketing situation (want a hint? Just give us five e-mail addresses!) that are just not truly the product of someone having a crush on you."

Miles Kronby, the founder of SecretAdmirer -- the grandfather of the concept, launched in 1997 -- won't name names, but says that he's watched the e-mail crush concept take a hurtful, debauched turn: "The problem is, some unscrupulous people running these things decided to abuse this system as a kind of spam generator," he sighs.

Perhaps the most extreme is the Crush007 site. (Note: Clicking on the link will open a lot of advertising windows.) Based in Malaysia, it sends a fake crush e-mail to an unsuspecting stooge. The site then goads the sucker to reveal all kinds of personal facts, including "how many times does she/he masturbate a week?" and "names of his/her biggest crush." The homepage makes no secret about its motives: "We have developed this website just to help you find out who your friend's crushes are, and also not to mention, their biggest, most well kept secrets." Fear for the dorkiest kid in the class, thrilled that someone actually has a crush on him, who is about to be the victim of an Internet humiliation machine.

But carping competitors aren't the only ones who think that all these anonymous romance e-mails have taken a sick and twisted turn. Several geeks, webmasters and spam fighters have put these love messages to the spam test and gone on a Web vigilante mission to find out who's behind them. If they couldn't find out who had crushes on them, at least they could figure out who was generating all those love notes!

. Next page | Unmasking the crushmaster

page 3

"Warning: crushlink is a spam scam," warns "Steve," a geek who refuses to reveal his real identity for fear of being sued, on a Web page set up to discuss his experience with the site. After he received a "crush" message, he became convinced that Crushlink was a system for harvesting e-mail addresses, so he registered for the site with an account at his own domain that he'd never used for anything else. Several months later, this account got a message from something called "Jennyslist."

Justin Beech, the webmaster behind Broadbandreports.com, went on his own sleuthing mission to unmask Mr. Crush after webmasters on his site groused that Crushlink and SomeoneLikesYou were fomenting spam, not romance. Although the WHOIS records for both sites are at least partially fake -- for instance, the phone number for Crushlink is listed as 800-000-0000 -- their Web server IP addresses don't lie. Beech linked both sites to Jumpstart Technologies LLC, a "direct-marketing" company. His research led him to finger Johann Schleier-Smith, a Harvard graduate and currently a physics grad student at Stanford, as Mr. Crush.

But it was Rob Whelan, a 40-year-old CIO for a retailing company in Tennessee, who finally turned up the guy who will admit to being the president and co-owner of Crushlink, Mr. Crush himself.

When Whelan got his "crush" message from Crushlink, he was immediately suspicious: "I'm not 12, so it seemed odd that I would get a message like this," he says. He contacted anti-spam organizations, the Federal Trade Commission and CyberAngels, a group that protects children online. After a few weeks of mucking around, threats to sue prompted a nervous phone call from one G reg Tseng, another Stanford physics grad student, who also went to Harvard as an undergrad.

____

'Some Like It Hot', Billy Wilder's manic, magical farce in 'Masterpiece' presented by Lexus

Start your Summer Fling @ Matchmaker

____

As a sophomore in college, Tseng started a dot-com called flyingchickens.com, which sought to take on Harvard's Coop by selling textbooks. (Johann Schleier-Smith, also then a student at Harvard. co-founded the site.) Flyingchickens soon merged with something called Limespot.com, a college-event listing site.

In short, these two embodied the late-'90s, dot-com poster-boy ideal -- techie, entrepreneurial undergrads so brimming with Big Ideas that they couldn't wait for graduation to start launching companies.

These weren't the stereotypical lowlife spammers that Whelan expected to find on the other end of his Crushmail. "Greg Tseng is a very bright young man, and unfortunately he's chosen this vocation for himself," sighs Whelan. "He does have a good entrepreneurial spirit, but I think that he's just misguided."

Whelan worries about the hurt feelings of kids who won't think twice about dumping their friends' e-mail addresses into a system that will send anonymous messages misleading them that romance is just at the other end of an "@" sign. "These guys think they're going to make a lot of money and not hurt anybody, but they're really just going to make a lot of money," says Whelan. "And they're not going to ever know or see or hear from the people who are hurt by this."

But worse than teenage false hopes, Whelan is concerned that parents have no way to opt their kids out. And he charges that the system lures kids to lie about their ages to Crushlink's and SomeoneLikesYou's marketing partners, who don't want 12-year-olds as customers. That's because one way to get "hints" to your admirer's identity on Crushlink is to register for an affiliated site's marketing program, like Netflix, which pays Crushlink a bounty for every person who signs up. SomeoneLikesYou takes this scheme even further. Even if you guess your crush correctly, you either have to sign up for an affiliate's program or pay $14.90 to find out who your admirer actually is.

After much stalking, both online and off, I finally tracked Tseng down. Although he demurely refused to speak to me on the phone or answer any specific questions about the charges leveled against his online love-note machines, he did send a few comments in one e-mail.

He maintained that the secrecy surrounding who's involved in the company is simply because they're in "stealth mode." But he outright denied spamming anyone with missives that might breed romantic delusions: "We do not sell or rent our user list to third parties (a.k.a. 'spam')," he wrote. "We do not purchase lists or harvest e-mail addresses. All of our outbound e-mails are either user-generated notices or communications with our registered users. We send precisely zero e-mail advertisements."

At least in one limited instance, this statement appears false. Remember Jennyslist, which messaged "Steve," after he registered for Crushlink with an address that he'd used for nothing else? A business acquaintance of Tseng's reveals that Jennyslist.com is a project of Jumpstart Technologies. Isn't this advertising? Tseng declined to comment.

Oh, maybe we're all just such doubting Thomases about the idea that anyone might actually like us that we can't face the possibility of new romance, even when it shows up right in our in boxes. Tseng seems to think so: "Some people may be confused about the origin of the 'Someone has a crush on you' notices but actually every single person that receives such a notice was listed as a crush by a registered user (and they should come to CrushLink to find out who!)."

Really? Then, prove to me that this person who claimed to admire me really exists, I demanded. But Tseng stayed mum. He had the perfect excuse, not that he bothered to offer it: Selling out the guy who likes me (if he exists) would violate the site's whole premise -- crush notification without the risk of rejection.

So maybe the evil genius of SomeoneLikesYou isn't that it's a love machine at all, but that it's an Internet Narcissus' pool. In this scenario, the love automaton feeds you hints about your "secret admirer," based on the profile you entered about yourself. You have so much in common!

More likely, the messages I got from SomeoneLikesYou came from someone who offered up my e-mail address when he or she tried to game the system to find out who likes them -- just as I did.

Or maybe there really is some blond-haired, blue-eyed, sarcastic guy biding his time surfing African safari Web sites, while he nurtures his fervent hope that the Internet will be our go-between.

Only the matchmaker knows for sure, and that pathological flirt's not telling.

Posted by Lisa at 08:24 AM