chairs message

Subject: Re: [chairs] SPAM
From: "Karl F. Best" <karl.best@oasis-open.org>
To: Duane Nickull <dnickull@adobe.com>, chairs@lists.oasis-open.org
Date: Tue, 13 Apr 2004 11:41:19 -0400
Chairs:

I'll open another can of worms and jump into this :-)

I agree with you wholeheartedly, Duane, that this is a problem. I'll bet 
that I get more spam than you do (few hundred a day). And I have no 
doubt that all this is because of spammers harvesting addresses from our 
list archives.

Of course a knee-jerk reaction would be to close off the archives so 
that nobody can get to them, but given that the OASIS philosophy is 
openness and accountability we need to keep things open and accessible.

There seems to be two possible solutions: either disguise the addresses 
stored in the archives, or to somehow block access so that only a human 
can get through. (I don't think that we want to go down the path of an 
offensive strategy such as what Duane suggests.)

Lacking a foolproof Turing test to allow only human access to the 
archives, I think the best and easiest solution will probably be to 
disguise the email addresses attached to each message so that whatever 
is harvested in unusable by spammers. The disguise would have to be such 
that the harvester would not be able to accurately or easily recreate 
the address. Obviously substituting the word "at" for the @ sign isn't 
going to fool anybody for very long. But whatever we do may not disguise 
the actual identity of the sender; we need to know who sent the message.

A final question is whether it is necessary for a person to be able to 
respond to a message he found in the archives; i.e. does the guy on the 
street need to be able to figure out how to respond to Duane when he 
reads something thet Duane wrote? Perhaps this requirement is not so 
important, as TC members already know how to respond to the TC list, and 
the guy on the street is already given instructions for sending a 
comment to the TC.

If the above is acceptable then perhaps I could suggest (and please 
note, this is just a strawman for discussion, not an official OASIS 
proposal) that we delete some portion of the address after the @ sign. 
We could delete all of it, leaving just "duane@", for example, but then 
we loose any idea about what company Duane was at, whether Yellow Dragon 
or Adobe (and it may be important for IPR reasons to know). So maybe we 
could leave the first couple of characters after the @ sign, resulting 
in "duane@ye" or "duane@ad". If we left three characters then we'd get 
"sun" and "ibm" etc. which would make it possible to reconstruct the 
address. But then again with only two we would get "hp".

So, any comments on whether it should be a requirement for a human to 
still be able to figure out the email address? And, if that's not a 
requirement, what do you think of my above suggestion?

-Karl

p.s. Duane, I hope you don't mind me using you as the example :-)





Duane Nickull wrote:
> I an getting ruthlessly spammed and every day it increases.
> 
> After careful analysis, I have deduced that my email address is most 
> often harvested from OASIS list archives.
> I would favor setting up a system that makes it harder for spammers to 
> harvest email addresses from this list by confusing the heuristic filters.
> 
> Others have done something like this to fight it
> 
> dnickull(at)adobe.com - replace the (at) with the "@" sign to email.
> 
> but this is too easy to program around.
> 
> I couldn't sleep last night and came up with a more devious plot to foil 
> the spammers.  What if we adopted both a defensive and offensive 
> strategy?  First of all, if we defensively replaced all the email 
> archives email addresses with something that confused the spam 
> harvesters like
> 
> "dnickull" + [some_randomness_here] + domainname + {something else to 
> hide the domain suffix - .com, .org, .gov}
> 
> that would potentially cut down email addresses getting harvested.
> 
> Second, as an offensive weapon, make some dynamic pages that either 
> detect patterns in the log files of a bot looking for email addresses 
> (such as a repeated get() for more than 10 archive pages within a 
> certain timeframe) and it would generate hundreds of email addresses 
> that are invisible to the human eye, but would be based on the URL the 
> get originated from. 
> For example, if I send a request to get the get() the archives for OASIS 
> from IP address 216.154.143.253, the page would generate 100's of hidden 
> email addresses, all   @216.154.143.253.  The IP address is a readily 
> available environmental variable within an HTTP request scenario.
> 
> To the casual observer, there would be no difference in the page display 
> but to a spam email harvester, this would add 100's (perhaps 1,000's) of 
> emails that would end up with the spam harvester being the victim of a 
> their own spam.
> 
> This could be both funny and help solve the problem.  This would also 
> not be to hard IMO to implement.
> 
> Thoughts?
> 
> Duane
> 


-- 
=================================================================
Karl F. Best
Vice President, OASIS
office  +1 978.667.5115 x206     mobile +1 978.761.1648
karl.best@oasis-open.org      http://www.oasis-open.org
Follow-Ups:
- Re: [chairs] SPAM
  - From: Duane Nickull <dnickull@adobe.com>
- Re: [chairs] SPAM
  - From: "David RR Webber" <david@drrw.info>
- Re: [chairs] SPAM
  - From: "Eve L. Maler" <Eve.Maler@Sun.COM>
References:
- SPAM
  - From: Duane Nickull <dnickull@adobe.com>