MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient

Discussion:

Sam Gelbart

2014-07-11 08:51:06 UTC

Hi All,

We at SYNAQ use and have used Mailscanner for many years. As an Email Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more customer domains) we have noticed some deficiencies in MailScanner.

Below is a brief description covering our problem areas:

Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact that we're provisioning more and more customers (and email domains) on our hygiene platform, and that more than one of these customer recipients/domains (and their applicable rulesets) are being addressed in the same email.

Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za and abc.co.za as the Message's "to_address" and "to_domain".
4) MailScanner determines that the message is SPAM and because it has "chosen" @abc.co.za as the email domain it deletes the message as the configured spam action for @abc.coz.a is to delete.
5) However the rule for xyz.co.za is to store/quarantine spam. This does not happen because of the actions above and data is also never logged via MailWatch.
6) The example above is a based on very simple scenario, and as you are aware this applies to many more complex rulesets (size, File Type etc) across the system.

Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a single email message.
3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za and abc.co.za as the Message's "to_address" and "to_domain".
4) When the message is processed, the MailWatch.pm script receives a message object for SQL logging with data only for user at abc.co.za and abc.co.za; xyz.co.za is never logged.

Finally we have considered splitting incoming messages by recipient at an MTA level to address this problem, but our calculations show that it would require 3.5x more hardware to process this increased mail load. So for us a MailsScanner solution is ideal.

Based on the above, could you tell me if there is anything that can be done from a MailScanner community point of view to help develop MailScanner functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.

Also if the community has any ideas on other ways we can remedy this problem we welcome your feedback.

Thanks and regards,

Sam Gelbart
SYNAQ

Alex Neuman

2014-07-11 12:37:36 UTC

Permalink

This has been discussed many times before. You need to split recipients at
the MTA level in order to accomplish this, unless you can contribute the
resources to modify the code. You could use less than 3.5x the hardware by
using a gateway machine to do the splitting for you. I don't know what your
mail volume is, but I'm guessing just for splitting and relaying you could
get by with using SSD's for the relay machine and for the MailScanner
incoming and processing queue folders - with the end result probably being
faster than what you have now, at least from experience.

*Alex Neuman van der Hans*Reliant Technologies / Vida Digital
http://vidadigital.com.pa/

Mobile: +507-6781-9505
Work: +507-832-6725
Work (USA): +1-440-253-9789
Skype: AlexNeuman

Don't miss Vida Digital on LiveStream
<http://new.livestream.com/accounts/5061819>!
Saturdays 8am-10am on 104.3FM Panama

Follow *@AlexNeuman <https://twitter.com/alexneuman>* on Twitter
Like Vida Digital <https://facebook.com/vidadigital/> on Facebook
Follow VidaDigital <http://instagram.com/vidadigital> on Instagram
Subscribe to Vida Digital <https://youtube.com/reliantpty> on Youtube

Post by Sam Gelbart
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email
Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more
customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact
that we're provisioning more and more customers (and email domains) on our
hygiene platform, and that more than one of these customer
recipients/domains (and their applicable rulesets) are being addressed in
the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za and abc.co.za as the Message's "to_address" and
"to_domain".
4) MailScanner determines that the message is SPAM and because it has
5) However the rule for xyz.co.za is to store/quarantine spam. This does
not happen because of the actions above and data is also never logged via
MailWatch.
6) The example above is a based on very simple scenario, and as you are
aware this applies to many more complex rulesets (size, File Type etc)
across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a
single email message.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za and abc.co.za as the Message's "to_address" and
"to_domain".
4) When the message is processed, the MailWatch.pm script receives a
message object for SQL logging with data only for user at abc.co.za and
abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an
MTA level to address this problem, but our calculations show that it would
require 3.5x more hardware to process this increased mail load. So for us a
MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be
done from a MailScanner community point of view to help develop MailScanner
functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this
problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140711/db696b3a/attachment.html

Martin Hepworth

2014-07-11 13:49:04 UTC

Permalink

Might want to also consider having a more flexible approach as Alex had
mentioned.
Will also help with some of the hardware requirements as you can also
reject non-valid recipients at MTA as well as splitting the emails up, so
the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140711/7580c8fe/attachment.html

Glenn Steen

2014-08-05 13:50:33 UTC

Permalink

Can only agree with Martin and Alex, there is no way around either
splitting mails per recipient (very feasible), or som major rework of both
the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite
excessive... i suppose you arrived at that figure by analysing the number
of recipients per mail (and frequency of multi-recipient emails)? Well, the
number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache
SpamAssassin Results = yes", the actual processing time and resource use
will be minimized (not to mention that the normal batch-processing style of
MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular
MailScanner hosts should be rather simple, as well as adjusting which
Received: lines your MailScanner hosts should ignore (since they otherwise
will perceive all messages as originating from the "splitting MX" host)...
So why not try that, with the gear you have ATM, and see where that leads
you? Depending on what mailstore hosts you eventually deliver to, the
storage impact should be minimal or even non-existant, since even
M-Sexchange has abandioned "single store" since ... way back... so every
recipient would eventually have their own copy in their own mailbox
anyway;-).

As Alex says, we know nothing about your actual mail volume, but my money
is on there being much less of a problem than you think, even if you do
have ... serious traffic... (more than a few thousand mails/hour). the
likeliest problem point/bottleneck is likely your MailWatch database so...
keep an eye on that one, make sure you run it as InnoDB etc.

Cheers!
--
-- Glenn

Post by Martin Hepworth
Might want to also consider having a more flexible approach as Alex had
mentioned.
Will also help with some of the hardware requirements as you can also
reject non-valid recipients at MTA as well as splitting the emails up, so
the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK

Post by Sam Gelbart
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email
Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many
more customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact
that we're provisioning more and more customers (and email domains) on our
hygiene platform, and that more than one of these customer
recipients/domains (and their applicable rulesets) are being addressed in
the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za and abc.co.za as the Message's "to_address" and
"to_domain".
4) MailScanner determines that the message is SPAM and because it has
5) However the rule for xyz.co.za is to store/quarantine spam. This does
not happen because of the actions above and data is also never logged via
MailWatch.
6) The example above is a based on very simple scenario, and as you are
aware this applies to many more complex rulesets (size, File Type etc)
across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a
single email message.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za and abc.co.za as the Message's "to_address" and
"to_domain".
4) When the message is processed, the MailWatch.pm script receives a
message object for SQL logging with data only for user at abc.co.za and
abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an
MTA level to address this problem, but our calculations show that it would
require 3.5x more hardware to process this increased mail load. So for us a
MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be
done from a MailScanner community point of view to help develop MailScanner
functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this
problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/ba2f40a1/attachment.html

Randal, Phil

2014-08-05 14:23:52 UTC

Permalink

Does converting the MailWatch databases to InnoDB make a big difference in MailWatch performance?

Just curious.

Phil

From: mailscanner-bounces at lists.mailscanner.info [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf Of Glenn Steen
Sent: 05 August 2014 14:51
To: MailScanner discussion
Subject: Re: MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient

Can only agree with Martin and Alex, there is no way around either splitting mails per recipient (very feasible), or som major rework of both the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite excessive... i suppose you arrived at that figure by analysing the number of recipients per mail (and frequency of multi-recipient emails)? Well, the number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache SpamAssassin Results = yes", the actual processing time and resource use will be minimized (not to mention that the normal batch-processing style of MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular MailScanner hosts should be rather simple, as well as adjusting which Received: lines your MailScanner hosts should ignore (since they otherwise will perceive all messages as originating from the "splitting MX" host)... So why not try that, with the gear you have ATM, and see where that leads you? Depending on what mailstore hosts you eventually deliver to, the storage impact should be minimal or even non-existant, since even M-Sexchange has abandioned "single store" since ... way back... so every recipient would eventually have their own copy in their own mailbox anyway;-).

As Alex says, we know nothing about your actual mail volume, but my money is on there being much less of a problem than you think, even if you do have ... serious traffic... (more than a few thousand mails/hour). the likeliest problem point/bottleneck is likely your MailWatch database so... keep an eye on that one, make sure you run it as InnoDB etc.

Cheers!
--
-- Glenn

On 11 July 2014 15:49, Martin Hepworth <maxsec at gmail.com<mailto:maxsec at gmail.com>> wrote:
Might want to also consider having a more flexible approach as Alex had mentioned.
Will also help with some of the hardware requirements as you can also reject non-valid recipients at MTA as well as splitting the emails up, so the core MailScanner farm has less to do.

--
Martin Hepworth, CISSP
Oxford, UK

On 11 July 2014 09:51, Sam Gelbart <samg at synaq.com<mailto:samg at synaq.com>> wrote:
Hi All,

We at SYNAQ use and have used Mailscanner for many years. As an Email Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more customer domains) we have noticed some deficiencies in MailScanner.

Below is a brief description covering our problem areas:

Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact that we're provisioning more and more customers (and email domains) on our hygiene platform, and that more than one of these customer recipients/domains (and their applicable rulesets) are being addressed in the same email.

Problem 1
1) abc.co.za<http://abc.co.za> and xyz.co.za<http://xyz.co.za> are both provisioned on our platform.
2) abc.co.za<http://abc.co.za> has quarantining of SPAM configured, while xyz.co.za<http://xyz.co.za> does not.
3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za<mailto:user at abc.co.za> and abc.co.za<http://abc.co.za> as the Message's "to_address" and "to_domain".
4) MailScanner determines that the message is SPAM and because it has "chosen" @abc.co.za<http://abc.co.za> as the email domain it deletes the message as the configured spam action for @abc.coz.a is to delete.
5) However the rule for xyz.co.za<http://xyz.co.za> is to store/quarantine spam. This does not happen because of the actions above and data is also never logged via MailWatch.
6) The example above is a based on very simple scenario, and as you are aware this applies to many more complex rulesets (size, File Type etc) across the system.

Problem 2
1) abc.co.za<http://abc.co.za> and xyz.co.za<http://xyz.co.za> are both provisioned on our platform.
2) A third party emails both user at abc.co.za<mailto:user at abc.co.za> and user at xyz.co.za<mailto:user at xyz.co.za> in a single email message.
3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za<mailto:user at abc.co.za> and abc.co.za<http://abc.co.za> as the Message's "to_address" and "to_domain".
4) When the message is processed, the MailWatch.pm script receives a message object for SQL logging with data only for user at abc.co.za<mailto:user at abc.co.za> and abc.co.za<http://abc.co.za>; xyz.co.za<http://xyz.co.za> is never logged.

Finally we have considered splitting incoming messages by recipient at an MTA level to address this problem, but our calculations show that it would require 3.5x more hardware to process this increased mail load. So for us a MailsScanner solution is ideal.

Based on the above, could you tell me if there is anything that can be done from a MailScanner community point of view to help develop MailScanner functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.

Also if the community has any ideas on other ways we can remedy this problem we welcome your feedback.

Thanks and regards,

Sam Gelbart
SYNAQ

--
MailScanner mailing list
mailscanner at lists.mailscanner.info<mailto:mailscanner at lists.mailscanner.info>
http://lists.mailscanner.info/mailman/listinfo/mailscanner

Before posting, read http://wiki.mailscanner.info/posting

Support MailScanner development - buy the book off the website!

--
MailScanner mailing list
mailscanner at lists.mailscanner.info<mailto:mailscanner at lists.mailscanner.info>
http://lists.mailscanner.info/mailman/listinfo/mailscanner

Before posting, read http://wiki.mailscanner.info/posting

Support MailScanner development - buy the book off the website!

--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 0LE

"Any opinion expressed in this e-mail or any attached files are those of the individual and not necessarily those of Hoople Ltd. You should be aware that Hoople Ltd. monitors its email service. This e-mail and any attached files are confidential and intended solely for the use of the addressee. This communication may contain material protected by law from being passed on. If you are not the intended recipient and have received this e-mail in error, you are advised that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. If you have received this e-mail in error please contact the sender immediately and destroy all copies of it."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/cbe3707a/attachment-0001.html

Jerry Benton

2014-08-05 15:16:36 UTC

Permalink

Based on Mailborder design and testing, which the DB structure of Mailwatch is very similar, MyISAM has better performance when you start hitting millions of records.

-
Jerry Benton
www.mailborder.com

Post by Randal, Phil
Does converting the MailWatch databases to InnoDB make a big difference in MailWatch performance?
Just curious.
Phil
From: mailscanner-bounces at lists.mailscanner.info [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf Of Glenn Steen
Sent: 05 August 2014 14:51
To: MailScanner discussion
Subject: Re: MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient
Can only agree with Martin and Alex, there is no way around either splitting mails per recipient (very feasible), or som major rework of both the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite excessive... i suppose you arrived at that figure by analysing the number of recipients per mail (and frequency of multi-recipient emails)? Well, the number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache SpamAssassin Results = yes", the actual processing time and resource use will be minimized (not to mention that the normal batch-processing style of MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular MailScanner hosts should be rather simple, as well as adjusting which Received: lines your MailScanner hosts should ignore (since they otherwise will perceive all messages as originating from the "splitting MX" host)... So why not try that, with the gear you have ATM, and see where that leads you? Depending on what mailstore hosts you eventually deliver to, the storage impact should be minimal or even non-existant, since even M-Sexchange has abandioned "single store" since ... way back... so every recipient would eventually have their own copy in their own mailbox anyway;-).
As Alex says, we know nothing about your actual mail volume, but my money is on there being much less of a problem than you think, even if you do have ... serious traffic... (more than a few thousand mails/hour). the likeliest problem point/bottleneck is likely your MailWatch database so... keep an eye on that one, make sure you run it as InnoDB etc.
Cheers!
--
-- Glenn
Might want to also consider having a more flexible approach as Alex had mentioned.
Will also help with some of the hardware requirements as you can also reject non-valid recipients at MTA as well as splitting the emails up, so the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact that we're provisioning more and more customers (and email domains) on our hygiene platform, and that more than one of these customer recipients/domains (and their applicable rulesets) are being addressed in the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
5) However the rule for xyz.co.za is to store/quarantine spam. This does not happen because of the actions above and data is also never logged via MailWatch.
6) The example above is a based on very simple scenario, and as you are aware this applies to many more complex rulesets (size, File Type etc) across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a single email message.
3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) When the message is processed, the MailWatch.pm script receives a message object for SQL logging with data only for user at abc.co.za and abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an MTA level to address this problem, but our calculations show that it would require 3.5x more hardware to process this increased mail load. So for us a MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be done from a MailScanner community point of view to help develop MailScanner functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 0LE
"Any opinion expressed in this e-mail or any attached files are those of the individual and not necessarily those of Hoople Ltd. You should be aware that Hoople Ltd. monitors its email service. This e-mail and any attached files are confidential and intended solely for the use of the addressee. This communication may contain material protected by law from being passed on. If you are not the intended recipient and have received this e-mail in error, you are advised that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. If you have received this e-mail in error please contact the sender immediately and destroy all copies of it." --
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/39307e6b/attachment.html

Jerry Benton

2014-08-05 15:27:27 UTC

Permalink

Caveat: You should partition the database by time. This is the Mailborder cp_maillog, which is slightly different than MailWatch, but the bit near the end is what you are looking for. You can adapt it for your table with an alter statement.

CREATE TABLE IF NOT EXISTS `cp_maillog` (
`db_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`id` varchar(30) NOT NULL,
`size` bigint(20) DEFAULT '0',
`from_address` varchar(255) DEFAULT NULL,
`from_domain` varchar(255) DEFAULT NULL,
`to_address` varchar(255) DEFAULT NULL,
`to_domain` varchar(255) DEFAULT NULL,
`subject` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`clientip` varchar(15) DEFAULT NULL,
`archive` varchar(100) DEFAULT NULL,
`isspam` tinyint(1) DEFAULT '0',
`ishighspam` tinyint(1) DEFAULT '0',
`issaspam` tinyint(1) DEFAULT '0',
`isrblspam` tinyint(1) DEFAULT '0',
`spamwhitelisted` tinyint(1) DEFAULT '0',
`spamblacklisted` tinyint(1) DEFAULT '0',
`sascore` decimal(7,2) DEFAULT '0.00',
`spamreport` text,
`virusinfected` tinyint(1) DEFAULT '0',
`nameinfected` tinyint(1) DEFAULT '0',
`sizeinfected` tinyint(1) DEFAULT '0',
`otherinfected` tinyint(1) DEFAULT '0',
`report` text,
`ismcp` tinyint(1) DEFAULT '0',
`ishighmcp` tinyint(1) DEFAULT '0',
`issamcp` tinyint(1) DEFAULT '0',
`mcpwhitelisted` tinyint(1) DEFAULT '0',
`mcpblacklisted` tinyint(1) DEFAULT '0',
`mcpsascore` decimal(7,2) DEFAULT '0.00',
`mcpreport` text,
`hostname` varchar(100) DEFAULT NULL,
`date` date NOT NULL DEFAULT '0000-00-00',
`time` time DEFAULT NULL,
`headers` text,
`quarantined` tinyint(1) DEFAULT '0',
`released` tinyint(1) DEFAULT '0',
`guid` varchar(40) NOT NULL,
PRIMARY KEY (`db_id`,`date`),
KEY `id` (`id`),
KEY `timestamp` (`timestamp`),
KEY `from_address` (`from_address`),
KEY `from_domain` (`from_domain`),
KEY `to_address` (`to_address`),
KEY `to_domain` (`to_domain`),
KEY `guid` (`guid`),
KEY `isspam` (`isspam`),
KEY `ishighspam` (`ishighspam`),
KEY `issaspam` (`issaspam`),
KEY `isrblspam` (`isrblspam`),
KEY `spamwhitelisted` (`spamwhitelisted`),
KEY `spamblacklisted` (`spamblacklisted`),
KEY `virusinfected` (`virusinfected`),
KEY `nameinfected` (`nameinfected`),
KEY `otherinfected` (`otherinfected`),
KEY `quarantined` (`quarantined`),
KEY `sizeinfected` (`sizeinfected`),
KEY `ismcp` (`ismcp`),
KEY `ishighmcp` (`ishighmcp`),
KEY `issamcp` (`issamcp`),
KEY `mcpwhitelisted` (`mcpwhitelisted`),
KEY `mcpblacklisted` (`mcpblacklisted`),
KEY `released` (`released`),
KEY `size` (`size`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH (( YEAR(`date`) + MONTH(`date`) )) PARTITIONS 70;

-
Jerry Benton
www.mailborder.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/06553749/attachment.html

Glenn Steen

2014-08-05 17:43:24 UTC

Permalink

Ah, that explains it! In reality, you work with a set of rather small
database files, which of course has a lot of impact on timeebased (and
indeed most!:-) queries... A very sensible design (that wasn't possible for
Steve F at the inseption even of MaiLWatch 1.0:-) and probably not that
much work implementing in my old setup (I confess, I've been... slow... in
adapting to the latest/greatest:-).
If/when time permits experimentation...:-)

Cheers!
--
-- Glenn

Post by Jerry Benton
Caveat: You should partition the database by time. This is the Mailborder
cp_maillog, which is slightly different than MailWatch, but the bit near
the end is what you are looking for. You can adapt it for your table with
an alter statement.
CREATE TABLE IF NOT EXISTS `cp_maillog` (
`db_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`id` varchar(30) NOT NULL,
`size` bigint(20) DEFAULT '0',
`from_address` varchar(255) DEFAULT NULL,
`from_domain` varchar(255) DEFAULT NULL,
`to_address` varchar(255) DEFAULT NULL,
`to_domain` varchar(255) DEFAULT NULL,
`subject` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`clientip` varchar(15) DEFAULT NULL,
`archive` varchar(100) DEFAULT NULL,
`isspam` tinyint(1) DEFAULT '0',
`ishighspam` tinyint(1) DEFAULT '0',
`issaspam` tinyint(1) DEFAULT '0',
`isrblspam` tinyint(1) DEFAULT '0',
`spamwhitelisted` tinyint(1) DEFAULT '0',
`spamblacklisted` tinyint(1) DEFAULT '0',
`sascore` decimal(7,2) DEFAULT '0.00',
`spamreport` text,
`virusinfected` tinyint(1) DEFAULT '0',
`nameinfected` tinyint(1) DEFAULT '0',
`sizeinfected` tinyint(1) DEFAULT '0',
`otherinfected` tinyint(1) DEFAULT '0',
`report` text,
`ismcp` tinyint(1) DEFAULT '0',
`ishighmcp` tinyint(1) DEFAULT '0',
`issamcp` tinyint(1) DEFAULT '0',
`mcpwhitelisted` tinyint(1) DEFAULT '0',
`mcpblacklisted` tinyint(1) DEFAULT '0',
`mcpsascore` decimal(7,2) DEFAULT '0.00',
`mcpreport` text,
`hostname` varchar(100) DEFAULT NULL,
`date` date NOT NULL DEFAULT '0000-00-00',
`time` time DEFAULT NULL,
`headers` text,
`quarantined` tinyint(1) DEFAULT '0',
`released` tinyint(1) DEFAULT '0',
`guid` varchar(40) NOT NULL,
PRIMARY KEY (`db_id`,`date`),
KEY `id` (`id`),
KEY `timestamp` (`timestamp`),
KEY `from_address` (`from_address`),
KEY `from_domain` (`from_domain`),
KEY `to_address` (`to_address`),
KEY `to_domain` (`to_domain`),
KEY `guid` (`guid`),
KEY `isspam` (`isspam`),
KEY `ishighspam` (`ishighspam`),
KEY `issaspam` (`issaspam`),
KEY `isrblspam` (`isrblspam`),
KEY `spamwhitelisted` (`spamwhitelisted`),
KEY `spamblacklisted` (`spamblacklisted`),
KEY `virusinfected` (`virusinfected`),
KEY `nameinfected` (`nameinfected`),
KEY `otherinfected` (`otherinfected`),
KEY `quarantined` (`quarantined`),
KEY `sizeinfected` (`sizeinfected`),
KEY `ismcp` (`ismcp`),
KEY `ishighmcp` (`ishighmcp`),
KEY `issamcp` (`issamcp`),
KEY `mcpwhitelisted` (`mcpwhitelisted`),
KEY `mcpblacklisted` (`mcpblacklisted`),
KEY `released` (`released`),
KEY `size` (`size`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH (( YEAR(`date`) +
MONTH(`date`) )) PARTITIONS 70;
-
Jerry Benton
www.mailborder.com
Based on Mailborder design and testing, which the DB structure of
Mailwatch is very similar, MyISAM has better performance when you start
hitting millions of records.
-
Jerry Benton
www.mailborder.com
Does converting the MailWatch databases to InnoDB make a big difference in
MailWatch performance?
Just curious.
Phil
*From:* mailscanner-bounces at lists.mailscanner.info [
mailto:mailscanner-bounces at lists.mailscanner.info
<mailscanner-bounces at lists.mailscanner.info>] *On Behalf Of *Glenn Steen
*Sent:* 05 August 2014 14:51
*To:* MailScanner discussion
*Subject:* Re: MailScanner Deficiency: Multi-Ruleset Processing per Email
Recipient
Can only agree with Martin and Alex, there is no way around either
splitting mails per recipient (very feasible), or som major rework of both
the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite
excessive... i suppose you arrived at that figure by analysing the number
of recipients per mail (and frequency of multi-recipient emails)? Well, the
number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache
SpamAssassin Results = yes", the actual processing time and resource use
will be minimized (not to mention that the normal batch-processing style of
MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular
MailScanner hosts should be rather simple, as well as adjusting which
Received: lines your MailScanner hosts should ignore (since they otherwise
will perceive all messages as originating from the "splitting MX" host)...
So why not try that, with the gear you have ATM, and see where that leads
you? Depending on what mailstore hosts you eventually deliver to, the
storage impact should be minimal or even non-existant, since even
M-Sexchange has abandioned "single store" since ... way back... so every
recipient would eventually have their own copy in their own mailbox
anyway;-).
As Alex says, we know nothing about your actual mail volume, but my money
is on there being much less of a problem than you think, even if you do
have ... serious traffic... (more than a few thousand mails/hour). the
likeliest problem point/bottleneck is likely your MailWatch database so...
keep an eye on that one, make sure you run it as InnoDB etc.
Cheers!
--
-- Glenn
Might want to also consider having a more flexible approach as Alex had mentioned.
Will also help with some of the hardware requirements as you can also
reject non-valid recipients at MTA as well as splitting the emails up, so
the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email
Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more
customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact
that we're provisioning more and more customers (and email domains) on our
hygiene platform, and that more than one of these customer
recipients/domains (and their applicable rulesets) are being addressed in
the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) MailScanner determines that the message is SPAM and because it has
5) However the rule for xyz.co.za is to store/quarantine spam. This does
not happen because of the actions above and data is also never logged via
MailWatch.
6) The example above is a based on very simple scenario, and as you are
aware this applies to many more complex rulesets (size, File Type etc)
across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a
single email message.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) When the message is processed, the MailWatch.pm script receives a
message object for SQL logging with data only for user at abc.co.za and
abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an
MTA level to address this problem, but our calculations show that it would
require 3.5x more hardware to process this increased mail load. So for us a
MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be
done from a MailScanner community point of view to help develop MailScanner
functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this
problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 0LE
"Any opinion expressed in this e-mail or any attached files are those of
the individual and not necessarily those of Hoople Ltd. You should be aware
that Hoople Ltd. monitors its email service. This e-mail and any attached
files are confidential and intended solely for the use of the addressee.
This communication may contain material protected by law from being passed
on. If you are not the intended recipient and have received this e-mail in
error, you are advised that any use, dissemination, forwarding, printing or
copying of this e-mail is strictly prohibited. If you have received this
e-mail in error please contact the sender immediately and destroy all
copies of it." --
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/c14f0026/attachment.html

Jerry Benton

2014-08-05 18:06:40 UTC

Permalink

From Baron Schwartz the author of High Performance MySQL:

"The reason is very simple. When you insert a row into MyISAM, it just puts it into the server's memory and hopes that the server will flush it to disk at some point in the future. Good luck if the server crashes.

When you insert a row into InnoDB it syncs the transaction durably to disk, and that requires it to wait for the disk to spin. Do the math on your system and see how long that takes.

You can improve this by relaxing innodb_flush_log_at_trx_commit or by batching rows within a transaction instead of doing one transaction per row."

In short, myisam is faster for inserts but InnoDB is more reliable. All of that ACID compliance and transaction rollback comes with an overhead cost. InnoDB also provides row level locking instead of table level like myisam and InnoDB can automatically recover from crashes. So, if you want reliability over performance, go with InnoDB. If you want faster inserts and quite often faster search results, go with MyISAM.

These are mail logs and not bank records. But I suppose the level of important is relative.

-
Jerry Benton
www.mailborder.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/d82d6f44/attachment.html

Glenn Steen

2014-08-06 09:06:40 UTC

Permalink

Yes jerry, quite true.
I thought I went InnoDB for performance, but it may well have been for
stabuility... As said, this was done quite a few years ago.
For the discussion at hand, this minor point has truly taken a far to large
place though... The dominant factor, vis-a-vis performance, is most likely
the execution of SpamAssassin on the message batch (and possibly the
AV-scanning, if you use something reeeaaally slow/cumbersome), and that
will be dealt with nicely by the SA results cache.
I'll still nick your partitioning idea (and revert to myISAM) and see what
that gives me, when I get the time:-).

Cheers
--
-- Glenn

Post by Jerry Benton
"The reason is very simple. When you insert a row into MyISAM, it just
puts it into the server's memory and hopes that the server will flush it to
disk at some point in the future. Good luck if the server crashes.
When you insert a row into InnoDB it syncs the transaction durably to
disk, and that requires it to wait for the disk to spin. Do the math on
your system and see how long that takes.
You can improve this by relaxing innodb_flush_log_at_trx_commit or by
batching rows within a transaction instead of doing one transaction per
row."
In short, myisam is faster for inserts but InnoDB is more reliable. All of
that ACID compliance and transaction rollback comes with an overhead cost.
InnoDB also provides row level locking instead of table level like myisam
and InnoDB can automatically recover from crashes. So, if you want
reliability over performance, go with InnoDB. If you want faster inserts
and quite often faster search results, go with MyISAM.
These are mail logs and not bank records. But I suppose the level of important is relative.
-
Jerry Benton
www.mailborder.com
Caveat: You should partition the database by time. This is the Mailborder
cp_maillog, which is slightly different than MailWatch, but the bit near
the end is what you are looking for. You can adapt it for your table with
an alter statement.
CREATE TABLE IF NOT EXISTS `cp_maillog` (
`db_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`id` varchar(30) NOT NULL,
`size` bigint(20) DEFAULT '0',
`from_address` varchar(255) DEFAULT NULL,
`from_domain` varchar(255) DEFAULT NULL,
`to_address` varchar(255) DEFAULT NULL,
`to_domain` varchar(255) DEFAULT NULL,
`subject` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`clientip` varchar(15) DEFAULT NULL,
`archive` varchar(100) DEFAULT NULL,
`isspam` tinyint(1) DEFAULT '0',
`ishighspam` tinyint(1) DEFAULT '0',
`issaspam` tinyint(1) DEFAULT '0',
`isrblspam` tinyint(1) DEFAULT '0',
`spamwhitelisted` tinyint(1) DEFAULT '0',
`spamblacklisted` tinyint(1) DEFAULT '0',
`sascore` decimal(7,2) DEFAULT '0.00',
`spamreport` text,
`virusinfected` tinyint(1) DEFAULT '0',
`nameinfected` tinyint(1) DEFAULT '0',
`sizeinfected` tinyint(1) DEFAULT '0',
`otherinfected` tinyint(1) DEFAULT '0',
`report` text,
`ismcp` tinyint(1) DEFAULT '0',
`ishighmcp` tinyint(1) DEFAULT '0',
`issamcp` tinyint(1) DEFAULT '0',
`mcpwhitelisted` tinyint(1) DEFAULT '0',
`mcpblacklisted` tinyint(1) DEFAULT '0',
`mcpsascore` decimal(7,2) DEFAULT '0.00',
`mcpreport` text,
`hostname` varchar(100) DEFAULT NULL,
`date` date NOT NULL DEFAULT '0000-00-00',
`time` time DEFAULT NULL,
`headers` text,
`quarantined` tinyint(1) DEFAULT '0',
`released` tinyint(1) DEFAULT '0',
`guid` varchar(40) NOT NULL,
PRIMARY KEY (`db_id`,`date`),
KEY `id` (`id`),
KEY `timestamp` (`timestamp`),
KEY `from_address` (`from_address`),
KEY `from_domain` (`from_domain`),
KEY `to_address` (`to_address`),
KEY `to_domain` (`to_domain`),
KEY `guid` (`guid`),
KEY `isspam` (`isspam`),
KEY `ishighspam` (`ishighspam`),
KEY `issaspam` (`issaspam`),
KEY `isrblspam` (`isrblspam`),
KEY `spamwhitelisted` (`spamwhitelisted`),
KEY `spamblacklisted` (`spamblacklisted`),
KEY `virusinfected` (`virusinfected`),
KEY `nameinfected` (`nameinfected`),
KEY `otherinfected` (`otherinfected`),
KEY `quarantined` (`quarantined`),
KEY `sizeinfected` (`sizeinfected`),
KEY `ismcp` (`ismcp`),
KEY `ishighmcp` (`ishighmcp`),
KEY `issamcp` (`issamcp`),
KEY `mcpwhitelisted` (`mcpwhitelisted`),
KEY `mcpblacklisted` (`mcpblacklisted`),
KEY `released` (`released`),
KEY `size` (`size`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH (( YEAR(`date`) +
MONTH(`date`) )) PARTITIONS 70;
-
Jerry Benton
www.mailborder.com
Based on Mailborder design and testing, which the DB structure of
Mailwatch is very similar, MyISAM has better performance when you start
hitting millions of records.
-
Jerry Benton
www.mailborder.com
Does converting the MailWatch databases to InnoDB make a big difference in
MailWatch performance?
Just curious.
Phil
*From:* mailscanner-bounces at lists.mailscanner.info [
mailto:mailscanner-bounces at lists.mailscanner.info
<mailscanner-bounces at lists.mailscanner.info>] *On Behalf Of *Glenn Steen
*Sent:* 05 August 2014 14:51
*To:* MailScanner discussion
*Subject:* Re: MailScanner Deficiency: Multi-Ruleset Processing per Email
Recipient
Can only agree with Martin and Alex, there is no way around either
splitting mails per recipient (very feasible), or som major rework of both
the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite
excessive... i suppose you arrived at that figure by analysing the number
of recipients per mail (and frequency of multi-recipient emails)? Well, the
number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache
SpamAssassin Results = yes", the actual processing time and resource use
will be minimized (not to mention that the normal batch-processing style of
MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular
MailScanner hosts should be rather simple, as well as adjusting which
Received: lines your MailScanner hosts should ignore (since they otherwise
will perceive all messages as originating from the "splitting MX" host)...
So why not try that, with the gear you have ATM, and see where that leads
you? Depending on what mailstore hosts you eventually deliver to, the
storage impact should be minimal or even non-existant, since even
M-Sexchange has abandioned "single store" since ... way back... so every
recipient would eventually have their own copy in their own mailbox
anyway;-).
As Alex says, we know nothing about your actual mail volume, but my money
is on there being much less of a problem than you think, even if you do
have ... serious traffic... (more than a few thousand mails/hour). the
likeliest problem point/bottleneck is likely your MailWatch database so...
keep an eye on that one, make sure you run it as InnoDB etc.
Cheers!
--
-- Glenn
Might want to also consider having a more flexible approach as Alex had mentioned.
Will also help with some of the hardware requirements as you can also
reject non-valid recipients at MTA as well as splitting the emails up, so
the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email
Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more
customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact
that we're provisioning more and more customers (and email domains) on our
hygiene platform, and that more than one of these customer
recipients/domains (and their applicable rulesets) are being addressed in
the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) MailScanner determines that the message is SPAM and because it has
5) However the rule for xyz.co.za is to store/quarantine spam. This does
not happen because of the actions above and data is also never logged via
MailWatch.
6) The example above is a based on very simple scenario, and as you are
aware this applies to many more complex rulesets (size, File Type etc)
across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a
single email message.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) When the message is processed, the MailWatch.pm script receives a
message object for SQL logging with data only for user at abc.co.za and
abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an
MTA level to address this problem, but our calculations show that it would
require 3.5x more hardware to process this increased mail load. So for us a
MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be
done from a MailScanner community point of view to help develop MailScanner
functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this
problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 0LE
"Any opinion expressed in this e-mail or any attached files are those of
the individual and not necessarily those of Hoople Ltd. You should be aware
that Hoople Ltd. monitors its email service. This e-mail and any attached
files are confidential and intended solely for the use of the addressee.
This communication may contain material protected by law from being passed
on. If you are not the intended recipient and have received this e-mail in
error, you are advised that any use, dissemination, forwarding, printing or
copying of this e-mail is strictly prohibited. If you have received this
e-mail in error please contact the sender immediately and destroy all
copies of it." --
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

Glenn Steen

2014-08-05 17:33:50 UTC

Permalink

Really? Perhaps time for a revision of my strategy then:-)
Anyway, it isva minor point... The main one being to keep track of the
performance of the one thing where splitting mails per recipients will
definitely have an impact... Then again, even that is not an issue if the
volume is low (say 10-20000/day), in which case the OP simply should go
ahead (unless the the MS host(s) are horrendously underpowered;-).

But good to know, thank you Jerry!

Cheers!
--
-- Glenn

Post by Jerry Benton
Based on Mailborder design and testing, which the DB structure of
Mailwatch is very similar, MyISAM has better performance when you start
hitting millions of records.
-
Jerry Benton
www.mailborder.com
Does converting the MailWatch databases to InnoDB make a big difference in
MailWatch performance?
Just curious.
Phil
*From:* mailscanner-bounces at lists.mailscanner.info [
mailto:mailscanner-bounces at lists.mailscanner.info
<mailscanner-bounces at lists.mailscanner.info>] *On Behalf Of *Glenn Steen
*Sent:* 05 August 2014 14:51
*To:* MailScanner discussion
*Subject:* Re: MailScanner Deficiency: Multi-Ruleset Processing per Email
Recipient
Can only agree with Martin and Alex, there is no way around either
splitting mails per recipient (very feasible), or som major rework of both
the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite
excessive... i suppose you arrived at that figure by analysing the number
of recipients per mail (and frequency of multi-recipient emails)? Well, the
number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache
SpamAssassin Results = yes", the actual processing time and resource use
will be minimized (not to mention that the normal batch-processing style of
MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular
MailScanner hosts should be rather simple, as well as adjusting which
Received: lines your MailScanner hosts should ignore (since they otherwise
will perceive all messages as originating from the "splitting MX" host)...
So why not try that, with the gear you have ATM, and see where that leads
you? Depending on what mailstore hosts you eventually deliver to, the
storage impact should be minimal or even non-existant, since even
M-Sexchange has abandioned "single store" since ... way back... so every
recipient would eventually have their own copy in their own mailbox
anyway;-).
As Alex says, we know nothing about your actual mail volume, but my money
is on there being much less of a problem than you think, even if you do
have ... serious traffic... (more than a few thousand mails/hour). the
likeliest problem point/bottleneck is likely your MailWatch database so...
keep an eye on that one, make sure you run it as InnoDB etc.
Cheers!
--
-- Glenn
Might want to also consider having a more flexible approach as Alex had mentioned.
Will also help with some of the hardware requirements as you can also
reject non-valid recipients at MTA as well as splitting the emails up, so
the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email
Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more
customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact
that we're provisioning more and more customers (and email domains) on our
hygiene platform, and that more than one of these customer
recipients/domains (and their applicable rulesets) are being addressed in
the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) MailScanner determines that the message is SPAM and because it has
5) However the rule for xyz.co.za is to store/quarantine spam. This does
not happen because of the actions above and data is also never logged via
MailWatch.
6) The example above is a based on very simple scenario, and as you are
aware this applies to many more complex rulesets (size, File Type etc)
across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a
single email message.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
4) When the message is processed, the MailWatch.pm script receives a
message object for SQL logging with data only for user at abc.co.za and
abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an
MTA level to address this problem, but our calculations show that it would
require 3.5x more hardware to process this increased mail load. So for us a
MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be
done from a MailScanner community point of view to help develop MailScanner
functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this
problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 0LE
"Any opinion expressed in this e-mail or any attached files are those of
the individual and not necessarily those of Hoople Ltd. You should be aware
that Hoople Ltd. monitors its email service. This e-mail and any attached
files are confidential and intended solely for the use of the addressee.
This communication may contain material protected by law from being passed
on. If you are not the intended recipient and have received this e-mail in
error, you are advised that any use, dissemination, forwarding, printing or
copying of this e-mail is strictly prohibited. If you have received this
e-mail in error please contact the sender immediately and destroy all
copies of it." --
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/1035d835/attachment-0001.html

Glenn Steen

2014-08-05 15:19:01 UTC

Permalink

IMO yes:-)
i suppose it all depends... For most, who keep their databases nice and
small and ... tidy... It likely doesn't matter that much. I did have some
figures from way back, but can't seem to find them (not that surprising,
we're talking years:-). But if you have a vast number of messages needing
to be logged every day... well, I've been told that it can make all the
difference in the world. then again, if you're in that kind of situatuion,
centralized database logging is probably not a good idea anyway;-).

Cheers
--
-- Glenn

Post by Randal, Phil
Does converting the MailWatch databases to InnoDB make a big difference
in MailWatch performance?
Just curious.
Phil
mailscanner-bounces at lists.mailscanner.info] *On Behalf Of *Glenn Steen
*Sent:* 05 August 2014 14:51
*To:* MailScanner discussion
*Subject:* Re: MailScanner Deficiency: Multi-Ruleset Processing per Email
Recipient
Can only agree with Martin and Alex, there is no way around either
splitting mails per recipient (very feasible), or som major rework of both
the MailScanner and mailWatch code (very infeasible).
But I also have to agree that the increase in hardware seem quite
excessive... i suppose you arrived at that figure by analysing the number
of recipients per mail (and frequency of multi-recipient emails)? Well, the
number isn?t everything:-)
Provided you use the normal caching-dns-thingy and also use "Cache
SpamAssassin Results = yes", the actual processing time and resource use
will be minimized (not to mention that the normal batch-processing style of
MailScanner will ... help...:-).
Introducing a "splitting MX" between the internet and your regular
MailScanner hosts should be rather simple, as well as adjusting which
Received: lines your MailScanner hosts should ignore (since they otherwise
will perceive all messages as originating from the "splitting MX" host)...
So why not try that, with the gear you have ATM, and see where that leads
you? Depending on what mailstore hosts you eventually deliver to, the
storage impact should be minimal or even non-existant, since even
M-Sexchange has abandioned "single store" since ... way back... so every
recipient would eventually have their own copy in their own mailbox
anyway;-).
As Alex says, we know nothing about your actual mail volume, but my money
is on there being much less of a problem than you think, even if you do
have ... serious traffic... (more than a few thousand mails/hour). the
likeliest problem point/bottleneck is likely your MailWatch database so...
keep an eye on that one, make sure you run it as InnoDB etc.
Cheers!
--
-- Glenn
Might want to also consider having a more flexible approach as Alex had mentioned.
Will also help with some of the hardware requirements as you can also
reject non-valid recipients at MTA as well as splitting the emails up, so
the core MailScanner farm has less to do.
--
Martin Hepworth, CISSP
Oxford, UK
Hi All,
We at SYNAQ use and have used Mailscanner for many years. As an Email
Hygiene provider MailScanner has served us very well.
However, as we have grown (very rapidly in the past 6 months, to many more
customer domains) we have noticed some deficiencies in MailScanner.
Overview
The issue has arisen due to SYNAQ's ever growing client base and the fact
that we're provisioning more and more customers (and email domains) on our
hygiene platform, and that more than one of these customer
recipients/domains (and their applicable rulesets) are being addressed in
the same email.
Problem 1
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za and abc.co.za as the Message's "to_address" and
"to_domain".
4) MailScanner determines that the message is SPAM and because it has
5) However the rule for xyz.co.za is to store/quarantine spam. This does
not happen because of the actions above and data is also never logged via
MailWatch.
6) The example above is a based on very simple scenario, and as you are
aware this applies to many more complex rulesets (size, File Type etc)
across the system.
Problem 2
1) abc.co.za and xyz.co.za are both provisioned on our platform.
2) A third party emails both user at abc.co.za and user at xyz.co.za in a
single email message.
3) Mailscanner accepts the message for processing but "chooses"
user at abc.co.za and abc.co.za as the Message's "to_address" and
"to_domain".
4) When the message is processed, the MailWatch.pm script receives a
message object for SQL logging with data only for user at abc.co.za and
abc.co.za; xyz.co.za is never logged.
Finally we have considered splitting incoming messages by recipient at an
MTA level to address this problem, but our calculations show that it would
require 3.5x more hardware to process this increased mail load. So for us a
MailsScanner solution is ideal.
Based on the above, could you tell me if there is anything that can be
done from a MailScanner community point of view to help develop MailScanner
functionality to address these issues?
We'd be very happy to give a nice donation for a fix or patch.
Also if the community has any ideas on other ways we can remedy this
problem we welcome your feedback.
Thanks and regards,
Sam Gelbart
SYNAQ
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 0LE
"Any opinion expressed in this e-mail or any attached files are those of
the individual and not necessarily those of Hoople Ltd. You should be aware
that Hoople Ltd. monitors its email service. This e-mail and any attached
files are confidential and intended solely for the use of the addressee.
This communication may contain material protected by law from being passed
on. If you are not the intended recipient and have received this e-mail in
error, you are advised that any use, dissemination, forwarding, printing or
copying of this e-mail is strictly prohibited. If you have received this
e-mail in error please contact the sender immediately and destroy all
copies of it."
--
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner
Before posting, read http://wiki.mailscanner.info/posting
Support MailScanner development - buy the book off the website!