NOTE: The first step in setting up ArcTitan should be enabling Journaling on your mail server. Once enabled, all mail received will be sent to the ArcTitan server. The process detailed in this article is used to import mail that was stored in your mail server before Journaling was enabled.
Once Journaling is enabled, the next step is to import all of your old mail in to the archive, and this can be done a number of ways, the easiest of which is to use the Mailbox Reader. The Mailbox Reader can connect to your mail server and download mail directly from your mail server. NOTE: The Mailbox Reader feature is not enabled by default, please contact ArcTitan support (firstname.lastname@example.org) to have it enabled on your account. Once enabled it will appear in the menus for your account:
The Mailbox Reader can:
- Collect from IMAP or POP3 or EWS (Exchange Web Service) mailbox sources.
- Use secure connections (TLS or SSL or HTTPS).
- Backfill: Collect up to a specified date in order to backfill a ArcTitan with data up to the date/time that the Email Server started Journaling.
- Infill: Collect between a date range in order to fill in any gaps caused by some issue.
- Live Collect: By using the Polling mode, it will continue to collect all recent mails. Use this if your Mail Server does not support a Journaling facility. Most useful for Hotmail type accounts.
The protocol you wish to use for accessing and reading from the mailboxes will depend on the mail server. We suggest the following choices:
- For Exchange 2007 onwards, use EWS (Exchange Web Services). This is a powerful facility and is becoming more efficient and effective with later Exchange releases.
- For most other mail sources, use IMAP (Exchange 2003 / Gmail / Hotmail / etc.)
- Only use POP3 as a last resort
Then you will need to discover the server from which to access the mailboxes.
For Exchange, the CAS server is usually preferred – as this offers the IMAP (if enabled) and EWS web services. For EWS you MUST enter the correct server host name – it must match the services’ certificate and standard URL. Please note: If this is not correct, EWS will not authorise the connection and errors.
For IMAP/POP3, you will generally use the service names that are well documented by the various mail vendors.
There are two main steps to configuring the Mailbox Reader:
- Configure the connection to your mail server
- Import the list of user accounts
Configure the connection to your mail server
Mail can be downloaded via POP3, IMAP or EWS. To begin click on Mailbox Reader > Connection Settings and click "Create Connection".
POP3 & IMAP
In order to be able to download mail via POP3 or IMAP the ArcTitan server will need to be populated with a list of valid users and passwords. These can be added manually or imported from an LDAP server. The LDAP server configuration is added in Basic Configuration > LDAP Servers.
Note: Using IMAP import from an Exchange server will also allow you to use an impersonation account as described below.
EWS & IMAP with Exchange
Importing via EWS or IMAP from an Exchange server allows you to use an account that is capable of impersonation. This will allow ArcTitan to import data without having passwords for all of the mailboxes. If you select EWS or IMAP as the import method and select "Use impersonation" a link will appear, that when clicked will open a pop-up window where you can download Powershell scripts which can be run on your Exchange server to allow you to:
- List impersonation users
- Assign impersonation
- Remove impersonation
If using impersonation the "User Id" should be a user principal name (email address format). For example, if your user name is John and your Active Directory domain name is company.local, your user principal name is email@example.com.
You only need to import the mails that are not currently in the archive, so ensure you set the Selection range. There are a few options you can use:
- Mails between (two dates) - Recommended option
- All mails up-to - Recommended option
- Mails From - Not recommended as the import will never end. If this option is selected it will be disabled by the ArcTitan admins
- All Mails - Not recommeneded as the import will never end. If this option is selected it will be disabled by the ArcTitan admins
The recommended methods are "Mails between" or "All mails up-to". For both of these methods the end date should be set to the time and date that you enabled Journaling. You can over lap by a few hours to be safe, ArcTitan will de-duplicate any message that already exists in the archive.
Connection Settings for on premise Exchange
We recommend EWS with Impersonation for Exchange. Connect to your CAS server and not direct to any single mailbox server, even if that one server holds the accounts to extract from.
Server: <fully qualified DNS name for the CAS server>
Domain: <your network domain may be provided or left blank>
Connection Type: HTTPS
Autodiscovery Mode: <use this option if manual server/domain settings fail>
Include Folders: *
Exclude Folders: drafts,calendar,contacts,outbox,tasks,suggested contacts
Connection settings for Office365
You must use EWS with Impersonation for Office365. With Office365, the user’s login Username is normally the same as their primary Email Address.
Connection Type: HTTPS
Autodiscovery Mode: <only use this option if you have configured autodiscover for your domain >
Include Folders: *
Exclude Folders: drafts,calendar,contacts,outbox,tasks,suggested contacts
Impersonation: Office365 offers a limited web-based Power Script feature. It is possible to enable Impersonation.
Hybrid Deployments:– A mixture of On-Premise and Office365. This should not affect the Mailbox Reader requirements.
Connection Settings for GMAIL
The connection settings are published by Google. Please note the Include Folders setting – this is recommended as some [gmail]/subfolders simply contain subsets of the inbox that have been
filtered in some way.
Connection Type: SSL
Include Folders: inbox,[gmail]/sent mail
Exclude Folders: <blank>
Connection settings for Hotmail / Live mail
The connection settings are published by Microsoft.
Connection Type: SSL
Include Folders: *
Exclude Folders: <blank>
Mailbox Reader Connection settings explained
Server: Enter the server’s URL host name that would correspond to the Exchange server certificate as you would use when using OWA. In this example, we would access our own mailboxes in
OWA with this URL https://mail.domain.com/owa. So use the hostname from that URL.
Domain: Dependent on network requirements, this may or may not be needed.
Port: For EWS this will always be the standard https port, which is 443.
Idle Alert Period: This will cause an alert emails to be sent by ArcTitan to the alert recipients if no mail is collected by this connection over the specified period.
Use Autodiscovery Mode: This will initialise the connection details via the email addresses of the user mailboxes that are to be collected from. In essence, Autodiscovery is another web
based service (which will also require a valid web certificate) that returns all of the server / domain / url and other details for a given email address.
Connectivity Type: for EWS this will always be https port, which is 443.
Include Folders: This is the set of Outlook Folders that you require to download email from. Generally, this will be from ALL folders – so the * wildcard can be used. Otherwise a comma
separated set of folder names can be provided. For Sub Folders, you will need to enter the full path – each part separated by a forward slash. For example: inbox/archive mail/*,sent mail
PLEASE NOTE that the * will mean that non-email folders will be accessed Exclude Folders: If you wish to exclude specific folders that would otherwise be Included, then enter
a comma separated set of folders here.
Concurrent Account Download Limit: The number of mailboxes that will be queried in parallel.
Ignore Non Email Items: This tells the Collector to only attempt to download items that have recognised flags indicating the content is a standard email. Some imported emails or post-
processed emails (like stubbed items) will have a different ‘item class’ flag – and you can ensure these are not collected via this option. If the collector is ‘skipping’ items that you believe should be collected, then try un-ticking this option and re-running the collection.
Use impersonation: This is a technique that allows a special user login account to have read/write access to all mailboxes in the Exchange. Without Impersonation you would need to provide
the password for every mailbox that you wish to collect mail from. Impersonation is needed when you wish to collect mail from more than one mailbox. See section Impersonation &
Throttling below for more information.
Run Mode: This will say “Polling” if there is no END DATE for mail collection. Without an end date, the system will need to repeatedly scan mailboxes – a technique used to archive mail from
systems that do not have a Journaling feature (like Hotmail / Gmail / Live mail / other IMAP or POP3 sources). If an end-date is specified, then the Run Mode will say “Date Limited”.
The summary information that is displayed during mailbox collection will be different between Polling mode vs Date Limited mode.
Selection Range / Start / End date & Time: This sets the required period over which mail is to be collected. For most new archive setups for Exchange or Lotus Notes, we recommend that you use the
“All mail up-to” option, and set the end date/time to the time when Journaling was enabled. Check Every: (seconds) This option shows only for “Polling” connections. After each complete pass over every user account, the system will pause for the duration specified in the Check Every. This allows you to scan mailboxes in hourly or daily intervals, if desired.
Queue Messages For Import Node: This tells the system to queue the imported mails into the “Import Node” feature of ArcTitan. This allows mails to be queued but not necessarily
processed straight away.
Download Chunk Size: This tells the underlying system how many emails to transfer over the connection in each query request. Having a larger number will increase performance at the cost of greater memory and network usage. It is unlikely that you will need to alter this except under the advice of the Support team – following some performance/memory or network issues.
Download MIME in Chunk: Size Limit: MB (<=0: No limit). This specifies if the email content is to be transferred along with the ‘chunk’ of email headers. By default the list of mail headers
will be transferred along with the email contents – but only if the content is less that the provided size limit. If an email is larger, then it will be transferred using a byte steam instead.
Mailbox Reader De-duplication: These choices help to identify duplicate emails prior to downloading from user mailboxes. De-Duplication is based upon the “MESSAGE-ID” value.
Regardless of these settings. De-Duplication may still be performed by ArcTitan as the mails are being processed into the archive repositories. Please check the Advanced Company Settings to see if de-
duplication is applied [to ‘basic’ rfc822 mail].
No Deduplication: – all mail will be downloaded. Repeated downloading will obtain the same emails again. Mails that appear in several user mailboxes will be downloaded regardless.
Mailbox Reader Downloaded Messages – only mails that have not previously been downloaded will be chosen. ArcTitan will create a private database of message-id’s to support this option.
WARNING: With very large data sets [i.e., over 10 million emails] , the database can become significantly large – which can affect the local disk usage and the systems internal nightly backup (where the databases are copied to the local disk, then transferred to the Mirror server, if used)
Downloaded messages AND ArcTitan repository – this will check for duplicates in the downloaded message-id database [see the previous description & warning] AND in the ArcTitan repositories as well. Use this option only if there is significant overlap between the Collection source and the mail already in the Archive. For example, during an “In-filling” process where only some mails were missing from the archive for some reason.
Process this Import data as normal spool mail? Yes/No: Mail that is collected by the reader should be marked as ‘Imported’. This allows for two main aspects to be used:
1. The mails, when viewed in ArcTitan, will show that it was Imported (and thus its authenticity cannot be guaranteed). And
2. That the mail is placed into a separate data storage node from the ‘Live’ mail. This allows for the imported mail to be removed on-mass if there was any problem.
By de-selecting this option the mails will NOT be marked as ‘Imported’ in the archive and will be processed into the same data files as ‘live’ mail – making it much harder to bulk
remove only the Imported data set.
IMPORTANT NOTE: If a de-duplication option is used (and this is both the default and is recommended), then a local database of message-ids will be created. Once collection has been fully
completed, then the Mailbox Reader connection should be DELETED – and in doing so, the message-id database will be removed, releasing disk space and speeding up the internal system backups.
Import the list of user accounts
After a connection is created, you will then need to specify which user accounts to collect mail from. You have two methods of adding user accounts for mail collection
1. Create Users manually, by entering their account details
2. Add Users From LDAP directory searches
If you have more than one mailbox reader connection, then remember to select the required connection first!
After you have created or added accounts then start the download process.
Creating a User Account entry
Click the “Create User” button. You will see the “User Details” section at the top of the page becomes editable.
Fill in the account’s Username (used for the account login or access connection), primary email address and password.
For Office 365, the username is the same as their primary email address. If Impersonation is available, then the password can be left blank.
Adding users from LDAP
If ArcTitan has access to LDAP, then you could search and select accounts from this resource. Please note that Exchange 2013 adds a number of “health mailboxes”.
If your LDAP server has one or more “Search DNs” associated with it then you must select the required DNS to search under. Only users under the selected OU groups may be searched and listed.
You may also apply a “Search Filter”. This allows you to refine the LDAP search query with additional restrictions. By default, ArcTitan provides a simple filter that only returns user accounts (not distribution or security groups).
To search ALL accounts, simply leave the Search For box empty and click the Search button.
Please note: enter a part of a users email address or account username, followed by a * (a wildcard), then press enter.
You will see the LDAP search terms briefly displayed on screen while the results are being collected.
This will show all the accounts.
Tick the required accounts, or tick the topmost box to select ALL entries, scroll right down to the
bottom, and then press the Add Users button.
The selected users will now show in the main Mailbox Reader – User Configuration panel.
If you have Active Directory sub-domains and users in those sub-domains, you may need to use a domain controller holding the Global Catalogue role in order to be able to find all users. The Global Catalogue can be queried using LDAP on TCP port 3268 or LDAPS on TCP port 3269.
Testing & Starting Collection Downloading
The grid of configured users will be paged, showing the list in blocks of 10 / 20 / 50 or 100 accounts at a time.
By clicking the “Test” link against any single user entry, you can validate the Mailbox Reader connection as well as validating login to this user’s mailbox. If there are issues at this “Test” phase, then it will be displayed in a message – and the issues resolved before attempting to start the Download process.
NOTE: If there are connection issues, the test may take up to 1 minute to return/timeout.
If the user account passes the test, then you can select the “Start Download” button. The system will now select a number of accounts to access in parallel. You will see this in the “Current State”
Mailbox Reader Option Buttons
The User Panel Buttons:
Create User – Manually add a user for Email Collection, where they cannot be selected from an LDAP source.
Add Users – Select one or more user mailbox accounts from an LDAP directory for which mail is to be collected.
Connection Settings – Switch the current view back to the Mailbox Reader Connection panel. It should switch so that the corresponding connection is selected (assuming that you have multiple collector connections).
Edit User / Delete User / Test Connection / Cancel – These options become visible only when a Mailbox user account is selected from the accounts Grid.
Start Download / Stop Download – Although the Mailbox Reader runs as an independent service to ArcTitan, each collector connection can be stopped and started independently. Once a
connection is stopped, other actions can be performed – such as adding / updating and removing User Mailbox accounts from which mail is to be collected.
Start Error Mails Retry – If the main collector has completed or been stopped, but some emails were skipped due to errors, then you can re-start the collector to just re-attempt to fetch these problem emails. It is highly likely that the error mails could only be downloaded successfully if the cause of the error is removed – and in some cases may require an Update to ArcTitan to address the underlying issue. Please only use this option if you know that the Exchange has had problems during the Mailbox Reader run – or after a ArcTitan Upgrade which has specifically included a fix for Mailbox Reader error cases.
Reset Download – Once the Reader is first run against each Mailbox, and after each sweep over a mailbox during “Polling” mode, the system will record a “Read up to Date/Time” stamp. Thus the system will only ever read forwards from the last pass. If you wish to collect mail from an earlier date, or that there were collection problems and you simply wish to ensure a complete sweep across all data is performed, then press this “Reset Download” button and all accounts will start collection from the beginning again.
Mailbox Reader – Grid of User Accounts
After Account entries have been Created or Added [via LDAP selection], they will appear in the grid in the lower section of the User Configuration panel. The grid is now “Paged” – meaning that only a fixed number of accounts will be displayed at a time. There are many things that can be performed to the grid as well as actions that can be applied to each account in the grid:
Refresh – this will refresh the data displayed anywhere in the visible Grid area. Repeated clicking on this link will help to view the progress – the download counts and Details display
areas will be updated.
Actions [Page Size]: Change the number of accounts to display per page (20 or 50 .etc). [Page Number] Just enter a number of a page to quickly goto that page.
Search Filters: You can search by Mailbox or Email Address. As you type the grid will immediately locate the matching accounts.
NOTE: This will search entries in all pages of data. For long lists there may be some delay between keystrokes.
NOTE 2: It uses a wildcard search – the text you type can appear ANYWHERE in the Mailbox name or email address.
Current State: Filter the grid to only display the accounts with the matching State
(Completed / Running / Stopped etc).
Within the User Accounts grid the following “Actions” links are available:
Mailbox Name link - To edit an existing Mailbox Account (to reset the registered user Password, or to ‘disable’ the account to prevent further collection), click on the username link in the
accounts grid. The main buttons on the left will now
Test – Check the account username & password (if Impersonation is not used) is valid by performing a login to that account.
Probe – The probe action allows you to view the Folders within the user’s account. You can monitor the actual collection as it happens from each folder via this view