Quantcast

Mail Archive app. Some advice please.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Mail Archive app. Some advice please.

BramVanOosterhout
Dear Foswikiers,

I would like a considered opinion on how to proceed from where I am now.

I am creating an email archive. I have chosen to display search results in
the way this forum displays it's emails.

Whenever I search for something or someone, I present the search results as:
1. a table of 6 (so far) years by 12 months.
2. a table of 5 weeks by 7 days
3 a list of emails matching the search criteria on a day
4 for each result a count of "related" emails (usually a search for a
matching subject)

That is 6x12+7x5+1+n (108+n) searches to display a result. Where n is the
number of results returned.

So far Foswiki has handled this admirably. I use query search on topic fields
exclusively. Results are returned in 60 seconds, with 5000 topics to be
searched. (0.5 seconds/search) Congratulations to those of you that worked on
the search!

However, this approach will become impractical when I fully populate the
archive. 50-100k topics. So how to proceed?

I have tried to implement these tables with MULTISEARCH, but
a. I don't believe I understand how to do that well
b. My attempts ended up taking twice as long as the implementation I have.

I have turned cache on in config (Tuning), but that does not seem to make
much difference. Probably because every search is different :-)

I have read the documentation on Solr. That may help, but I would have to
rewrite the searches using the Solr query language.  Solr would build the
text index for the topics, but since I am using query SEARCH, I do not expect
a big improvement. Do you think the result would be much better?

I am considering writing a Plugin that would return the whole page in the
desired format. The data can be collected in a single pass over the topics.
But I would need to replicate all the smarts implemented in SEARCH.

My questions:
1. Are there other options to implement this interface?
2. What would you do to address the looming issue?

All comments welcome.

By the way. It has taken me a while to come to grips with Foswiki
Applications, but I am getting the hang of it and I believe it is a really
neat way to go!
 
Thanks in advance, and all the best for 2016.
--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mail Archive app. Some advice please.

Paul L. Merchant Jr.
Hi Bram, a while back someone on this list pointed me at the DBCachePlugin for some performance issues I was having.  It is built on top of DBCacheContrib and offers a database backend to searching and including which I've found works extremely well.  Take a look at the DBQUERY section on http://foswiki.org/Extensions.DBCachePlugin#DBQUERY.   I was very pleased with the results, and I think it will help you out immensely even on your smaller test dataset.  The query syntax is a little different from SEARCH, but I found it was pretty easy to switch.  (Hint: DBDUMP is very handy for looking at what's available in the database.)  I haven't tried it on anything the size you described, but I don't know any reason why it wouldn't work.

I haven't used it myself, but I wonder if DBRECURSE (described on that page) might also help to reduce the number of searches you have to perform.

Good luck!

Paul Merchant, Jr.

> On Jan 12, 2016, at 7:27 AM, Bram van Oosterhout <[hidden email]> wrote:
>
> Dear Foswikiers,
>
> I would like a considered opinion on how to proceed from where I am now.
>
> I am creating an email archive. I have chosen to display search results in
> the way this forum displays it's emails.
>
> Whenever I search for something or someone, I present the search results as:
> 1. a table of 6 (so far) years by 12 months.
> 2. a table of 5 weeks by 7 days
> 3 a list of emails matching the search criteria on a day
> 4 for each result a count of "related" emails (usually a search for a
> matching subject)
>
> That is 6x12+7x5+1+n (108+n) searches to display a result. Where n is the
> number of results returned.
>
> So far Foswiki has handled this admirably. I use query search on topic fields
> exclusively. Results are returned in 60 seconds, with 5000 topics to be
> searched. (0.5 seconds/search) Congratulations to those of you that worked on
> the search!
>
> However, this approach will become impractical when I fully populate the
> archive. 50-100k topics. So how to proceed?
>
> I have tried to implement these tables with MULTISEARCH, but
> a. I don't believe I understand how to do that well
> b. My attempts ended up taking twice as long as the implementation I have.
>
> I have turned cache on in config (Tuning), but that does not seem to make
> much difference. Probably because every search is different :-)
>
> I have read the documentation on Solr. That may help, but I would have to
> rewrite the searches using the Solr query language.  Solr would build the
> text index for the topics, but since I am using query SEARCH, I do not expect
> a big improvement. Do you think the result would be much better?
>
> I am considering writing a Plugin that would return the whole page in the
> desired format. The data can be collected in a single pass over the topics.
> But I would need to replicate all the smarts implemented in SEARCH.
>
> My questions:
> 1. Are there other options to implement this interface?
> 2. What would you do to address the looming issue?
>
> All comments welcome.
>
> By the way. It has taken me a while to come to grips with Foswiki
> Applications, but I am getting the hang of it and I believe it is a really
> neat way to go!
>
> Thanks in advance, and all the best for 2016.
> --
> Bram van Oosterhout
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Foswiki-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/foswiki-discuss


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mail Archive app. Some advice please.

Chris Hoefler
In reply to this post by BramVanOosterhout
DBCacheContrib/DBCachePlugin might be a good fit for your project. The query language is similar enough to query SEARCH that it would be easy to implement. That said, Solr is lightning fast so it might be worth the investment of time to transition to that now, especially if you might need to implement full text searching at some point in the future. For performance purposes, I highly recommend asynchronous rendering of your results. There are several ways to approach it. JQGridPlugin is a drop-in solution, but your formatting is restricted to a grid format. RenderPlugin is a flexible solution. You can wrap your formatted search, or you can used it to expand a custom TML template with your SEARCH results. For the highest flexibility you can use jsRender templates, but you will need to either use Solr, or write a plugin that implements a jsonrpc method (using JsonRpcContrib) for retrieving search results.

On Tue, Jan 12, 2016 at 6:27 AM, Bram van Oosterhout <[hidden email]> wrote:
Dear Foswikiers,

I would like a considered opinion on how to proceed from where I am now.

I am creating an email archive. I have chosen to display search results in
the way this forum displays it's emails.

Whenever I search for something or someone, I present the search results as:
1. a table of 6 (so far) years by 12 months.
2. a table of 5 weeks by 7 days
3 a list of emails matching the search criteria on a day
4 for each result a count of "related" emails (usually a search for a
matching subject)

That is 6x12+7x5+1+n (108+n) searches to display a result. Where n is the
number of results returned.

So far Foswiki has handled this admirably. I use query search on topic fields
exclusively. Results are returned in 60 seconds, with 5000 topics to be
searched. (0.5 seconds/search) Congratulations to those of you that worked on
the search!

However, this approach will become impractical when I fully populate the
archive. 50-100k topics. So how to proceed?

I have tried to implement these tables with MULTISEARCH, but
a. I don't believe I understand how to do that well
b. My attempts ended up taking twice as long as the implementation I have.

I have turned cache on in config (Tuning), but that does not seem to make
much difference. Probably because every search is different :-)

I have read the documentation on Solr. That may help, but I would have to
rewrite the searches using the Solr query language.  Solr would build the
text index for the topics, but since I am using query SEARCH, I do not expect
a big improvement. Do you think the result would be much better?

I am considering writing a Plugin that would return the whole page in the
desired format. The data can be collected in a single pass over the topics.
But I would need to replicate all the smarts implemented in SEARCH.

My questions:
1. Are there other options to implement this interface?
2. What would you do to address the looming issue?

All comments welcome.

By the way. It has taken me a while to come to grips with Foswiki
Applications, but I am getting the hang of it and I believe it is a really
neat way to go!

Thanks in advance, and all the best for 2016.
--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss



--
Chris Hoefler, PhD
Postdoctoral Research Associate
Straight Lab
Texas A&M University
2128 TAMU
College Station, TX 77843-2128

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mail Archive app. Some advice please.

Randy Kramer
In reply to this post by BramVanOosterhout
I will offer a poorly considered opinion :-) (and explain a little about my use
case).

I store TWiki (i.e., Foswiki) pages (and similar things, including "real
emails"--keep reading), thousands of them, in what I call my askRhk offline free
format text database.  It is, essentially an mbox file as the storage medium,
and an mbox mail header (The From line plus at a minimum a Date: and Subject:
header line) serves to separate the individual pages (whether they are TWiki
pages, similar things that aren't quite TWiki pages, and emails).

As a full text search engine I use recoll (well, I used, and plan to use
again, but currently I'm focusing on other things and don't have it
implemented).

Recoll is a full text search engine made up of multiple parts (eg., pieces
that constitute recoll itself, and somewhat integrated with it, nail, which
provides for display of the found records.

Recoll can also search smaller files in plain text format (i.e., without the
mbox header).

Maybe recoll would be useful for you.  (The author / maintainer of recoll has
been helpful to me in various ways in preserving a "feature" that is not
strictly mbox compliant that is very useful to me--that is the ability to have
a quoted phrase instead of just a single word as the first parameter on the
>From line of the mbox header.

I chose to require the From line as well as at least a Date: and Subject:
header so that I can also read (and in some cases edit) individual records
(i.e., what would be considered individual emails in the mbox file) using some
standard mail clients, notably kmail and nail.  (I've tried some other email
clients, some work, some don't, and I don't recall atm which did and which did
not.)

Randy Kramer


On Tuesday, January 12, 2016 07:27:34 AM you wrote:

> Dear Foswikiers,
>
> I would like a considered opinion on how to proceed from where I am now.
>
> I am creating an email archive. I have chosen to display search results in
> the way this forum displays it's emails.
>
> Whenever I search for something or someone, I present the search results
> as: 1. a table of 6 (so far) years by 12 months.
> 2. a table of 5 weeks by 7 days
> 3 a list of emails matching the search criteria on a day
> 4 for each result a count of "related" emails (usually a search for a
> matching subject)
>
> That is 6x12+7x5+1+n (108+n) searches to display a result. Where n is the
> number of results returned.
>
> So far Foswiki has handled this admirably. I use query search on topic
> fields exclusively. Results are returned in 60 seconds, with 5000 topics
> to be searched. (0.5 seconds/search) Congratulations to those of you that
> worked on the search!
>
> However, this approach will become impractical when I fully populate the
> archive. 50-100k topics. So how to proceed?
>
> I have tried to implement these tables with MULTISEARCH, but
> a. I don't believe I understand how to do that well
> b. My attempts ended up taking twice as long as the implementation I have.
>
> I have turned cache on in config (Tuning), but that does not seem to make
> much difference. Probably because every search is different :-)
>
> I have read the documentation on Solr. That may help, but I would have to
> rewrite the searches using the Solr query language.  Solr would build the
> text index for the topics, but since I am using query SEARCH, I do not
> expect a big improvement. Do you think the result would be much better?
>
> I am considering writing a Plugin that would return the whole page in the
> desired format. The data can be collected in a single pass over the topics.
> But I would need to replicate all the smarts implemented in SEARCH.
>
> My questions:
> 1. Are there other options to implement this interface?
> 2. What would you do to address the looming issue?
>
> All comments welcome.
>
> By the way. It has taken me a while to come to grips with Foswiki
> Applications, but I am getting the hang of it and I believe it is a really
> neat way to go!
>
> Thanks in advance, and all the best for 2016.
> --
> Bram van Oosterhout
>
>
> ---------------------------------------------------------------------------
> --- Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Foswiki-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/foswiki-discuss

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mail Archive app. Some advice please.

BramVanOosterhout
Hi All,
Thanks for the prompt responses.

Chris and Paul, Thanks for the suggestion to consider DBQuery. I will
investigate that option.

Chris and Randy, I am definitely not after a free text search of the body of
the email. Searches on the fields extracted are more than adequate at this
point. So I will discount Solr for the time being.

Randy, Your application sounds interesting, My archive is an attempt to get a
single archive, from a multitude of different file formats. MS Outlook being
one. I also am particularly interested in attachments to the emails. I could
consider to use the mbox format as the reference file, but I find Foswiki
presentation superior over the email front ends.

Chris, I appreciate your suggestion of asynchronous rendering. I had not
considered that. In part because I do not really understand how that works. I
believe that it is especially useful if there is a lot of content to render
and one can load the page in parts.  Am I correct in that?
My immediate problem is that I do a lot of searching to create a compact
summary. It's about 10 lines.

All,
Thanks for your responses and also for letting me ask the question. After
asking the question I have had several conversations with myself and come to
the conclusion that I can reframe my problem from a search problem to a
formatting problem. I will post a separate question on the formatting problem
relating to the options that this reframing offers. Thanks for listening.




--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mail Archive app. Some advice please.

Chris Hoefler
Chris, I appreciate your suggestion of asynchronous rendering. I had not
considered that. In part because I do not really understand how that works. I
believe that it is especially useful if there is a lot of content to render
and one can load the page in parts.  Am I correct in that?
My immediate problem is that I do a lot of searching to create a compact
summary. It's about 10 lines.

That is correct. However, even with relatively little content, it has the advantage of allowing time-consuming processes to complete in the background while the rest of the page loads. For example, I have a macro that custom formats some field data for my topics. When I do a formatted SEARCH of 1000+ topics using that macro, it takes a while to complete and blocks page loading (and sometimes can timeout). By rewriting the macro as a REST handler, I can call the formatting code asynchronously, which allows each row of the search results to be rendered independently.


On Thu, Jan 14, 2016 at 5:57 AM, Bram van Oosterhout <[hidden email]> wrote:
Hi All,
Thanks for the prompt responses.

Chris and Paul, Thanks for the suggestion to consider DBQuery. I will
investigate that option.

Chris and Randy, I am definitely not after a free text search of the body of
the email. Searches on the fields extracted are more than adequate at this
point. So I will discount Solr for the time being.

Randy, Your application sounds interesting, My archive is an attempt to get a
single archive, from a multitude of different file formats. MS Outlook being
one. I also am particularly interested in attachments to the emails. I could
consider to use the mbox format as the reference file, but I find Foswiki
presentation superior over the email front ends.

Chris, I appreciate your suggestion of asynchronous rendering. I had not
considered that. In part because I do not really understand how that works. I
believe that it is especially useful if there is a lot of content to render
and one can load the page in parts.  Am I correct in that?
My immediate problem is that I do a lot of searching to create a compact
summary. It's about 10 lines.

All,
Thanks for your responses and also for letting me ask the question. After
asking the question I have had several conversations with myself and come to
the conclusion that I can reframe my problem from a search problem to a
formatting problem. I will post a separate question on the formatting problem
relating to the options that this reframing offers. Thanks for listening.




--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss




------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Loading...