Options for a new plugin summarising search results.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Options for a new plugin summarising search results.

BramVanOosterhout
Hi all,
I am considering options for speeding up a topic that is currently
implemented with many (100+) searches in a 50,000 topic web.

After some discussion I have realised that I only need 1 search and the rest
is formatting of the results. My search produces the results in a table of
the form:
| Topicname1   | Thu | 1 | Jan | 2015 |
| Topicname10  | Sat | 1 | Feb | 2014 |
| Topicname100 | Fri | 1 | Mar | 2013 |
...

My extension would produce a pivot of the data:
|      | Jan | Feb | Mar | ...
| 2015 |  1  |     |     | ...
| 2014 |     |  1  |     | ...
| 2013 |     |     |  1  | ...

I see three options:
1. write my own extension. Suggest:
COUNT{ text="%SEARCH{...}%" countby="column5,column4" format="..." ... }

2. extend FORMAT Suggest:
FORMAT{ text="%SEARCH{...}%" countby="column5,column4" type="count" ... }

3. extend FilterPlugin. Suggest:
include the COUNT of option 1 as part of the FilterPlugin.

FORMAT appears part of core.
My inclination is option 3.

Does anyone have an opinion on the relative merit of these options? Are there
others that may be better?

Thanks for your input.

--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Options for a new plugin summarising search results.

Chris Hoefler
I would go with your first option. You have the full flexibility of Perl by writing your own extension, and there is a Perl API for searching Foswiki topics. See, for example,
http://foswiki.org/System/PerlDoc?module=Foswiki::Func#query_40_36searchString_44_36topics_44_92_37options_41_45_62_iterator_40resultset_41

You pull your results into a Foswiki::Iterator object, and then format them in a while loop. You can just output TML directly, and it will be rendered as HTML by the Foswiki rendering engine.

You can also use DBCachePlugin. If you write your extension carefully, it should be able to handle both cases (Foswiki::Func::query and Foswiki::Plugins::DBCachePlugin::getDB) without too much trouble, and then you will have the flexibility of enabling/disabling DBCachePlugin depending on your needs.



On Thu, Jan 14, 2016 at 6:09 AM, Bram van Oosterhout <[hidden email]> wrote:
Hi all,
I am considering options for speeding up a topic that is currently
implemented with many (100+) searches in a 50,000 topic web.

After some discussion I have realised that I only need 1 search and the rest
is formatting of the results. My search produces the results in a table of
the form:
| Topicname1   | Thu | 1 | Jan | 2015 |
| Topicname10  | Sat | 1 | Feb | 2014 |
| Topicname100 | Fri | 1 | Mar | 2013 |
...

My extension would produce a pivot of the data:
|      | Jan | Feb | Mar | ...
| 2015 |  1  |     |     | ...
| 2014 |     |  1  |     | ...
| 2013 |     |     |  1  | ...

I see three options:
1. write my own extension. Suggest:
COUNT{ text="%SEARCH{...}%" countby="column5,column4" format="..." ... }

2. extend FORMAT Suggest:
FORMAT{ text="%SEARCH{...}%" countby="column5,column4" type="count" ... }

3. extend FilterPlugin. Suggest:
include the COUNT of option 1 as part of the FilterPlugin.

FORMAT appears part of core.
My inclination is option 3.

Does anyone have an opinion on the relative merit of these options? Are there
others that may be better?

Thanks for your input.

--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss





------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Options for a new plugin summarising search results.

Lynnwood Brown
In reply to this post by BramVanOosterhout
Bram - 
I sometimes use a strategy for parsing the results of a large search that may apply to your situation. What I do is run an initial search that gets all the raw data I need and then format it such that I can then slice & dice it using FilterPlugin. Here’s a summary of how I do it.

I have a top level INCLUDE that in turn contains a second INCLUDE, something like this:
%INCLUDE{“%WEB%.%TOPIC%” 
  section=“format_output”
  DATA=“%INCLUDE{“%WEB%.%TOPIC%” section=“search_data”}%”
}%

The important thing accomplished here is that since the second INCLUDE (to search_data) gets rendered first so that the variable DATA is passed to “format_output” as a simple string. In this way, the search is only executed once and I can further query that data set with modest server overhead.

The “search_data” section does the heavy lifting of running a SEARCH that gets all the data. It might look something like this:
%SEARCH{“form.name = ‘MyDataForm’”
  type=“query”
  nonoise=“on” 
  format=“$topic~$formfield(Year)~$formfield(Month)~$formfield(Day)”
  separator=“##”
}%
(Instead of using SEARCH here you could also use DBQUERY as suggested by other folks. I’ll have some more comments about DBCache below.)

The characters I use to separate fields (“~”) and records (“##”) is arbitrary. They just need to be characters or strings you would not find in the actual data.

The format_output section would then look something like this:
%FORMATLIST{“2015, 2014, 2013, etc…”
  split=“, ” 
  separator=“ $”
  format=“| $1 | $percntFORMATLIST{\”Jan, Feb, Mar, etc…\”
                     split=\”, \” 
                     format=\”$dollarpercntFORMATLIST{\\”%DATA%\\” 
                                 split=\\”##\\” 
                                 include=\\”.*?~$1~$dollar1~.*\\"
                                 format=\\”\\”
                                 hideempty=\\”on\\”
                                 null=\\”\\”
                                 footer=\\”$dollardollarcount\\”
                               }$dollarpercnt | \”
                 }$percnt”
}%

The key part of the above code is the inner-most FORMALIST which parses the data set contained in variable %DATA%. The “include” parameter is a regular expression essentially says to only include data which matches the particular year (provided by “$1” defined in the outermost FORMATLIST) and month (provide by $dollar1 defined by the second-level FORMATLIST). It then renders the number of records found, resulting in a table exactly like you describe.

I have used this strategy to search, slice, dice and render quite large data sets in the range you describe without bogging down the server. One additional note regarding DBCachePlugin (along with DBCacheContrib): you need to read carefully the documentation about the different store options as I have found some work better than others with large data sets such as you anticipate. 

I know this seems a bit complicated at first but if you study it carefully, I believe you will find it accomplishes exactly what you are requesting using existing plugins. Hope it helps.
Cheers,
Lynnwood Brown


                              
On Jan 14, 2016, at 7:09 AM, Bram van Oosterhout <[hidden email]> wrote:

Hi all,
I am considering options for speeding up a topic that is currently 
implemented with many (100+) searches in a 50,000 topic web.

After some discussion I have realised that I only need 1 search and the rest 
is formatting of the results. My search produces the results in a table of 
the form:
| Topicname1   | Thu | 1 | Jan | 2015 |
| Topicname10  | Sat | 1 | Feb | 2014 |
| Topicname100 | Fri | 1 | Mar | 2013 |
...

My extension would produce a pivot of the data:
|      | Jan | Feb | Mar | ...
| 2015 |  1  |     |     | ...
| 2014 |     |  1  |     | ...
| 2013 |     |     |  1  | ...

I see three options:
1. write my own extension. Suggest: 
COUNT{ text="%SEARCH{...}%" countby="column5,column4" format="..." ... }

2. extend FORMAT Suggest: 
FORMAT{ text="%SEARCH{...}%" countby="column5,column4" type="count" ... }

3. extend FilterPlugin. Suggest: 
include the COUNT of option 1 as part of the FilterPlugin.

FORMAT appears part of core. 
My inclination is option 3.

Does anyone have an opinion on the relative merit of these options? Are there 
others that may be better?

Thanks for your input.

--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Options for a new plugin summarising search results.

BramVanOosterhout
Hi Lynnwood and Chris (and Paul L. Merchant)

Chris, thanks for your suggestion to
a) consider my own plugin and
b) consider DBCachePlugin/Contrib. Paul and Lynnwood also suggested
DBCachePlugin.

Lynnwood, I was really pleased getting your example, using FORMATLIST to do
the processing. It looks so elegant and practical. I did implement it
yesterday.

The original implementation of the sample presented the results in 50 seconds.

The implementation with one SEARCH and the FORMATLISTs you suggested
presented the results in 30 seconds

On reflection I thought that doing a search for each year, followed by a
FORMATLIST for the months might be faster still, because the FORMATLIST will
have less "no match" records per year. (linear vs heap search) That theory
proved correct. The results were returned in 24 seconds.

Just stretching the example a little, I decided to repeat the experiments
with DBQUERY.

One DBQUERY with the FORMATLISTs suggested by you returned the results in 26
seconds (4 seconds better than the SEARCH implementation)

The implementation with one DBQUERY for each year, with a FORMATLIST for each
month returned the results in 15 seconds (9 seconds better than the SEARCH
implementation.

For completeness I also tested the original implementation with the SEARCHes
replaced by DBQUERYs. To my surprise and delight, the results were returned
in 13 seconds!!!

So in this case "brute" force wins over elegance! It may be because I do no
text processing, just counts. But the technique you suggested Lynnwood is
very nice.

Three cheers for DBCachePlugin!!! Kudos to the author.

Lynnwood, you wrote: "One additional note regarding DBCachePlugin (along with
DBCacheContrib): you need to read carefully the documentation about the
different store options as I have found some work better than others with
large data sets such as you anticipate."

I used the implementation that came with the installation I downloaded.
{DBCachePlugin}{MemoryCache} Foswiki 2.0.3. What others are there and where
do I find the documentation?
 
Thanks very much for your assistance. I will keep the option of writing my ow
plugin when the application becomes too slow.

--
 Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Options for a new plugin summarising search results.

BramVanOosterhout
In reply to this post by Lynnwood Brown
Hi Lynnwood and Chris (and Paul L. Merchant)

Chris, thanks for your suggestion to
a) consider my own plugin and
b) consider DBCachePlugin/Contrib. Paul and Lynnwood also suggested
DBCachePlugin.

Lynnwood, I was really pleased getting your example, using FORMATLIST to do
the processing. It looks so elegant and practical. I did implement it
yesterday.

The original implementation of the sample presented the results in 50 seconds.

The implementation with one SEARCH and the FORMATLISTs you suggested
presented the results in 30 seconds

On reflection I thought that doing a search for each year, followed by a
FORMATLIST for the months might be faster still, because the FORMATLIST will
have less "no match" records per year. (linear vs heap search) That theory
proved correct. The results were returned in 24 seconds.

Just stretching the example a little, I decided to repeat the experiments
with DBQUERY.

One DBQUERY with the FORMATLISTs suggested by you returned the results in 26
seconds (4 seconds better than the SEARCH implementation)

The implementation with one DBQUERY for each year, with a FORMATLIST for each
month returned the results in 15 seconds (9 seconds better than the SEARCH
implementation.

For completeness I also tested the original implementation with the SEARCHes
replaced by DBQUERYs. To my surprise and delight, the results were returned
in 13 seconds!!!

So in this case "brute" force wins over elegance! It may be because I do no
text processing, just counts. But the technique you suggested Lynnwood is
very nice.

Three cheers for DBCachePlugin!!! Kudos to the author.

Lynnwood, you wrote: "One additional note regarding DBCachePlugin (along with
DBCacheContrib): you need to read carefully the documentation about the
different store options as I have found some work better than others with
large data sets such as you anticipate."

I used the implementation that came with the installation I downloaded.
{DBCachePlugin}{MemoryCache} Foswiki 2.0.3. What others are there and where
do I find the documentation?
 
Thanks very much for your assistance. I will keep the option of writing my ow
plugin when the application becomes too slow.

--
 Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Options for a new plugin summarising search results.

Paul L. Merchant Jr.
In reply to this post by BramVanOosterhout
HI Bram, DBCacheContrib is the plugin that provides the actual database cache service for DBCachePlugin.   if you turn on export options and look under the DBCacheContrib tab you'll see several different options for {DBCacheContrib}{Archivist}.

— Paul

> On Jan 17, 2016, at 3:38 AM, Bram van Oosterhout <[hidden email]> wrote:
>
> Lynnwood, you wrote: "One additional note regarding DBCachePlugin (along with
> DBCacheContrib): you need to read carefully the documentation about the
> different store options as I have found some work better than others with
> large data sets such as you anticipate."
>
> I used the implementation that came with the installation I downloaded.
> {DBCachePlugin}{MemoryCache} Foswiki 2.0.3. What others are there and where
> do I find the documentation?

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Options for a new plugin summarising

BramVanOosterhout
Thanks Paul,
I did not spot the DBCacheContrib tab on the configure page. All contribution
tabs are at the bottom, rather than in alphabetical order. It turns out I am
using "Segmentable".

Thanks for the response.

On Tue, 19 Jan 2016 13:50:03 +0000, Paul L. Merchant Jr. wrote
> HI Bram, DBCacheContrib is the plugin that provides the actual
> database cache service for DBCachePlugin.   if you turn on export
> options and look under the DBCacheContrib tab you'll see several
> different options for {DBCacheContrib}{Archivist}.
>
> [UTF-8?]— Paul
>
> > On Jan 17, 2016, at 3:38 AM, Bram van Oosterhout <bram@van-
oosterhout.org> wrote:
> >
> > Lynnwood, you wrote: "One additional note regarding DBCachePlugin (along
with
> > DBCacheContrib): you need to read carefully the documentation about the
> > different store options as I have found some work better than others with
> > large data sets such as you anticipate."
> >
> > I used the implementation that came with the installation I downloaded.
> > {DBCachePlugin}{MemoryCache} Foswiki 2.0.3. What others are there and
where
> > do I find the documentation?
>
> ----------------------------------------------------------------------------
--
> Site24x7 APM Insight: Get Deep Visibility into Application
> Performance APM + Mobile APM + RUM: Monitor 3 App instances at just
$35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Foswiki-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/foswiki-discuss


--
Bram van Oosterhout


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Foswiki-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/foswiki-discuss