Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    Bob Balaban  August 16 2008 04:22:09 AM
    Greetings, Geeks!!

    I was browsing my feed reader this morning, and came across a couple of interesting posts on the TeamStudio site (see the posts titled "Show Time" and "Time Differences").

    Sigh. More discussion about NotesDocumentCollection.GetNth..... This topic has been a pimple on the butt of Notes developers for, let's see, EVER (ok, well, actually since Notes 4.0 shipped in January, 1996, so really only for 12 years....). I was moved to blog about it, one more (and I sincerely hope, LAST) time.

    I've written about it, talked about it at conferences and among friends, and it STILL comes up every now and then. So, ok. ONE MORE TIME!

    IF (you have a DocumentCollection object -- doesn't matter whether it's LotusScript, Java, COM, whatever language binding),
    AND (your DocumentCollection object is an UNsorted list -- i.e., the result of a Database.AllDocuments, or Database.Search operation, NOT an FTSearch)
    THEN iterating that collection using the DocumentCollection.GetFirst/GetNext pattern is MUCH (i.e., A LOT) faster than iterating using DocumentCollection.GetNth(index).

    CAVEATS:
     - For small collections the difference doesn't matter
     - I'm talking about navigation time, regardless of what happens to the documents themselves inside your iteration loop.
     - I'm ONLY talking about DocumentCollection navigation here, VIEW navigation is completely different (GetNth in a View is just as fast as GetFirst/GetNext)
     - Your results may well be version-dependent! (more on this later...)

    How do I know? How do i know for sure? I wrote an experiment, which I will now share with you, gentle geek readers. Here's what I did:
     - Using Notes/Designer v7.02,
     - Created a blank database
     - Wrote a LS agent to populate the database with 50,000 documents (see code, below)
     - Wrote another LS agent to iterate all the documents in the database (see code, below), first using GetFirst/Next, then using GetNth
     - The agent times each full iteration, and reports the elapsed time of each.

    Results? The envelope, please....

     - GetFirst/Next: 1 second

     - GetNth: 446 seconds


    You can't tell me that's not a significant difference. I wondered whether someone at Lotus had managed to fix this forever problem recently, so I asked my colleague Steve Mullen to run the test on Notes 8. Steve reported results almost identical to the ones I receved ("painfully slow"). Twelve years and counting, DocumentCollection.GetNth is still.....painful.

    So, What's Going on Here?

    Why so slow with GetNth? It's because unsorted DocumentCollections, internally, represent a data structure called an "IDTable", a compressed list of Note IDs (see the Lotus CAPI toolkit for documentation, this stuff has been around a long time). There's no way to find the Nth entry in the table other than by starting at the beginning and counting to N. So, to iterate the entire list of X entries will take (x*(x-1))/2 CAPI calls (which is Order(n-squared). In other words, for 50,000 documents, you're going to call the IDTable interface something like 1,249,975,000 times. As Kurt Vonnegut might have said, "That's a lot of times!" The shocker (to me) is that it ONLY takes 7 or 8 minutes (on my laptop).

    So, there you go. I think I'll write one more blog entry on this topic, as I recently came across a situation where I had no choice -- I HAD to use GetNth to iterate a document collection (first time in 12 years!). Stay tuned!

    Here's the code for my "PopulateDb" agent:
    Sub Initialize
        Dim s As New notessession
        Dim db As NotesDatabase
        Dim i As Long
        Dim doc As NotesDocument
        Set db = s.CurrentDatabase
       
        For i = 0 To 49999
                Set doc = db.CreateDocument()
                doc.somefield = i
                doc.Save True, False
        Next
    End Sub

    And here's the code for my test agent:
    Sub Initialize
        Dim s As New notessession
        Dim db As notesdatabase
        Dim dc As NotesDocumentCollection
        Dim doc As notesdocument
        Dim dtStart As NotesDateTime
        Dim dtEnd As NotesDateTime
        Dim elapsed As Long
        Dim i As Long
        Set db = s.CurrentDatabase
        Set dc = db.AllDocuments
       
        Set dtEnd = s.CreateDateTime("")
        Set dtStart = s.CreateDateTime("")

        dtStart.SetNow
        Set doc = dc.GetFirstDocument
        While Not (doc Is Nothing)
             Set doc = dc.GetNextDocument(doc)
        Wend
        dtEnd.SetNow
       
        elapsed = dtEnd.TimeDifference(dtStart)
        Msgbox "GetFirst/Next elapsed time = " & elapsed
       
        dtStart.SetNow
        For i = 0 To dc.Count
             Set doc = dc.GetNthDocument(i)
        Next
        dtEnd.SetNow
       
        elapsed = dtEnd.TimeDifference(dtStart)
        Msgbox "GetNth elapsed time = " & elapsed
       
    End Sub

    Try it yourself!

    (Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2009 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.
    Comments

    1Vitor Pereira  8/16/2008 9:41:23 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    People seem to take this out of context every time thus making wrong assumptions.

    I'm sure this will not be the last time you blog about it. :-)

    2Nathan T. Freeman  8/16/2008 10:05:18 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    I think you're reading Kingsley's original article wrong, Bob. My impression was that he understood GetNth to be massively slower, and was surprised to discover that it was actually quite fast when used against a View rather than an unsorted Collection. So he was writing about the fact that there was a special case where GetNth made sense, rather than absolutely NO case where GetNth made sense.

    3Kendall  8/16/2008 10:18:49 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    I knew most of this except I never heard the "unsorted" part--I'd never heard that FTSearch results didn't go faster using GetNext.

    Actually, you don't say--is it just not as dramatically faster, or is it actually SLOWER, to use GetNext versus GetNth (when dealing with FTSearch results)?

    Yeah...I'm going to put this on my list to test...just thought I'd ask, since it sounds like you know the answer, having excluded it when saying which types of lists are walked faster. ;-)

    4Bob Balaban  8/16/2008 11:15:13 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    @1 - I sure hope you're wrong! :-)

    Thuogh I did promise to post about the one case where I really needed GetNth

    @2 - I do believe I stated (I'll have to go look later, when I think I can stand reading about this topic again...) that View.GetNth is FINE. No proble with View.GetNth at all. Views don't use the IDTable object to store their indices, and there is (complicated, but fast) C API to jump to the Nth entry in a view.

    @3 - DocumentCollections which are created as the result of a full-text search (and possibly by some other techniques as well) do not use an IDTable structure to store the results, they use an array (because it has to be sorted...). DocumentColection.GetNth on one of these array-based thingies is perfectly OK, because arrays are quick to index into.

    5Randy Rempel  8/17/2008 8:14:09 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    What about Select Case? I see this as being a poorly implemented statement in Lotus Notes applications.

    1. Which is faster to compare:

    - Integer

    - String

    - Variant

    If Integer is faster (I assume it is), why pass a String or Variant when you know what the Case values will be anyway?

    2. What if there are 100+ Case statements? Would the application not run faster if the Select Case statements are broken into subgroups. For example, have a Select Case statement to capture 1 through 10. Then have separate Select Case statements for 1, 2, 3, ... within that Case statement.

    This would reduce the number of Case statements that have to be processed when the code has to get to number 100. A difference of 20 comparisons versus 100 (depending how it is coded). Now imagine that the function is called 100 times during processing and it has several hundered Case statements all comparing lengthy String values.

    I usually see this issue in the GetString() function of Lotus Notes applications.

    6Bob Balaban  8/18/2008 3:39:14 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    @5 - Yes, I have always wondered why the select/case construct in LotusScript has to execute every "case" statement until it finds the matching selection. It certainly does not work that way for "switch" statements in C/C++.

    Regarding which variable types compare fastest: of the 3 you mention (integer, string, variant), comparing 2 integers is definitely fastest. Comparing 2 Variants containing Integers is slightly slower (one extra pointer de-reference to figure out the real type of the value), but not significantly.

    String comparisons are, of course, slowest, because the interpreter has to look at every character. In fact, now that you mention it, that's why I never code tests like this (LotusScript)to test for a null string:

    if myString = "" then

    ....

    end if

    Instead, this construct is faster to execute (maybe not much, but still....):

    if len(myString) = 0 then

    ....

    end if

    7John Kingsley  8/18/2008 9:24:16 AM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    @2 Thanks Nathan - the point is there is more than one way to do things in Notes, so don't pay any attention to absolutes like 'GetNth' should never be used.

    8Kendall  8/18/2008 9:18:37 PM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    @4: Thanks, Bob!

    9Rock  8/19/2008 2:56:28 PM  GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    Whew! I finally finished my post on GetNext vs GetNth { Link }

    For those of you who don't know, Bob and I spoke last weekend. He sent me a link to this post, and I screamed! I had just finished a bunch of testing of getNext vs getNth, and I was going to post it on Thursday but I had computer (ok, Notes beta client) problems!

    I told him I wouldn't read this until I had finished mine, so that anything he said wouldn't "tarnish" my thoughts; and we both agreed to respond to each other's posts with our thoughts - so here I am!

    Thankfully my findings back up Bob's statements (I had no doubt). The purpose of mine was to show that, in real-world situations using real data, GetNth is not a recommended choice unless you know you need it (as in Bob's case above). It is simply not usable on unsorted collections, and on sorted collections it isn't fast enough to warrant its use vs. GetNext. If you know you have a situation where GetNth is the right answer, then rock on; otherwise stick with GetNext with the utmost confidence that it will serve you well.