Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    GetNth Revisited - Helpful function? Or SPAWN OF THE DEVIL?

    Bob Balaban  August 16 2008 04:22:09 AM
    Greetings, Geeks!!

    I was browsing my feed reader this morning, and came across a couple of interesting posts on the TeamStudio site (see the posts titled "Show Time" and "Time Differences").

    Sigh. More discussion about NotesDocumentCollection.GetNth..... This topic has been a pimple on the butt of Notes developers for, let's see, EVER (ok, well, actually since Notes 4.0 shipped in January, 1996, so really only for 12 years....). I was moved to blog about it, one more (and I sincerely hope, LAST) time.

    I've written about it, talked about it at conferences and among friends, and it STILL comes up every now and then. So, ok. ONE MORE TIME!

    IF (you have a DocumentCollection object -- doesn't matter whether it's LotusScript, Java, COM, whatever language binding),
    AND (your DocumentCollection object is an UNsorted list -- i.e., the result of a Database.AllDocuments, or Database.Search operation, NOT an FTSearch)
    THEN iterating that collection using the DocumentCollection.GetFirst/GetNext pattern is MUCH (i.e., A LOT) faster than iterating using DocumentCollection.GetNth(index).

    CAVEATS:
     - For small collections the difference doesn't matter
     - I'm talking about navigation time, regardless of what happens to the documents themselves inside your iteration loop.
     - I'm ONLY talking about DocumentCollection navigation here, VIEW navigation is completely different (GetNth in a View is just as fast as GetFirst/GetNext)
     - Your results may well be version-dependent! (more on this later...)

    How do I know? How do i know for sure? I wrote an experiment, which I will now share with you, gentle geek readers. Here's what I did:
     - Using Notes/Designer v7.02,
     - Created a blank database
     - Wrote a LS agent to populate the database with 50,000 documents (see code, below)
     - Wrote another LS agent to iterate all the documents in the database (see code, below), first using GetFirst/Next, then using GetNth
     - The agent times each full iteration, and reports the elapsed time of each.

    Results? The envelope, please....

     - GetFirst/Next: 1 second

     - GetNth: 446 seconds


    You can't tell me that's not a significant difference. I wondered whether someone at Lotus had managed to fix this forever problem recently, so I asked my colleague Steve Mullen to run the test on Notes 8. Steve reported results almost identical to the ones I receved ("painfully slow"). Twelve years and counting, DocumentCollection.GetNth is still.....painful.

    So, What's Going on Here?

    Why so slow with GetNth? It's because unsorted DocumentCollections, internally, represent a data structure called an "IDTable", a compressed list of Note IDs (see the Lotus CAPI toolkit for documentation, this stuff has been around a long time). There's no way to find the Nth entry in the table other than by starting at the beginning and counting to N. So, to iterate the entire list of X entries will take (x*(x-1))/2 CAPI calls (which is Order(n-squared). In other words, for 50,000 documents, you're going to call the IDTable interface something like 1,249,975,000 times. As Kurt Vonnegut might have said, "That's a lot of times!" The shocker (to me) is that it ONLY takes 7 or 8 minutes (on my laptop).

    So, there you go. I think I'll write one more blog entry on this topic, as I recently came across a situation where I had no choice -- I HAD to use GetNth to iterate a document collection (first time in 12 years!). Stay tuned!

    Here's the code for my "PopulateDb" agent:
    Sub Initialize
        Dim s As New notessession
        Dim db As NotesDatabase
        Dim i As Long
        Dim doc As NotesDocument
        Set db = s.CurrentDatabase
       
        For i = 0 To 49999
                Set doc = db.CreateDocument()
                doc.somefield = i
                doc.Save True, False
        Next
    End Sub

    And here's the code for my test agent:
    Sub Initialize
        Dim s As New notessession
        Dim db As notesdatabase
        Dim dc As NotesDocumentCollection
        Dim doc As notesdocument
        Dim dtStart As NotesDateTime
        Dim dtEnd As NotesDateTime
        Dim elapsed As Long
        Dim i As Long
        Set db = s.CurrentDatabase
        Set dc = db.AllDocuments
       
        Set dtEnd = s.CreateDateTime("")
        Set dtStart = s.CreateDateTime("")

        dtStart.SetNow
        Set doc = dc.GetFirstDocument
        While Not (doc Is Nothing)
             Set doc = dc.GetNextDocument(doc)
        Wend
        dtEnd.SetNow
       
        elapsed = dtEnd.TimeDifference(dtStart)
        Msgbox "GetFirst/Next elapsed time = " & elapsed
       
        dtStart.SetNow
        For i = 0 To dc.Count
             Set doc = dc.GetNthDocument(i)
        Next
        dtEnd.SetNow
       
        elapsed = dtEnd.TimeDifference(dtStart)
        Msgbox "GetNth elapsed time = " & elapsed
       
    End Sub

    Try it yourself!

    (Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2009 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.