Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    Bob Balaban  October 25 2010 03:00:00 AM
    Greetings, Geeks!

    Have you ever had to write an agent that looks at ALL of the documents in a database? As is usual with LotusScript/Java and the back-end classes in Notes, there's more than one way to do that.

    Actually, there are (at least) 3 common techniques:
    1) Database.AllDocuments
    2) Get the "All Documents" view and iterate
    3) Database.search("@all")

    I used to always use (and recommend) Database.AllDocuments, which returns a DocumentCollection instance. It's very efficient, because it maps to an optimized C API call that retrieves a list of all NOTEIDs (as an IDTable) in the database. This translates to a LotusScript DocumentCollection instance very easily, and none of the document objects is actually "opened" until you access it from the DocumentCollection,

    Then, somewhere along the way, I discovered that a change had been made to the behavior of this property. Originally (Notes 4.x), the code in the back-end classes implementation filtered out any deletion stubs in the IDTable resulting from the C API call. Again, this was an efficient operation, made possible by the fact that the NOTEIDs of deletion stubs all have a high bit set (0x80000000L). So you could tell the API to remove all values in the IDTable greater than  0x80000000L, and what was left would be all valid NOTEIDs.

    At some point (not sure when, or why), this final filtering operation was removed. So now, if you use Database.AllDocuments, you might find that as you iterate through the document objects in the result set, some of those documents will be "invalid" or deleted. It can be hard to work around this in your code, as the DocumentCollection.getNextDocument() call does not usually work if the "current" document object is invalid. See here and here for some of the details.

    I recently worked on a project where the speed of acquiring all the document objects in the NSF and iterating through them was really important. The AllDocuments/GetNth technique was too slow. So I thought I'd try iterating through the "All Documents" view instead (this particular project was processing dbs derived from the standard mail template). I thought it would be pretty fast because all the indexing would have already been done, and it was fast. The problem, however, was that it didn't really retrieve ALL of the documents in the database. Why? Here's the mail template "AllDocuments" view selection formula (the "($All)" view):

         SELECT @IsNotMember("A"; ExcludeFromView) & IsMailStationery != 1 & Form != "Group" & Form != "Person"

    So any document that has a field "ExcludeFromView" containing an "A", or any document with a Form field containing "Group" or "Person", or any mail stationery document would never show up in the "All" view, and your agent would never see it. For some purposes this would be fine, but not for the project I was doing at the time, I really needed ALL of the documents.

    So I fell back on Plan C: do a Database.search using "@all" as the selection criterion. I thought it would be slow, because the search() function visits every document in the database to apply the @function selection formula. But I knew that it would also explicitly exclude any deletion stubs from the result set, and that was a major concern. It turned out that the search was not particularly slow, certainly it's faster than using DocumentCollection.GetNthDocument() on a very large (100K) result set. The performance overall was actually pretty reasonable, and since I know that the resulting DocumentCollection will not contain any deleted documents, I can avoid GetNthDocument and just use GetFirst/NextDocument.

    So, there you have it. All 3 techniques "work" (with variations in behavior), you need to think carefully about which one you should use.

    Geek ya later!

    (Need expert application development architecture/coding help?  Want me to help you invent directory services based on RDBMS?? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2010 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.