Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    Bob Balaban  October 25 2010 03:00:00 AM
    Greetings, Geeks!

    Have you ever had to write an agent that looks at ALL of the documents in a database? As is usual with LotusScript/Java and the back-end classes in Notes, there's more than one way to do that.

    Actually, there are (at least) 3 common techniques:
    1) Database.AllDocuments
    2) Get the "All Documents" view and iterate
    3) Database.search("@all")

    I used to always use (and recommend) Database.AllDocuments, which returns a DocumentCollection instance. It's very efficient, because it maps to an optimized C API call that retrieves a list of all NOTEIDs (as an IDTable) in the database. This translates to a LotusScript DocumentCollection instance very easily, and none of the document objects is actually "opened" until you access it from the DocumentCollection,

    Then, somewhere along the way, I discovered that a change had been made to the behavior of this property. Originally (Notes 4.x), the code in the back-end classes implementation filtered out any deletion stubs in the IDTable resulting from the C API call. Again, this was an efficient operation, made possible by the fact that the NOTEIDs of deletion stubs all have a high bit set (0x80000000L). So you could tell the API to remove all values in the IDTable greater than  0x80000000L, and what was left would be all valid NOTEIDs.

    At some point (not sure when, or why), this final filtering operation was removed. So now, if you use Database.AllDocuments, you might find that as you iterate through the document objects in the result set, some of those documents will be "invalid" or deleted. It can be hard to work around this in your code, as the DocumentCollection.getNextDocument() call does not usually work if the "current" document object is invalid. See here and here for some of the details.

    I recently worked on a project where the speed of acquiring all the document objects in the NSF and iterating through them was really important. The AllDocuments/GetNth technique was too slow. So I thought I'd try iterating through the "All Documents" view instead (this particular project was processing dbs derived from the standard mail template). I thought it would be pretty fast because all the indexing would have already been done, and it was fast. The problem, however, was that it didn't really retrieve ALL of the documents in the database. Why? Here's the mail template "AllDocuments" view selection formula (the "($All)" view):

         SELECT @IsNotMember("A"; ExcludeFromView) & IsMailStationery != 1 & Form != "Group" & Form != "Person"

    So any document that has a field "ExcludeFromView" containing an "A", or any document with a Form field containing "Group" or "Person", or any mail stationery document would never show up in the "All" view, and your agent would never see it. For some purposes this would be fine, but not for the project I was doing at the time, I really needed ALL of the documents.

    So I fell back on Plan C: do a Database.search using "@all" as the selection criterion. I thought it would be slow, because the search() function visits every document in the database to apply the @function selection formula. But I knew that it would also explicitly exclude any deletion stubs from the result set, and that was a major concern. It turned out that the search was not particularly slow, certainly it's faster than using DocumentCollection.GetNthDocument() on a very large (100K) result set. The performance overall was actually pretty reasonable, and since I know that the resulting DocumentCollection will not contain any deleted documents, I can avoid GetNthDocument and just use GetFirst/NextDocument.

    So, there you have it. All 3 techniques "work" (with variations in behavior), you need to think carefully about which one you should use.

    Geek ya later!

    (Need expert application development architecture/coding help?  Want me to help you invent directory services based on RDBMS?? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2010 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.


    Comments

    1Tim Tripcony  10/25/2010 7:28:11 AM  How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    I haven't benchmarked this on an extremely large database in quite a while, but in the past I've found using the NotesNoteCollection class to be far faster than any alternative when dealing with large data sets. And when you're just looking to grab all documents, it's easy:

    var allNotes = database.createNoteCollection(false); // empty collection

    allNotes.selectAllDataNotes(true);

    allNotes.setSelectProfiles(false); // unless you want profiles too...

    allNotes.buildCollection();

    var currentNoteId = allNotes.getFirstNoteID();

    while (currentNoteId) {

    var currentDoc = database.getDocumentByID(currentNoteId);

    if (currentDoc.isValid()) { // check for deletion stub

    // process the doc

    }

    currentNoteId = allNotes.getNextNoteID(currentNoteId);

    }

    As you might have guessed, this class is also handy for other things that you *can't* do with views, such as locating profile documents, getting a NotesDocument handle on design elements (including the ACL), etc.

    For instance, the following might be fun:

    var designNotes = database.createNoteCollection(false);

    designNotes.selectAllDesignElements(true);

    designNotes.setSelectionFormula("@Contains($Script;\"On Error Resume Next\")");

    var elementId = designNotes.getFirstNoteID();

    while(elementId) {

    var sloppyCode = database.getDocumentByID(elementId);

    elementId = designNotes.getNextNoteID(elementId);

    sloppyCode.removePermanently(true); // shouldn't have been there in the first place...

    }

    :)

    2Bob Balaban  10/25/2010 9:10:19 AM  How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    @tim - Thanks for the comments. Using NotesNoteCollection is not something I've tried. Do you know if it filters deletion stubs using it the way you suggest?

    3Victor  10/25/2010 9:35:57 PM  How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    A great job

    4Tim Tripcony  10/26/2010 4:42:35 AM  How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    @Bob, I don't believe it excludes deletion stubs by default, hence the check for currentDoc.isValid(). Which might slow it down, of course, since you'd have to verify each doc handle after you've obtained it. On the other hand, if all deletion stubs do indeed have a minimum NoteID, you could use setSelectionFormula to filter based on that, e.g.:

    allNotes.setSelectionFormula("@TextToNumber(@Right(@NoteID;\"NT\")) > " + minimumStubNoteId);

    That would exclude any notes with a NoteID higher than the minimum value you specify, which should ensure that you only get valid note handles to begin with instead of having to verify each.

    5Bob Balaban  10/26/2010 6:20:28 AM  How to get ALL of the documents: db.search() vs. db.AllDocuments vs. AllDocuments view

    @Tim - Looks like NotesNoteCollection uses the same underlying C API as Database.Search(). So the relative performance of the two calls should be pretty much the same, and both will filter out deletion stubs. So, party on!