Welcome to divmod.org, a source of open software.

This page is no longer maintained. For a link to this project's current page, see http://divmod.org/trac/

Lupy

The Lupy project has been RETIRED! For full-text indexing and search we recommend Xapwrap or PyLucene instead.

Lupy is a is a full-text indexer and search engine written in Python. It is a port of Jakarta Lucene 1.2 to Python. Specifically, it reads and writes indexes in Lucene binary format. Like Lucene, it is sophisticated and scalable. Lucene is a polished and mature project and you are encouraged to read the documentation found at the Lucene home page.

Lupy requires Python 2.3 or greater.

Release 0.2.1

11 May 2004

Minor bugfixes and test improvements.

Release 0.2.0

20 Feb 2004

Reorganized code into modules; converted some iteration constructs to Python iterators and generators. All text processing internally is now handled as Unicode. Analyzers are back as generators of tokens.

The changes to the code to make it more pythonic appear to have resulted in trading time for space: preliminary tests indicate about a 5% speedup on one dataset in exchange for a 20% increase in memory usage.

Release 0.1.6

23 Nov 2003

  • Replaced analysis framework with a regex based tokenizer.
  • Cache the hash for a Term.
API Changes:
Creating an IndexWriter no longer takes an analyzer argument

Release 0.1.5.5

17 Nov 2003

  • Changed default mergeFactor from 9 to 20 for better performance
  • Fixed the example in simple.py to use a Keyword for filename, instead of stored, instead of a tokenized and stored Text field.
  • Tidied up SegmentInfos and FieldInfos to be more Pythonic.
  • Called close() on open searcher in indexer.Index.index()

Release 0.1.5.4

31 Aug 2003

Fixed windows-only bug in IndexWriter: was catching IOError instead of OSError. This gets triggered when windows refuses to let Lupy delete a file because windows is using it (eg indexing service). Lupy keeps track of these files and deletes them when it can.

API Changes:
Added setMergeFactor to Index to allow for tuning.

Release 0.1.5.3

17 Aug 2003

  • The indexer wrapper now analyzes search queries using the same analyzer as the index writer. This yields more accurate search results.
  • Some cleanup to get rid of deprecation warnings issued by Python 2.3
  • API cleanups (move to more Pythonic constructs):
    • lupy.document.field -- moved Text etc. from class methods of Field to module level. Moved the incomplete DateFiled into the module.
    • lupy.document.document -- Document.field() returns pythonic iterator, internal storage is now dictionary. Multiple fields with same name are no longer supported.
    • lupy.document.documentfieldlist,documentfieldenumeration,datefield removed.

Release 0.1.5.2

10 Jun 2003

  • Fixed bug in BooleanQuery (thanks to Paul Jimenez for reporting this)
  • Flush the index to disk before doing searches in indexer.Index wrapper.
  • Converted .py to unix line endings.
  • Moved from alpha to beta.
  • Tweaked the __repr__ method of Document and Field.
  • No longer intern field names since intern is not Unicode happy.

Release 0.1.5.1a

29 Apr 2003

  • Call close() less frequently in lupy.indexer.Indexer. Useful speedup.
  • lupy.indexer.Indexer.optimize() now calls setupIndexer(). Tidier code.
  • analysis.splitter is a new tokenizer based on David Metz's article in IBM Developerworks. It is faster, but it is NOT UNICODE.

Release 0.1.5a

25 Apr 2003

  • Fixed deletion bug in SegmentReader.doDelete()
  • Fixed index creation issues in indexer.Index
  • Added deletion to Index wrapper
  • Added "field search" to Index wrapper: findInField
  • Index.find() now searches across all fields
  • Guard against probs with attemp to search an empty index

Release 0.1.4a

24 Apr 2003

  • Fixed Unicode related bug in LowerCaseTokenizer.
  • Added lupy.indexer.Index() as a wrapper for indexing and search. It is a lot easier to get started with Lupy using the wrapper.
  • Added __str__ method to Hits.
  • Added __getitem__ method to Hits enabling indexed access and loops.
  • Added example of using the new Index wrapper.
  • Added example of indexing email.
  • Tweaked Unicode changes in IndexReader

Release 0.1.3a

15 Apr 2003

  • Incorporated Martin Eliasson's great Unicode changes.

Release 0.1.2a

23 Feb 2003

  • Replaced lupy.document.field TextWithString() and TextWithReader() class methods with simple Text() which is more like Lucene's API. The new method uses isinstance() to do some typed dispatch.

Release 0.1.1a

23 Feb 2003

  • Tidied up examples and install.

Release 0.1

22 Feb 2003

  • Initial release.