Printable Version of this PageHome PageRecent ChangesSearchSign In

Endeca Overview

Endeca Overview in outline form:
  • Data processing/merging
    • BIB and HOL data extracted from Aleph Oracle (z00) x 11
    • Merge routine
      • 'Endeca Field Mapping and Pipeline' shows the action that is taken during the merge routine for each MARC field
      • Deduping on OCLC no
      • HOL data written into the Union MARC
      • p_print_03 for formats and UTF8 encoding
      • The merge routine happens before the field mapping.
    • QF data and other data sources in the future (digital libraries)
      • Via DLU01
      • Direct
  • Endeca Forge and Dgidx
    • Forge is a data processing program
      • Endeca provided a custom MARC adapter to make MARC records readable by Forge
      • Transforms your source data into standardized, tagged Endeca records
      • Each record has a list of dimension (text) values tagged to it.
    • Dgidx is an indexing program
      • Reads the tagged Endeca records that were prepared by Forge
      • Creates the proprietary indices for the Endeca MDEX Engine
        • Dgraph: An Index for every N-value
        • Entire Endeca Database stored in memory
    • Indexing: program that analyzes and extracts indexable tokens from text in records
      • Output stored in directories the file system (fclnx19).
      • Indexing Configuration in Endeca (pipeline) includes:
        • Stop words
        • Character normalization and ‘internationalization’
        • Thesaurus and Stemming (automatic)
        • Taxonomies ('hierarchies') e.g. LCC/NLM
        • Search Configuration ('interfaces')
        • Relevance ranking
        • DYM and Spell Correction
        • Truncation (not implemented)
    • MDEX Engine serves data in response to a query, includes all of the information needed to build an entire page
      • Queries include Search and Navigation
      • An entire page is returned in response to a query, constructed from a subset of the Dgraph
      • Subsequent navigation is applied to this object, not the entire Dgraph.
    • Documents – See https://sblogs.fcla.edu/index.php/endeca
      • 'Endeca Field Mapping and Pipeline' shows how MARC fields are mapped to Endeca record fields.
      • 'Endeca Dimensions (Facet) Mappings' shows how MARC fields are mapped to Endeca dimensions (facets). —'Endeca Search Configuration' shows the search 'interfaces' and the Endeca record fields that are searched
  • WebApp by FCLA
    • Apache Tomcat and JSP
      • maintains a connection state with Endeca Nav Engine
      • similarly, maintains a connection with ORACLE via JDBC
      • restarted every morning
    • Endeca custom API includes
      • Http Connections into the MDEX
      • Method to query via a URL
      • An result object that can be parsed and manipulated for display
      • A method to get the dimensions, dimension values, and corresponding IDs for a particular Navigation state.
      • Boolean query mode
        • interaction with other features (no stop words, stemming, spelling, thesaurus, ranking)
        • proximity searching (NEAR/n, ONEAR/n)
      • Statistical Report (See http://www.fcla.edu/FCLAinfo/stats/endeca_stat/endeca_stat.html and linked document that explains the reporting categories)
    • Hooks into Aleph
      • Availability Circ (item) status
      • Patron Empowerment
      • Loans list
      • Renewals
      • Requests and Holds
      • SFX Contextual links for Full Text
    • Custom Features
      • List Functions
      • RSS
      • Hooks to RefWorks
      • Permalink
      • Marc Views via SQL
      • Debug View
      • Book covers
    • Development Environment
      • subversion
      • svnlog (http://catalog.fcla.edu/svnlog.xml)
    • Circulation function of Webapp
      • Availability check
        • SQL query to Aleph
          • Checks if there is a due date in z36
          • Checks the z36_status (L or C)
          • Checks the z30 if not loaned
            • Gets info for item from local copy of tab15 (tabs loaded every Sun. via crontab)
            • Checks item status 90’s OR col. 6=N to display tab15 col. 5 text.
            • If col. 6=Y
              • Checks for reserve label (day loan/hour loan) and display tab15.col6.text
              • Check if there is an IPS and if so, displays IPS text
          • In all other cases “Available” will display
      • Item Requests
        • UF gets label “Request” all others get “Place a Hold”
        • Calls cgi program that interfaces with Aleph API for circ
          • Nothing is displayed cgi returns an error
        • Troubleshooting
          • Try to place a request in the circ client, and note any error messages and codes
          • tab_hold_request : CIRC only
            • note check_hold_request code in header
            • $aleph_error_eng : grep for code.
      • Display of item information
        • tab40.ene and tab_sub_library.ene are extracted daily via crontab
        • Library/Collection Facet
          • File ‘sublib_display’ generated from all tab_sub_library's/tab40’s
          • ‘sublib_display’ used to generate facet values during forge
        • Brief Record
          • ‘sublib_display’ used to map Library/collection text (can update w/o forge)
        • Full Record
          • Item information comes from SQL query into Aleph
          • ‘sublib_display’ used to map Library/collection text (summary hol and detailed hol, can update w/o forge)

Last modified 24 January 2008 at 6:08 pm by jeanp