Brussels / 4 & 5 February 2023

schedule

Deploy an enterprise search server with Fess

Search GitLab, Redmine, and repositories with a single query


This talk will illustrate how organizations can configure and deploy OSS enterprise search server Fess, based on our own experiences.

Fess is, by and large, a well-documented, user-friendly tool. Yet, no matter how good the tool is, deploying a search server on your own brings a number of challenges. I will talk about the challenges we had and how we overcame them. Our hope is that

  1. You will have a good grasp of what it is like to configure and deploy an enterprise search server, whether or not you are considering deploying one at the moment.
  2. Should you decide to deploy one, you can (hopefully) do so with a significantly lower cost.

Today's technology companies often use multiple content management systems, resulting in information (e.g. source code, document files, and wiki pages) fragmented and stored in many places. Consequently, engineers sometimes end up searching multiple locations one by one, repeating the same query at each place, to look for the information they need.

A solution would be to build and deploy a search server which, given a query, fetches the relevant code and documents from all the content management systems, much like a web search, such as Google, does on the Internet.

This would be a daunting task, as it would essentially mean setting up and maintaining the whole end-to-end system of search as a service, from web crawlers to search box UI. Fortunately, there is an OSS search server built for this exact purpose. We are here to share the knowledge, experiences, and insights we have gained so far, so that you can build your own enterprise search server, using OSS.

In this talk, I will first explain what search-related problems we had to solve, as well as briefly touching on what enterprise search is in general.

Next, I'll go over what Fess is and explain its core features, which will serve as an introductory, informational session for this lesser-known OSS enterprise search server.

Moving on to the core part of the talk, I will dive deep into how we configured, customized, and deployed Fess for our specific purpose. Our objectives were

  1. to index contents on issue tracking/project management tools we use (GitLab & Redmine) and in repositories (Git & Subversion)
  2. to enable users to search the indexed contents from a single search box.

We achieved both of these objectives but we had to overcome several hurdles. I will share things such as what pitfalls/shortcomings Fess has and how to overcome them.

I will also talk about what we have made and contributed; our dev team wrote several patches for fixing and customizing Fess. 3 of the patches are bug fixes and 2 of them have been merged into the mainline. 2 other patches are customizations designed to meet our specific requirements. One of them allows the crawler to pass the custom authentication page of our GitLab implemented with SAML and Keycloak. The other patch re-maps the crawled filesystem paths to the webpage URLs; this was crucial to cut the time to index contents in repositories by reading files on the local filesystem instead of following links on the webpages.

The goal of this segment is to provide the information which will allow future Fess users to deploy it with a dramatically smaller time and effort investment.

Then, I will share our experiences with our preliminary deployments inside TOSHIBA. We will explain resource requirements and performance, such as how long it takes to crawl and index a given size of resources (basically web pages and files) using how much computing resource. This will help future Fess users make estimates on how much computing resource they need to secure in order to deploy it. Along with that, we will reveal details such as how Fess's ability to index contents inside Microsoft Office documents and PDF files helped engineers.

Speakers

Photo of Takashi Kumagai Takashi Kumagai

Attachments

Links