Editor's Pick (1 - 4 of 8)
Document Management in the Cloud
Using Data to Delight Customers
People Love Maps!
Demystifying Enterprise Search
By Thom Wisinski, Director of Knowledge Management, Haynes And Boone, LLP
As the definition implies, enterprise search crawls identified silos of data and creates a searchable index for retrieval of information. Careful consideration should be given in determining what silos of data to use and to identify the relation to each of the data sets and what might be missing that could create a relationship. As an example, a drug company may have a file share that has folders for each of the drugs it manufactures. In this example the folder name of the drug could be a useful piece of data that could be used to create a relationship (relational link) to revenue data in an accounting system assuming that the accounting system has a reference to the same drug name. Remember then, as data sets are added to the index that there is some common data element that creates a relationship or if one needs to be created. Creating a relationship could be as simple as renaming a folder, as in the example above, to match that of the way the drug name may be stored in other data sets or creating a cross-reference table in a SQL database. As an example, the accounting system and the purchasing departments may refer to the drug name as ‘Acetaminophen C8H9NO2’ but the folder name is labeled as simply ‘Acetaminophen’.
In thinking about types of data to consider, the typical low hanging fruit is typically loose files (unstructured data) that have some organization to them that may lend itself to relational links. Sometimes, as in the example above, it is helpful to rename or add additional information to the structure so that relational links are easier to create. Other data sets may include databases (structured data) that likely already contain information that lends itself to these relational links. As an example, there may be a products database that contains drug names or product number that create additional links to other information.
The real power of the enterprise search is the ability to use data acquired during the gathering and indexing process
In the example above, the folder name may be used to match the drug name in a products database where the product number can then be identified because the drug name and the folder name match. The product number can then be used to get accounting information because the accounting system goes by product numbers not drug names.
Another system that adds a level of complexity may be an email system that can be searched to find experts within an organization. Some consideration must be given to this as there is a sense of ‘big brother’ watching employees, but there may be a very simple solution to this. The solution may be as simple as not returning results but notifying the email owner that someone did a search for a certain expertise and their email identified them as a potential expert – respond to the requester if they agreed.
The delivery of the enterprise search system is typically by an internal web application, but the building of the index typically depends on “crawlers” or computers that can query systems and then deliver the results in a unified manner to a central system to be indexed. In the delivery system some consideration to results ‘weighting’ can be done so that the most relevant results are displayed first. Weighting can use various pieces of information about the data (time, date, author, etc…) in combination with attributes about the person searching (name, title, job function, etc…) to weight the results. As an example, a lab technician may be looking to search notes from all the other lab technicians about a drug they’re interested in. The system may already know their job function and that most of the searches done by lab technicians were for lab notes and then is able to tailor the results as such. Another consideration may also be to use social weighting where the person reviewing the results of a search has the ability to mark a result or group of results as highly relevant to what they were looking to find. The search system can then actively collect this information for future search weighting.
The real power of the enterprise search is the ability to use data acquired during the gathering and indexing process as a way to create filters to refine the results – much like Amazon uses. Using Amazon as an example, a search done for shoes will result in thousands of results. However, when the results are displayed it will also display filters on the side of the screen that may be used to filter the results by shoe size of ‘10’ and further by ‘brown’ shoes and even further by a particular brand. All of this ‘metadata’ (data about data) is acquired during the gathering process as relational links.
The next piece to the puzzle is to determine what to use for a search engine – really an indexing engine. Aside from straight forward Boolean engines technologies (using AND, OR, NEAR, etc…) there are engines that are also able to search based on ‘concept’ and understand the jist of what a document is talking about. Using one of the many concept search technologies opens up the searching ability to a more ‘Google’ like experience. As an example, a search could be done on the word ‘antitrust’ but the results will bring back documents that talk about ‘The Sherman Act’ or ‘The Clayton Act’ and other documents that don’t mention the word ‘antitrust’ in the body of the document.
Conclusion: Enterprise search isn’t for everyone, but if key knowledge is spread multiple data silos, multiple offices or even multiple countries it could be a business differentiator by making employees more efficient and more productive. Many business decisions need to be vetted during the consideration of data to include in the enterprise search index as well as data that may contain personal identifiable identification (‘PII’) or information governed by The Health Insurance Portability and Accountability Act (‘HIPAA’). While the process of creating an enterprise search system isn’t terribly difficult, it is pretty tedious and time consuming to document how it will all ultimately be tied together for meaningful results.