Entity Confidence

Each extracted entity has an associated confidence value between 0 and 1. The value denotes the likelihood that the entity is an actual entity per the algorithm. In order to eliminate noise Idyl E3 offers some methods for filtering out low confidence entities. The entities that are filtered out are simply not returned in response to an extraction API call. Refer to Idyl E3’s Configuration for instructions on how to select which method to use.

Entities extracted by a plugin that performs regular expression-based or dictionary-based extraction are assigned a confidence value of 1.0 and are never filtered.

Entity Filtering Methods

Simple Confidence Filtering

The first method filters out entities with having a confidence lower than some arbitrary value. The extraction API includes a confidence threshold parameter. (Refer to the API documentation to learn how to provide the parameter in an API call.) Each extracted entity whose confidence is lower than the threshold will not be returned. This method of entity filtering is fast but is very rigid.

To summarize this method:

  1. An entity extraction request is received containing a confidence threshold.
  2. If an extracted entity’s confidence is greater than or equal to the confidence threshold the entity is returned in the response. Otherwise the entity is filtered out of the response.

Hueristic Confidence Filtering

A second method of entity confidence filtering uses heuristics. As entities are extracted Idyl E3 tracks the entity confidences per model. When a large enough sample of entity confidences has been tracked, Idyl E3 will begin filtering the entities based on the mean of the confidences and how statistically significant an entity’s confidence is from the mean.

Until Idyl E3 has accumulated a large enough sample the entity filtering will be performed simply by comparing the confidence threshold. Even after the collected sample is sufficient if an entity’s confidence is greater than the threshold the entity will not be filtered and will always be returned.

Overall, this filtering is less rigid than simply filtering by a confidence threshold.

To summarize this method:

  1. An entity extraction request is received containing a confidence threshold.
  2. If an extracted entity’s confidence is greater than or equal to the confidence threshold the entity is returned in the response.
  3. If Idyl E3 has collected a large enough sample it will compare confidence with the sampled mean. Otherwise the entity is filtered out of the response and request processing ends.
  4. If the entity’s confidence is statistically significant to the mean it will be returned in the response. Otherwise the entity is filtered out of the response.

Selecting the Confidence Filtering Method

Refer to the Configuration for details on selecting an entity confidence filtering method.