Skip to content

Feature extraction analyzer

The feature extraction analyzer creates attributes out of event data based on different extraction plugins.

Currently supported: * Regular expression based extractions * Regex Extraction Plugin * Plaso parsed windows event logs * Winevt Extraction Plugin

Note Please be aware that this analyzer does not extract ipv4, email addresses and similar from all events, but only those that match the definitions configured for the plugins explained below!

Use case

This analyzer is helpful to extract additional data from events as separate attributes. Those extracted attributes can then be used in search, lookups, correlations, aggregations or with analyzers.

For example: In the default configuration, the analyzer will extract email_addresses from the message field of events with the source WEBHIST matching the regular expression.

Regex Extraction Plugin

This feature extraction plugin uses regular expression to extract matching strings from an existing event attribute (e.g. message) and adds it as a new attribute to the event.

Configuration

Features are defined in data/regex_features.yaml

A regex based feature extraction definition looks like this:

name:
       # Define either a query_string or query_dsl.
       query_string: *
       query_dsl:
       # Mandatory fields.
       attribute:
       store_as:
       re:
       # Optional fields.
       re_flags: []
       emojis: []
       tags: []
       create_view: False
       aggregate: False
       overwrite_store_as: True
       overwrite_and_merge_store_as: False
       store_type_list: False
       keep_multimatch: False

Each definition needs to define either a query_string or a query_dsl.

re_flags is a list of flags as strings from the re module. These include: - DEBUG - DOTALL - IGNORECASE - LOCALE - MULTILINE - TEMPLATE - UNICODE - VERBOSE

The fields tags and emojis are optional.

The field store_as defines the name of the attribute the feature is stored as.

The create_view is an optional boolean that determines whether a view should be created if there are hits.

The aggregate is an optional boolean that determines if we want to create an aggregation of the results and store it (ATM this does nothing, but once aggregations are supported it will).

The overwrite_store_as is an optional boolean that determines if we want to overwrite the field store_as if it already exists.

The overwrite_and_merge_store_as is an optional boolean that determines if we want to overwrite the field store_as and merge the existing values.

The store_type_list is an optional boolean that determines if we want to store the extracted data in List type (default is text).

The keep_multimatch is an optional boolean that determines if we want to store all matching results (default store first result).

The feature extraction works in the way that the query is run, and the regular expression is run against the attribute to extract a value. The first value extracted is then stored inside the "store_as" attribute. If there are emojis or tags defined they are also applied to that event. In the end, if a view is supposed to be created a view searching for the added tag is added (only if there are results).

Winevt Extraction Plugin

This feature extraction plugin uses configured mappings to create new attributes for Windows Event Log events that were parsed using Plaso.

The mapping is based on the strings array, that gets generated by Plaso for the event data entries.

Note The winevt extraction plugin does not map all Windows Event Log fields. It does only map the ones configured in data/winevt_features.yaml!

Configuration

Features are defined in data/winevt_features.yaml

A mapping for a Windows Event uses the yaml format and looks like this:

name:

    source_name:            Type: list[str] | REQUIRED | case-insensitive
                            A list of source names to match against. Multiple
                            entries will be checked with OR.

    provider_identifier:    Type: list[str] | OPTIONAL | case-insensitive
                            A list of provider identifiers to match against.
                            Multiple entries will be checked with OR.

    event_version:          Type: int | REQUIRED
                            The event version to match against.

    event_identifier:       Type: int | REQUIRED
                            The event identifier to match against.

    references:             Type: list[str] | OPTIONAL
                            A list of references to provide as context and
                            source for the event mapping. E.g. a URL to the
                            official Microsoft documentation on the event.

    mapping:                Type: list[dict] | REQUIRED
                            A list of dicts that define the new attribute name
                            and the string index of the event to extract the
                            value from. Additonally it can also contain an
                            alias list to add multiple attributes with
                            the same value but different names.

        name:               Type: str | REQUIRED
                            The name of the new attribute to create.

        string_index:       Type: int | REQUIRED | Starting at index 0
                            The string index of the event to extract the
                            value from. Based on the plaso extracted "strings"
                            attribute with Windows eventlog entries.

        aliases:            Type: list[str] | OPTIONAL
                            A list of aliases to add additionally to the
                            offical name of the attribute. This can be used
                            to add different field names matching individual
                            field name ontologies. E.g. srcIP, domain, etc.

Checkout the preconfigured mappings for some examples: data/winevt_features.yaml