Skip to contents

The package defines a set of functions of the form extract_feature_XXX where XXX stands for a particular feature to be extracted either directly from a set of aligned signatures or indirectly from peaks and valleys derived from a set of aligned signatures.

Using the function extract_features_all will automatically call all currently implemented functions of the form extract_feature_XXX and return a dataframe (of a single row) for each set of aligned signatures.

There are two types of features implemented at the moment: features, that are based on - only the aligned signatures: D, ccf, rough cor, … - striae extracted from aligned signatures: cms, matches, …

Distance DD

The Euclidean distance between two aligned signatures ss and tt is defined as d(s,t)=i=1N(siti)2 d(s,t) = \sqrt{\sum_{i=1}^N\left( s_i - t_i\right)^2}, where i=1,...,Ni = 1, ..., N, the length of the aligned signatures. Two signatures ss and tt of respective lengths nsn_s and ntn_t can be aligned by padding one or both of the signatures with missing values NA. The aligned form of signatures ss and tt then has length Nns,ntN \ge n_s, n_t. dd is then a measure of the distance between the two vectors. Note that this form of dd is not invariant to the resolution rr, at which signatures ss and tt are collected. To make the distance invariant to the resolution, we could use drd\cdot r as an estimate for the area between the two signatures. However, in cases of degraded signatures (i.e. cases, in which for some reason a signature cannot be extracted from a whole land), we want to also make distance invariant to the length of the signatures involved. We therefore define D(s,t)=d(s,t)/ND(s,t) = d(s,t)/N to be the average distance between aligned signatures ss and tt.

Consecutively matching striae (CMS)

Consecutively matching striae is a measure first established by Alfred Biasotti in 1950 (reference). The number of consecutively matching striae is the number of consecutive peaks two signature have in common, i.e. the valleys in between the peaks should not be counted. XXX Currently, the function extract_feature_cms counts both peaks and valleys XXX Generally, a CMS of 6 or higher is considered to be strongly indicative of a match (need another citation for this).

Countable features

All features that return a count in one way or the other, such as cms, noncms, matches, nonmatches, … are accompanied by functions that scale these integers by the signature length (to make these numbers independent from length) and return values scaled to millimeter. The corresponding variables then have an appendix of _per_mm.