Amélie Cordier, Béatrice Fuchs, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2), Supporting Interaction and Learning by Experience (SILEX), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), AFIA, Nathalie Hernandez, Traces, Web, Education, Adaptation, Knowledge (TWEAK), Amedeo Napoli, Yannick Toussaint, Catherine Faron Zucker, Chiara Ghidini, and Hernandez, Nathalie
National audience; The context of this work is the study of sequential data that can be represented with sequences of timestamped events. The aim is to explore these sequences with sequence mining to discover serial episodes which are frequent event subsequences that occur frequently in data (Mannila et al., 1997). The domain of melodic analysis is studied in this work : the aim is to highlight the structure of a musical piece by discovering its main melodic patterns. The episodes produced by the miner are examined by a user generally an expert of the domain who have to identify relevant episodes and interpret them. Meanwhile in the interpretation step, the user has to face to a recurrent overabundance of mining's results which makes difficult the identification of interesting ones. There is a real need to adopt a rigorous approach to methodically manage this step and assist the user's work. For this, we propose a visual and interactive approach to assist the interpretation of serial episodes. An Interactive approach to the interpretation of serial episodes We propose to assist the interpretation task by managing combinatorial redundancy in order to focus on relevant episodes. The assistance combines iteratively ranking and filtering useless episodes to help focusing on relevant ones. It has been exemplified in the Transmute prototype, a web-based application enabling user's interaction with events sequences and serial episodes that are represented graphically on a timeline with customisable icons. The interpretation process consists in the main iterative steps : ranking, selection and filtering. The user can choose measures to rank episodes and then select among them to display their occurrences in the sequence. When a choice is made, a filtering process is triggered to clean up other episodes that can no longer be selected following the previous selections of the user. Finally, the user can interpret the episodes by attaching them annotations and record the model resulting from the interpretation into a knowledge base. The ranking of episodes is performed thanks to several objective interestingness measures which estimate the relative importance and compactness of the episodes in the sequence. The first measure is the event coverage indicator which is the number of distinct events of the occurrences of an episode. The second measure is the spreading indicator which is the number of events of the sequence in the time intervals of the episode occurrences. The noise indicator is the difference between these two previous indicators and corresponds to the number of events of the sequence in the time intervals of the episode occurrences. Temporal measures may also be used when event duration are known. The selection of an episode by the user triggers the filtering process which is based on the event coverage of the selected episode. The remaining episodes are examined and occurrences having at least an event in common with the event coverage are discarded. The support is consequently updated and episodes whose support becomes less than the given frequency threshold are discarded. This results in removing combinatorial redundancy around the chosen episode and leads to a gradual diminution of the remaining episodes, allowing to the user a better focus on other episodes.