Search Results

You are looking at 1 - 8 of 8 items for

  • Keyword: data discovery x
Clear All Modify Search
Open access

Michał Rogalewicz and Robert Sika

Abstract

The paper contains a review of methodologies of a process of knowledge discovery from data and methods of data exploration (Data Mining), which are the most frequently used in mechanical engineering. The methodologies contain various scenarios of data exploring, while DM methods are used in their scope. The paper shows premises for use of DM methods in industry, as well as their advantages and disadvantages. Development of methodologies of knowledge discovery from data is also presented, along with a classification of the most widespread Data Mining methods, divided by type of realized tasks. The paper is summarized by presentation of selected Data Mining applications in mechanical engineering.

Open access

Yi Shen

Abstract

Currently, we are witnessing the emergence and abundance of many different data repositories and archival systems for scientific data discovery, use, and analysis. With the burgeoning of available data-sharing platforms, this study addresses how scientists working in the fields of natural resources and environmental sciences navigate these diverse data sources, what their concerns and value propositions are toward multiple data discovery channels, and most importantly, how they perceive the characteristics and compare the functionalities of different types of data repository systems. Through a user community research of domain scientists on their data use dynamics and insights, this research provides strategies and discusses ideas on how to leverage these different platforms. Furthermore, it proposes a top–down, novel approach to the processes of searching, browsing, and visualizing for the dynamic exploration of environmental data.

Open access

Tomáš Kliment, Linda Gálová, Renata Ďuračiová, Róbert Fencík and Marcel Kliment

Abstract

Flood protection is one of several disciplines where geospatial data is very important and is a crucial component. Its management, processing and sharing form the foundation for their efficient use; therefore, special attention is required in the development of effective, precise, standardized, and interoperable models for the discovery and publishing of data on the Web. This paper describes the design of a methodology to discover Open Geospatial Consortium (OGC) services on the Web and collect descriptive information, i.e., metadata in a geocatalogue. A pilot implementation of the proposed methodology - Geocatalogue of geospatial information provided by OGC services discovered on Google (hereinafter “Geocatalogue”) - was used to search for available resources relevant to the area of flood protection. The result is an analysis of the availability of resources discovered through their metadata collected from the OGC services (WMS, WFS, etc.) and the resources they provide (WMS layers, WFS objects, etc.) within the domain of flood protection.

Open access

Satoshi Tsutsui, Yi Bu and Ying Ding

Abstract

Purpose

This paper aims to better understand a large number of papers in the medical domain of Alzheimer’s disease (AD) and related diseases using the machine reading approach.

Design/methodology/approach

The study uses the topic modeling method to obtain an overview of the field, and employs open information extraction to further comprehend the field at a specific fact level.

Findings

Several topics within the AD research field are identified, such as the Human Immunodeficiency Virus (HIV)/Acquired Immune Deficiency Syndrome (AIDS), which can help answer the question of how AIDS/HIV and AD are very different yet related diseases.

Research limitations

Some manual data cleaning could improve the study, such as removing incorrect facts found by open information extraction.

Practical implications

This study uses the literature to answer specific questions on a scientific domain, which can help domain experts find interesting and meaningful relations among entities in a similar manner, such as to discover relations between AD and AIDS/HIV.

Originality/value

Both the overview and specific information from the literature are obtained using two distinct methods in a complementary manner. This combination is novel because previous work has only focused on one of them, and thus provides a better way to understand an important scientific field using data-driven methods.

Open access

Adrian Besimi and Visar Shehu

Abstract

Surveys represent popular, traditional tools for collecting data from users. They have been especially popular with the growth of convenient electronic delivery methods, through email, electronic forms and especially because of the ability to distribute them quickly through social networks. In the past years, South East European University has been relying a lot on surveys for the purpose of evaluating the quality of service offered by the university to its students. Through these surveys, the university has obtained a large amount of data which is used as an invaluable feedback tool from students and contributes to the improvement of the quality of service of the university. This paper aims to investigate the possibility of applying advanced statistical methods against these datasets with the purpose of uncovering hidden information and providing the office of Quality Assurance with a variety of methods that will aid the process of evaluating staff members.

Open access

Adam Krasuski and Karol Kreński

Abstract

In this article we present the foundations of a decision support system for blockage management in Fire Service. Blockage refers to the situation when all fire units are out and a new incident occurs. The approach is based on two phases: off-line data preparation and online blockage estimation. The off-line phase consists of methods from data mining and natural language processing and results in semantically coherent information granules. The online phase is about building the probabilistic models that estimate the block-age probability based on these granules. Finally, the selected classifier judges whether a blockage can occur and whether the resources from neighbour fire stations should be asked for assistance.

Open access

Marcin Relich

Abstract

Nowadays, more and more enterprises are using Enterprise Resource Planning (EPR) systems that can also be used to plan and control the development of new products. In order to obtain a project schedule, certain parameters (e.g. duration) have to be specified in an ERP system. These parameters can be defined by the employees according to their knowledge, or can be estimated on the basis of data from previously completed projects. This paper investigates using an ERP database to identify those variables that have a significant influence on the duration of a project phase. In the paper, a model of knowledge discovery from an ERP database is proposed. The presented method contains four stages of the knowledge discovery process such as data selection, data transformation, data mining and interpretation of patterns in the context of new product development. Among data mining techniques, a fuzzy neural system is chosen to seek relationships on the basis of data from completed projects stored in an ERP system.

Open access

John P. McCrae and Paul Buitelaar

Abstract

Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets.