Document clustering is a problem of automatically grouping similar document into categories based on some similarity metrics. Almost all available data, usually on the web, are unclassified so we need powerful clustering algorithms that work with these types of data. All common search engines return a list of pages relevant to the user query. This list needs to be generated fast and as correct as possible. For this type of problems, because the web pages are unclassified, we need powerful clustering algorithms. In this paper we present a clustering algorithm called DBSCAN – Density-Based Spatial Clustering of Applications with Noise – and its limitations on documents (or web pages) clustering. Documents are represented using the “bag-of-words” representation (word occurrence frequency). For this type o representation usually a lot of algorithms fail. In this paper we use Information Gain as feature selection method and evaluate the DBSCAN algorithm by its capacity to integrate in the clusters all the samples from the dataset.
Digital technologies changed all the social life domains in society, as we are now living in the digital era, in the information society or in an interconnected world. Besides the improvements in every day life, digital changes also brought inherent risks, from cyber-security, hacking, cyber-bullying, to the vulnerability of personal data, or the mental health consequences of information explosion.
This article reviews the risks of the changes brought by the digital transformation on libraries in general, with examples of the LBUS Library, drawing from the last 10 years of development experience of informatic book management systems (electronic catalogue) and of the digital library system. A group of library experts took part in local cultural projects promoting the city of Sibiu, as well as in four major European projects focusing on “Europeana”, and developing highly valuable cultural, historical, and scientific digital collections. As relatively new and highly complex technical activities comprising a high volume of new information, the management of these projects also posed risks related to decision-making and to choosing the best solutions to ensure their success. We have identified and highlighted the major risks.
The Europeana Collections, inaugurated in 2008, represents the completion of an ambitious project intended to be a journey and an instrument for the access to the culture, the history and the identity of the Europeans. Nowadays, Europeana contains in its collections more than 58 million digital units, organized on domains and themes, from art works, artefacts and books to movies and music. The patrimony of the Europeana enriches yearly and constantly by the contribution of the European Member States, as response to the common aspiration to open the access to knowledge beyond the national or territorial borders. The Europeana Project represents the implementation by digitization of a set of standards and of a unitary approach on the valorisation by digitization of the patrimony of the European states. The country reports reflect the most accurate the measures and the achievements of each contributing state to Europeana. Therefore, the Romanian report – a document updated in January 2019 – presents punctually the achievements of our country and of the Romanian institutions contributing to the enrichment of the Europeana collections. The list of the contributors contains, next to the names of well-known libraries, the name of the National Museum of the Union from Alba Iulia, with 986 digital units. Related to the field of the prints, the National Museum of the Union is not a direct contributor, but its collections are uploaded by other contributors. The National Museum of the Union has a remarkable and extremely valuable collection of Transylvanian books, mainly printed in Cluj during the 18th century. There are 78 titles with a preponderantly religious, juridical and educational content, representing an important segment of the national cultural heritage. The present paper aims to approach the above mentioned works and to identify them in the Europeana collections.
Aims: The paper focuses on the methodological frames of Library and Information Sciences vis-à-vis Web Science in the light of the OECD Fields of Science and Technology Classification. The roots of Library Science and Information Science in Humanities and Social Sciences are described. The technological revolution which took place during and after World War II enabled the development of a new mathematics- and engineering-oriented environment for information. On this basis such new research areas like Web Science emerged. It led to a change towards an interdisciplinary character of Information Science. Method: The OECD Fields of Science and Technology Classification was analysed from the point of view of the Library and Information Science’s place in this classification.
Solutions: In the OECD Fields of Science and Technology Classification Library Science has its independent place within Social Sciences while Information Science is dispersed between three main sections. It confirms the interdisciplinary character of Information Science and sets up its name as a superior covering traditional Information Science and all of new mathematics- and engineering- based research areas dealing with information. Although the name Web Science is not mentioned in this classification, we can assume that it is a sub-discipline of Information Science in the light of the OECD classification. Polish implications are mentioned.
The Open Science concept represents a new approach to the way in which scientific research based on cooperation and new ways of knowledge dissemination is carried out and organized, using new digital technologies, new tools for collaboration, and R&D infrastructure to ensure open access to research data.
This study uses data collected in May - July 2018 within a survey that aimed at investigating the scientific data ecosystem in the Republic of Moldova. Findings show that, although there are some concerns about the loss of property rights and copyright infringement in case of sharing and open access to research data, Moldovan academia is ready to provide access to research data. The research has highlighted that a new challenge is needed to solve scientific data issues by creating new type of infrastructure to ensure data retention, broad access to research results for the purpose of their dissemination and use, and creating new research opportunities based on research data.
The aim of this paper is to prove the usefulness of graphs in solving an ever-present problem for library users: finding books they like and they are looking for. Graphs are known as an important tool in solving conditioned optimization problems. We propose a graph-based system of recommendation which can be easy used in a library for assisting and helping users in finding in real time the books they like. The main advantage of the proposed graph-based approach lies in the ease with which new data or even new entities from different sources are added to the graph without disturbing the entire system. The system uses the similarity scores in order to find the similarity between objects and to get the best recommendation for a user’s request. In the end, we will compare the results from used formulas..
In recent years, there has been increasing interest in the field of natural language processing. Determining which syntactic function is right for a specific word is an important task in this field, being useful for a variety of applications like understanding texts, automatic translation and question-answering applications and even in e-learning systems. In the Romanian language, this is an even harder task because of the complexity of the grammar. The present paper falls within the field of “Natural Language Processing”, but it also blends with other concepts such as “Gamification”, “Social Choice Theory” and “Wisdom of the Crowd”. There are two main purposes for developing the application in this paper:
a) For students to have at their disposal some support through which they can deepen their knowledge about the syntactic functions of the parts of speech, a knowledge that they have accumulated during the teaching hours at school
b) For collecting data about how the students make their choices, how do they know which grammar role is correct for a specific word, these data being primordial for replicating the learning process