10  Data Catalogue

The Open Music Observatory curates, maintains, and disseminates a data catalogue with the resources within the data catalogue: individual datasets and their series and API endpoints where the data can be queried in a custom format. In creating our data infrastructure, we considered the specifications of our dissemination nodes, which provide our data with a wide range of interoperability and easy access: the EU Open Data Portal, Europeana, Wikibase Cloud and Wikidata. From a thematic point of view, we relied on the definition of the EMO feasibility study, and created topical pillars (Section 10.2). The data curators of the Observatory Stakeholder Network (see Annex) and for the duration of the Open Music Europe project, the work packages (WP1-4 represent each “pillar”) can define and provide datasets or data series according to their topical collection guidelines (Section 10.1).

Not updated

Warning

This section was created at an early planning stage, and had not yet been updated. Unfortunately, due to the problems of the WP6 Data Management Plan task we cannot yet show how we will fill up the pillars of the observatory.

A data catalogue formally is a metadata dataset: a dataset on information about our available datasets and their downloadable or queriable distributions. It follows the global World Wide Web DCAT standard.

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogues published on the Web. This document defines the schema and provides examples for its use (Albertoni et al. 2020). It is a global standard, which was further extended and specified for the release of statistical datasets (StatDCAT-AP) and for the needs of the EU Open Data Portal (DCAT-AP). (Sofou and Dragan 2019; Fragkou 2023) These extensions provide further metadata and organisations standards, but essentially they do not change the definition of the global standards.

A data catalogue (dcat:Catalog) represents a catalogue, which is itself a dataset in which each individual item is a metadata record describing some resource: a description of a dataset, a data service, or other type of resource.

dcat:Dataset represents a collection of data, published or curated by a single agent or identifiable community. We currently support two types of datasets: statistical datasets that conform to the datacube definition of SDMX, or collection datasets for microdata, which contain non-aggregated, structured data representing some unity criteria, for example, music works and recordings that have been present in the official radio charts of a given country. The _dataset_, similar to a musical or literary work, is an abstract concept which can be used, downloaded, and stored in its manifestation. For a musical work, a manifestation may be a sound recording or music sheet; for a dataset, it is a distribution. A URI identifies a dataset; the URI does not allow the downloading of the dataset, because it refers to the abstract idea of the dataset; the URL for downloading the dataset belongs to the individual distributions.

dcat:Distribution represents an accessible form of a dataset, such as a downloadable file. When the same dataset is distributed in different file formats (for example, CSV and SPSS files), each distribution is listed in the catalogue separately with a separate download link. Each distribution has its own URL where the dataset can be downloaded.

The music.dataobservatory.eu/tag/music-economy/ URL lists the downloadable datasets on the Open Music Observatory website. They can be found on EU Open Data Portal, too.

In the first days after launching our new service, around 1-10 June 2024, the datasets may be missing from the EU Open Data Portal, which is changing in these days its complete backend, and may have some backlog in accepting our datasets.

dcat:DataService represents a collection of operations accessible through an interface (API) that provides access to one or more datasets or data processing functions. Our datasets are accessible on different platforms with their own datasets, and our internal data-sharing space also has its API. As data is added to the different platforms (EU Open Data Portal for statistical and microdata datasets, Europeana for collections dataset, Wikibase Cloud for further microdata, metadata and collections, and Reprexbase for confidential microdata and collections), we are updating the catalogue with the DataService entries.

dcat:DatasetSeries is a dataset that represents a collection of datasets that are published separately but share some characteristics that group them; for example, a (play)list of sound recordings that were present in the weekly charts or the annual budget of an institution. A time series dataset is usually not defined as a data series, but the new time observations are added to an updated distribution of the time series dataset. Stakeholders who provide data to the Open Music Observatory can commit to making a data series; however, we only define a data series when we have at least two items available from the series.

dcat:CatalogRecord represents a metadata record in the catalogue, primarily concerning the registration information, such as who added the record and when.

10.1 Collection Guidelines

In short, we collect data about music. The initial data collection guidelines of the Open Music Observatory are derived from the EMO Feasibility study. We see them as a starting point for further discussion with the Observatory Stakeholder Network.

The EMO feasibility study

Curators are forming collections with the application of unity criteria which allow them to decide which musical work, sound recording, music enterprise or person is included in a collection list. The curators are responsibility for the comprehensive application of the unity criteria and ensuring that their collections are up-to-date (Wickett et al. 2013).

Some examples of music data curation

Hitlists use some kind of popularity metrics, and they follow rigorous rules which sound recordings are included every week, or year.

Collective rights management organisations create comprehensive lists of works and sound recordings registered for rights protection and exploitation.

Statistical agencies create business registers to carry out data collection.

Who can curate our datasets? Any music professional or scholar can curate datasets in agreement with our Collection Guidelines. The quality review mechanisms will be set by the Observatory Stakeholder Network from a content point of view, and the Open Music Data Exchange from a technical point of view.

10.2 Topical Pillars

The extended five pillars, with sustainability added.
Note

The suggested four-pillar model would categorise data-collection and analysis along the following lines:

  • Measure the contribution of music to the EU’s economic and legal environment, from a systemic perspective (Pillar 1).

  • Monitor the cross-border flows of repertoire, the mobility of artists and diversity (national, linguistic, genre-based) (Pillar 2).

  • Assess music’s impact on society and citizenship: how audiences access and consume music; how citizens participate in professional and not-for-profit music activities; the scale, value and quality of music education and training (Pillar 3).

  • Provide a framework to develop prospective research on the future of the music sector, supporting innovation and developing understanding of emerging practices from various perspectives (business, tech, policy) (Pillar 4)(European Commission et al. 2020, p30).

10.2.1 Music Economy

Note

Main potential data-collection and research areas identified at this stage: ▷ Macro-economic patterns and trends (e.g. employment, revenue, competition) ▷ Value chain mapping and analysis (e.g. characteristics of music organisations, copyright collection, collective management, remuneration of artists, spill-over effects) ▷ Legal aspects (e.g. tax, labour laws, social security, contracts, case law) ▷ Business regulations (e.g. live music regulations, consumer protection, licensing, anti-piracy rules) (European Commission et al. 2020, p114)

According to the EMO feasibility study, one “of the key findings of the AB music working group report was a substantial appetite for cross-sectoral, neutral and comparable data on the music business at EU level. While recent studies (e.g. EY “Creating Growth” study) have attempted to measure the impact of music on the EU’s economy, systematic and comprehensive metrics do not exist at this stage.” (European Commission et al. 2020, p113)

Deliverable D1.1 of Open Music Europe, Economy of Music in Europe: Methods and Indicators identifies critical research questions, data sources and gaps,and data collection methods regarding the economy of music in Europe . (Antal, Kmety Barteková, and Remeňová 2023) The deliverable begins by reviewing definitions of “the music industry”, the categorisation of musical activities within the system of national accounts (SNA) and statistical classifications of economic activity (ISIC and NACE), and the three primary income streams within the music industry (the live music, author or publishing, and recording streams).

It then turns to the topic of value, first identifying the types of value created by musical activity and then considering legal and economic dimensions of valuation.

Our website documents with visualisations each dataset, apart from providing links to the latest distribution downloads with visualisations (in zip) or the access points on the various dissemination nodes. The illustration is an experimental dataset from our background CEEMID catalogue.

After introducing the concept of mixed enterprise and personal surveying as a means of improving insight on informal economic activity in the sector, the deliverable identifies data gaps relevant to national policy in our pilot study target country of Slovakia, critically reviews the data gaps relevant to EU-level policy first identified in the EMO feasibility study,and proposes data collection methods appropriate to filling specified data gaps.

10.2.2 Music Diversity

According to the EMO feasibility study, “creating reliable tools to monitor what kind of repertoire circulates on digital platforms or via radio will require access to vast amounts of data from Digital Service providers (DSPs) or third party aggregators. The notion of European repertoire has to be clarified and very well defined; notion of language, of origin, of nationality, country of production, genres, and it should not be limited to the language sung in a given song. […] A European Music Observatory should also look into the possibility to collect regular data on the circulation of European repertoire at song and/or artist level, considering live performance/radio/ digital use, which will be available at a weekly/monthly/yearly basis to the music sector.” (European Commission et al. 2020, p34)

The Open Music Europe project is developing two tools for capturing and turning the aforementioned data into informative indicators. In WP1, the project is developing big data statistical sampling algorithms to avoid the need for “access to vast amounts of data from Digital Service providers (DSPs)”. WP2 is working on a taxonomy and GDPR-conform representation of the “notion of language, of origin, of nationality, country of production, genres” based on our background (Antal 2020). The result of this work will be the Slovak Comprehensive Music Database, which will create clear taxonomies and allow users and software applications to determine aspects of “Slovakness” for each sound recording.

The possibility “to collect regular data on the circulation of European repertoire at song and/or artist level” is an issue that must be addressed with strict adherence to GDPR. We are developing an opt-in, opt-out mechanism in Slovakia that will be replicable in all other jurisdictions in accordance with GDPR. (For potential non-European artists, we will apply GDPR, too.)

Note

Main data-collection and research areas identified at this stage:

▷ Cross-border circulation of works/repertoire (e.g. building common definition and indicators, mapping of cross-border access, sales and consumption flows

▷ Cross-border mobility of artists and professionals (e.g. cross-border live performances, mobility of professionals, international music events)

▷ Cultural diversity aspects (e.g. languages, genres, types of productions)

▷ Legal aspects (freedom of movement, state aid, etc.) (European Commission et al. 2020, p115)

An interesting opportunity here is that many European countries already collect information on subsidised music operators (e.g. associations or not-for-profit projects), not to mention the wealth of information available through Creative Europe supported initiatives, which could provide this Pillar with interesting data. (European Commission et al. 2020, p116)

10.2.3 Music Society

Regarding music and society, D3.1 considers the reuse of various survey programs with retrospective survey harmonisation, and WP3 is planning to conduct surveys in 2025. Data will be added to the catalogue as it is becoming available.

Note

Main data-collection and research areas identified at this stage: ▷Education, training, personal development

▷Audiences (music consumption, interaction, participation in music events, etc.)

▷Music and society (not-for-profit sector, associations, social inclusion, amateur music, heritage, participation in music)

▷Normative Aspects (broadcasting quota rules, diversity promotion schemes, freedom of speech rules)

▷Music and the environment (carbon footprint of venues, touring, festivals, merchandise manufacture, streaming services; issues around noise/neighbourhood impacts; good practice in these areas). (European Commission et al. 2020, p116)

10.2.4 Innovation

The definition of the Innovation pillar in the EMO feasibility study is more a topic to be covered than a data need description.

This pillar is less data-driven in that it will rely mostly on research conducted on topics relating to changes in the market place, new business models, disruptive technologies, etc. A European Music Observatory will have the latitude to pick certain topics based on priorities and input from sectoral stakeholders. An EMO should consider setting up an “innovation experts’ advisory committee,” constituted of respected professionals in their field who are known for their forward thinking views, to help identify key themes to be studied. (European Commission et al. 2020, p37)

We will initiate an informal music innovation expert’s roundtable to discuss potential data needs in this pillar.

10.2.5 Sustainability

In the EMO feasibility study the definition of sustainability was mentioned among the innovation topics. Because of the triple transition, introducing the Corporate Social Responsibility Directive and the European Sustainability Reporting Standards have increased the interest and need in sustainability data; we decided to create a separate topical pillar for environmental and social sustainability, or governance indicators (ESG.) We will publish datasets that will be used in the value-added service described in Section 9.2.2.