Classical Music Index schema

This document describes the schema of the Classical Music Index (CMI).

The goal of CMI - and hence of this schema - is to help people find and discover classical music: old and new, well-known and obscure. CMI is not meant to store detailed musicological metadata beyond what might be useful for these purposes. When we discuss issues like what 'instrument' means (see below) our goal is not to make editorial judgements, but to anticipate the needs to people looking for music.

CMI is seeded with Mediawik-format data from IMSLP, and its schema is influenced by the structure of IMSLP's data. But CMI doesn't try to including everything in IMSLP, and it also includes things like concerts that are not present in IMSLP.

The CMI schema involves a number of 'classes'. Each class has a set of 'attributes', some of which can refer to items (of the same class or other classes).

Classes have names like 'composition'. To avoid confusion, when we refer to a class we use capitalized boldface, e.g. Composition. Otherwise we just mean the (possibly vaguely defined) word, e.g. 'composition'.

What's a composition?

The central class in CMI is Composition. An entry can represent:

The 'parent' link allows potentially arbitrary levels of hierarchy. This raises the question of what should be considered a 'composition'. In general, our principle is:

A composition is something that could plausibly be played as a unit.

Let's look at some examples.

Sections of compositions

How should we handle subdivision? For example, if a composition consists of 30 variations, should we treat each one as a Composition?

In general, yes. Because:

The easiest way to accomplish these goals is to model each variation as a Composition.

On the other hand, a Minuet might contain a Trio. We don't represent the Trio as a Composition because it wouldn't be played separately from the Minuet.

Some compositions have a more complex structure:

We could represent these structures as hierarchies of Compositions, and for some applications of CMI (like generating a program for a concert) this might be useful. But for simplicity we don't do this.

Collections of compositions

Compositions can be collected in various ways. Should a collection be treated as a Composition? Examples:

In summary:

Versions of compositions

Some compositions were significantly revised after their first publication. If this involved adding or removing sections, then the revision is a separate Composition. Otherwise whether to separate it is a judgement call: e.g. is the revision sufficiently different that someone might rate it differently than the original?

Relation to Scores

CMI has another class called Score, representing published scores (physical or digital). Compositions should not be confused with Scores:

What's a score?

A Score is a graphical representation of a collection of pieces (e.g. Beethoven's sonatas vol. 1) or a single piece., or a file from which such representations can be generated.

If the composition is for multiple instruments, the score might have separate parts for each instrument. The Score represents the whole collection.

A Score might be

Describing instrumentation

A composition's 'instrumentation' is the set of instruments or voices normally used to perform it. This is described by two classes: Instrument represents instruments (piano, violin, etc.) and Instrumentation represents a list of (count, instrument) pairs: for example, '1 piano and 3 violins'.

Each Composition has one or more Instrumentations. If a piece is for a solo instrument, the Instrumentation is a singleton, e.g. '1 piano'. There are infinitely many possible instrumentations; the Instrumentation class includes only those that are used in at least one Composition.

It's not always clear what should be considered an Instrument. The main criterion (see above) is: what will help musicians find music most easily? Some cases:

What if a piece is written for orchestra? It would be cumbersome to list the complete set of instruments. So instead, we treat 'orchestra' as an Instrument, along with some other non-specific combinations: 'female chorus, 'guitar ensemble', 'theater orchestra', etc.

Instrument hierarchy

There is an inclusion relationship ⊂ between Instruments:

CMI currently doesn't model this relationship. Doing so might be useful, but it would add complexity. In particular, we'd need to address the question: if a user searches for music for soprano, should we show them music for specializations (like coloratura soprano)? For generalizations (like female singer)?

Locations

In CMI composers can be linked to the countries where they lived, and concerts can be linked to the city where they took place. We want to let users search for music from, say, Southeast Asia, or from New York City.

Hence we need a way of representing a hierarchy of locations. This is done using two classes. Location type has items continent, subcontinent, country, province, city. These have an implicit containment relationship: a city is in a province, and a province is in a country.

Entries can be entities that no longer exist (like Bohemia) or that used to be a different type (e.g. Venice used to be a country).

The Location class represents geopolitical entities. Each item can be linked to a 'parent' item at a higher level of the hierarchy. For example, the parent of 'Germany' (country) might be 'Europe' (continent).

Each item has a name, and the (name, parent) combination must be unique. This means, for example, that the parent of 'Springfield' must be e.g. the province 'Illinois', not the country 'United States', since there are many Springfields in the United States.

CMI's search features take hierarchy into account. For example, if you search for East Asia, you'll get items that are linked to Tokyo (because Tokyo is in Japan, and Japan is in East Asia).

The attributes of Location include two versions of the location name:

It could also have a third version: resident name ('Dane' in this case). But in almost all cases except Denmark, this is the same as the adjective.

Note: you'd think that a location database of this sort already exists. I was unable to find one.

Period

IMSLP has a notion of 'period': Baroque, Classical, Romantic, etc. CMI includes this data, but de-emphasizes it. 'Period' is a misnomer: these terms are actually styles or genres, they're useful only for Western music, and they're subjective. On the other hand, users might want to include these as search criteria. It might be more useful to have a system of 'tags'.

Race and ethnicity

The idea of distinct races seems outdated. But it's likely that some users will want to search for music by black composers, or to study the racial breakdown of concerts.

To avoid going down rabbit holes, CMI uses the list on U.S. government forms: American Indian, Asian, Black, Hispanic, Pacific Islander, and White. This is a bit odd - 'Hispanic' is an ethnicity, not a race - but there it is. People can be linked to zero or more of these, and you can search on them.


CMI classes

Name Description Attributes Links to Linked from
Location_type See above name Location
Location See above
  • Name (e.g. Italy, France)
  • Name_adjective (e.g. Italian, French)
  • Name_native (e.g. Italia)
  • Type (link to Location_type)
  • Parent (smallest enclosing Location)
Unique (name, type, parent)
Location Location, Person, Ensemble, Organization, Venue
Sex Male, Female. Could potentially include trans female etc. name Person
Ethnicity see above name Person
Period See above; Renaissance, Baroque, Classical, Romantic, Early 20th Century, Modern. name Person, Composition
Person A person
  • Name (last, other) (not unique)
  • birth/death dates (year/month/day)
  • birth/death places (links to Location)
  • Locations (links to Location)
  • Periods (links to Period)
  • Sex (link to Sex)
  • Race (link to Race)
  • Ethnicity (link to Ethnicity)
Person_role
Language A written language name Composition, Score
Composition_type E.g. Operetta, Sonata, Minuet, Theme and Variations, etc.
Directed acyclic graph structure. A type can be included in one or more "parent" types.
  • Name
  • Parent (link to Composition_type)
  • Ancestors (links to Composition_type)
Composition_type, Composition
Ensemble_type Examples: Opera Company; Orchestra; String quartet
This overlaps Instrument, and maybe it shouldn't be a separate table.
name Ensemble
Ensemble e.g. NY Philharmonic.
  • Name (not unique)
  • Start/end dates
  • Type (link to Ensemble_type)
  • Location (link to Location)
  • period (link to Period)
  • Members (links to Person_role)
Performance
Organization_type Music publisher, record company, conservatory, concert series name Organization
Organization A company or nonprofit or informal organization
  • Name
  • Type (link to Organization_type)
  • start/end date
  • Location (link to Location; typically a city)
  • URL
Score, Release
Instrument see above name Instrumentation, Person_role
Instrumentation see above A list of (Instrument link, count) pairs. Composition, Score
Role Roles that a person might have in a composition or performance.

Examples: performer, composer, lyricist, conductor, narrator, arranger, member, musical director.

name Person_role
Person_role A person's role in a compositions or performances

Example: (Grigory Sokolov, performer, piano). This is one record, referenced from all Performances in which he played piano.

  • Subject (link to Person)
  • Role (link to Role)
  • Instrument (link to Instrument, if role is performer)
Person, Role, Instrument Composition, Performance, Concert
Composition see above
  • Title
  • Opus or catalog#, e.g. 'Op. 57' or 'BWV 933'
  • Date first published
  • Date composed
  • Dedication
  • Tempo marking(s), e.g. 'Allegro Moderato; Andante'
  • Metronome marking(s), e.g. 'quarter=120; dotted eighth=96'
  • Key(s). The main key of the pieces, e.g. a, A, Ab, cs
  • Time signature(s). e.g. '4/4; 3/8; 4/4'
  • Types (links to Composition_type)
  • Creators (links to Person_role) e.g. could have multiple composers, or composer and lyricist
  • Parent (link to Composition; if movement)
  • Children (if compound composition; list of links to Composition)
  • Arrangement_of (link to Composition, if arrangement)
  • Language (link to Language; if has words)
  • Instruments (link to Instrument_combination). If concerto, the solo instruments.
  • Period (link to Period)
  • Average duration (sec)
For sub-compositions and arrangements, most fields are unpopulated; values come from the parent. The title is that of the movement.
Composition_type, Person_role, Cmposition, Language, Instrumentation, Period Composition, Performance, Score
License e.g. 'Public domain', 'Creative Commons 3.0' name Score, Release
Score see above
  • Compositions (links to Composition)
  • Parent (if sub-score)
  • Instrument_combo (if sub-score)
  • Publisher (link to Organization)
  • License (link to License)
  • Languages (links to Language, if the score contains editorial text)
  • Date published
  • Edition number
  • Page count
  • File format (if virtual)
  • URLs (if virtual)
Composition, Organization, License, Language
Venue A physical performance venue: concert hall, house, etc.
  • Name
  • Audience capacity
  • Location (link to Location)
  • Address
  • Start/end dates
Location Concert
Performance A past or future performance of a composition, either in a concert or a studio. If the performance was recorded, includes information about the resulting audio files.
  • Composition (link to Composition)
  • Performers (links to Person_role)
  • Files (names and descriptions of files)
Composition, Person_role Concert
Concert A past or future concert
  • start date/time, duration
  • audience size
  • Venue (link to Venue)
  • Sponsor (link to Organization)
  • Program (links to Performance)
Venue, Organization, Performance Performance
Release A publicly accessible package of one or more recordings. Could be a CD, LP, DVD, YouTube video, etc.
  • title
  • release date
  • catalog #
  • URL (if relevant)
  • License (link to License)
  • Recordings (links to Recording)
  • Publisher (link to Organization, e.g. the record company)
License, Performance, Organization

Implementation notes


Picture

A diagram of some of the tables and their relationships. An arrow means items in one table link to items in another.

Copyright 2025 © David P. Anderson