Spinitron automated playlist bias

Below is an email to spinitron tech support - identifying a consistent bias in spinitron’s automated playlists - favoring bootleg/pirate/plagirized reissues at the expense of legitimate reissues - posted here so “everyone can benefit”

==============

Hi Spinitron People

station: KPOO

username: McSchmormac (KPOO)

I’m not sure who to address this to, I’ve been using the automated playlist generator of spinitron for a few months, and I have issues of concern about the automatically generated content - there is something of a pattern I’ve been observing, whereby a release is mis-identified one of its own rip offs.

Here are four examples:

  1. When I play tracks from “Anthology of American Folk edited by Harry Smith” by Smithsonian Folkways

https://folkways.si.edu/anthology-of-american-folk-music/african-american-music-blues-old-time/music/album/smithsonian

Spinitron, EVERY TIME, identifies these as being from a pirated knock-off version of this set on DOXY records called “Harry Smith’s Complete Anthology of American Folk Music”

  • they have taken the entire Folkways set, maintained the original sequence, duplicated the Folkways masters, which were restored and considerable expense and re-packaged them for re-sale to an unsuspecting public

You can read further about DOXY here:

and here:

The consensus among music lovers and collectors seems to be that DOXY is a shabby rip-off label best avoided.

  1. I have played numerous selections from “Opika Pende - Africa at 78RPM” on the Dust-to-Digital label - a grammy-nominated collection of never-before released on CD African recordings from the private collection of one person: https://latimesblogs.latimes.com/music_blog/2011/12/opika-pende-africa-at-78-rpm-resurrects-a-continents-music.html

Every time, spinitron identifies tracks from a more recent multi-volume collection of African 78RPM recordings taken directly from Opika Pende - called “Originated In Africa” on Count Records

  1. There is grammy-nominated a 4cd set of SouthEast Asian 78RPM recordings on Dust-to-Digital "Longing For The Past - The 78rpm Era in SOuthEast Asia"https://ethnomusicologyreview.ucla.edu/content/review-longing-past-78-rpm-era-southeast-asia

Again, this is a set of rare gramophone recordings never-before released on CD,

but, whenever I play trcks from this set, Spinitron automatically recrords as these as coming from this knock-off multi-volume set: “Forgotten SOunds of Southeast Asia” again on COUNT records

  1. “Greek Rhapsody” on Dust-to-Digital https://www.dust-digital.com/greek/

a collection of Greek instrumental records from the 78rpm era

On several occasions when I play selections from this release - they are mis-identified by spinitron as being from a derivative pirate release called “Greek Instrumental” on COunt Records

Besides Doxy, Count, Round, Hellencic - there are numerous labels who are unscrupulously pirating re-issues produced by other labels - sometimes duplicating with sub-standard quality - they re making their pirte releases on all the streaming platforms.

I am concerned that by automatically listing them on spinitron, spinitron is doing a disservice to music lovers, and record collectors, and labels who put so much care into presenting these almost-forgotten recordings to a new audience. Unless I manually correct the spinitron entires for my program, I am perpetuating this proliferation of bad information, and helping pertuate the enrichment of pirates and bootleggers at the expense of labels like Smithsonian Folkways and Dust To Digital who do such valuable work preserving recorded history.

Is there anything spintron can do to adjust their algorithms so the sketchy overseas labels are not being prioritized over the highly respected American labels?

I asked @McSchmormac to post this here since it allows us to explain a bit about how automatic music recognition works and discuss what might be done to improve it.

Background

ACRCloud

The recognition service is provided by ACRCloud who have partnered with streaming service providers, digital music distributors and others to obtain fingerprints of well over 65 million songs. The fingerprint database is built automatically by ingesting music available in the digital music supply chain.

ACRCloud provides the technology (software), the infrastructure for stream monitoring and data processing, and the database of fingerprints.

When a station uses recognition in Spinitron, we set up ACRCloud to listen to the station’s stream and send recognition results to Spinitron’s servers which puts them into playlists.

Psycho-acoustic fingerprint recognition

Fingerprint technology uses models of how humans hear and recognize music. It works independently of the digital file format or analog recording medium, the stream encoding, trans-coding and all the rest. As ACRCloud puts it: “A robust acoustic fingerprint algorithm must take into account the perceptual characteristics of the audio. If two files sound alike to the human ear, their acoustic fingerprints should match, even if their binary representations are quite different.”

Recording versus release

It’s normal that a given recording appears on a number of releases. Some recordings have been released hundreds of times. Given how the fingerprint recognition works, there’s no way for the system to distinguish which release the DJ played the song from.

Implications

The above leads us to two key insights

  1. Spinitron does not and cannot know which release you are playing from among those that include the recording

  2. If a recording is available in a commercial release via ACRCloud’s upstream data providers, it will be associated with that release in the database.

In practice this means that when the recording you play on the radio was released more than once then Spinitron’s choice of release is pretty much arbitrary. Every station and DJ using recognition has noticed this already.

Some DJs correct the recording when needed while others allow it. Some DJs want to log the exact release they played while others don’t mind so much feeling that the artist and recording matters more. All are reasonable positions that Spinitron wants to support.

The ethics of release choice

@McSchmormac and colleagues brings up a very important point. They have shown that there is a legitimate ethical dimension to the choice of release. I would far prefer to see Smithsonian Folkways and Dust To Digital get the credit than other labels pointed out by the OP. (Some years ago, Spinitron worked with Smithsonian Folkways to get their catalog into the auto-complete database since their catalog wasn’t in the commercial upstream feed we were using.)

I’d like to do something about this but the approach to take isn’t obvious. We need to discuss the possibilities and maybe do some research to figure it out.

What to do?

Seems to me there’s a couple of approaches and we should investigate both.

  1. Influence the upstream data and hence what’s in ACRCloud’s fingerprint database

  2. Modify Spinitron’s software to influence the outcomes.

Is Smithsonian Folkways’ catalog (or others of interest) in ARCCloud’s fingerprint database?

If not then this needs to be fixed or there’s not much Spinitron can practically do. Let’s find out if the catalog in the digital supply chain?
- If it is in Spotify, Deezer, Apple music then yes, otherwise quite possibly not.
- Ask the labels in question what distribution channels they use.

If, as I suspect, Smithsonian Folkways is not distributed globally through the mainstream commercial supply chains then they could either work on that or they could work with ACRCloud to index their recordings.

Can the shady labels be removed from ACRCloud

If what the shady labels are doing actually violates copyright law, i.e. they are pirate labels, then there’s a case to ask ACRCloud to remove their catalogs. If the catalogs we want are in and those of the pirates are purged then the problem will be solved. This is I think the best solution, if it can be accomplished. Otoh it’s not clear to me that what Doxy does is actually illegal, as opposed to unethical, and Spinitron is not going to get involved in arbitrating the difference.

What can Spinitron do?

If the catalogs of interest are in ACRCloud’s fingerprint database but are not getting into Spinitron playlists then we can do something in our software. There are a few things we could do such as more elaborate logging so we can see the raw data we get from ACRCloud or introducing user-configurable filters. But I don’t want to spend development effort on that until the details of the current situation are more clear.

Next steps

There are some things that people in the community could help with:

  • Are the catalogs of the labels we’re interested in available in the digital music supply chain? We can look to see (as mentioned above) or we can ask the labels directly?

  • If so, find out if the catalogs if there’s something preventing it.

  • If so, find out if the labels can work with ACRCloud so they will ingest and index their catalogs.

  • Spinitron can look into adding some server-side logging of ACRCloud recognition data so we can find out what was in specific callbacks of interest.

Thanks for your reply Tom.

Seeing as it looks like I’ll be posting here a lot, here are a few details about my programming:
For over a decade I have been focussing my radio programming around historic/vintage/archive recordings from the gramophone 78RPM era - covering a lot of different styles and periods up to and including 1950, I generally don’t go past 1950.
I have been presenting 2-3hour weekly programs around this format on KPOO 89.5fm for 5+ years, here, are my past spintron log entries: https://spinitron.com/KPOO/dj/73454/McSchmormac

Part of my personal mission is to highlight the latest quality reissue releases, of recordings from this era - and help support the people who are doing such valuable work in preserving and disseminating the recordings from this period, often for little or no financial gain, some of whom I consider to be friends of mine. The work they do is like manna from the heavens for me, it forms the lifeblood of my programming - and it pains me to see their works plagiarized by others who are misappropriating their work to make streaming revenue at others’ expense

I transitioned to V.2 with the automated playlists in the past few months.

I want to reiterate that my problem is not that spinitron automated playlist mis-identifies the recording/release - it is that I am seeing a consistent bias in the mis-identification of recordings favoring releases that are effectively pirated version of those I am playing, and by doing-so; enables and empowers these Pirates at the expense of the curators of the reissues the Pirates are drawing from.
The ethical dimension is a very real thing for me, because if I don’t manually correct these incorrect listings, I myself become an accomplice to these Pirates, and I cannot and will not allow this. This “matters” to me very much, so much so, that I have decided going forward I will post here on a regular basis, listing those incorrect recording/release identifications that most cause me pain - and some explanation as to why this is the case.

Before I do this I just want to confirm the the Folk Anthology on Folkways is definitely in spinitron’s system - I am not so sure about all of the Dust-To-Digital releases, Opika Pende definitely is in its entirety, but as mentioned - every time I play sometimg from the Smithsonian-Folkways set - it is listed as being from the DOXY set, this does not seem arbitrary and random to me, so far the only time v.2. generated a listing for the folkways set was when I played something from a completely different release - that ironically was neither Smithsonian Folkways no DOXY - it was a collection of 78rpm Mississippi String Band recordings on the highly-respected U.S.-based COUNTY label.
Here are some of this week’s:

For a bit of background - here is a story - once upon a time a guy in Greece decided that he would gather all the reissues in the world of 78rpm Greek music, from U.S. & U.K. reissue labels - and copy them and market them as his own productions, having achieved that he decided to do the same with reissues of everything Cuban and Argentinian - systematically duplicating the entire catalogues of Spanish-based labels El Bandoneon & Tumbao Cuban Classics, and he didn’t stop there, he created multiple labels copying not just every type of ethnic music, but branched into all sorts of American vernacular music and a wide range of popular musics. To protect his identity I will refer to him as “Greek Dick”.

This week’s program featured 2 tracks from Dust To Digital’s Opika Pende - 4CD of African 78s - mentioned in my previous post - these were incorrectly identified as being from two different releases on two different labels, both believed to be operated by Greek Dick - see image “Ethnic Music Classics on 78rpm - West Africa” on MUSICAL ARK, and “Vintage African Music” on Jober Entertainment. The recordings on Opika Pende when released in 2011 were made available for the first time in digital format, from a private collection of rare original discs, it is very unlikely any that contain large swathes of the selections on Opika Pende were sourced anywhere other than from Opika Pende itself.

Here is an interview with esteemed ethno-musicologist Dick Spottswood in which he mentions a series of releases he curated for the U.S. Rounder label, when he had access to the Library of Congress 78rpm archives:
http://soundamerican.org/sa_archive/sa4/sa4-ian-nagoski-interviews-dick-spottswood.html

These have been fair game for Mr. Greek Dick. Three slections on my program came from the Spottswood sets on Rounder, but have been identified as Greek Dick’s Knock-offs:

Klezmer Pioneers on Rounder : https://www.discogs.com/Various-Klezmer-Pioneers-European-And-American-Recordings-1905-1952/release/7123007
repackaged as “Klezmorim (Early Klezmer Recordings)” on MUSICAL ARK (again!)

Great Voices of Constantinople on Rounder:


repackaged as Athaneiko Romeiko on Early Licence Group LLC"

Calypso Pioneers on Rounder:


repackaged as Calypso Pioneers on BLACK ROUND RECORDS

There are three listings attributed to VINTAGE MASTERS INC - which at the risk of sounding paranoid - seems to using my playlists to source more recordings to pirate - I don’t know for sure if Greek Dick is behind this operation, but it’s consistent with his others.

That’s it for this week.

1 Like

@tom just curious, when ACRCloud matches to a fingerprint, does it return a list of viable candidate matches? Or is that decision made on their end, before it ends up at Spinitron? If the former, a modal that allows the user to select the release to attribute to could be a solution here.

1 Like

It does return a list of candidates but it wouldn’t help for this problem. It returns candidates that are different recordings, i.e. completely different pieces of music with fingerprints that match statistically to some degree. We always choose the one with the highest score, i.e. the best match. What you need, iiuc, is to select from all the albums that the correct recording appears on.