* feat(spotify-api): bulk endpoints
* feat(music-library): allow bulk operations
* feat(spotify): bulk track+album+artist+genre import
* feat(spotify): use bulk import api for user crawl
* feat(spotify): bulk listen insert
For the benchmark case of a new user where Listory imports 50 new listens along with all now tracks, artists, albums & genres we significantly reduced the number of things happening:
Spotify API Requests: 208 => 8
DB Insert: 96 => 8
Tracing Spans: 1953 => 66
Previously it took a minute until the import was started within the normal
loop. This way, we create an import job right after the user logs
in for the first time.
To save on Spotify API requests we have two different classes of
polling intervals:
- all users are polled at least every 10 minutes, this is a safe interval
and no listens will be ever missed
- if a user listened to a song within the last 60 minutes, we poll every
minute to ensure that the UI shows new listens immediately
The current algorithm is CPU intensive and blocks the event loop for
multiple seconds in my deployment. This is not acceptable, as other
requests can not be answered during that time.
I do not have time to fully fix the issue here, but I did implement an
optimization for ALL_TIME reports:
Before, the all time report was generated for every timeFrame since 1970,
which iterated over the listens many hundred times. We can instead only start
the interval at the day of the first listen, and therefore skip 50+ years
of calculations.
This should help with failing health checks while the crawler is running.
Quick math: 10 users, 30 songs each, each song requires at least 3
queries => 900 db queries every minute.
With the default of 10 pool connections, this blocks all available db
bandwidth for some time and causes slow UI and failing healthchecks.
Listory could miss some listens when the Spotify Api misbehaves.
Sometimes the listens added to the last-recently-played endpoint
are showing up are out-of-order.
Because of our optimization to only retrieve listens newer than
the lastRefreshTime we would skip retrieving those out-of-order listens.
By always retrieving the maximum 50 listens we can be reasonably
sure that we got all the listens.
We also had to improve the handling of duplicate listens, as we now
have a lot of them, curtesy of removing the lastRefreshTime
optimization.