Cloud Tracks

Save (your data from) Endomondo Month!


I hereby dub December, 2020, “Save your data from Endomondo” month. Why?

Endomondo’s retiring from the game.

So, given this state of affairs, it would be wise to ensure your data on the Endomondo platform is exported to somewhere. I made a request via their site to get all 789 of my workouts from there and a few days later, I got an archive that included this folder structure:

I wanted to do some analysis on my workout data, so I created a really simple ingestion tool that takes the data from the json documents in Workouts/ and inserts them into a SQL Server database.

The tool can be found in this repo.

The key thing about this tool is that I had to fiddle with Endomondo’s JSON output to get it to play nice with my approach to serialization:

I’m not super-proud of it, because it could be very finicky, but it got the job done for my purposes. I deliberately rejected pulling in the available lat-lon data from the runs, because I wasn’t interested in it for the moment, but a slight modification to the approach I’ve taken will accommodate that.

So, I’m glad the data is ingestible now, and I hope to do some cool stuff with it soon.


Back to the Sky: Processing Satellite Data Using Cloud Computing

From time to time, I work with researchers on projects outside of the day to day, forms-over-databases stuff. Mind you, I like a nice form over a cool data repository as much as the next guy, but it’s definitely cool to stretch your arms and do something more.

So, when Dr. Ogundipe approached me about cloudifying her satellite data processing, I had to do a lot of research. She had a processing pipeline that featured ingesting satellite images, doing some data cleanup, then analyzing those results using machine learning. Everything ran on local machines in her lab, and she knew she would run into scaling problems.

Early on, I decided to use a containerized approach to some of the scripting that was performed. The python scripts were meant to run in Windows, but I had an easier go at the time getting Linux containers up and running, so I went with that. Once the container was in good order, I stored the image in the Azure Container Registry and then fired it up using an Azure Container Instance.

Like a good story, I had started in the middle – with the data processing. I didn’t know how I would actually get non-test data into the container. Eventually, I settled on using Azure Files. Dr. Ogundipe would upload the satellite images via a network drive mapped to storage in the cloud. Since I got to have some fun with the fluent SDK in Azure a while back, I used it to build an orchestrator of sorts.

Once the orchestrator had run, it would have fed the satellite images into the container. Output from the container was used to run models stored in Azure ML. Instead of detailing all the steps, this helpful diagram explains the process well:

Super simple.

No, not that diagram.

The various cloud resources used to process satellite data in Azure.

So, I shared some of this process at a seminar Dr. Ogundipe held to talk about the work she does, and how her company, Global Geo-Intelligence Solutions Ltd uses a pipeline like this to detect locust movement in Kenya or the impact of natural disasters and a host of other applications of the data available from satellite images.