Skip to main content

Programming an Android DAW: Storage

    One of the projects I was interested in creating was a variation of a simple DAW (Digital Audio Workstation) for Android. Of course, DAW's already exist for Android (mostly just desktop DAW's written in C++ ported for Android) and there's the ten millisecond problem in Android. But neither of the which deterred me from attempting such a feat. However, my lack of audio engineering knowledge and the limited resources on the subject certainly are slowing me down.

    Android provides basic classes for implementing audio recording and playback. Though, it's completely limited and inefficient for many more complex use cases. For instance, a complete cycle from record to playback on most devices takes hundreds of milliseconds to complete. Way above the twenty millisecond maximum before our ears can perceive the desynchronization. There are a couple (not well documented) libraries that attempt to fix this and other problems, such as, SuperPowered (C++ library that you need to port to Android using JNI), Professional Audio SDK (which only works for Samsung devices; yet does seem to bring the latency below twenty milliseconds), and TarsosDSP (Java library with little documentation and no real time support). This problem can be traced back to the fact that the Android operating system can run on so many devices (making it hard to create a unified system since each device manufacturer has different hardware and processor architecture). So, the solution to this specific issue depends on Android and device manufacturers, leaving it out of the hands of developers. Therefore, real time audio applications cannot be created for the Android system yet.

    Okay, so no real time audio manipulations, such as, applying effects on the current instance of recorded audio (ex: plugging in a guitar and having a virtual effect pedal) or playing back the audio just captured (ex: play a note and that sound instantly be played out of a monitoring speaker without delay). But what about non-real time audio manipulations? Like a simple recorder? Surely, we can record audio with the Android platform and play the recorded audio back once we are finished. That is possible since no real time work has been done. And since a DAW is a recorder with multi-track support that should be possible too (without any real time effects).

    Well now that I reasoned that it is possible to create an Android DAW, where do I begin? Since DAW's have been out for quite some time now, I figured it would be easy to find some stepping stones about how they are programmed. I thought wrong. :( There seems to be little resources on the subject (or so my extensive and desperate Googling attempts have yielded). With no starting point or guidance I had to create my own. This can be both good and bad, good: I get to let my creativity shine; bad: I can create an increasingly complex and hard to work with system which might naively ignore important standards. Unfortunately, I have to hope for the best as there really isn't any other option. So, I began attempting to break down the project into smaller tasks and outline some possible scenarios to look out for. It gets pretty complex and I'm still unsure how to go about it but I did decide on a starting point: storage.


    When I record a piece of audio, I need to store that audio in a file somewhere so it can be accessed later. If there's multitrack support (which there should be), there needs to be a place to store all the related audio files. These tracks often start and stop at different times throughout the course of a song, therefore I need to store this information as well. Sometimes, tracks can be split into "smaller tracks" or clips; once done these clips reference different audio files, so, each clip needs to be stored along with information about it. Each track, clip, and the song itself can have related metadata that needs to be stored (artists, file types, bit rate, channels, comments, etc.). And any extras, such as, effects and midi also need to be stored.

    Alright, so this got even more complex real fast. How do I structure this storage? There must be some kind of detailed file structure (perhaps standardized) that I can just use... No. No, there's not. Once again, a headache worth of time of Googling returned nothing useful (perhaps I'm being spoiled to assume the majority of the boilerplate work will be done for me). The problem is that every DAW manufacturer uses a proprietary format and folder structure. The closest thing there is to an open standard is OMF(I). But there is absolutely no examples or explanation of the files and folder structures (not to mention the fact that it seems that no one uses it). There's format files like BWF but this doesn't help in formatting a projects files. So, with no clear path ahead of me, I must make my own. And that's where I am now: creating a project that creates and defines a folder structure for my DAW project. This project will be an open source project hosted on Github which will hopefully be beneficial to anyone with a similar objective.


    Before I can actually start hacking away at code or even outlining a system, I need to understand the terms that I will be using. (The following definitions are in my own words and are related to the specific topic)

  • Digital Audio Workstation (DAW): A piece of software used to create, record, alter, and edit audio and music. 
  • Song: A completed piece of music created by an artist that can be distributed and played.
  • Project / Session: A single piece of work and all its parts that comprise it. The top of the software workstation object hierarchy.
  • File: A place that can store information. Many different types of information can be stored each in many different ways.
  • Track: A piece of a project usually referencing a single category of recording (ex: guitar track). Each track consists of at least one recorded file. 
  • Clip: A piece of a track often created by splitting a single track into multiple parts to edit, remove, or alter the track. Each clip references its own recorded file. A track consists of at lease one clip. A track references its recorded files through its clips.
  • Consolidate: To bring multiple pieces together. Can consolidate all clips in a track into a single track object by merging all the clips audio files into one file. Can also be seen as storing all related information and files together in a folder.

    There seems to be at least three objects that need to be created: Project, Track, and Clip. Metadata can either be defined as a separate object or, since each object has its own metadata, it can be distributed as fields and properties within each object. The objects themselves can be stored in a database (SQLite for Android) and the Files can be stored in an associated folder with the Project. There should at least be an option to include a ContentProvider and a ContentResolver. Here's a visualization of the objects:



    To remove some of the boilerplate code (SQL queries and updates) I should use an Android ORM (Object Relational Mapping) library. There are many choices: greenDAODBFlowSugar ORM, ORMLite, and many more. I kind of like the way ORMLite models its Entities because it reminds me of JPA but it seems too verbose for accessing and manipulating the Entities. Sugar ORM seems very simple to use which is desirable but I don't like the fact that you have to extend their object (what if you have a complex polymorphic inheritance?). GreenDAO and DBFlow boast speed as their advantage but both seem somewhat pointlessly complex and verbose (ex: with DBFlow, you have to spawn a new thread in order to perform your transactions, something I was hoping would be handled by the library). Though, I must make a decision, so, I think I'm going to use Sugar ORM because of its simplicity (no wasting time learning a complex library just so I can make my own).

    So, let's have a look at the relationships between the objects. A Project consists of multiple Tracks and a Track will belong to only one Project making it a unidirectional one-to-many relationship. A Track can consist of many Clips and a Clip will belong to only one Track also making it a unidirectional one-to-many relationship.

Project > Track > Clip


    Each object stores information about it (ex: title, artist, channels, etc.). Luckily I've found some
resources that talk about what metadata to store and how to store it: iXML and archiving multitrack recordings. iXML is a standard adopted by most DAW and music providers. It provides metadata in XML format within a BWF file. However, I'm storing the information in Entity Objects that are getting mapped to a database but I can still use the standard for a basis of what to store. Another question that comes to mind is how do I store effect information? And what if I leave out a feature but want to add it later? Right now I think I'll leave out effects to make this easier and future iterations of the library will just have to be backwards compatible. So I'll map out what I should store with each object.

  • Database ID
  • Project Name
  • Description
  • Length
  • Folder Location
  • Artists / Contributors
  • Location
  • Created Date
  • Comments / Notes
  • Pictures

  • Database ID
  • Track Name
  • Description
  • Created Date
  • Last Updated Date
  • Comments / Notes
  • Artists / Contributors 
  • Track Position
  • Is Muted
  • Sample Rate
  • Channels
  • Sample Format
  • Channel Index
  • Interleave Index
  • File Format

  • Database ID
  • Clip Name
  • Description
  • File Location
  • Start Position
  • Length

     That looks like a good starting point. This does, however, introduce a somewhat more complex structure. For instance, in order to keep track of the Artists and Contributors I need another object, Artist. This Artist object can contain a User ID field that can be application specific. As we can see that we have Artist objects referenced from both the Project and the Track object, so when a new Track is added or a Track is altered, the Project must be updated with the appropriate information. Also, all sample rate and channel information is specified on the Track object which all Clips must adhere to. This should be fine since Clips belong to Tracks and in order to retrieve a Clip you must first have the Track object so you can still have access to that information. Finally, it might be convenient to add a Comment or Note object rather than just using a String. This way we can associate a particular user (Artist) with the comment or note. So, including our new objects we should have something like the following:


    Puzzle is what I'm choosing to call this project because it feels like I'm putting together the pieces of a puzzle and because I'm consolidating numerous pieces into one whole piece. The Puzzle object will be the central part of the library. I want to follow a similar design pattern to that of the Picasso library (seemingly a mix of Singleton and Builder design patterns). So, actually using the library would look something like this:

 Object > Action > Target


    Most of the operations will be CRUD (create, read, update, delete) operations on the Project object cascading throughout all its descendant objects. For example, the Puzzle object should provide a way to create a new Project, retrieve an already created Project, update a Project, and delete a Project. When adding new Files, it should properly handle reformatting them to the appropriate folder location. And it should be able to merge multiple Clip Files into a single File for a track, as well as, split a single File into numerous Files. As mentioned before, I also want a simple way (even for other apps) to query the files, so I'll need a ContentProvider and possibly a ContentResolver. It would also be convenient to have a way to send the files and information to a server, as well as, possibly creating a master track. A lot of these operations are focused on the central purpose of the library (storing and accessing files for a DAW) while others are abstractly related (consolidating multiple Clips into a single Track, splitting a Track into multiple Clips, and creating a master track out of the files).


    You may have noticed that this blog post seems like a rant and that's because it is! I was detailing my thoughts as I reasoned through to my result. I needed a way to store all the information associated with a multi-track DAW project. Facing this problem, I crafted a simple and efficient solution. The solution ended up being a typical storage structure, one that you would see in any app. However, due to the limited resources, sought out standardization support, and my own premature optimization, it was no trivial endeavor. You can follow and contribute to the project on GitHub. Note that features and the structure of the library may change over time. Hopefully, this blog post will be a useful resource for anyone with a similar objective. 


Popular posts from this blog

Face detection and live filters

Live video filters are becoming a popular trend fueled by Facebook (through their purchase of Msqrd) and Snapchat incorporating the features into their apps. These filters apply images or animations to your face using face tracking software. This technology has been around for awhile but is becoming increasingly more common due to the powerful CPU's that our mobile phones now have. Google provides an API that provides face tracking abilities through the Google Play Services library called Mobile Vision. I'm going to use their API to build a basic live filter app. The end result will look something like this:

    The bounding box wraps around the detected face and the sunglasses are the filter I chose (which is just a PNG image) which are drawn over the eyes. You could use any PNG image (with alpha for the background) you want, you will just have to adjust the layout according to where the image should be displayed. As you move your head, the box and sunglasses are redrawn…

Setting Up Connection Pooling With Elastic Beanstalk

Amazon's Elastic Beanstalk is a service which automatically scales your application when needed. It uses Amazon's Elastic Compute Cloud (EC2) instances as deployable containers which when your app requires more resources more containers will be deployed. This removes the need to manually configure your EC2 instance whenever you need more connections or resources and attempts to add simplicity to the maintenance aspect of your application. So, when you get more users of your app, your app will scale accordingly.

    Unfortunately, along with the ability to scale automatically, comes less control and configuration. Things you would normally have the ability to configure to your liking, such as your server, you no longer can. Amazon attempts to address this issue with configuration files. You can provide configuration files which can set up your server. These files are either written in JSON or the horrible format YAML. Though these files provide some level of control, you ca…

Android Guitar Tuner

Recently I created a guitar tuner application for Android that is written with pure Java (no C++ or NDK usage). The design was inspired by the Google Chrome team's guitar tuner web app using the WebAudio API. I wanted to code a version written natively for Android that didn't have to rely on a WebView, the WebAudio APIs, or server-side logic. Also, I wanted this application to be available to as many versions of Android as possible (whereas the WebAudio API may only be supported in more recent versions of WebView available only on newer flavors of Android). So, I coded it from scratch. I used a portion of the open source TarsosDSP project (their YIN algorithm) to help with the pitch detection.

    The application is available in the Google Play Store for Android: The project is completely open source and the code can be found on the GitHub repository: Fi…