RNA Transcription with Swift

This week I decided to have a little fun and break out of the traditional CS algorithms so I went onto exercism.io and grabbed an exercise to work with. The exercism that I chose is called RNA Transcription. I chose it namely because it has a near and dear connection to my background in biology (just in case you didn’t know before I was a developer I was a science teacher).

The Challenge…

So let’s start by going over the challenge that we are trying to solve with the algorithm that we develop. In cell biology when a cell wants to transcribe a gene sequence into a protein it has to transcribe the matching section of DNA into an RNA segment which is then used to build the protein like a set of blue prints. DNA if you need a refresher is a ladder like structure made up of rungs which are built out of nucleotides. There are 4 primary nucleotides which can occur, for our purposes we will use their abbreviations: A, T, G, and C. When a section of DNA goes through transcription it transposes each nucleotide into its compliment so…

  • T -> A
  • G -> C
  • C -> G
  • A -> U (in the case of A the compliment in RNA is actually a fifth nucleotide called Uracil)

If you want to learn more about DNA transcription check out this great article from the Kahn Academy!

So, in terms of code our problem can be stated something like this:

Given a DNA string transpose it and return the complimentary RNA string.

For example: “GCTCATTA” -> “CGAGUAAU”

To swift…

If you download the exercise from exercism.io you will be provided with a set of tests. To summarize these our code must meet the following conditions…

  • We must correctly transpose a DNA string of any length to the complimentary RNA string using the method Nucleotide.complementOfDNA()
  • If for some reason the provided DNA string should contain a nucleotide (a letter) that is not a part of n = A || T || G || C then we should throw TranscriptionError.invalidNucleotide(“n is not a valid Nucleotide”)

To class or to struct…

Based on the above we see that we are going to need an object called Nucleotide and it needs to provide a method .complementOfDNA(). We could do this with either a class or a struct, but really we don’t have a reference type here, a Nucleotide is a value type. All of our tests seem to want to access Nucleotide directly not via reference. Each Nucleotide has its own value, so we don’t really want to be creating referenced instances; this would be like Bob’s DNA causing changes to Charlie’s face, YIKES! So let’s start with a struct which we will initialize with our DNA string…

We will also need to keep track of valid letters, so let’s also include a param to contain these called validNucleotides. It should be noted that we cannot use the default mem-wize initializer here because the test cases require that we be able to initialize Nucleotide and omit the dna parameter name like this Nucleotide(“G”).

method to our madness…

Next we need to provide a method .complementOfDNA() which will do the actual work of transcribing. We can assume a couple of things based on the requirements our tests provide. First, our method will need to return a string, and second our method will need to throw an error. Using a red-green-refactor approach we can simply use a basic forEach block and loop over each letter in our DNA string, test its value, and return the compliment. We will need to store the sequence we produce and we can do that with a variable.

We will start by creating a variable to track each transposition, we will call it sequence. Next we will loop through our dna sequence using a basic forEach block. We need to make sure that each letter n that we evaluate is an A, T, G, or C. We can do this by checking if our validNucleotides string contains the n value, if it doesn’t we will throw an error (we will take a look at this in a moment). Next, we can use a basic if / else test to determine which letter to add to our sequence variable. When we’re done just return the sequence.

If we find that the letter we are trying to evaluate does meet our limitations we need to be able to throw an error in the form of n is not a valid Nucleotide. We can do this with a simple enum.

This is great! It works, it passes all of the tests, and it meets our requirements. But it’s kinda lumpy. First we have this long if / else which is a bit hard to read and probably can be optimized a bit further for performance.

switch it up…

If we have a pretty short DNA sequence it might not matter that much, but DNA sequences can be millions of nucleotides long! For short sequences the performance gain between switch and if / else might be negligible, but over millions of evaluations it could really make a difference! Check it out on GeeksForGeeks. Also the syntax of a switch will allow us to clean this code up a bit.

This refactor allows us to do a couple of great things. First, we get to clean up and improve the readability of this code. Second, we can leverage the default case in switch to handle when to throw our error, this means we can get rid of the validNucleotides parameter from our struct. Finally, while we were at it we might have noticed that injecting the whole string “n is not a valid Nucleotide” feels a little nasty, so why not refactor this a bit too?

Adding a private variable to our struct allows us to clean this up.

Win, win and win! This looks pretty good now, but it’s still kinda long. The whole method is 16 lines if we skip the blank ones. If we think about the problem we are trying to solve it’s actually rather simple, take some letters, swap them for some other letters, and return a string. 16 lines seems like a lot of work just to swap one string for another especially when Swift provides us with some great enumeration tools like map.

maping up the mess…

In Swift, like with many other languages, we have access to some pre-optimized tools to handle enumeration. These help us to clean up our code and set a nice standard of implementation. By leveraging map we can not only help to improve our code performance, but we can also clean things up and reduce code overhead!

With this change we are able to not only utilize the pre-optimization, but we also can take advantage of the fact that map returns a collection. This means we no longer have to keep track of our sequence, and can instead let map do its thing and then use joined() to convert our collection back into a string. Now we’re getting somewhere!

BUT, we still have that darn switch statement hanging around! And while it’s faster than if / else I still feel like there might be a better way to structure our code. Now understand, that we probably can’t get rid of the switch entirely, but what we might be able to do is clean up our method body in a way that is a bit more Swifty and much nicer to read.

enum’anyone…

Check that out! Now, I understand that yes we still have a switch statement, having simply pushed it off into an enum. However, what we have gained in this should be considered. First, our method body now reads in a very simple context

Look at our DNA values, if we can transcribe a nucleotide do it, if not, throw an error.

That’s it! It’s so simple, it’s only 7 lines of code, and it reads like a sentence. Second, our code is very organized, one of the things I love about Swift is how neat and tidy code can be when the tools it offers are leveraged correctly. We now have a clear definition of our Nucleotide struct, we have a clear public interface, and we have a clear private interface. Easy to read, easy to understand, and easy to maintain.

Wrapping up…

I had a lot of fun with this algorithm, I know it’s not a CS classic, but that’s part of what I loved about it. I don’t have a degree in CS, which is fine, I thrive in the land of implementation. But this doesn’t have to mean that I don’t understand how to think in terms of algorithms. As development continues to evolve I think that this is a very important distinction to make. Changing the way that we view and approach the knowledge around algorithms is a critical part of growing the craft.

Until next time…

Check out the code on GitHub.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s