Skip to content

Commit

Permalink
[Approaches]: Protein Translation (#352)
Browse files Browse the repository at this point in the history
  • Loading branch information
glaxxie authored Feb 15, 2024
1 parent 833da85 commit 561ebe0
Show file tree
Hide file tree
Showing 6 changed files with 290 additions and 0 deletions.
28 changes: 28 additions & 0 deletions exercises/practice/protein-translation/.approaches/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"introduction": {
"authors": [
"glaxxie"
],
"contributors": []
},
"approaches": [
{
"uuid": "11aa2b40-050c-433b-b3db-3e5d88e23826",
"slug": "regex-switch",
"title": "regex switch",
"blurb": "Approach using regex and switch statement",
"authors": [
"glaxxie"
]
},
{
"uuid": "05c3c31a-24cb-4696-9266-bc8b5ecae54e",
"slug": "substring-hashtable",
"title": "substring hashtable",
"blurb": "Apparoach using substring and hashtable",
"authors": [
"glaxxie"
]
}
]
}
86 changes: 86 additions & 0 deletions exercises/practice/protein-translation/.approaches/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Introduction

There are many idiomatic approaches to solve the exercise Protein Translation.
The `substring` method could be used tandem with a `hashtable` to look up values.
Utilizing `regex` combine with the flexibility of `switch` statement to arrive to a neat approach.

## General guidance

The main object of this exercise is to deal with the input string by chunks of three characters, each chunk (codon) can be translate into a protein.
If a codon has a `STOP` value then the translation process is halted.


## Approach: `substring` and `hashtable`

This approach use `substring` method to look up a section of the input string, then use that value to retrive the result from a `hashtable`.

```powershell
Function ProteinTranslation() {
[CmdletBinding()]
Param(
[string]$Strand
)
if ($Strand.Length % 3) {Throw "Error: Invalid codon"}
$Proteins = @()
$codonsToProteins = @{
"AUG" = "Methionine"
"UUU" = "Phenylalanine"
"UUC" = "Phenylalanine"
"UUA" = "Leucine"
"UUG" = "Leucine"
"UCU" = "Serine"
"UCC" = "Serine"
"UCA" = "Serine"
"UCG" = "Serine"
"UAU" = "Tyrosine"
"UAC" = "Tyrosine"
"UGU" = "Cysteine"
"UGC" = "Cysteine"
"UGG" = "Tryptophan"
"UAA" = "STOP"
"UAG" = "STOP"
"UGA" = "STOP"
}
for ($i = 0; $i -lt $Strand.Length; $i+=3) {
$Protein = $codonsToProteins[$Strand.Substring($i, 3)]
if ("STOP" -eq $Protein) {break}
if ($null -eq $Protein) {Throw "error: Invalid codon"}
$Proteins += $Protein
}
$Proteins
}
```

For more information, check the [`substring` and `hashtable` approach][approach-substring-hashtable].


## Approach: `regex` and `switch` statement

This approach utilize `regex` to deal with the input, then use `switch` statement to get to the result.

```powershell
function ProteinTranslation {
[CmdletBinding()]
Param(
[string]$Strand
)
$codons = $Strand -split "(\w{3})" -ne ""
switch -Regex ($codons) {
"AUG" { "Methionine" }
"UU[U|C]" { "Phenylalanine" }
"UU[A|G]" { "Leucine" }
"UC[U|C|A|G]" { "Serine" }
"UA[U|C]" { "Tyrosine" }
"UG[U|C]" { "Cysteine" }
"UGG" { "Tryptophan" }
"(UAA|UAG|UGA)" { break }
Default {Throw "Error: Invalid codon"}
}
}
```

For more information, check the [`regex` and `switch` statement approach][approach-regex-switch].


[approach-regex-switch]: https://exercism.org/tracks/powershell/exercises/protein-translation/approaches/regex-switch
[approach-substring-hashtable]: https://exercism.org/tracks/powershell/exercises/protein-translation/approaches/ordered-hashtable
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Using `regex` and `switch` statement

```powershell
function ProteinTranslation {
[CmdletBinding()]
Param(
[string]$Strand
)
$codons = $Strand -split "(\w{3})" -ne ""
switch -Regex ($codons) {
"AUG" { "Methionine" }
"UU[U|C]" { "Phenylalanine" }
"UU[A|G]" { "Leucine" }
"UC[U|C|A|G]" { "Serine" }
"UA[U|C]" { "Tyrosine" }
"UG[U|C]" { "Cysteine" }
"UGG" { "Tryptophan" }
"(UAA|UAG|UGA)" { break }
Default {Throw "Error: Invalid codon"}
}
}
```

This approach utilize `regex` and `switch` statement to work with strings.

First, the string being split into an array of strings by length 3.
When a string length is not divisible by 3, the last string will simply be a string of lenght less than 3.

```powershell
$codons = $Strand -split "(\w{3})" -ne ""
```

Next we utilize the flexibility of `switch` statement in Powershell to translate these strings of codons into the correct protein name.
We set the `-Regex` flag for `switch` statement so it can match `regex` patterns of codons to correspondent proteins.

```powershell
switch -Regex ($codons) {
"AUG" { "Methionine" }
"UU[U|C]" { "Phenylalanine" }
"UU[A|G]" { "Leucine" }
"UC[U|C|A|G]" { "Serine" }
"UA[U|C]" { "Tyrosine" }
"UG[U|C]" { "Cysteine" }
"UGG" { "Tryptophan" }
```

If the codon match any of three terminating codons (`STOP` value) then we simply just `break` out of the `switch` statement, and end the translation there.

```powershell
"(UAA|UAG|UGA)" { break }
```

Anything else and it would be an invalid codon and should throw an error.

```powershell
Default {Throw "Error: Invalid codon"}
```

If no error were thrown, an array of proteins is now being returned.

[Regular expression.](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_regular_expressions)

[Switch statement.](https://learn.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-switch)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
$codons = $Strand -split "(\w{3})" -ne ""
switch -Regex ($codons) {
"AUG" { "Methionine" }
"UU[U|C]" { "Phenylalanine" }
"UU[A|G]" { "Leucine" }
"UC[U|C|A|G]" { "Serine" }
"UA[U|C]" { "Tyrosine" }
"UG[U|C]" { "Cysteine" }
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Using `substring` and `hashtable`

```powershell
Function ProteinTranslation() {
[CmdletBinding()]
Param(
[string]$Strand
)
if ($Strand.Length % 3) {Throw "Error: Invalid codon"}
$Proteins = @()
$codonsToProteins = @{
"AUG" = "Methionine"
"UUU" = "Phenylalanine"
"UUC" = "Phenylalanine"
"UUA" = "Leucine"
"UUG" = "Leucine"
"UCU" = "Serine"
"UCC" = "Serine"
"UCA" = "Serine"
"UCG" = "Serine"
"UAU" = "Tyrosine"
"UAC" = "Tyrosine"
"UGU" = "Cysteine"
"UGC" = "Cysteine"
"UGG" = "Tryptophan"
"UAA" = "STOP"
"UAG" = "STOP"
"UGA" = "STOP"
}
for ($i = 0; $i -lt $Strand.Length; $i+=3) {
$Protein = $codonsToProteins[$Strand.Substring($i, 3)]
if ("STOP" -eq $Protein) {break}
if ($null -eq $Protein) {Throw "error: Invalid codon"}
$Proteins += $Protein
}
$Proteins
}
```

This approach utilize the `SubString` method to extract sections of a string, and `hashtable` to translate the codons into proteins.

First thing we do is check if the string is divisible by 3, if it isn't then we threw an error because it confirmed there will be invalid codon since all codon have to be exactly a string of 3 characters.

```powershell
if ($Strand.Length % 3) {Throw "Error: Invalid codon"}
```

Then we set up an empty array to collect all the proteins to be returned later, along with a `hashtable` with codons as keys and their protein names as values.

```powershell
$Proteins = @()
$codonsToProteins = @{
"AUG" = "Methionine"
"UUU" = "Phenylalanine"
"UUC" = "Phenylalanine"
"UUA" = "Leucine"
"UUG" = "Leucine"
"UCU" = "Serine"
"UCC" = "Serine"
"UCA" = "Serine"
"UCG" = "Serine"
"UAU" = "Tyrosine"
"UAC" = "Tyrosine"
"UGU" = "Cysteine"
"UGC" = "Cysteine"
"UGG" = "Tryptophan"
"UAA" = "STOP"
"UAG" = "STOP"
"UGA" = "STOP"
}
```

Next we loop over the indexes of the string, and use index to extract the `subtring` as codon, then use codon as key to retrieve value from the hashtable as protein.

Normally when a `substring` method got called and the index is out of range, it will throw an error that we don't want.
However due to the check we did previously, it eliminated that posibility.

```powershell
for ($i = 0; $i -lt $Strand.Length; $i+=3) {
$Protein = $codonsToProteins[$Strand.Substring($i, 3)]
```

After we got a protein, we need to check its value.
If the protein is one of the three terminating protein, we simply break out of the loop and stop the stranlsation process.
If the protein is an invalid one that doesn't existed in the hashtable (`null`), we throw an error.
Otherwise we add the protein into the proteins array.
When the loop has stopped, we simply return the proteins array.
```powershell
if ("STOP" -eq $Protein) {break}
if ($null -eq $Protein) {Throw "error: Invalid codon"}
$Proteins += $Protein
}
$Proteins
```

[Hashtable.](https://learn.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-hashtable)

[Substring.](https://ss64.com/ps/substring.html)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
for ($i = 0; $i -lt $Strand.Length; $i+=3) {
$Protein = $codonsToProteins[$Strand.Substring($i, 3)]
if ("STOP" -eq $Protein) {break}
if ($null -eq $Protein) {Throw "error: Invalid codon"}
$Proteins += $Protein
}
$Proteins

0 comments on commit 561ebe0

Please sign in to comment.