-
-
Notifications
You must be signed in to change notification settings - Fork 47
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Approaches]: Protein Translation (#352)
- Loading branch information
Showing
6 changed files
with
290 additions
and
0 deletions.
There are no files selected for viewing
28 changes: 28 additions & 0 deletions
28
exercises/practice/protein-translation/.approaches/config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
{ | ||
"introduction": { | ||
"authors": [ | ||
"glaxxie" | ||
], | ||
"contributors": [] | ||
}, | ||
"approaches": [ | ||
{ | ||
"uuid": "11aa2b40-050c-433b-b3db-3e5d88e23826", | ||
"slug": "regex-switch", | ||
"title": "regex switch", | ||
"blurb": "Approach using regex and switch statement", | ||
"authors": [ | ||
"glaxxie" | ||
] | ||
}, | ||
{ | ||
"uuid": "05c3c31a-24cb-4696-9266-bc8b5ecae54e", | ||
"slug": "substring-hashtable", | ||
"title": "substring hashtable", | ||
"blurb": "Apparoach using substring and hashtable", | ||
"authors": [ | ||
"glaxxie" | ||
] | ||
} | ||
] | ||
} |
86 changes: 86 additions & 0 deletions
86
exercises/practice/protein-translation/.approaches/introduction.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Introduction | ||
|
||
There are many idiomatic approaches to solve the exercise Protein Translation. | ||
The `substring` method could be used tandem with a `hashtable` to look up values. | ||
Utilizing `regex` combine with the flexibility of `switch` statement to arrive to a neat approach. | ||
|
||
## General guidance | ||
|
||
The main object of this exercise is to deal with the input string by chunks of three characters, each chunk (codon) can be translate into a protein. | ||
If a codon has a `STOP` value then the translation process is halted. | ||
|
||
|
||
## Approach: `substring` and `hashtable` | ||
|
||
This approach use `substring` method to look up a section of the input string, then use that value to retrive the result from a `hashtable`. | ||
|
||
```powershell | ||
Function ProteinTranslation() { | ||
[CmdletBinding()] | ||
Param( | ||
[string]$Strand | ||
) | ||
if ($Strand.Length % 3) {Throw "Error: Invalid codon"} | ||
$Proteins = @() | ||
$codonsToProteins = @{ | ||
"AUG" = "Methionine" | ||
"UUU" = "Phenylalanine" | ||
"UUC" = "Phenylalanine" | ||
"UUA" = "Leucine" | ||
"UUG" = "Leucine" | ||
"UCU" = "Serine" | ||
"UCC" = "Serine" | ||
"UCA" = "Serine" | ||
"UCG" = "Serine" | ||
"UAU" = "Tyrosine" | ||
"UAC" = "Tyrosine" | ||
"UGU" = "Cysteine" | ||
"UGC" = "Cysteine" | ||
"UGG" = "Tryptophan" | ||
"UAA" = "STOP" | ||
"UAG" = "STOP" | ||
"UGA" = "STOP" | ||
} | ||
for ($i = 0; $i -lt $Strand.Length; $i+=3) { | ||
$Protein = $codonsToProteins[$Strand.Substring($i, 3)] | ||
if ("STOP" -eq $Protein) {break} | ||
if ($null -eq $Protein) {Throw "error: Invalid codon"} | ||
$Proteins += $Protein | ||
} | ||
$Proteins | ||
} | ||
``` | ||
|
||
For more information, check the [`substring` and `hashtable` approach][approach-substring-hashtable]. | ||
|
||
|
||
## Approach: `regex` and `switch` statement | ||
|
||
This approach utilize `regex` to deal with the input, then use `switch` statement to get to the result. | ||
|
||
```powershell | ||
function ProteinTranslation { | ||
[CmdletBinding()] | ||
Param( | ||
[string]$Strand | ||
) | ||
$codons = $Strand -split "(\w{3})" -ne "" | ||
switch -Regex ($codons) { | ||
"AUG" { "Methionine" } | ||
"UU[U|C]" { "Phenylalanine" } | ||
"UU[A|G]" { "Leucine" } | ||
"UC[U|C|A|G]" { "Serine" } | ||
"UA[U|C]" { "Tyrosine" } | ||
"UG[U|C]" { "Cysteine" } | ||
"UGG" { "Tryptophan" } | ||
"(UAA|UAG|UGA)" { break } | ||
Default {Throw "Error: Invalid codon"} | ||
} | ||
} | ||
``` | ||
|
||
For more information, check the [`regex` and `switch` statement approach][approach-regex-switch]. | ||
|
||
|
||
[approach-regex-switch]: https://exercism.org/tracks/powershell/exercises/protein-translation/approaches/regex-switch | ||
[approach-substring-hashtable]: https://exercism.org/tracks/powershell/exercises/protein-translation/approaches/ordered-hashtable |
63 changes: 63 additions & 0 deletions
63
exercises/practice/protein-translation/.approaches/regex-switch/content.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Using `regex` and `switch` statement | ||
|
||
```powershell | ||
function ProteinTranslation { | ||
[CmdletBinding()] | ||
Param( | ||
[string]$Strand | ||
) | ||
$codons = $Strand -split "(\w{3})" -ne "" | ||
switch -Regex ($codons) { | ||
"AUG" { "Methionine" } | ||
"UU[U|C]" { "Phenylalanine" } | ||
"UU[A|G]" { "Leucine" } | ||
"UC[U|C|A|G]" { "Serine" } | ||
"UA[U|C]" { "Tyrosine" } | ||
"UG[U|C]" { "Cysteine" } | ||
"UGG" { "Tryptophan" } | ||
"(UAA|UAG|UGA)" { break } | ||
Default {Throw "Error: Invalid codon"} | ||
} | ||
} | ||
``` | ||
|
||
This approach utilize `regex` and `switch` statement to work with strings. | ||
|
||
First, the string being split into an array of strings by length 3. | ||
When a string length is not divisible by 3, the last string will simply be a string of lenght less than 3. | ||
|
||
```powershell | ||
$codons = $Strand -split "(\w{3})" -ne "" | ||
``` | ||
|
||
Next we utilize the flexibility of `switch` statement in Powershell to translate these strings of codons into the correct protein name. | ||
We set the `-Regex` flag for `switch` statement so it can match `regex` patterns of codons to correspondent proteins. | ||
|
||
```powershell | ||
switch -Regex ($codons) { | ||
"AUG" { "Methionine" } | ||
"UU[U|C]" { "Phenylalanine" } | ||
"UU[A|G]" { "Leucine" } | ||
"UC[U|C|A|G]" { "Serine" } | ||
"UA[U|C]" { "Tyrosine" } | ||
"UG[U|C]" { "Cysteine" } | ||
"UGG" { "Tryptophan" } | ||
``` | ||
|
||
If the codon match any of three terminating codons (`STOP` value) then we simply just `break` out of the `switch` statement, and end the translation there. | ||
|
||
```powershell | ||
"(UAA|UAG|UGA)" { break } | ||
``` | ||
|
||
Anything else and it would be an invalid codon and should throw an error. | ||
|
||
```powershell | ||
Default {Throw "Error: Invalid codon"} | ||
``` | ||
|
||
If no error were thrown, an array of proteins is now being returned. | ||
|
||
[Regular expression.](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_regular_expressions) | ||
|
||
[Switch statement.](https://learn.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-switch) |
8 changes: 8 additions & 0 deletions
8
exercises/practice/protein-translation/.approaches/regex-switch/snippet.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
$codons = $Strand -split "(\w{3})" -ne "" | ||
switch -Regex ($codons) { | ||
"AUG" { "Methionine" } | ||
"UU[U|C]" { "Phenylalanine" } | ||
"UU[A|G]" { "Leucine" } | ||
"UC[U|C|A|G]" { "Serine" } | ||
"UA[U|C]" { "Tyrosine" } | ||
"UG[U|C]" { "Cysteine" } |
98 changes: 98 additions & 0 deletions
98
exercises/practice/protein-translation/.approaches/substring-hashtable/content.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Using `substring` and `hashtable` | ||
|
||
```powershell | ||
Function ProteinTranslation() { | ||
[CmdletBinding()] | ||
Param( | ||
[string]$Strand | ||
) | ||
if ($Strand.Length % 3) {Throw "Error: Invalid codon"} | ||
$Proteins = @() | ||
$codonsToProteins = @{ | ||
"AUG" = "Methionine" | ||
"UUU" = "Phenylalanine" | ||
"UUC" = "Phenylalanine" | ||
"UUA" = "Leucine" | ||
"UUG" = "Leucine" | ||
"UCU" = "Serine" | ||
"UCC" = "Serine" | ||
"UCA" = "Serine" | ||
"UCG" = "Serine" | ||
"UAU" = "Tyrosine" | ||
"UAC" = "Tyrosine" | ||
"UGU" = "Cysteine" | ||
"UGC" = "Cysteine" | ||
"UGG" = "Tryptophan" | ||
"UAA" = "STOP" | ||
"UAG" = "STOP" | ||
"UGA" = "STOP" | ||
} | ||
for ($i = 0; $i -lt $Strand.Length; $i+=3) { | ||
$Protein = $codonsToProteins[$Strand.Substring($i, 3)] | ||
if ("STOP" -eq $Protein) {break} | ||
if ($null -eq $Protein) {Throw "error: Invalid codon"} | ||
$Proteins += $Protein | ||
} | ||
$Proteins | ||
} | ||
``` | ||
|
||
This approach utilize the `SubString` method to extract sections of a string, and `hashtable` to translate the codons into proteins. | ||
|
||
First thing we do is check if the string is divisible by 3, if it isn't then we threw an error because it confirmed there will be invalid codon since all codon have to be exactly a string of 3 characters. | ||
|
||
```powershell | ||
if ($Strand.Length % 3) {Throw "Error: Invalid codon"} | ||
``` | ||
|
||
Then we set up an empty array to collect all the proteins to be returned later, along with a `hashtable` with codons as keys and their protein names as values. | ||
|
||
```powershell | ||
$Proteins = @() | ||
$codonsToProteins = @{ | ||
"AUG" = "Methionine" | ||
"UUU" = "Phenylalanine" | ||
"UUC" = "Phenylalanine" | ||
"UUA" = "Leucine" | ||
"UUG" = "Leucine" | ||
"UCU" = "Serine" | ||
"UCC" = "Serine" | ||
"UCA" = "Serine" | ||
"UCG" = "Serine" | ||
"UAU" = "Tyrosine" | ||
"UAC" = "Tyrosine" | ||
"UGU" = "Cysteine" | ||
"UGC" = "Cysteine" | ||
"UGG" = "Tryptophan" | ||
"UAA" = "STOP" | ||
"UAG" = "STOP" | ||
"UGA" = "STOP" | ||
} | ||
``` | ||
|
||
Next we loop over the indexes of the string, and use index to extract the `subtring` as codon, then use codon as key to retrieve value from the hashtable as protein. | ||
|
||
Normally when a `substring` method got called and the index is out of range, it will throw an error that we don't want. | ||
However due to the check we did previously, it eliminated that posibility. | ||
|
||
```powershell | ||
for ($i = 0; $i -lt $Strand.Length; $i+=3) { | ||
$Protein = $codonsToProteins[$Strand.Substring($i, 3)] | ||
``` | ||
|
||
After we got a protein, we need to check its value. | ||
If the protein is one of the three terminating protein, we simply break out of the loop and stop the stranlsation process. | ||
If the protein is an invalid one that doesn't existed in the hashtable (`null`), we throw an error. | ||
Otherwise we add the protein into the proteins array. | ||
When the loop has stopped, we simply return the proteins array. | ||
```powershell | ||
if ("STOP" -eq $Protein) {break} | ||
if ($null -eq $Protein) {Throw "error: Invalid codon"} | ||
$Proteins += $Protein | ||
} | ||
$Proteins | ||
``` | ||
|
||
[Hashtable.](https://learn.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-hashtable) | ||
|
||
[Substring.](https://ss64.com/ps/substring.html) |
7 changes: 7 additions & 0 deletions
7
exercises/practice/protein-translation/.approaches/substring-hashtable/snippet.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
for ($i = 0; $i -lt $Strand.Length; $i+=3) { | ||
$Protein = $codonsToProteins[$Strand.Substring($i, 3)] | ||
if ("STOP" -eq $Protein) {break} | ||
if ($null -eq $Protein) {Throw "error: Invalid codon"} | ||
$Proteins += $Protein | ||
} | ||
$Proteins |