A library to encode text as DNA and decode DNA to text.
GeneSpeak allows you to encode regular text as DNA using
base-pairs (A
, T
, G
, C
) and convert back to the
original text. Text encoding is done for both ascii
and
utf-8
characters based on the strategy
keyword argument.
The encoding scheme could be any combination of A
, T
, G
, C
.
You can install the library via pip
or conda
.
Install with pip
pip install genespeak
Install with conda
conda install -c conda-forge genespeak
See the quickstart guide here.
Service | Link/Badge |
---|---|
Colab | |
Binder | |
SageMaker StudioLab | |
Deepnote | |
Kaggle |
You can play around with GeneSpeak in this streamlit app: https://tinyurl.com/genespeak-demo
import genespeak as gp
print(f'{gp.__name__} version: {gp.__version__}')
schema = "ATCG" # (1)
strategy = "ascii" # (2)
text = "Hello World!"
dna = gp.text_to_dna(text, schema=schema)
text_from_dna = gp.dna_to_text(dna, schema=schema)
print(f'Text: {text}\nEncoded DNA: {dna}\nDecoded Text: {text_from_dna}\nSuccess: {text == text_from_dna}')
Output
genespeak version: 0.0.5
Text: Hello World!
Encoded DNA: TACATCTTTCGATCGATCGGACAATTTGTCGGTGACTCGATCTAACAT
Text: Hello World!
Encoded DNA: TACATCTTTCGATCGATCGGACAATTTGTCGGTGACTCGATCTAACAT
Decoded Text: Hello World!
The genespeak
docs are maintained here.
The library is available under MIT license.
You may cite this library as follows.
@software{ray2022genespeak,
author = {Ray, Sugato},
title = {GeneSpeak - A library to encode text as DNA and decode DNA to text},
url = {https://github.com/sugatoray/genespeak},
doi = {10.5281/zenodo.5885777},
month = {1},
year = {2022}
}
Let's have some fun! ✨ The following is a GeneSpeak thumbprint of genespeak
itself.
schema | strategy | thumbprint |
---|---|---|
ATCG |
ascii |
TCTGTCTTTCGCTCTTTGAGTGAATCTTTCATTCCG |
Includes health and security badges from:
- Sonarcloud
- OSSF Code Quality