The problem
Bench biologists often need a complete summary of a FASTA sequence (nucleotide counts, GC content, restriction sites, and the encoded protein) without stitching together five separate tools.
What I built
A Python toolkit that parses multi-sequence FASTA files with Biopython SeqIO and computes, per sequence: nucleotide counts, GC content, reverse complement, start-codon positions, restriction-enzyme cut sites (EcoRI, BamHI, HindIII, NotI, SpeI), and standard-genetic-code protein translation that terminates at the first stop codon. The output is a flat CSV ready for downstream work.
Data preview
| Sequence | Nucleotides | GC% | Starts | Cut sites | Protein |
|---|---|---|---|---|---|
| CDS-1 | A2938 T4245 C2008 G2734 | 39.77 | 338 | EcoRI, BamHI, HindIII×6, SpeI×2 | MANQYVLRVADCTNVYYTRLWSSREAVSVYGAAAACGF…3,974 aa |
| CDS-2 | A2129 T2736 C1410 G1741 | 39.31 | 236 | EcoRI×2, HindIII×4 | EPCSEHHVIRAFDIYNKDVACITKFPKINCVRFRNTGM…2,671 aa |
| CDS-3 | A894 T1147 C632 G708 | 39.63 | 96 | SpeI×2 | MALIFVLMLITLYRCPFVLCNFQVCTDQLRQQEVYLPN…1,126 aa |
| CDS-4 | A147 T235 C130 G133 | 40.78 | 10 | none | MIGGLFSVGFEQFIQHANVTTGGALTALAAQPLINYGT…214 aa |
| CDS-5 | A58 T98 C41 G40 | 34.18 | 4 | SpeI | MLPSFLRVFNDEGVVLSVLFWLLFIIILLLFSIAMLKT…78 aa |
The outcome
Validated on the SARS-related coronavirus reference genome (NC_034972.1). One run produces a full bioinformatics summary per sequence, ready for cell-line work, restriction cloning, or expression-construct design.
Interested in this kind of work? Get in touch