Skip to content
HEHunter Eppley
All work
03 / 06Bioinformatics

DNA Sequence Analyzer

PythonBiopythonFASTACSV

The problem

Bench biologists often need a complete summary of a FASTA sequence (nucleotide counts, GC content, restriction sites, and the encoded protein) without stitching together five separate tools.

What I built

A Python toolkit that parses multi-sequence FASTA files with Biopython SeqIO and computes, per sequence: nucleotide counts, GC content, reverse complement, start-codon positions, restriction-enzyme cut sites (EcoRI, BamHI, HindIII, NotI, SpeI), and standard-genetic-code protein translation that terminates at the first stop codon. The output is a flat CSV ready for downstream work.

Data preview

SequenceNucleotidesGC%StartsCut sitesProtein
CDS-1A2938 T4245 C2008 G273439.77338EcoRI, BamHI, HindIII×6, SpeI×2MANQYVLRVADCTNVYYTRLWSSREAVSVYGAAAACGF3,974 aa
CDS-2A2129 T2736 C1410 G174139.31236EcoRI×2, HindIII×4EPCSEHHVIRAFDIYNKDVACITKFPKINCVRFRNTGM2,671 aa
CDS-3A894 T1147 C632 G70839.6396SpeI×2MALIFVLMLITLYRCPFVLCNFQVCTDQLRQQEVYLPN1,126 aa
CDS-4A147 T235 C130 G13340.7810noneMIGGLFSVGFEQFIQHANVTTGGALTALAAQPLINYGT214 aa
CDS-5A58 T98 C41 G4034.184SpeIMLPSFLRVFNDEGVVLSVLFWLLFIIILLLFSIAMLKT78 aa

The outcome

Validated on the SARS-related coronavirus reference genome (NC_034972.1). One run produces a full bioinformatics summary per sequence, ready for cell-line work, restriction cloning, or expression-construct design.

Interested in this kind of work? Get in touch