of DNA/RNA and its quality scores. By default, the device saves up to 4000 sequences in one FASTQ file. The size of a FASTQ file will depend on the
number of reads contained and the length of DNA/RNA sequenced. See the table below for estimates of storage space required for a typical
sequencing run.
sequencing_summary.txt
contains metadata about all basecalled reads from an individual run. Information includes read ID, sequence length, per-read
q-score, duration etc. The size of a sequence summary file will depend on the number of reads sequenced.
Typically, 1 Gbase of sequence data takes up approximately 11 Gbytes of storage. This typically comprises of 90% .fast5 files, 9% FASTQ files and 1%
sequence summary file.
Example file sizes below are based on different throughputs from an individual flow cell, with a run saving both .fast5 and FASTQ files with a read N50 of 25
kb. The GridION can run up to five flow cells simultaneously.
Output (Gbases)Output (Gbases) .fast5 storage.fast5 storage
(Gbytes)(Gbytes)
FASTQ storageFASTQ storage
(Gbytes)(Gbytes)
.fast5 + FASTQ storage.fast5 + FASTQ storage
(Gbytes)(Gbytes)
1010 100 10 110
1515 150 15 165
3030 300 30 330
As an experiment progresses, .fast5 files are produced for all reads. If basecalling is chosen, these reads are utilised by the on-board software (more
information below) to generate sequence data which is then stored in FASTQ files.
Long-term storageLong-term storage
The GridION has sufficient SSD disk space for multiple runs to be carried out, storing both .fast5 and FASTQ data. However, it is imperative this data store is
cleared regularly in order to prevent successive runs from terminating due to lack of storage space. For this, a site must provide storage to transfer data off
the device.
The GridION runs on Ubuntu and is able to mount multiple filesystem types. We recommend storage presented as NFS or CIFS. The form (and volume) of
data to be stored will depend on customer requirements:
- Storing .fast5 files with raw read data in will permit re-basecalling of data when new algorithms are released by Oxford Nanopore. In such cases, new
releases of basecallers have enabled significant improvements in basecalling accuracy of existing datasets through re-basecalling. Further, selected Oxford
Nanopore and third party tools use the raw signal information contained within the .fast5 to extract additional information from the raw signal e.g modified
bases calling, reference-guided SNP calling or polishing of data.
- Retaining just FASTQ files will allow use of standard downstream analysis tools using the DNA/RNA sequence, but no further sequence data can be
generated when improvements in basecalling become available.
Included SoftwareIncluded Software
Oxford Nanopore Technologies build and provide numerous software types involved in acquisition, orchestration and analysis:
MinKNOWMinKNOW
MinKNOW carries out several core tasks:
Device control, including run parameter selection
Data acquisition
Real-time analysis and feedback
Data streaming
Basecalling (through integrated Guppy)
The MinKNOW software carries out several core tasks: data acquisition, real-time analysis and feedback, basecalling, data streaming, controlling the device,
and ensuring that the platform chemistry is performing correctly to run the samples. MinKNOW takes the raw data and converts it into reads by recognition of
the distinctive change in current that occurs when a DNA strand enters and leaves the pore. MinKNOW then basecalls the reads, and writes out the data into