MarpoDB - About

Description and Motivation

MarpoDB is a gene-centric database for Marchantia polymorpha Cam-1 strain genetic parts designed for genetic engineering and synthetic biology purposes. Our motivation to develop this database emerged from the need to handle and facilitate access to annotated sequence data in the most simple, clean and straightforward manner possible implementing some user interaction and community aspects. We would like to highlight that the database is agnostic and modular and will be released to the wider community through a github repository with setup instructions in the near-future.

We strongly emphasise that MarpoDB is still in a early stage of development and the dataset is prone to contain mis-annotation errors. Furthermore we encourage users to cross-check predictions with the ongoing community effort for sequencing Marchantia, currently at the early release stage hosted in Phytozome, which will become the primary source for Marchantia polymorpha reference sequences, or in marchantia.info.

Technical details

Datasets included in MarpoDB are derived from the Marchantia polymorpha Cam-1 & Cam-2 strains isolated by Prof. Jim Haseloff in Cambridge, UK.

DNA and RNA data was obtained by performing 100bp paired-end Illumina sequencing and generating a de novo genome assembly using the Meraculous 2.0 assembly software and a de novo transcriptome assembly by using Bridger Assembler.

Downstream analyses for ORF prediction, domain and ortholog identification using Transdecoder, InterpoScan analyses and Blast to a Viridiplantae filtered nr dataset.

Backend was developed using python libraries including Flask, Psycopg2 and Biopython and database models were generated de novo using PostgreSQL. Custom scripts were generated for obtaining gene regions with mapped transcripts, interrogating the database and producing filtered results.

Frontend was generated by using Scribl, SeqViewer, Clipboard and custom jQuery and Javascript code for programming site behaviour.

MarpoDB was developed by Bernardo Pollak (BP)*, Mihails Delmans(MD)* under the supervision of Prof. Jim Haseloff at the OpenPlant Centre in the Department of Plant Sciences of the University of Cambridge. BP performed extraction experiments and carried out high-throughput sequencing and assembled the genome and transcriptome. BP in conjunction with MD did the downstream analyses, prepared the datasets and programmed the front-end and MD developed the back-end and database models.

* Equal contribution.

Disclaimer

This software is provided by the authors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. The entire risk as to the quality and performance of the program is with you. In no event shall the authors, be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of data or data being rendered inaccurate; loss of use or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.