Натарадж С.

ведущий программист/биоинформатик

Ищу работу в области разработки ПО для поддержки биотехнологических исследований

Последнее обновление резюме 15.12.2016
Адрес Новосибирск, Russian Federation
Электронная почта Заблокировано
Телефон Заблокировано

Опыт

Новые вычислительные системы в биологии ООО
Lead Software Developer
Июн 2004 - Ноя 2016
I was working since 2004 as developer, lead developer and CEO of small company (4-7
persons). Started at school I had strong programming skill and background. It is my work specialty
that was set already in school. In 2003-2004, studying in University I started to work as developer
in company, that was producing enterprise java application.
Knowing C++ already, I then switched to C++ work leading small group, that was working
for Biobase GmbH company. For this company I worked for 12 years, until 2016.
In 2004-2005 we started developing GRESA (Gene Regulation and Sequence Analysis)
package. This package consisted of a library, that was describing needed genetic structures: DNA
alphabet, DNA, RNA sequence, sequences list, promoter region, gene, transcription factor and
transcription factor binding site. Also there were microarrays, gene expression and single nucleotide
polymorphisms. Each entity was represented as a class, with hierarchy. There were templates and
std classes. We were done few tools, that runs from command line and also few stand-alone
windows applications. Some command line tools, like snp analysis were wrapped into web user
interface. In particular, I was done visualization for match and footprint (method of alignment of
promoter sequences that consider TFBS conservation).
In around 2005-2007 we were developing ExPlain and its basis CMA. First I will describe
development process and then tools’ technical details.
We were a team of around 5 people by that time. There was very strong programmer, Tagir,
who contributes a lot to ExPlain and CMA and made his PhD thesis on CMA. There were two more
strong programmers, me and experienced scientific tester - Tanja. We were trying different
software development approaches (XP, RUP, MSF) to find most appropriate model for software
development for us. RUP was not my style (and others were of the same kind). Being strong mind
developers we prefer flexibility upon projecting structure from very beginning. XP we liked, but it
had its disadvantages. For some time we were sitting only by two at one computer, without
compromise. It gave some improvement, because one make less mistakes, when work in pair, but it
was difficult from other point. So in the end we were sitting each at his own computer, but
computers were around the room and we were sitting with back to center of a room near each other,
so people see neighbor monitor and can consult each other easily. One of good approaches we used
was MSF (Microsoft solution framework) from Microsoft. It is very promising. There are two roles:
Project manager and product manager (also developers and testers off cause). Project manager is
responsible for planning a project, scheduling tasks, setting and controlling milestones. Product
manager is responsible for communication with customer and providing his vision to the team. So
there suppose to be a constant fight between project and product manager, and as a result, better
plan for development. I was project manager, Tanja was a product manager and work was
successful.
CMA (Composite Module Analyst) is a tool that performs genetic algorithm to search for
combination of transcription factor binding sites, that are working together in selected set of
promoters. The model was very big to search it by complete enumeration. We tried also Metropolis
sampling (simulated annealing variation) and shown that genetic algorithm is the best choice for
this problem. A model consisted of number of transcription factors, each can be represented by
multiple variants. Window, where they are located, may vary, order may be fixed or may vary. One
of the Tfs can have negative impact on regulation. Some can absent, and each of them can have
different weights. The software was based on GRESA, developed earlier, and was implemented as
C++ stand alone program. It takes a set of parameters, matrices library TRANSFAC from Biobase
and produces a result.
ExPlain was a big web-based application, where hundreds of users can perform their
bioinformatics analysis and evaluate results. ExPlain was written on perl + javascript. It was
containing around 100 different forms, tables and reports. There were 3 workflows and several
databases on backend. User can upload his data. It may be set of regulatory sequences, microarray,
list of genes, list of molecules, etc. This data can be sent to one of several types of analyses: Match
– prediction of transcription factor binding sites; Fmatch – predict set of transcription factor binding
sites that are over or underrepresented in set of target promoters comparing to background set;
CMA – that was described earlier (predict combinations of transcription factors regulating set of
genes comparing to background); functional analysis – predicting key transcription factors by
walking in functional relationships graph. Network Analysis – analysis same as FA, but on
molecules to reveal key molecules regulating given set of genes/factors/molecules. It was done by
upstream/downstream analysis; Microarray analysis – it is CMA when each promoter is passed with
corresponding expression value. This web tool was successfully used by number of commercial
customers from big pharma and many institutions.
It was a tree, where analysis results were put under each data item. Database on backend was
pretty complicated: there were several databases merged together in one hub called Genehub. Basis
of genehub was unique identifier, numeric, containing several ids. First 3 digits is species identifier,
then one digit for type, some digits (I dont remember exactly) for item identifier, for database
identifier, etc. In total it was 12 digits number for each record, identifying it completely. By that
time I was not involved in development of Genehub, but after, few years later, I had to update it
with new versions of databases! It was a real quest: we got the code of genehub build scripts, that
we need to adapt to new databases. The work was consisting of analysis of big ExPlain code, that
refer to Genehub database match, search and display. Also to genehub building code to adapt it to
new information. We solved this task on around 95% and had to cut out few deprecated sources.
ExPlain instance is available for demo use. Please contact me to request a link.
In 2009-2010 we were working under scope of COGANGS Grant (FP7 programme) for the
same Biobase GmbH partner. Biobase asked us first to update GeneHub (described above) and then
to write web interface for Match tool, that was able to operate with large, chromosome scale data
for visualization. We created Matchportal: Vaadin based tool, also for web. This tool contained
visualization compound, that was able to display TF binding sites for whole chromosome with
option to zoom them until individual sites.
For this tool we also rewritten Match and Fmatch on Java. Java has many advantages, like
cross-platform, easy to maintain and many libraries that support development. For the first time
Match was significantly slower, then C++ version, but we managed to get it nearly the same speed.
We knew that it is possible to make java same fast as C++ and used profilers to find bottlenecks and
speed up. Those Match and Fmatch tools were used later in BKL – Proteome interface
Proteome (BKL) development and support. In 2012-2016 our team was developing and
supporting Proteome interface (formerly known as BKL). The code that we received had long
history. It was done in perl by several year before us. We arranged a structure of classes to display
and operate with data, added user data management system. Used tools Match and Fmatch
developed previously by us in java, added process monitor to schedule and control tasks.
By that time I was working separately, first remotely for Biobase in team developing
GenomeTrax (Biobase hub for different variations databases). GenomeTrax was very promising,
after acquisition of Biobase, unfortunately substituted by Ingenuity’s tool. It merge together around
10 databases for gnomic regions, mostly single nucleotide polymorphisms (SNPs). Then there was
very fast match engine, that allowed matching of large variation files, provided by users to the
database. Results of this match went to another database and were displayed to the user in form of
very flexible table. One may resize, reorder rows, filter by conditions (numeric less, greater, string
substring match), sort etc.
I was also developing scientific algorithms for Biobase (while other guys in our team were
working on Proteome). Those algorithms were a method of prediction of pair of TF binding sites,
when one of them is fixed (anchor) and another vary. Algorithm predicts overrepresented or
underrepresented such pair, calculating p-value, FDR.
For example, I’ve done Gibbs Sampler 2 algorithm, sources available for evaluation.Please
contact me to request a link.
This algorithm aligns a set of around hundred sequences, provided by the user. Each
sequence is up to few hundreds bp length. Each sequence contains a manually predicted binding site
of special type – containing two core domains. Those sequences are aligned to get best score of
weight matrices for those two cores. There is GUI wrapper around this tool available. It requires a
list of sequences in fasta format. Each sequence contains experimental TF binding site with two
cores.
I’ve done a lot of software projects, alone, in team, being a developer and project manager.
Projects were implemented in C++, Java, Perl. Web applications in Javascript, html.

Образование

Ershov Institute of Informatics Systems
PhD
Сен 2004 - Июн 2006
PhD in math and phys. applied to computer science and bioinformatics

В чем вы сильны?

12+ лет в области разработки ПО для биоинформатики (работал на компанию Biobase GmbH). Сильные знания в Java, C++, perl, оптимизация кода, алгоритмы. Средние: Linux, MySQL. Немного знаком: javascript, html. 

В течении 12 лет я руководил филиалом компании Биобэйс в Новосибирске. У нас была небольшая фирма (около 5 человек). Поэтому я занимался всем, от проектирования, написания кода, до бизнес планирования, работы с заказчиком, подбором персонала. 

Очень хорошие знания и опыт в программировании. Особенно что касается разработки алгоритмов и оптимизации для быстрой работы. Опыт оптимизации Java кода с целью уменьшения потребления памяти и увеличения быстродействия.

Расскажите о себе что-нибудь еще: публикации, конференции, хобби

H-index 3, 

женат, трое детей

люблю танцевать

Детальное резюме можно посмотреть здесь: https://www.dropbox.com/s/pyeav1i0gkodf8l/Nataraj%20CV%20nolinks.pdf?dl=0