-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.htm
1245 lines (1161 loc) · 58.2 KB
/
README.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
</head>
<body text="#000000" bgcolor="#FFFFFF" link="#0000EE" vlink="#551A8B" alink="#FF0000">
<center><b><font color="#3366FF"><font size=+2>NCBI SOFTWARE DEVELOPMENT
TOOLKIT</font></font></b>
<br><b><font color="#3366FF"><font size=+2>National Center for Biotechnology
Information</font></font></b>
<br><b><font color="#3366FF"><font size=+2>Bldg 38A, NIH</font></font></b>
<br><b><font color="#3366FF"><font size=+2>8600 Rockville Pike</font></font></b>
<br><b><font color="#3366FF"><font size=+2>Bethesda, MD 20894</font></font></b></center>
<p>The NCBI Software Development Toolkit was developed for the production
and distribution of GenBank, Entrez, BLAST, and related services by NCBI.
We make it freely available to the public without restriction to facilitate
the use of NCBI by the scientific community. However, please understand
that while we feel we have done a high quality job, this is not commercial
software. The documentation lags considerably behind the software and we
must make any changes required by our data production needs. Nontheless,
many people have found it a useful and stable basis for a number of tools
and applications.
<p>The toolkit is available by anonymous ftp from <a href="ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/">ftp.ncbi.nih.gov</a>
<blockquote><tt>cd toolbox</tt>
<br><tt>cd ncbi_tools</tt>
<br><tt>bin</tt>
<br><tt>get ncbi.tar.Z (compressed UNIX tar file)</tt>
<br><tt>quit</tt></blockquote>
<p><br>In this same directory are also ncbiz.exe (DOS self extracting archive)
and ncbi.hqx (Mac self extracting archive). All three files contain the
same source code and will make the toolkit for all platforms.
<p>Please feel free to email questions/suggestions to: <a href="mailto:[email protected]">[email protected]</a>
<p>If you would like hardcopy of the current documentation, send your mailing
address with your request to the email address above.
<p>If you are considering a serious development project using this toolkit,
please contact us. We are happy to discuss compatible strategies and inform
you of our longer term plans. There is no limitation of the use of this
code or in contacting us about its use for commercial, academic, or government
groups.
<br>
<hr WIDTH="100%">
<center><b><font size=+1>Version 6.1</font></b>
<br><i> the date of release may be obtained from the file <b>ncbi/VERSION</b></i></center>
<hr WIDTH="100%">
<center><b>Summary</b></center>
<p>The procedure of building the toolkit on Unix was slightly changed.
Now there is no need to download any binary NCBI product for your platform
to obtain the platform-specific ncbi.mk file.
<p>To build the NCBI toolkit you need to look for platform-dependent instructions:
<br>For UNIX (including Linux and Mac OS X):
<br> look at the file <b>make/readme.unx</b>
<br>For alternative Mac instructions (using CodeWarrior):
<br> look at the file <b>make/readme.mac</b>
<br>For Microsoft Windows95/98/NT:
<br> look at the file <b>make/readme.dos</b>
<br>There is some information which may be useful for NCBI tookit building
in the file <b>doc/FAQ.txt</b>
<p>Documentation relevant to BLAST may be found in the <b>doc/blast</b> subdirectory.
<p>The file <b>doc/sequin.htm</b> describes SEQUIN and its configuration.
<p>If you have problems configuring Entrez with a firewall, look at the
file <b>doc/firewall.txt</b>
<p>This file has a section called <b>CONFIGURATION OR SETTINGS FILES,</b>
which explains in detail how our configuration system works. The ncbi config
file (<b>.ncbirc </b>on UNIX, <b>ncbi.ini </b>on PC/Windows, and <b>ncbi.cnf
</b>on
Macintosh) is needed in order to find data files, such as <b>gc.val
</b>(the
genetic code table), provided in the toolkit or with programs like Sequin.
(The <b>asnload</b> files containing dynamic versions of the ASN.1 parse
tables are no longer needed, since all platforms can now have large static
data.)
<p>It has recently become possible to eliminate the need for the ncbi config
file by calling <b>UseLocalAsnloadDataAndErrMsg ()</b> at the beginning
of your program. This looks for the data directory in the same directory
as the running program. If it doesn't find it, it looks up one level,
in case you are compiling programs in the build directory of the toolkit.
If it finds the data directory in either of these places, it transiently
sets the location, so code that loads these files is given the correct
path.
<p>An even more recent change is that copies of several of our data files
(gc, seqcode, and featdef) are now built into the source code, so if the
data directory is not found, programs that require only these can still
run.
<p>One final improvement is that access to our network services is now
much simpler than before, so if you are not behind a firewall and have
domain name server (DNS) available you can connect to our network without
needing any configuration information in the ncbi config file. Operation
behind a firewall, or with a proxy, requires very little in the ncbi config
file, and this is easily created by asking Sequin to configure for network
access.
<br>
<hr WIDTH="100%">
<center><b>Notes from Previous Releases</b>
<br><b><font size=+1>Version 6.0</font></b>
<br><i>the date of release may be obtained from the file <b>ncbi/VERSION</b></i></center>
<hr WIDTH="100%">
<br>This release includes source code for the new (2.0) version of BLAST.
Also included are a small number of incremental changes in the ASN.1 specification.
<p>BLAST 2.0 - BLAST 2.0 can produce gapped alignments and is capable of
position-specific-iterated BLASTp (PSI-BLAST). Compared to the 1.4
release of BLAST, there are also signficant performance enhancements as
well as extensive changes to the text report and the format of the databases.
BLAST 2.0 uses threads for multi-processing, using the NCBI threads library.
Three BLAST programs may be compiled in the demo directory. They are:
<br>
<ul>
<li>
<b>formatdb</b>: formats FASTA files as BLAST databases for BLAST 2.0.</li>
<li>
<b>blastall</b>: perform all five flavors of blast comparison.</li>
<li>
<b>blastn</b> and <b>blastp</b> offer fully gapped alignments.</li>
<li>
<b>blastx</b> and <b>tblastn</b> have 'in-frame' gapped alignments and
use sum statistics to link alignments from different frames.</li>
<li>
<b>tblastx</b> provides only ungapped alignments.</li>
<li>
<b>blastpgp</b>: performs gapped blastp searches and can be used to perform
iterative searches in psi-blast mode.</li>
</ul>
Additional information may be obtained from the README in the BLAST
<br>directory of the FTP site and from the NCBI BLAST pages.
<p><b>ASN.1 Spec Changes for 1997</b>
<p><tt>biblio.asn</tt>
<blockquote><tt>Cit-pat - some fields made optional to allow patent applications
to be legal</tt>
<blockquote><tt>Cit-pat.number OPTIONAL</tt>
<br><tt>Cit-pat.date-issue OPTIONAL</tt></blockquote>
<tt> -- Patent number and date-issue were made optional in 1997 to</tt>
<br><tt> -- support patent applications being issued from the
USPTO</tt>
<br><tt> -- Semantically a Cit-pat must have either a patent
number or</tt>
<br><tt> -- an application number (or both) to be valid</tt>
<br> </blockquote>
<tt>medline.asn</tt>
<blockquote><tt>added ML-field to support other MEDLINE line types</tt></blockquote>
<p><br><tt>Medline-entry ::= SEQUENCE {</tt>
<br><tt> uid INTEGER OPTIONAL , -- MEDLINE UID, sometimes
not yet available if from PubMed</tt>
<br><tt> em Date ,
-- Entry Month</tt>
<br><tt> ... (not shown)</tt>
<br><tt> pmid PubMedId OPTIONAL , -- MEDLINE records may include
the PubMedId</tt>
<br><tt> pub-type SET OF VisibleString OPTIONAL, -- may show
publication types (review, etc)</tt>
<br><tt> mlfield SET OF Medline-field OPTIONAL } -- additional
Medline field types</tt>
<p><tt>Medline-field ::= SEQUENCE {</tt>
<br><tt> type INTEGER { -- Keyed type</tt>
<br><tt> other (0) ,
-- look in line code</tt>
<br><tt> comment (1) , -- comment
line</tt>
<br><tt> erratum (2) } , -- retracted, corrected,
etc</tt>
<br><tt> str VisibleString , -- the text</tt>
<br><tt> ids SEQUENCE OF DocRef OPTIONAL } -- pointers relevant
to this text</tt>
<p><tt>DocRef ::= SEQUENCE { -- reference to a document</tt>
<br><tt> type INTEGER {</tt>
<br><tt> medline (1) ,</tt>
<br><tt> pubmed (2) ,</tt>
<br><tt> ncbigi (3) } ,</tt>
<br><tt> uid INTEGER }</tt>
<br>
<p><tt>seq.asn</tt>
<blockquote><tt>MolInfo.tech - added names for HTG classes already implemented</tt>
<br><tt>Annotdesc.region - added seqloc. If present, all annots in this
SeqAnnot are within this region. Optimization on big seqs.</tt></blockquote>
<p><br><tt>seqfeat.asn</tt>
<blockquote><tt>added OrgMod.specimen-voucher - new organism qualifier</tt>
<br><tt>added OrgMod.old-name - used internally at NCBI</tt>
<br><tt>added BioSource.is-focus - for distinguishing biological focus
of multiple source features.</tt>
<br><tt>added Seq-feat.pseudo so any feature can be flagged explicitly
as belonging to a pseudogene</tt>
<br><tt>added Seq-feat.except-text for an explanation of the exception
when Seq-feat.except is TRUE. Currently this text is in Seq-feat.comment
in backbone records and GBQuals in some other genbank records.</tt>
<br> </blockquote>
<p><br>
<hr WIDTH="100%">
<center><b>Notes from Previous Releases</b>
<br><b><font size=+1>Version 5.0</font></b>
<br>
<hr WIDTH="100%"><b>Summary</b></center>
<p>This release includes a small number of incremental changes in the ASN.1
specification. Most significant is the addition of the PubMedID, a bibliographic
citation identifier similar to a MEDLINE UID. PubMed is a new citation
database being developed at NCBI which is a superset of MEDLINE. It will
be an avenue by which publishers can deposit electronic versions of their
citations and abstracts to allow them timely linking to network entrez
from the publishers on-line services. PubMed will route these citations
to MEDLINE and they will appear in MEDLINE (and Entrez) after the usual
MEDLINE indexing. However, for some period of time, such articles will
have only a PubMedID. We would like to switch Entrez over to supporting
PubMedIDs as early as possible. WE STRONGLY ENCOURAGE DEVELOPERS TO RECOMPILE
AND RELINK WITH THISVERSION OF THE TOOLKIT AS SOON AS POSSIBLE. The changes
in this specification should not cause problems with existing software,
so a simple compile and link should be enough to make you compatible. Details
of ASN.1 specification changes are listed below.
<p>There has been considerable development of the toolkit in other aspects
as well, many of which are embodied in sequin, the new NCBI direct submission
tool, which is included in the toolkit as well. In the interest of getting
the PubMed changes into the specification and developers hands promptly,
we have not included much on that aspect of this toolkit at this time.
<p>
<hr WIDTH="100%">
<center><b> Changes in the 1996 NCBI ASN.1 (version 5.0) specification</b></center>
<hr WIDTH="100%">
<br>Once again, there are very few changes to the NCBI ASN.1 specification
this year. The biggest change is the addition of the PubMed ID to support
the new NCBI PubMed database. There are also small additions to the
medline and organism specifications, detailed below. As usual, these
changes are also backward compatible with old data. However, you
should recompile and relinkyour applications as soon as possible, since
the old applications will not be compatible with the new datatypes.
<p>1) PubMed - NCBI is building a new citation database that is a superset
of MEDLINE and which will be linked to online journals from publishers.
The bibliographic components of the specification have had support for
PubMed IDs added. These include biblio.asn (objbibli.[ch]), pub.asn
(objpub.[ch]), medline.asn (objmedli.[ch]).
<p>2) pub-type - MEDLINE includes strings indicating the type of a publication.
The medline definition has had the attribute pub-type added to support
these strings.
<p>From the 1996 MeSH, here's the list.
<br>
<blockquote><tt>Abstract</tt>
<br><tt>Bibliography</tt>
<br><tt>Classical Article</tt>
<br><tt>Clinical Conference</tt>
<br><tt>Clinical Trial</tt>
<br><tt>Clinical Trial, Phase I</tt>
<br><tt>Clinical Trial, Phase II</tt>
<br><tt>Clinical Trial, Phase III</tt>
<br><tt>Clinical Trial, Phase IV</tt>
<br><tt>Comment</tt>
<br><tt>Consensus Development Conference</tt>
<br><tt>Consensus Development Conference, NIH</tt>
<br><tt>Controlled Clinical Trial</tt>
<br><tt>Corrected and Republished Article</tt>
<br><tt>Current Biog-Obit</tt>
<br><tt>Dictionary</tt>
<br><tt>Directory</tt>
<br><tt>Duplicate Publication</tt>
<br><tt>Editorial</tt>
<br><tt>Festschrift</tt>
<br><tt>Guideline</tt>
<br><tt>Historical Article</tt>
<br><tt>Historical Biography</tt>
<br><tt>Interview</tt>
<br><tt>Journal Article</tt>
<br><tt>Legal Brief</tt>
<br><tt>Letter</tt>
<br><tt>Meeting Report</tt>
<br><tt>Meta-Analysis</tt>
<br><tt>Monograph</tt>
<br><tt>Multicenter Study</tt>
<br><tt>News</tt>
<br><tt>Newspaper Article</tt>
<br><tt>Overall</tt>
<br><tt>Periodical Index</tt>
<br><tt>Practice Guideline</tt>
<br><tt>Published Erratum</tt>
<br><tt>Randomized Controlled Trial</tt>
<br><tt>Retracted Publication</tt>
<br><tt>Retraction of Publication</tt>
<br><tt>Review</tt>
<br><tt>Review Literature</tt>
<br><tt>Review of Reported Cases</tt>
<br><tt>Review, Academic</tt>
<br><tt>Review, Multicase</tt>
<br><tt>Review, Tutorial</tt>
<br><tt>Scientific Integrity Review</tt>
<br><tt>Technical Report</tt>
<br><tt>Twin Study</tt></blockquote>
<p><br>3) virion - the attribute virion has been added to BioSource.genome.
It just complements proviral which was already there. This will map
to a /virion qualifier in the new GenBank feature table definition.
<p>4) division - OrgName.div now (optionally) can contain the GenBank division
code (eg. PRI).
<p>5) signal-peptide, transit-peptide - were added to Prot-ref, to support
annotation of protein features on the protein sequence in a way that could
be mapped to a GenBank feature table.
<p>That's all. Relevant sections of the asn.1 specification are shown below.
<br>
<hr WIDTH="100%">
<br><tt>biblio.asn</tt>
<p><tt>PubMedId ::= INTEGER -- Id from the PubMed database at NCBI</tt>
<p><tt>and..</tt>
<br>
<p><tt>Cit-gen ::= SEQUENCE {
-- NOT from ANSI, this is a catchall</tt>
<br><tt> cit VisibleString OPTIONAL , -- anything, not parsable</tt>
<br><tt> authors Auth-list OPTIONAL ,</tt>
<br><tt> muid INTEGER OPTIONAL , --
medline uid</tt>
<br><tt> journal Title OPTIONAL ,</tt>
<br><tt> volume VisibleString OPTIONAL ,</tt>
<br><tt> issue VisibleString OPTIONAL ,</tt>
<br><tt> pages VisibleString OPTIONAL ,</tt>
<br><tt> date Date OPTIONAL ,</tt>
<br><tt> serial-number INTEGER OPTIONAL , -- for GenBank style
references</tt>
<br><tt> title VisibleString OPTIONAL , -- eg.
cit="unpublished",title="title"</tt>
<br><tt> pmid PubMedId OPTIONAL }
-- PubMed Id</tt>
<br> <tt></tt>
<p><tt>pub.asn</tt>
<p><tt>Pub ::= CHOICE {</tt>
<br><tt> gen Cit-gen ,
-- general or generic unparsed</tt>
<br><tt> sub Cit-sub ,
-- submission</tt>
<br><tt> medline Medline-entry ,</tt>
<br><tt> muid INTEGER ,
-- medline uid</tt>
<br><tt> article Cit-art ,</tt>
<br><tt> journal Cit-jour ,</tt>
<br><tt> book Cit-book ,</tt>
<br><tt> proc Cit-proc , -- proceedings
of a meeting</tt>
<br><tt> patent Cit-pat ,</tt>
<br><tt> pat-id Id-pat , -- identify
a patent</tt>
<br><tt> man Cit-let ,
-- manuscript, thesis, or letter</tt>
<br><tt> equiv Pub-equiv, -- to cite
a variety of ways</tt>
<br><tt> pmid PubMedId } -- PubMedId</tt>
<p><tt>medline.asn</tt>
<p><tt>
-- a MEDLINE or PubMed entry</tt>
<br><tt>Medline-entry ::= SEQUENCE {</tt>
<br><tt> uid INTEGER OPTIONAL , -- MEDLINE UID,
sometimes not yet available if from PubMed</tt>
<br><tt> em Date ,
-- Entry Month</tt>
<br><tt> cit Cit-art ,
-- article citation</tt>
<br><tt> abstract VisibleString OPTIONAL ,</tt>
<br><tt> mesh SET OF Medline-mesh OPTIONAL ,</tt>
<br><tt> substance SET OF Medline-rn OPTIONAL ,</tt>
<br><tt> xref SET OF Medline-si OPTIONAL ,</tt>
<br><tt> idnum SET OF VisibleString OPTIONAL ,
-- ID Number (grants, contracts)</tt>
<br><tt> gene SET OF VisibleString OPTIONAL ,</tt>
<br><tt> pmid PubMedId OPTIONAL ,
-- MEDLINE records may include the PubMedId</tt>
<br><tt> pub-type SET OF VisibleString OPTIONAL } -- may show publication
types (review, etc)</tt>
<p><tt>seqfeat.asn</tt>
<p><tt>OrgName ::= SEQUENCE {</tt>
<br><tt> name CHOICE {</tt>
<br><tt> binomial BinomialOrgName ,
-- genus/species type name</tt>
<br><tt> virus VisibleString ,
-- virus names are different</tt>
<br><tt> hybrid MultiOrgName ,
-- hybrid between organisms</tt>
<br><tt> namedhybrid BinomialOrgName , --
some hybrids have genus x species name</tt>
<br><tt> partial PartialOrgName } OPTIONAL
, -- when genus not known</tt>
<br><tt> attrib VisibleString OPTIONAL , -- attribution
of name</tt>
<br><tt> mod SEQUENCE OF OrgMod OPTIONAL ,</tt>
<br><tt> lineage VisibleString OPTIONAL , -- lineage with semicolon
separators</tt>
<br><tt> gcode INTEGER OPTIONAL ,
-- genetic code (see CdRegion)</tt>
<br><tt> mgcode INTEGER OPTIONAL ,
-- mitochondrial genetic code</tt>
<br><tt> div VisibleString OPTIONAL }
-- GenBank division code</tt>
<p><tt>BioSource ::= SEQUENCE {</tt>
<br><tt> genome INTEGER { -- biological context</tt>
<br><tt> unknown (0) ,</tt>
<br><tt> genomic (1) ,</tt>
<br><tt> chloroplast (2) ,</tt>
<br><tt> chromoplast (3) ,</tt>
<br><tt> kinetoplast (4) ,</tt>
<br><tt> mitochondrion (5) ,</tt>
<br><tt> plastid (6) ,</tt>
<br><tt> macronuclear (7) ,</tt>
<br><tt> extrachrom (8) ,</tt>
<br><tt> plasmid (9) ,</tt>
<br><tt> transposon (10) ,</tt>
<br><tt> insertion-seq (11) ,</tt>
<br><tt> cyanelle (12) ,</tt>
<br><tt> proviral (13) ,</tt>
<br><tt> virion (14) } DEFAULT unknown ,</tt>
<br><tt> origin INTEGER {</tt>
<br><tt> unknown (0) ,</tt>
<br><tt> natural (1) ,
-- normal biological entity</tt>
<br><tt> natmut (2) ,
-- naturally occurring mutant</tt>
<br><tt> mut (3) ,
-- artificially mutagenized</tt>
<br><tt> artificial (4) ,
-- artificially engineered</tt>
<br><tt> synthetic (5) ,
-- purely synthetic</tt>
<br><tt> other (255) } DEFAULT unknown ,</tt>
<br><tt> org Org-ref ,</tt>
<br><tt> subtype SEQUENCE OF SubSource OPTIONAL }</tt>
<p><tt>Prot-ref ::= SEQUENCE {</tt>
<br><tt> name SET OF VisibleString OPTIONAL , -- protein name</tt>
<br><tt> desc VisibleString OPTIONAL ,
-- description (instead of name)</tt>
<br><tt> ec SET OF VisibleString OPTIONAL , --
E.C. number(s)</tt>
<br><tt> activity SET OF VisibleString OPTIONAL , -- activities</tt>
<br><tt> db SET OF Dbtag OPTIONAL ,
-- ids in other dbases</tt>
<br><tt> processed ENUMERATED {
-- processing status</tt>
<br><tt> not-set (0) ,</tt>
<br><tt> preprotein (1) ,</tt>
<br><tt> mature (2) ,</tt>
<br><tt> signal-peptide (3) ,</tt>
<br><tt> transit-peptide (4) } DEFAULT not-set
}</tt>
<p>
<hr WIDTH="100%">
<center><b>Notes from Previous Releases</b>
<br><b><font size=+1>New Functions in Version 4.0</font></b></center>
<hr WIDTH="100%">
<br>There are a host of new functions in this release, but as usual we
have not managed to make time to document them all. Large parts of Sequin
are present which will be announced and described more fully in the fall.
However, specific tools of immediate interest are:
<p>blast2 - this is the long awaited BLAST client/server which permits
structured interaction with BLAST over the internet. We have provided a
basic client that produces the traditional blast output. In addition, the
function call interface can be used in more elaborate clients. For more
information contact Tom Madden, <a href="mailto:[email protected]">[email protected]</a>
<p>WARNING!!! blast2 is the client we plan to support on the longer term.
The blast1 client we included for those of you who wanted a head start
will NOT be supported in future. Please shift any blast1 clients to the
(very similar) blast2 interface as soon as possible.
<p>sim, sim2 - protein and DNA sequence alignments in linear space. This
is the function call interface to these valuable tools. Applications have
been written which are available by ftp as are published papers. For more
information contact Jinghui Zhang, <a href="mailto:[email protected]">[email protected]</a>
<br>
<br>
<p><b>Changes in ASN.1 spec 4.0 from 3.0</b>
<p>Affil - biblio.asn
<br>added the field "postal-code" for Zip code finally.
<p>Contact-info - submit.asn
<br>added the field "contact" which is type "Author". The contact info
has evolved into a fully structured form, so I just took Author which has
structured names and structured address (Affil). We will eventually phase
out all the less structured ones in Contact-info.
<p>OrgName - sefeat.asn
<br>added "lineage", "gcode", "mgcode" for the lineage, genetic code, and
mitochondrial genetic code. This is part of Org-ref, and consolidates all
the organism info (except original SOURCE line) out of the GenBank block...
and enables us to deliver it nicely from Taxon.
<p>Seq-descr - seq.asn
<br>removed the Seq-descr "neighbors" and replaced it with "dbxref", since
neighbors has never been used. This is used to add cross-references to
the whole entry.
<p>Pubdesc - seq.asn
<br>has an added slot, "reftype" which is an integer and is used to indicate
the GenBank usage of a reference.
<p>0 - seq - applies to the sequence. This is default and they way it is
used now.
<br>1 - sites - applies to (unspecified) features. Equivalent to a GenBank
SITES feature. We could switch to this from using the Imp-feat we do now.
<br>2 - feats - applies to specific features. The idea here is provide
a place for the full citation, so features nead only reference it. If now
features reference it should be removed. This would work for checking content
when only a part of a sequence is copied or pasted. A "sites" ref could
not have this check since we do not know which features it goes to.
<p>Seq-feat - seqfeat.asn
<br>added a slot called "dbxref" to Seq-feat. This is a SET OF Dbtag. It
will be for adding the new db_xref qualifiers to features. We already have
some of these in the xref slots of Gene-ref, Prot-ref, Org-ref. It means
we have to check two places in these cases. I do not want to retire the
slots since these were meant to be used in other contexts besides features..
and Org-ref already is.
<p>added a slot called "anticodon" to the tRNA extension of the RNA feature.
This is a Seq-loc that points to the location of the anticodon in a tRNA.
We have been populating this data in a User-object, and will have to do
a retro to convert it.
<p><b>EXPORTED Genetic-code</b>
<p>Seq-align - seqalign.asn
<br>added "bounds" to Seq-align so you can record the regions over which
an alignment was computed.. not always included in the resulting alignment
itself.
<p>added two new types:
<br> A) Packed-seg -- a denser representation from Colombe and Jinghui
<br> B) disc - discontinuous alignments as a SEQUENCE OF Seq-align
<p>Seq-annot - seq.asn
<p>added a field to Seq-annot, Align-def, to discriminate types of alignment
sets. This has the advantage of minimal changes as well as separating sets
of alignments from conceptually single alignments. I am not sure it is
necessary to distinguish "alt" from "blocks" though. Also it means you
can attach more info, with other Seq-annot fields and/or by expanding the
Align-def. I put in "ids" in Align-def specifically to put the one Seq-id
that is the "master" for type "ref". I made it a SET OF so we could use
it for other collections where we might want to list more than one.
<p>added "ids" and "locs" as allowed types within Seq-annot. This would
enable us to pass lists like this around between tools with all the addtional
descriptive information in Annotdesc. I know this will be useful.
<p>added "general" to Annot-id for tracking 3rd party annotations.
<br>
<br>
<br>
<br>
<br>
<center>
<p><b><font size=+1>INTRODUCTION</font></b></center>
<p>This distribution is release 5.0 of the NCBI core library for building
portable software, and AsnLib, a collection of routines for handling ASN.1
data and developing ASN.1 software applications. AsnLib and the asntool
application are built using the CoreLib routines. In the ./doc directory
is an MS Word file which details the information given below. It is also
available as hardcopy. See the README in ./doc.
<p>The lowest layer of code is the CoreLib. These are multiplatform
functions for memory allocation (including byte stores), string manipulation,
file input and output, error and general messages, and time and date notification.
These functions have been written only where we found that the existing
ANSI functions were not sufficiently multi-platform or wellbehaved among
all of the platforms that we support. For each platform (a combination
of processor, operating system, compiler, and windowing system), we supply
a specific ncbilcl.h file, which contains typedefs and defines for multi-platform
symbols,and includes a number of standard header files. (For example,
ncbilcl.msw is used for the Microsoft C compiler under Microsoft Windows
on the PC.)
<br>Use of these symbols, and of the functions in the CoreLib, allow us
to write multi-platform source code for a variety of disparate platforms.
<p>The next layer of code is the AsnLib stream reader. This is used
in conjunction with a header file and a parse table loader file, both of
which are produced by processing the formal ASN.1 specification with the
AsnTool application. The symbolic defines in the header file are pointers
into the parse table, in which the ASN.1 specification is represented.
To read at the stream reader level, a program alternates between calls
to AsnReadId and AsnReadVal. AsnReadId returns a pointer into the parse
table, which can be compared against the defines in the AsnTool-generated
header. For example, in the specification for MEDLINE records, the
Medline-entry section has an item called "uid", for the unique ID of the
record. This is symbolized in the header file as MEDLINE_ENTRY_uid.
When AsnReadId returns this symbol, the program calls AsnReadVal to obtain
the uid for that record. AsnKillValue is also needed to free any memory
allocated by AsnReadVal, which occurs when the value is a string and not
an integer. The entire set of records on the Entrez CD-ROM can be
read as a single stream with the AsnLib functions.
<p>The ASN.1 records may be accessed at a higher level through the object
loaders, which utilize the stream processing functions to load C memory
structures with the contents of the ASN.1 objects. For each ASN.1 object
we specify, we also define an equivalent C memory structure. The
object loader level of code contains functions to read and write each ASN.1
object. These are hierarchical, as are the ASN.1 specifications.
Calling the top level loader, SeqEntryAsnRead, will load an entire SeqEntry
from an open AsnIo channel, and will return apointer to the loaded memory
structure. The read function for an AsnIo channel can be swapped
to refer to a normal disk file, a network socket, or to compressed data,
which it automatically decompresses. The object loader code can interconvert
between the highly-branched memory object and a linear ASN.1 message with
complete fidelity. The object loaders have additional functions,
including the ability to explore the structure and notify the program when
particular data elements are encountered. The entire contents of
the Entrez CD-ROM can also be streamed through the object loaders.
However, most calls to the object loaders for simply reading a particular
record are done via the data access functions (see below).
<p>The data access functions allow a program to call the object loaders
on a sequence or MEDLINE record given the uid of the record. This will
get the data into memory regardless of whether the data are compressed
on the Entrez CD-ROM or are obtained through a service over the Internet.
This means that a detailed understanding of the files and formats on the
Entrez disc is not needed by application programmers. The function to load
a sequence record, SeqEntryGet, needs the uid to retrieve and a complexity
code parameter. A sequence record is in the form of a NucProt set.
This contains a nucleotide (which may itself be composed of segments) and
all of the proteins it is known to encode. The set of segments is called
a SegSet, and the individual sequences are called BioSeqs. We have
taken the liberty of producing this integrated view, but the complexity
code parameter allows the record to be easily loaded in a simpler, more
traditional form, if desired. The accession number term list is built
to supply the proper uids to support this facility. This access library
is compatible with Entrez release 1.0 or later only.
<p>The sequence utilities and application programmer interface layer allows
exploration of the loaded memory structures and generation of standard
literature or sequence reports from those objects. For example, a
BioSeq can be converted to FASTA or GenBank flat file formats and saved
to a file, and a MEDLINE record can be saved in MEDLARS format, which is
suitable for entry into personal bibliographic database programs.
A sequence port can be opened that gives a simple, linear view of a segmented
sequence, converting alphabets, merging exon segments, and dealing with
information on both strands of the DNA. This layer also includes
some functions to explore the NucProt set. The explore functions
visit each individual BioSeq in the set, calling a callback function for
each sequence node so that a program can examine feature tables and other
information that are associated with the NucProt or SegSets or with the
individual sequences.
<p>Vibrant is a multi-platform user interface development library that
runs on the Macintosh, Microsoft Windows on the PC, or X11 and OSF/Motif
on UNIX and VAX computers [separate documentation]. It is used to
build the graphical interface for the Entrez application (whose source
code is in the browser directory). The philosophy behind Vibrant is that
everything in the published user interface guidelines (the generic behavior
of windows, menus, buttons, etc.), as well as positioning and sizing of
graphical control objects, is taken care of automatically. The program
provides callback functions that are notified when the user has manipulated
an object. Vibrant and Entrez code are not supported, but are provided
on an as-is basis.
<p>The advantage of using AsnLib and the object loaders, as they are implemented,
is that application program developers merely need to recompile their programs
with the new (AsnTool-generated) header files and load the new parse tables
(included with the Entrez software) in order to be able to read the new
data. This process is straightforward, and will not break existing
program code. The application is free to ignore new fields if it
does not choose to take advantage of the new kinds of information.
<p>When developing new ASN.1 specifications, as of June 1994 it is possible
to automatically generate the object loaders and header files for those
specifications, using the AsnCode utility. For some complex ASN.1
specifications, however, AsnCode may fail to generate the correct source
code.
<p>The documentation is currently being brought up to date. The programs
in the demo directory are designed to teach the proper use of many of the
functions discussed above. Many of these programs are not yet documented.
The simplest is testcore.c, which tests various functionsin the CoreLib.
The most complex is getfeat.c, which takes an accession number of locus
name, determines the unique seq ID, retrieves the entry from the Entrez
CD-ROM using the data access library, locates all coding region features
using the explore functions, and prints the DNA sequences of all exons
using sequence port functions. If you cannotextract and print the
doc.tar.Z file, please send an email message with your land mailing address
and phone number to <a href="mailto:[email protected]">[email protected]</a>,
and we will mail a copy to you.
<p>The contents of the ncbi directory (the highest level, containing the
NCBI Software Development Kit source code in several subdirectories) is
shown below. The readme file contains instructions on copying the
appropriate make files to be built in the build directory. The makeallfile
copies headers to the include directory builds four libraries (ncbi, ncbiobj,
ncbicdr and vibrant), copying them to the lib directory. The makedemo file
builds the demo programs and the Entrez application:
<br>
<ul>
<li>
api Application Programmer Interface, Sequence Utilities</li>
<li>
asn ASN.1 specifications for publications and sequences</li>
<li>
asnlib Source code for AsnLib and asntool</li>
<li>
asnload AsnLib headers and dynamic parse tables (Mac and PC)</li>
<li>
asnstat AsnLib headers that use static memory (UNIX and VMS)</li>
<li>
bin Asntool executable copied here</li>
<li>
biostruc Source code for Molecular Modelling DataBase functions</li>
<li>
browser Source code for Entrez application</li>
<li>
build Empty directory for building tools and libraries</li>
<li>
cdromlib Access routines for data on the Entrez CD-ROM</li>
<li>
cn3d Source code for Vibrant-based 3D structure viewer</li>
<li>
config Configuration files for NCBI software:</li>
<ul>
<li>
dos</li>
<li>
mac</li>
<li>
unix</li>
<li>
vms</li>
<li>
win</li>
</ul>
<li>
corelib Source code for NCBI Core Software Library</li>
<li>
data Data files used for sequence conversion</li>
<li>
demo AsnLib and sequence utility demonstration programs</li>
<li>
desktop Source code for Vibrant-based viewers and editors</li>
<li>
doc Documentation in Microsoft Word file</li>
<li>
include Include files required by applications are copied here</li>
<li>
lib Libraries copied here</li>
<li>
link Contains several subdirectories with build accessory files:</li>
<ul>
<li>
macmet Macintosh Metrowerks/CodeWarrior</li>
<li>
macmpw Macintosh MPW C</li>
<li>
msdos Microsoft C and Borland C for DOS</li>
<li>
mswin Microsoft C and Borland C for Windows</li>
</ul>
<li>
make Make files for various systems</li>
<li>
network Network version of data access</li>
<ul>
<li>
apple</li>
<li>
blast2</li>
<li>
encrypt</li>
<li>
entrez</li>
<li>
netmanag</li>
<li>
nsclilib</li>
</ul>
<li>
object Functions for reading and writing complex objects</li>
<li>
sequin Source code for Sequin application</li>
<li>
tools Source code for alignment and other contributed utilities</li>
<li>
readme File that contains important building instructions</li>
<li>
vibrant Source code for Vibrant portable interface package</li>
</ul>
<p><br>The platforms that are supported (as indicated by the suffix on
the relevant ncbilcl.h file) are shown below. Those marked with an asterisk
(*) are available as-is:
<p>370* IBM 370
<br>acc SUN acc compiler
<br>alf DEC Alpha under OSF/1
<br>aov DEC Alpha under AXP/OpenVMS
<br>aux* Macintosh A/UX
<br>bor Borland for DOS
<br>bwn Borland for Microsoft Windows
<br>ccr CenterLine CodeCenter
<br>cpp SUN C++
<br>cra* Cray
<br>cvx* Convex
<br>gcc Gnu gcc (under SunOS, not Solaris)
<br>hp * Hewlett Packard
<br>lna* Linux on DEC Alpha
<br>lnx Linux (Red Hat Linux release 5.2 with kernel 2.0.36)
<br>met Macintosh Metrowerks compiler
<br>mpw Macintosh Programmer's Workshop
<br>msc Microsoft C for DOS
<br>msw Microsoft for Windows
<br>nxt* NeXT
<br>r6k* IBM RS 6000
<br>scr CodeCenter under Sun Solaris
<br>sgi Silicon Graphics
<br>sin Sun Solaris on Intel processors
<br>sol Sun Solaris (for cc and gcc)
<br>thc THINK C on Macintosh
<br>ult DEC ULTRIX
<br>vms DEC VAX/VMS
<p>Questions or comments can be directed to <a href="mailto:[email protected].">[email protected].</a>
<p><b>ANSI C:</b>
<p> This software requires an ANSI C compiler. This will be no problem
at
<br>all except to people on Sun machines, where the bundled C compiler,
cc, is
<br>non-ansi. However, you can use the Sun unbundled compiler, acc,
or the Gnu
<br>compiler, gcc (which is free) and that works just fine. If you
have written
<br>applications on the Sun with non-ANSI functions, the ANSI compilers
will
<br>complain. See the notes below if this is a problem.
<center>
<p><b><font size=+1>INSTALLATION</font></b></center>
<p>To build the NCBI toolkit you need to look for platform-dependent instructions:
<br>For UNIX:
<br> look at the file make/readme.unx
<br>For Mac:
<br> look at the file make/readme.mac
<br>For Microsoft Windows95/98/NT:
<br> look at the file make/readme.dos
<p>There is some information which may be useful for NCBI tookit building
<br>in the file doc/FAQ.txt
<p><b>ALL</b>
<br> change to the directory above the ncbi subdirectory
<p><b>Unix</b>
<br> tested on Sun Sparc (Solaris 2.6, Sunos 4.1.3),
<br> Silicon Graphics IRIX 5.* and 6.*, DEC Alpha with OSF/1 V4.0,
<br> Linux (Red Hat Linux release 5.2 with kernel 2.0.33) on Intel,
<br> Sun Solaris for Intel (Solaris 2.7).
<p> Run the script ncbi/make/makedis.csh keeping it's output in the
<br> separate file:
<br> for sh or bash:
<blockquote><tt>ncbi/make/makedis.csh 2>&1 | tee out.makedis.csh</tt></blockquote>
for csh or tcsh:
<blockquote><tt>ncbi/make/makedis.csh |& tee out.makedis.csh</tt></blockquote>
If that script gives you an error like this:
<blockquote><tt>Your platform is not supported.</tt>
<br><tt>To port ncbi toolkit to your platform consult</tt>
<br><tt>the files platform/*.ncbi.mk</tt></blockquote>
then you should check the script ncbi/make/makedis.csh and
<br> add proper platform-dependent ncbi.mk file in ncbi/platform
<br> directory.
<p> Other UNIX: AIX, ULTRIX, NeXt, Sun acc,
<br> Follows models above. Read header in makeall.unx and makedemo.unx
<br> for details.
<p> for all UNIX, edit .ncbirc as described in section "CONFIGURATION
OR
<br> SETTINGS FILES".
<br> optional edit .login to "setenv NCBI=[path to .ncbirc file]"
<br>
<p><b>MS-DOS</b>
<br>(Also see NEW MAKEFILES, below)
<br><u>Microsoft C version 7.00</u>
<blockquote><tt>copy ..\make\*.dos</tt>
<br><tt>ren makeall.dos makefile</tt>
<br><tt>nmake MSC=1 [note: nmake requires windows or DPMI]</tt>
<br><tt>copy ..\config\ncbi.dos ncbi.cfg</tt></blockquote>
check paths in ncbi.cfg file [see section on CONFIGURATION]
<p>Optional:
<br>edit AUTOEXEC.BAT with "set NCBI=[path to directory containing ncbi.cfg]".
<br>reboot to activate
<p> To make demo programs:
<blockquote><tt>nmake -f makedemo.dos MSC=1</tt></blockquote>
<u>Microsoft Windows version 7.00</u>
<blockquote><tt>copy ..\make\*.dos</tt>
<br><tt>ren makeall.dos makefile</tt>
<br><tt>nmake MSW=1 [note: nmake requires windows or DPMI]</tt></blockquote>
check paths in "ncbi.ini" as above
<br> copy ncbi.ini to your windows directory
<br> To make demos:
<blockquote><tt>nmake -f makedemo.dos MSW=1</tt></blockquote>
<u>Borland C++ 3.1</u>
<blockquote><tt>copy ..\make\*.dos</tt>
<br><tt>ren makeall.dos makefile</tt>
<br><tt>make -DBOR</tt></blockquote>
then set paths as in Microsoft C, above.
<p>To make demos:
<blockquote><tt>make -f makedemo.dos -DBOR</tt></blockquote>
<p><br><u>Borland C++ 3.1 for Windows</u>
<blockquote><tt>copy ..\make\*.dos</tt>
<br><tt>ren makeall.dos makefile</tt>
<br><tt>make -DBWN</tt></blockquote>
then set paths as in Microsoft Windows, above.
<br>To make demos:
<blockquote><tt>make -f makedemo.dos -DBWN</tt></blockquote>
<p><br><b>Mac</b><b></b>
<p>tested on <u>CodeWarrior IDE 2.1, MacOS 8.0</u>
<p><u>All</u>
<blockquote>copy <b>config:mac:ncbi.cnf </b>to your System Folder, or to
the <b>System Folder:Preferences</b> subfolder
<br>edit the "<b>ASNLOAD</b>" line in <i>"ncbi.cnf" </i>to point to the
<b>ncbi:asnload</b> directory in this release
<br>edit the "<b>DATA</b>" line to point to the <b>ncbi/data </b>directory
<br> </blockquote>
<u>CodeWarrior</u>
<blockquote>raise Preferred Size of Script Editor from 700 to 3000, and
raise Preferred Size of CodeWarrior IDE 2.1 by 2000 (e.g., from 8206 to
10206), using Get Info from the Finder.
<br>to compile for MC680x0 platform (default is PowerPC), change property
MASTER from "PPC" to "68K".
<br>run copyhdrs.met
<br>run makeall.met
<br>run makenet.met
<br>run makedemo.met</blockquote>
<u>Think C</u> - no longer supported
<br><u>MPW C</u> - no longer supported
<br>
<p><b>VMS</b>
<p><u>Changes to VMS make file naming conventions:</u>
<p> The old .dcl prefix (last character is a lower case L) was changed
<br>to .dc1 (last character is the numeral 1) to allow for different make
files
<br>for DecWindows 1.1 and DecWindows 1.2. Several new .dc2 files
were
<br>contributed by David Mathog of CalTech. A synopsis of his additional
<br>instructions:
<p> VAX C DecWindows 1.1 Use .dcl1 files.
<br> DEC C DecWindows 1.1 Use .dcl1 files, but change cc to
cc/standard=vaxc
<br> VAX C DecWindows 1.2 This combination has not been tested.
<br> DEC C DecWindows 1.2 Use .dcl2 files.
<p><u>VMS (without Vibrant) on VAX</u>
<br><tt> $set def [ncbi.build]</tt>
<br><tt> $copy [-.make]*.dc1 *.com</tt>
<br><tt> $@makeall</tt><tt></tt>
<p> check ncbi.cfg as described in section "CONFIGURATION OR SETTINGS
FILES".
<br> edit LOGIN.COM to "define NCBI [path to ncbi.cfg file]"
<p> To make demos:
<br><tt> $@makedemo</tt>
<p><u>VMS (with Vibrant) on VAX</u>
<br><tt> $set def [ncbi.build]</tt>
<br><tt> $copy [-.make]*.dc1 *.com</tt>
<br><tt> $@viball</tt>
<p> check ncbi.cfg as described in section "CONFIGURATION OR SETTINGS
FILES".
<br> edit LOGIN.COM to "define NCBI [path to ncbi.cfg file]"
<p> To make demos:
<br><tt> $@vibdemo</tt>
<p><b>Testing</b>
<p><u>VMS</u> only: look in rundemo.dc1 in [make] to see how to give
command line arguments. Not all demo programs are shown. Run at least testcore.
<p><u>All</u> else:
<br>In <b>build</b> directory should be a program called <b>testcore</b>.
Type "<tt>testcore -</tt>" and it should show you some default arguments.
Type "<b>testcore</b>" and it will run through a variety of functions in
CoreLib, prompting you for responses along the way. It should run
without a crash or error report. If you made Vibrant versions all demos
will have startup dialog boxes. If not, they take command line arguments.
<p>If testcore runs, read the documentation for CoreLib and for AsnLib.
In the AsnLib documentation are instructions for running asntool itself.
for running a few of the demo programs. There are a large number
of demo programs now (including Entrez itself, if you made the Vibrant
versions).
<br>
<br>
<br>
<br>
<center>
<p><b><font size=+1>CONFIGURATION OR SETTINGS FILES</font></b></center>
<p>One of the fundamental problems in writing portable software concerns
configuration issues. Each individual user's computer will have its
own particular hardware and software environment, and each machine will
have its disk file hierarchy set up in a unique manner. A program
that needs accessory information, such as help files, parse tables, or