Postgress database query question #634

genomicsguy · 2024-08-08T12:14:17Z

Could you provide an example query for how you create the GFF or GTF download for the human annotations.

Please could you also confirm that https://rnacentral.org/static/img/rnacentral_latest_schema.png is the latest schema. For example, the schema shows "RnaPrecomputed" as the main table, yet the table available through the postgress interface is called "rnc_rna_precomputed".

Thank you!

genomicsguy · 2024-08-09T09:14:04Z

Here another example. You have two tables "rnc_sequence_regions_active_mapped" and "rnc_sequence_regions_active_provided" within the rnacen schema. I am assuming one or both of these is related to "SequenceRegionActive" in the online schema, but it is not clear which, and the tables contain a column not listed online, "providing_databases."

blakesweeney · 2024-08-12T08:48:17Z

Hi, the query we use is:

SELECT
  json_build_object(
      'assembly_id', :'assembly_id',
      'region_id', max(regions.region_name),
      'rna_id', max(pre.id),
      'description', max(pre.short_description),
      'rna_type',  max(pre.rna_type),
      'databases', regexp_split_to_array(max(pre."databases"), ','),
      'providing_databases', array_agg(ac.database),
      'chromosome', max(regions.chromosome),
      'strand', max(regions.strand),
      'identity', max(regions.identity),
      'was_mapped', bool_or(regions.was_mapped),
      'exons', array_agg(distinct exons.*)
  )
FROM rnc_rna_precomputed pre
JOIN rnc_sequence_regions_active regions
ON
  regions.urs_taxid = pre.id
JOIN rnc_sequence_exons exons
ON
  exons.region_id = regions.id
join rnc_accession_sequence_region sra
	on sra.region_id = regions.id
join rnc_accessions ac
	on sra.accession = ac.accession
WHERE
  pre.is_active = true
  AND regions.assembly_id = :'assembly_id'
GROUP BY regions.id
ORDER BY max(regions.chromosome), max(regions.region_start), regions.id

The result of this query is turned into GFF/BED as needed. In the database we have a table of provided coordinates, and of mapped coordinates. We combine these to produce the final coordinates. Note that the table name in the schema diagram follows python naming conventions, while the table names do not. We probably should update the diagram to indicate table names, but they are generally a simple modification of the table names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgress database query question #634

Postgress database query question #634

genomicsguy commented Aug 8, 2024

genomicsguy commented Aug 9, 2024

blakesweeney commented Aug 12, 2024

Postgress database query question #634

Postgress database query question #634

Comments

genomicsguy commented Aug 8, 2024

genomicsguy commented Aug 9, 2024

blakesweeney commented Aug 12, 2024