Skip to content

Open Canada Solr Search (OCS) is a Django 3.x application that uses Solr 8.x to provide a customizable search interface for the Open Canada data catalog and the proactive disclosure data.

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE-EN.txt
Unknown
LICENSE-FR.txt
MIT
LICENSE-url-search-params-polyfill.txt
Notifications You must be signed in to change notification settings

open-data/oc_search

Repository files navigation

Open Canada Search 2 (OCS2)

Visits Badge

About

Open Canada Search (OCS2) is a Django 3.x application that uses Solr 8.x to provide a customizable search interface for the Open Canada data catalog and the proactive disclosure data. OCS provides a standard customizable web interface with a focus on searching Solr cores.

Installing OCS from Source

System Requirements

OCS is built with the Django 4.x framework, and can run in any environment capable of supporting Django 4.x which is built with Python 3. Version 3.9 or higher of python is recommended. For more details, see the Django project pages. OCS2 has been tested on both Windows 10 and 11 and RHEL 8. It is highly recommended that users have some basic familiarity with Django before installing OCS2.

OCS2 requires a database backend that is supported by Django such as PostgreSQL or MySQL. Initial development can be done with the SQLite engine that is included with Python.

OCS2 also requires access to a Solr v8.x server. For information on installing Solr, please visit the Apache Solr Reference Guide.

For background data processing, OCS2 using Celery for Django.

Django Extensions

Django extensions are re-usable code modules provided by third party developers that provide additional functionality to Django. The Django core project comes with several contributed modules which are used by OCS2. It also uses several well-known plugins provided by third party developers. The python modules for these extensions are included in the project's requirements.txt file.

  1. Django CORS Headers A Django App that adds Cross-Origin Resource Sharing (CORS) headers to responses. This allows in-browser requests to your Django application from other origins.
  2. Django Jazzmin Admin Theme Provides a more modern Ui for the Django admin interface
  3. Django QUrl Template Tag A Django template tag to modify url's query string
  4. Django Celery Beat This extension enables you to store the periodic task schedule in the database. The periodic tasks can be managed from the Django Admin interface, where you can create, edit and delete periodic tasks and how often they should run.
  5. Django Celery Results This extension enables you to store Celery task results using the Django ORM.
  6. Django Smuggler Django Smuggler is a pluggable application for Django Web Framework to easily dump/load fixtures via the automatically-generated administration interface
  7. Django Timezone Field A Django app providing DB, form, and REST framework fields for zoneinfo and pytz timezone objects.

These Django plugins are enabled in the Django application's settings.py file. Example configuration can be found in settings-sample.py

Before Installing

Before installing OCS2, set up the prerequisites:

  • Python 3.9+
  • PostgreSQL 13 (recommended) or other Django supported database
  • Apache Solr Search Server 8.x

For production instances you will want a uWSGI server like uWSGI or Gunicorn

Steps

Before downloading code and setting up your virtual environment, choose an appropriate directory like /opt/tbs/search. Use of a dedicated non-privileged user is also recommended for running the server in production environments - no particular username is assumed. Change to your installation directory, optionally switch to the dedicated user, and follow these steps.

  1. Clone the OCS2 project from GitHub: https://github.com/open-data/oc_search

  2. Clone the SolrClient project from GitHub: https://github.com/open-data/SolrClient

  3. Clone the OCS2 custom searches from GitHub: https://github.com/open-data/oc_searches.git

  4. Create a python virtual environment using Python 3.6 or higher.

    For example python -m venv venv.

  5. Activate the new virtual environment.

    On Linux, the command is source venv/bin/activate. On Windows, the command venv\Scripts\activate where venv is the name of the virtual environment.

  6. Install SolrClient library.

    Change into the SolrClient project directory and install the prerequisites from the requirements.txt file and then install the client project itself.

    pip install -r requirements.txt

    python setup.py develop

  7. Install the OCS2 python library prerequisites.

    Change to the directory where OCS2 project was cloned from GitHub, then install from the requirements.txt file

    pip install -r requirements.txt

  8. Create a Django project settings file.

    Django by default with read project runtime settings from a settings.py file located in the application sub-directory. OCS2 provides an example settings file. Use the provided file settings-sample.py as a template for your own project.

    For more information on customizing the settings file, see the Django Project documentation.

  9. Create the Django, OCS2, and Celery database tables.

    In the settings.py file set the appropriate database settings and create the database tables. OCS2 has been tested with PostgreSQL 13.

    • python manage.py makemigrations search
    • python manage.py sqlmigrate search 0001
    • python manage.py migrate

    Downloading search results makes use of a Celery background worker that offloads the process for generating large CSV files that contain the data found for a given search from the main Django web application. To set up Celery for Django run the provided database migrations.

    python .\manage.py migrate django_celery_results
    python .\manage.py migrate django_celery_beat

  10. Start the Celery workers. Note, in production, the Celery workers should be daemonized.

    celery -A oc_search worker -l INFO --pool=solo [Windows]
    celery -A oc_search worker -l INFO [Linux]

    celery -A proj beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler

  11. Create an admin user for Django.

    python manage.py createsuperuser

  12. Test your installation by running Django.

    python manage.py runserver

Next Steps

The Search application is a blank framework. The next steps include making custom search plugins to create a custom interactive search application.

For production, Django should be installed as a WSGI application. For instruction on doing this with uWSGI, see the Django Documentation

Note on Logging

OCS2 has two logs, one for regular logging information and another for recording search activity. In the logging settings, be sure to set up your logging using a format similar to this:

    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
        'query_log': {
            'class': 'logging.StreamHandler',
            'formatter': 'search_term_formatter',
            'encoding': 'utf8',
        },
    },
    'formatters': {
        'search_term_formatter': {
            'format': '%(asctime)s,%(message)s',
            'datefmt': '%Y-%m-%dT%H:%M:%SZ'
        }
    },
    'loggers': {
        'search_term_logger': {
            'handlers': ['query_log'],
            'level': 'INFO',
            'propagate': False,
        },
    },
    'root': {
        'handlers': ['console'],
        'level': 'INFO',
    },

The search query log needs to be in a specific format so that the custom import_query_logs command can load the log file into the database where it can be processed. Logs will accumulate over time, so be sure to set up an information management policy for managing the logs.

Automated Testing

OCS2 comes with a basic end-to-end test suite that employs Playwright. See Tests for more information.


Overview

OCS2 is made of several components including:

  1. The Django web application that provides the search and administration web interfaces. The Django framework is a general purpose web application framework written in Python and is well supported.
  2. A relational database backend supported by Django. The database is used to hold routing, messaging, search definitions, and other permanent data. OCS2 has been tested with PostgreSQL 13.
  3. An Apache Solr text search engine that provides the semantic search engine. OCS2 uses the SolrClient library to both query with Solr and dynamically create search cores on the Solr server.
  4. A Celery background worker

High Level Architecture Diagram

Database

Each search definition is made of three or four components:

  1. Search: General information about the search such as labels and Solr core name
  2. Fields: Each search consists of a number of individual fields. Each field record is associated with a single Search record and contains metadata describing the field such as the data type and labels.
  3. Codes and code values (optional). Often structured data will contain code values or 'lookup' fields values where the field value must come from a predetermined list of values. For example, 'AB' maybe selected from a list of Canadian provincial acronyms. Each row in the table represents a single code value and is associated with a single field.
  4. ChronologicCodes: These are similar to codes, but have a start and end date time associated with a code value. This permits the Englisn and French values of the codes to be associated with a specific time range.

Combined, these three components, Search, Fields, and Codes, define a custom search application. Django provides an administrative user interface for editing the search definitions. To use, create an admin account, and login to the admin system. The OC Search admin screens have been modified with helpful customizations to make it easier to customize a search.

Tha actual search data is not stored in the relational database, but is stored only in the Solr search engine. The database contains the metadata model of the search application which describes the formant of the data that is searched, and the search interface,

Database Schema

Importing and exporting of search definitions is done using custom Django management commands..

Generating a Search from CKAN yaml

Creating a new search from scratch can be laborious. There are two command line utilities that can be used to generate new search definitions from existing data sources. One works with the CKAN yaml files that are used by Open Canada's proactive disclosure system, and another that createa a simple search derives from a basic search from a generic CSV files with a header.

Use the custom import_schema_ckan_yaml Django command to create a new search definition based on a schema defined in a CKAN scheming yaml file.

For example:

python manage.py import_schema_ckan_yaml --yaml_file .\data\travela.yaml --search_id travela --title_en "Travel Expenses" --title_fr "Dépenses de voyage gouvernementaux"

Generate for CSV

Use the custom generic_csv_schema Django command to create a simple search definition based on an existing CSV files with headers.

For example:

python manage.py generic_csv_schema --csv_file tpsgc-pwgsc_ao-t_a.csv --search_id tendernotices --title_en "Tender Notices" --title_fr "Appels d'offres"

OCS Commands

Several custom Django management commands are available

create_solr_core

To run: python manage.py create_solr_core <search name>

<search name Is the name of a search that has been defined either by running a load script or through the Django admin UI.

import_schema_ckan_yaml

To run: python manage.py import_schema_ckan_yaml --yaml_file <yaml file> --search_id <unique search ID> --title_en <English Title> --title_fr <French Title> [--reset]

This command will parse the CKAN YAML file and load it into the search model database

import_data_csv

To run: python manage.py --csv <CSV file> --search <Unique search ID> --core <Solr Core Name> [--nothing_to_report]


For Open Maps, the viewer is using the remote configuration services (RCS). The local RCS definition files are also located in the ramp/viewer folder. The provided files are specific to Open Canada.

Viewer directory structure

Plugin API Changes

Version 1.1

Added two new API functions that are called just before the search page is rendered and just before the record page is rendered:

def pre_render_search(context: dict, template: str, request: HttpRequest, lang: str, search: Search, fields: dict, codes: dict):

def pre_render_record(context: dict, template: str, request: HttpRequest, lang: str, search: Search, fields: dict, codes: dict):

Version 1.2

pre_render_search() function updated to include a view-type parameter, that allows rendering to differentiate between views like Search and More-Like-This

def pre_render_search(context: dict, template: str, request: HttpRequest, lang: str, search: Search, fields: dict, codes: dict, view_type='search'):

Debugging Celery with PyCharm on Windows

When running Celery on Windows, use the following command to start the Celery worker

celery -A oc_search worker -l INFO --pool=solo

To run a separate Celery process to run scheduled tasks:

celery -A oc_search beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler

To debug with PyCharm, add a new python Run Configuration. Change the script path to a module path and enter celery. For parameters use -A oc_search worker -l INFO --pool=solo

About

Open Canada Solr Search (OCS) is a Django 3.x application that uses Solr 8.x to provide a customizable search interface for the Open Canada data catalog and the proactive disclosure data.

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE-EN.txt
Unknown
LICENSE-FR.txt
MIT
LICENSE-url-search-params-polyfill.txt

Stars

Watchers

Forks

Packages

No packages published