Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenation on considerations for production adoption #426

Merged
merged 11 commits into from
Jan 9, 2024
Merged
77 changes: 77 additions & 0 deletions docs/deployment/considerations_production.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Considerations For Adopting Into Production

This documentation outlines essential considerations for adopting the Information Assistant (IA) accelerator into a production environment. Focused on scalability, high availability, security, and proactive management, the recommendations cover various components, including App Server scaling, App Gateway deployment for handling increased traffic, Load Balancing strategies, and leveraging Azure Front Door for global-scale content delivery. Additional guidance is provided for GPT Model throttling, security enhancements through Private Endpoints, proactive monitoring with Azure Monitor, and ensuring redundancy through multiple OpenAI instances. Consider the importance of safeguarding against cyber threats, integrating seamlessly with existing ecosystems, and proactively managing the environment through comprehensive monitoring and alerting mechanisms. Overall, these considerations serve as a guide for teams deploying IA, ensuring a smooth transition from a Proof of Concept to a production-ready implementation.

## Scalability and High Availability

These recommendations offer options for load balancing and high availability, catering to the scalability needs and global distribution requirements of your IA application.

### App Server - Scaling

**Consideration:** As the load on the App Server increases, scaling becomes crucial to handle varying levels of traffic efficiently.

**Recommendation:** Refer to the Azure [Autoscaling documentation](/docs/deployment/autoscale_sku.md) to set up dynamic scaling based on demand. Configure autoscaling rules to automatically adjust the number of instances in response to changes in load. You can also adjust the sku to scale vertically and adjust the number of workers in the app server. Alternatively, you could consider deploying the App to a container orchestration platform like Azure Kubernetes Service (AKS) for management and scaling.

### App Gateway
**Consideration:** Ensure App Gateway handles increased traffic and maintains high availability. Learn more about App Gateway [here](https://learn.microsoft.com/en-us/azure/application-gateway/overview).

**Recommendation:**
Explore horizontal scaling with multiple instances.
Use Azure Traffic Manager for global distribution.

### Load Balancing - Distribution of Workload
**Consideration:** Efficient workload distribution for optimal performance.

**Recommendation:**
Utilize Azure Load Balancer for even traffic distribution.
Adjust load balancing rules and implement health probes.
Learn more about load balancing options in Azure [here](https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/load-balancing-overview).

### Front Door - Global Scale and Content Delivery

**Consideration:** Address global-scale distribution and content delivery concerns.

**Recommendation:**
Implement Azure Front Door for global load balancing and improved content delivery.
Configure regional routing and caching for optimized user experience.
Learn more about Frontdoor [here](https://learn.microsoft.com/en-us/azure/frontdoor/front-door-overview).

## Security

### Private Endpoints

**Consideration:** Enhancing the security posture of your deployment is crucial, especially when dealing with sensitive data or services.

**Recommendation:** Consider implementing Private Endpoints to establish a private connection between public ingress points and related services, like storage accounts. By avoiding exposure to the public internet, you reduce the attack surface and enhance data privacy. Ensure that only necessary traffic is allowed through the Private Endpoint for increased security.
Learn more [here](https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview).

### Web Application Firewall (WAF)

**Consideration:** Safeguarding your IA application from cyber threats is crucial for protecting sensitive data and maintaining user trust.

**Recommendation:** Implement Azure WAF to protect against common web vulnerabilities and attacks. Regularly update WAF rules and policies to stay resilient against evolving security threats. Integrate with Azure Security Center for comprehensive threat detection and response. Learn more [here](https://learn.microsoft.com/en-us/azure/web-application-firewall/overview).


## AOAI Instances - Redundancy and Reliability

**Consideration:** To enhance system reliability and ensure business continuity, consider implementing multiple instances of your Azure OpenAI (AOAI).

**Recommendation:** You could deploy AOAI instances across multiple Azure regions for geographic redundancy. Leverage Azure Traffic Manager for global distribution and failover capabilities, ensuring uninterrupted service even in the event of a regional failure. Learn more about potential strategies [here](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-openai-architecture-patterns-and-implementation-steps/ba-p/3979934).


## GPT Model - Throttling

**Consideration:** Throttling is essential to control the rate at which requests are sent to the GPT model, preventing overload and maintaining performance.

**Recommendation**: Microsoft offers Provisioned Throughput options, allowing you to specify the maximum request rate for your GPT model. Define appropriate throughput limits based on the model's capacity and resource availability. Adjust throttling settings as needed to ensure optimal performance without risking service degradation. Learn more [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput).


## Monitoring And Alerting

**Consideration:** Proactively monitoring the IA environment is crucial for identifying performance issues, bottlenecks, and potential security threats.

**Recommendation:** Leverage Azure Monitor to gain insights into the performance and health of your entire IA ecosystem. Set up custom alerts based on key performance indicators and metrics. Utilize Azure Log Analytics for in-depth analysis and troubleshooting. Integrate Azure Security Center to monitor and respond to security threats in real-time.
Learn more [here](https://azure.microsoft.com/en-us/products/monitor/?ef_id=_k_2bb24bd93ec91aeba1fe2e4c90190298_k_&OCID=AIDcmm5edswduu_SEM__k_2bb24bd93ec91aeba1fe2e4c90190298_k_&msclkid=2bb24bd93ec91aeba1fe2e4c90190298).

## In Summary
Ensure that your IA accelerator seamlessly integrates into your existing ecosystem, considering compatibility and interoperability. Use the IA accelerator as a blueprint to plan integration into your ecosystem.
2 changes: 2 additions & 0 deletions docs/deployment/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ Once deployed, you can find the URL of your installation by:

At this point deployment is complete. Please go to the [Using the IA Accelerator for the first time](/docs/deployment/using_ia_first_time.md) section and complete the following steps.

Review Considerations for adopting and [migrating to a production environment](/docs/deployment/considerations_production.md).
ryonsteele marked this conversation as resolved.
Show resolved Hide resolved

## Need Help?

To review logs try [Using the Workbook Template](/docs/deployment/worbook_usage.md)
Expand Down
Loading