Difference between revisions of "CSEP Gitlab"

From SCECpedia
Jump to navigationJump to search
 
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The CSEP2 software development and test environment uses the [https://about.gitlab.com/ Gitlab] integration software.
+
SCEC's CSEP project is coordinating an earthquake and ground motion forecasting project with national and international partners. Project members want to use the Gitlab system to support the development of the CSEP open-source scientific software development community, the code development, distributed version control, and automated software testing environment. Specifications describing the expected system requirements and preliminary data creation and transfer estimates are below.
 +
 
 +
== CSEP CI/CD System Overview Diagram ==
 +
[[Image:Csep gitlab roadmap.pdf|250px| "Gitlab Server and Runner"]]
  
== Proposed System Diagram ==
 
<gallery>
 
Image:Csep gitlab roadmap.pdf|left|150px| "Gitlab Server and Runner"
 
</gallery>
 
 
*[[Media:Csep gitlab roadmap.pptx | Gitlab Server Configuration as ppt file]]
 
*[[Media:Csep gitlab roadmap.pptx | Gitlab Server Configuration as ppt file]]
 
*[[Media:Csep gitlab roadmap.pdf | Gitlab Server Configuration as pdf file]]
 
*[[Media:Csep gitlab roadmap.pdf | Gitlab Server Configuration as pdf file]]
 
== CSEP Computers ==
 
*[[CSEP_Computers]]
 
*[[CSEP_Hardware_Inventory]]
 
*[[Media:Computer Hardware and Storage Systems · SCECcode CSEP Wiki.pdf | Github Docs]]
 
*SCEC CSEP Storage Summaries:
 
** SCEC Storage Summary: [[Media:Disk Usage Jan2020.txt | Usage Report]]
 
** Sorted by Username: [[Media:Sorted_Storage_Jan2020.txt | Sorted Usage Report]]
 
 
== Gitlab Installation ==
 
*[https://about.gitlab.com/pricing/#self-managed Gitlab Software Options]
 
*[https://about.gitlab.com/install/ GitLab and Runner Installation Information]
 
*[https://docs.gitlab.com/omnibus/docker/ GitLab Installation using Docker]
 
*[https://docs.gitlab.com/runner/install/docker.html GitLab Runner Installation with Docker]
 
  
 
== Data Usage Estimates ==
 
== Data Usage Estimates ==
 
Estimates for data usages are based on combining the two use cases for the GitLab repository (1) Code storage and (2) Storage of Large data files and experiment results. These estimates are based on data from the current CSEP project, such as current project members, current storage in current and old repositories, and rough estimates of expected data usage for past and future experiments.
 
Estimates for data usages are based on combining the two use cases for the GitLab repository (1) Code storage and (2) Storage of Large data files and experiment results. These estimates are based on data from the current CSEP project, such as current project members, current storage in current and old repositories, and rough estimates of expected data usage for past and future experiments.
 +
# amount generated internally by the system per year, most of which can be left on AWS storage: 5TB/year
 +
# amount of data transferred into AWS: 1TB/year
 +
# amount of data transferred out of AWS: 2.5TB/year
  
 +
== Data Estimates Breakdown ==
 
Project Info:<br>
 
Project Info:<br>
 
CSEP Project Members: 20<br>
 
CSEP Project Members: 20<br>
 
CSEP1 Repo Size (in Gb): 1.2<br>
 
CSEP1 Repo Size (in Gb): 1.2<br>
 +
Storage per experiment (in Gb): 10<br>
  
 
(1) Code Storage
 
(1) Code Storage
Line 44: Line 34:
 
Total stored: ~4.5 Tb<br>
 
Total stored: ~4.5 Tb<br>
 
Transferred: [250 Gb - 2.5 Tb]<br>
 
Transferred: [250 Gb - 2.5 Tb]<br>
 +
 +
== AWS Estimation Webpage ==
 +
*[https://aws.amazon.com/ec2/pricing/ EC2 Pricing]
 +
*[https://calculator.aws/ Total Cost Estimator]
 +
*[https://docs.gitlab.com/ee/install/aws/ AWS Gitlab Example Deployment]
  
 
== Gitlab Basic System Requirements ==
 
== Gitlab Basic System Requirements ==
Line 55: Line 50:
 
* Avoid installing GitLab Runner on the same machine where Gitlab is installed.
 
* Avoid installing GitLab Runner on the same machine where Gitlab is installed.
 
* GitLab needs JavaScript enabled in browsers to support features such as Issue Boards.
 
* GitLab needs JavaScript enabled in browsers to support features such as Issue Boards.
 +
 +
== CSEP Computer and Storage Inventory ==
 +
*[[CSEP_Computers]]
 +
*[[CSEP_Hardware_Inventory]]
 +
*[[Media:Computer Hardware and Storage Systems · SCECcode CSEP Wiki.pdf | Github Docs]]
 +
*SCEC CSEP Storage Summaries:
 +
** SCEC Storage Summary: [[Media:Disk Usage Jan2020.txt | Usage Report]]
 +
** Sorted by Username: [[Media:Sorted_Storage_Jan2020.txt | Sorted Usage Report]]
 +
 +
== Overview and Gitlab Installation ==
 +
*[https://about.gitlab.com/ Gitlab] integration software.
 +
*[https://about.gitlab.com/pricing/#self-managed Gitlab Software Options]
 +
*[https://about.gitlab.com/install/ GitLab and Runner Installation Information]
 +
*[https://docs.gitlab.com/omnibus/docker/ GitLab Installation using Docker]
 +
*[https://docs.gitlab.com/runner/install/docker.html GitLab Runner Installation with Docker]
 +
 +
== AWS Estimate ==
 +
 +
 +
Here is follow-up information for you to look over to get a better idea of the setup, general information, and pricing of the services.
 +
# '''Running GitLab on AWS:'''
 +
#* GitLab has fantastic documentation to get you setup and running. Using their Omnibus package or Marketplace listings are quick and easy ways to get started. GitLab has different Marketplace listings based on which license is used, the community version is linked.
 +
#** https://about.gitlab.com/install/#ubuntu
 +
#** https://aws.amazon.com/marketplace/pp/B071RFCJZK
 +
#* It does require setup and maintenance on your end, it’s not a fully managed service. You would provision the GitLab instance and the Runner instance and complete their espective software install and setup.
 +
# '''Pricing GitLab on AWS:'''
 +
#:'''Factors to consider:'''
 +
#* Storage:
 +
#** S3 (Simple Storage Service) is a great service for storing your data. It is highly scalable, durable, and available object storage where you don’t need to provision anything. It’s significantly cheaper than holding the data in volumes attached to your instances.
 +
#** The EBS (Elastic Block Store) volumes are charged based on their provisioned size and are resizable, so you can start off smaller and resize if needed.
 +
#* Compute:
 +
#** EC2 instance savings plans can greatly reduce your compute cost, ranging from around 30% to 60% compute savings over the on-demand instances shown in the price estimates. It varies based on the commitment time (1 or 3 years) and payment options (no upfront to full upfront).
 +
#** GitLab has a guide on autoscaling Runners on AWS: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/
 +
#:'''Pricing estimates:'''
 +
#* The wiki setup estimate: https://calculator.aws/#/estimate?id=a540551f0239aef761c25ec52c9b87f5c0571563
 +
#** $8500/year
 +
#** This has two instances, the GitLab instance with 2vCPU and 8GB of RAM, and the Runner instance with 4vCPU and 16GB of RAM. Each has 4TB of hard drive storage. This estimate does not include backups.
 +
#** The lion’s share of the cost from this estimate comes from the 8TB of provisioned storage attached to the instances, the majority of which can be held in S3.
 +
#* Moving storage to S3 estimate: https://calculator.aws/#/estimate?id=636f64ec1dff1f2f2e8aacfe50793480e3dc2160
 +
#** $5700/year
 +
#** The same setup as above but with the majority of storage (5TB) offloaded into S3. The GitLab instance has 50GB of SSD storage and the Runner instance has 250GB of SSD storage, each with daily backups. Storage performance for the instances will be better due to using SSDs instead of hard drives.
 +
#* Including with EC2 Instance Savings Plans: https://calculator.aws/#/estimate?id=7f6a08fbe7e1feba574cd8daf15145773f6fcb27
 +
#** $4500/year
 +
#** Same as the previous estimate but with EC2 one year savings plans paid upfront.
 +
# '''Serverless Managed CI/CD Solution, CodePipeline:'''
 +
#* Fully managed CI/CD pipeline solution where you would just manage the users.
 +
#** Source Control : AWS CodeCommit is a private Git service that can securely store your source code, binaries, and application assets.
 +
#** Build: AWS CodeBuild allows you to build and test your application in preconfigured or custom environments.
 +
#** Deploy: AWS CodeDeploy automates software deployments to your AWS or on-premises servers.
 +
#* Usage based pricing, there are no servers to provision or maintain.
 +
#* Free tier eligible so you can test out the services for free.
 +
#* Natively integrates with other AWS services and control plane.
 +
#* More information about CodePipeline: https://aws.amazon.com/codepipeline/
 +
#:'''Pricing estimates:'''
 +
#* $2500/year
 +
#* I’ve attached a spreadsheet with modifiable red input numbers to get an idea of how much this solution will cost.
 +
#* The 5TB of storage on S3 with 2.5TB of data transfer out per year ($1821/year) would be in addition to CodePipeline costs. For example, if the spreadsheet outputs $650/year, the total with S3 data storage, transfer, and CI/CD would be $2471/year.
 +
 +
[[Media:CodePipeline_Cost_Estimation.xlsx|Configurable Excel Spreadsheet]]
  
 
== Related Entries ==
 
== Related Entries ==
 
*[[CSEP_Working_Group]]
 
*[[CSEP_Working_Group]]
 
*[[CSEP_Project]]
 
*[[CSEP_Project]]

Latest revision as of 18:42, 25 June 2020

SCEC's CSEP project is coordinating an earthquake and ground motion forecasting project with national and international partners. Project members want to use the Gitlab system to support the development of the CSEP open-source scientific software development community, the code development, distributed version control, and automated software testing environment. Specifications describing the expected system requirements and preliminary data creation and transfer estimates are below.

CSEP CI/CD System Overview Diagram

"Gitlab Server and Runner"

Data Usage Estimates

Estimates for data usages are based on combining the two use cases for the GitLab repository (1) Code storage and (2) Storage of Large data files and experiment results. These estimates are based on data from the current CSEP project, such as current project members, current storage in current and old repositories, and rough estimates of expected data usage for past and future experiments.

  1. amount generated internally by the system per year, most of which can be left on AWS storage: 5TB/year
  2. amount of data transferred into AWS: 1TB/year
  3. amount of data transferred out of AWS: 2.5TB/year

Data Estimates Breakdown

Project Info:
CSEP Project Members: 20
CSEP1 Repo Size (in Gb): 1.2
Storage per experiment (in Gb): 10

(1) Code Storage Total data stored: 25 Gb (High-end estimate including models, codes and benchmark data sets)
Data transferred: [50-500] Gb

(2) Data/Catalog Storage Total Storage: 250 Gb
Data transferred: [100 Gb - 1 Tb]

(3) Experiment Results Total Storage: 4 Tb
Data transferred: [100 Gb - 1 Tb]

Overall Estimates: Total stored: ~4.5 Tb
Transferred: [250 Gb - 2.5 Tb]

AWS Estimation Webpage

Gitlab Basic System Requirements

  • Recent Linux Distro (Ubuntu,Centos...)
  • Generate SSH keys
  • Configure SMTP Server
  • 8GB RAM is the recommended minimum memory size for all installations and supports up to 100 users. 16GB RAM supports up to 500 users
  • Databases: PostgreSQL
  • Redis/Sidekiq: stores all user sessions and background task queue processes the background jobs with a multithreaded process
  • Prometheus and it’s exporters
  • Avoid installing GitLab Runner on the same machine where Gitlab is installed.
  • GitLab needs JavaScript enabled in browsers to support features such as Issue Boards.

CSEP Computer and Storage Inventory

Overview and Gitlab Installation

AWS Estimate

Here is follow-up information for you to look over to get a better idea of the setup, general information, and pricing of the services.

  1. Running GitLab on AWS:
    • GitLab has fantastic documentation to get you setup and running. Using their Omnibus package or Marketplace listings are quick and easy ways to get started. GitLab has different Marketplace listings based on which license is used, the community version is linked.
    • It does require setup and maintenance on your end, it’s not a fully managed service. You would provision the GitLab instance and the Runner instance and complete their espective software install and setup.
  2. Pricing GitLab on AWS:
    Factors to consider:
    • Storage:
      • S3 (Simple Storage Service) is a great service for storing your data. It is highly scalable, durable, and available object storage where you don’t need to provision anything. It’s significantly cheaper than holding the data in volumes attached to your instances.
      • The EBS (Elastic Block Store) volumes are charged based on their provisioned size and are resizable, so you can start off smaller and resize if needed.
    • Compute:
      • EC2 instance savings plans can greatly reduce your compute cost, ranging from around 30% to 60% compute savings over the on-demand instances shown in the price estimates. It varies based on the commitment time (1 or 3 years) and payment options (no upfront to full upfront).
      • GitLab has a guide on autoscaling Runners on AWS: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/
    Pricing estimates:
  3. Serverless Managed CI/CD Solution, CodePipeline:
    • Fully managed CI/CD pipeline solution where you would just manage the users.
      • Source Control : AWS CodeCommit is a private Git service that can securely store your source code, binaries, and application assets.
      • Build: AWS CodeBuild allows you to build and test your application in preconfigured or custom environments.
      • Deploy: AWS CodeDeploy automates software deployments to your AWS or on-premises servers.
    • Usage based pricing, there are no servers to provision or maintain.
    • Free tier eligible so you can test out the services for free.
    • Natively integrates with other AWS services and control plane.
    • More information about CodePipeline: https://aws.amazon.com/codepipeline/
    Pricing estimates:
    • $2500/year
    • I’ve attached a spreadsheet with modifiable red input numbers to get an idea of how much this solution will cost.
    • The 5TB of storage on S3 with 2.5TB of data transfer out per year ($1821/year) would be in addition to CodePipeline costs. For example, if the spreadsheet outputs $650/year, the total with S3 data storage, transfer, and CI/CD would be $2471/year.

Configurable Excel Spreadsheet

Related Entries