Last updated: 2022-02-01
Checks: 7 0
Knit directory: QuantitativeGen/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220124)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 79038f7. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: output/QTLgenotypes.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Simulate.Rmd
) and HTML (docs/Simulate.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 79038f7 | LucianoRogerio | 2022-02-01 | Homework 2 |
html | 60fd515 | LucianoRogerio | 2022-01-24 | Build site. |
Rmd | aba23b1 | LucianoRogerio | 2022-01-24 | wflow_publish(files = c("analysis/index.Rmd", "analysis/about.Rmd", |
html | 4f7264e | LucianoRogerio | 2022-01-24 | Build site. |
Rmd | 6c15995 | LucianoRogerio | 2022-01-24 | wflow_publish(files = c("analysis/index.Rmd", "analysis/Simulate.Rmd", |
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
If you rate your familiarity with R as “low” please go to Data Carpentry: R for Genomics in advance of the first class and do the following:
AlphaSimR
, workflowr
, tidyverse
, and sommer
packages.workflowr
aims to make it easier to follow a coding workflow that will increase the communicability and reproducibility of your code, and constrain you somewhat to following that workflow. The package sets up a standard directory file structure and populates it with useful initial R markdown (.Rmd) files.workflowr
also sets up the directory as a git version control repository. We will (probably) not teach git, but we encourage you all delving into it. Assume for the purpose of this class that you can use workflowr
just to set up the file structure.There is more you can do with the package, and it’s worth checking it out (and its extensive documentation!)workflowr
-created directory, and hand in that zipped directory.There are three components to this document:
workflowr
directory. The questions to answer through the lab and homework will be indented text.workflowr
provides a code
directory for programs, and an analysis
directory for scripts. Of course, scripts can (should) refer to functions, etc., in the code
directory. Any raw data can go in data
. Script outputs can go in output
. Final figures and so on can go in docs
.workflowr
commands wrap a number of git commands to simplify the work flow, particularly if you are working in the RStudio environment.If your script depends on external packages, load them at the beginning. This shows users early on what the script dependencies are.
ip <- installed.packages()
packages_used <- c("tidyverse", "workflowr")
for (package in packages_used){
isInstalled <- package %in% rownames(ip)
if (!isInstalled){
stop(paste("ERROR: you need to install the", package, "package"))
} else{
library(package, character.only=T)
}
}#END packages_used
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.6 ✓ dplyr 1.0.7
✓ tidyr 1.1.4 ✓ stringr 1.4.0
✓ readr 2.1.1 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
This is workflowr version 1.7.0
Run ?workflowr for help getting started
Data simulation is an important tool for quantitative geneticists. It can be used for model testing and validation, exploring the impacts of certain assumptions, and in breeding can be used to optimize breeding approaches by simulating various breeding strategies and measuring performance In PLBRG simulation will be used throughout the course in the labs.
In subsequent labs you will be introduced to simulation tools that are particularly useful to plant and animal breeders, but to start we will simply use some random number generators to sample values from a normal distribution and calculate some simple statistics.
#The function rnorm returns simulated values from a normal distribution given input parameters for the mean and variance.
# here we will draw 1000 samples from a normal distribution with mean= 15 and standard deviation = 10
set.seed(12)
s=rnorm(n = 1000, mean = 15, sd = 10)
#The samples from rnorm are stored in the vector s
#Random number generators use seeds to generate a sequence of random numbers. If you want a simulation to be repeated using the same sequence of numbers to need to keep the same seed (workflowr does this by default)
hist(s)
Version | Author | Date |
---|---|---|
60fd515 | LucianoRogerio | 2022-01-24 |
Now that we have sampled the normal random variables, let’s calculate some simple statistics. The purpose of this exercise if to introduce some basic coding concepts: the use of looping statements and basic matrix/vector operations.
First we will use two types of loops the sum the values in the vector s
#Here we initialize a value to store the sum of the values in s
sumS = 0.0
#Here we initialize the loop
for(i in c(1:1000)){
#here we give a function to add the values of s to sumS
#since the for loop is set up with the indicator variable "i" which will count i = 1 to 1000 we access values in s using s[i]
sumS=sumS+s[i]
} # end of the for loop to sum values i s
#calculate the mean
meanS=sumS/1000
#this could also be coded as ms=sumS/length(s)
print(meanS)
[1] 14.73563
#Since there is not a predefined set of numbers to loop through we have to initialize the indicator variable.
i=1
sumS=0.0
#In this case the condition to stop the loop will be when i reaches 1000
while(i<=length(s)){
sumS=sumS+s[i]
#We increase the value of i by 1 for each loop
i=i+1
} #end the while loop to sum values in s
#calculate the mean
meanS=sumS/length(s)
print(meanS)
[1] 14.73563
In addition to loops we can also using simple matrix/vector operations to calculate a sum.
#creating a vector of 1s of length 1000
#rep is an R function to create a vector of identical elements of some specified length
b=rep(1,1000)
#by default vectors in R are treated as columns
#to calculate and inner product you need the first vector to be a row and the second vector to be a column
#to transpose a column vector into a row vector use the function t()
sumS=t(s)%*%b
#in R %*% is used for matrix/vector multiplication
meanS=sumS/length(s)
print(meanS)
[,1]
[1,] 14.73563
Now let’s apply the use of loops to calculate the variance
#To calculate the variance we will need to first get the sum of squares
#First lets create a vector of centered values (mean removed)
sc=s-meanS[1]
#because we just calculate meanS using matrix/vector operations R treats the object meanS as a vector with 1 element - if calculate using loops it would be a scalar and no need for [1] after meanS
#The above line of code subtracts the mean from each of the 1000 elements in the vector s
#now we initialize a varaible for the sum of squares
sumSQ = 0.0
#Here we initialize the loop
for(i in c(1:1000)){
sumSQ=sumSQ+sc[i]**2 #a**b is used to raise a to the power b
} # end of the for loop to calculate the sum of squares
#calculate the variance
varS=sumSQ/(length(sc)-1)
#we lose 1 degree of freedom when using an estimate of the mean
print(varS)
[1] 92.03544
workflowr
to set up the file structure for this lab (I will walk you through it during our first lab session)Luciano Solution Compress the full folder structure created by workflowr and upload to Canvas before the next lab period.
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.6.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.7.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[5] purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
[9] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8 lubridate_1.8.0 getPass_0.2-2 ps_1.6.0
[5] assertthat_0.2.1 rprojroot_2.0.2 digest_0.6.29 utf8_1.2.2
[9] R6_2.5.1 cellranger_1.1.0 backports_1.4.1 reprex_2.0.1
[13] evaluate_0.14 highr_0.9 httr_1.4.2 pillar_1.6.4
[17] rlang_0.4.12 readxl_1.3.1 rstudioapi_0.13 callr_3.7.0
[21] whisker_0.4 jquerylib_0.1.4 rmarkdown_2.11 munsell_0.5.0
[25] broom_0.7.11 compiler_4.1.1 httpuv_1.6.5 modelr_0.1.8
[29] xfun_0.29 pkgconfig_2.0.3 htmltools_0.5.2 tidyselect_1.1.1
[33] fansi_1.0.2 crayon_1.4.2 tzdb_0.2.0 dbplyr_2.1.1
[37] withr_2.4.3 later_1.3.0 grid_4.1.1 jsonlite_1.7.3
[41] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 git2r_0.29.0
[45] magrittr_2.0.1 scales_1.1.1 cli_3.1.0 stringi_1.7.6
[49] fs_1.5.2 promises_1.2.0.1 xml2_1.3.3 bslib_0.3.1
[53] ellipsis_0.3.2 generics_0.1.1 vctrs_0.3.8 tools_4.1.1
[57] glue_1.6.0 hms_1.1.1 processx_3.5.2 fastmap_1.1.0
[61] yaml_2.2.1 colorspace_2.0-2 rvest_1.0.2 knitr_1.37
[65] haven_2.4.3 sass_0.4.0