基因数据处理18之基因序列生成工具wgsim安装和使用
发布时间:2021-05-30 01:16:02 所属栏目:大数据 来源:网络整理
导读:1.下载: https://github.com/lh3/wgsim 可以git或者zip 2.安装: gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm 3.数据下载:可以使用bwakit下载: https://github.com/lh3/bwa/tree/master/bwakit 下载: bwa.kit/run-gen-ref hs38DH 4.使用方法和默认配置:
副标题[/!--empirenews.page--]
1.下载: https://github.com/lh3/wgsim 可以git或者zip 2.安装: gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm 3.数据下载:可以使用bwakit下载: https://github.com/lh3/bwa/tree/master/bwakit 下载: bwa.kit/run-gen-ref hs38DH 4.使用方法和默认配置: hadoop@Master:~/cloud/spark-1.5.2/examples/src/main/resources$ wgsim Program: wgsim (short read simulator) Version: 0.3.2 Contact: Heng Li <lh3@sanger.ac.uk> Usage: wgsim [options] <in.ref.fa> <out.read1.fq> <out.read2.fq> Options: -e FLOAT base error rate [0.020] -d INT outer distance between the two ends [500] -s INT standard deviation [50] -N INT number of read pairs [1000000] -1 INT length of the first read [70] -2 INT length of the second read [70] -r FLOAT rate of mutations [0.0010] -R FLOAT fraction of indels [0.15] -X FLOAT probability an indel is extended [0.30] -S INT seed for random generator [0,use the current time] -A FLOAT discard if the fraction of ambiguous bases higher than FLOAT [0.05] -h haplotype mode 5.使用实践: (1)默认双端: wgsim hs38DH.fa PE/hs38DHPE1LallF1.fq PE/hs38DHPE1LallF2.fq (2)默认匹配 hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ wgsim hs38DH.fa hs38DHSELallF1V2.fq /dev/null (3)-N ?产生reads的数量 -N 10000 wgsim -N 1000 hs38DH.fa PE/hs38DHPE1L1000F1.fq PE/hs38DHPE1L1000F2.fq 查看: 文件长度: hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat PE/hs38DHPE1L10000F1.fq |wc -l 39740 fq的格式为一条reads四行信息 文件内容: hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat PE/hs38DHPE1L10000F1.fq |head -20 @chrUn_KN707606v1_decoy_29_523_2:0:0_1:0:0_0/1 ATGCCCAGCTGGTTTCTGATACTTCTAATCAAATGTCTTATCCCCCAAATTAGCCCTGGGAGTGAGAATA + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707606v1_decoy_657_1222_1:0:0_1:0:0_1/1 GTGGTGCACACCTGTAGTGCCTGTTCCTTGGGAGGCTGAGGCCGGAGGATCCCTTGAGCCCAGGAGTTCA + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707606v1_decoy_1052_1588_2:0:0_1:1:0_2/1 GTCCAAACACCACGTGACAAGCCCATTCTTCCATTTTCTCAGACCATAAACTGCACTGTCCTCTAACTGC + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707607v1_decoy_1123_1686_1:0:0_2:0:0_0/1 GAGGATATTTTGTTTAGTCACTAGGATTTCTTAACATTCTGAAATTCTATTCACCTCTGATTTTGTCTAT + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707607v1_decoy_877_1369_0:0:0_0:0:0_1/1 TATAGTTAACATAACATGGTCTATCTTTAGATAATCTCCATGCACAGTAAGATAATATTTTTTCTAGGAC + 2222222222222222222222222222222222222222222222222222222222222222222222 (4)-1 第一个的reads的长度 -1 10表示第一个位置的fq的reads长为10 wgsim -N10000 -1 10 hs38DH.fa SE/hs38DHSE1N10000L10F1.fq /dev/null信息查看: hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat SE/hs38DHSE1N10000L10F1.fq |wc -l 39740 hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat SE/hs38DHSE1N10000L10F1.fq |head -20 @chrUn_KN707606v1_decoy_216_790_0:0:0_2:0:0_0/1 CATGTCTTTC + 2222222222 @chrUn_KN707606v1_decoy_1191_1728_0:0:0_1:0:0_1/1 TTAACCTTAA + 2222222222 @chrUn_KN707606v1_decoy_792_1284_1:0:0_0:0:0_2/1 CAGAACAAAA + 2222222222 @chrUn_KN707607v1_decoy_1925_2441_0:0:0_1:0:0_0/1 TGCAGGTTTG + 2222222222 @chrUn_KN707607v1_decoy_2305_2757_1:0:0_3:0:0_1/1 GGACAAGGGA + 2222222222 6.其他: (1)匹配: 使用BWA构建索引: (编辑:晋中站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |