基因数据处理48之ART使用实例
发布时间:2021-03-12 08:49:50 所属栏目:大数据 来源:网络整理
导读:相关参数请见上一篇 1.使用实例1: hadoop @Master :~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem $ art_illumina -ss HS20 -i GRCH38chr1L3556522 .fna -l 100 -f 20 -o G38L100F20Nhs20 ==================== ART ==================== ART_Illumina ( 200
4.生成一条数据: hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 ====================ART==================== ART_Illumina (2008-2016) Q Version 2.5.1 (Apr 17,2016) Contact: Weichun Huang <whduke@gmail.com> ------------------------------------------- Single-end Simulation Total CPU time used: 15.82 The random seed for the run: 1464918910 Parameters used during run Read Length: 100 Genome masking 'N' cutoff frequency: 1 in 100 Fold Coverage: 0X Profile Type: Combined ID Tag: Quality Profile(s) First Read: HiSeq 2000 Length 100 R1 (built-in profile) Output files FASTQ Sequence File: G38L100c1Nhs20.fq ALN Alignment File: G38L100c1Nhs20.aln hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20. cat: G38L100c1Nhs20.: No such file or directory hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.fq @chr1-1 CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT + @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)<D?HCHBFAE?GEDC5D5ECD<CD<DBADDBE hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20. G38L100c1Nhs20.aln G38L100c1Nhs20.fq hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20. G38L100c1Nhs20.aln G38L100c1Nhs20.fq hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.aln ##ART_Illumina read_length 100 @CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 -rs 1464918910 @SQ chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 248956422 ##Header End >chr1 chr1-1 225496693 + CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT 5.使用bwa验证: hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.sam @SQ SN:chr1 LN:248956422 @PG ID:bwa PN:bwa VN:0.7.13-r1126 CL:bwa samse GRCH38chr1L3556522.fna G38L100c1Nhs20.sai G38L100c1Nhs20.fq chr1-1 0 chr1 225496694 37 100M * 0 0 CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)<D?HCHBFAE?GEDC5D5ECD<CD<DBADDBE XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:61A38 hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.aln ##ART_Illumina read_length 100 @CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 -rs 1464918910 @SQ chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 248956422 ##Header End >chr1 chr1-1 225496693 + CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT 可以发现art产生的数据是从位置0开始,跟Adam一致,bwa是从一开始 6.用snap验证: hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.snap.sam @HD VN:1.4 SO:unsorted @RG ID:FASTQ PL:Illumina PU:pu LB:lb SM:sm @PG ID:SNAP PN:SNAP CL:single index G38L100c1Nhs20.fq -o G38L100c1Nhs20.snap.sam VN:1.0beta.23 @SQ SN:chr1__AC:CM000663.2__gi:568336023__LN:248956422__rl:Chromosome__M5:6aef897c3d6ff0c78aff06ac189178dd__AS:GRCh38 LN:248956422 chr1-1 0 chr1__AC:CM000663.2__gi:568336023__LN:248956422__rl:Chromosome__M5:6aef897c3d6ff0c78aff06ac189178dd__AS:GRCh38 225496694 70 100M * 0 0 CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)<D?HCHBFAE?GEDC5D5ECD<CD<DBADDBE PG:Z:SNAP NM:i:1 RG:Z:FASTQ PL:Z:Illumina PU:Z:pu LB:Z:lb SM:Z:sm hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.aln ##ART_Illumina read_length 100 @CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 -rs 1464918910 @SQ chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 248956422 ##Header End >chr1 chr1-1 225496693 + CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT (编辑:晋中站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |