编程技术网

关注微信公众号,定时推送前沿、专业、深度的编程技术资料。

 找回密码
 立即注册

QQ登录

只需一步,快速开始

极客时间

使用python以CSV格式提取BLAST输出列:Extracting BLAST output columns in CSV form with python

kaladasan Python 2022-5-11 10:46 5人围观

腾讯云服务器
使用python以CSV格式提取BLAST输出列的处理方法

我在excel中有一个csv文件,其中包含来自BLAST搜索的以下格式的文件:

I have a csv file in excel which contains the output from a BLAST search in the following format:

# BLASTN 2.2.29+ # Query: Cryptocephalus androgyne # Database: SANdouble # Fields: query id subject id % identity alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score # 1 hits found Cryptocephalus ctg7180000094003 79.59 637 110 9 38 655 1300 1935 1.00E-125 444 # BLASTN 2.2.29+ # Query: Cryptocephalus aureolus # Database: SANdouble # Fields: query id subject id % identity alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score # 4 hits found Cryptocephalus ctg7180000093816 95.5 667 12 8 7 655 1269 1935 0 1051 Cryptocephalus ctg7180000094021 88.01 667 62 8 7 655 1269 1935 0 780 Cryptocephalus ctg7180000094015 81.26 667 105 13 7 654 1269 1934 2.00E-152 532 Cryptocephalus ctg7180000093818 78.64 515 106 4 8 519 1270 1783 2.00E-94 340 

我已经使用csv将其作为csv导入了python

I have imported this as a csv into python using

with open('BLASToutput.csv', 'rU') as csvfile: contents = csv.reader(csvfile, delimiter=' ', quotechar='|') for row in contents: table = ', '.join(row) 

我现在想要做的是将数据列提取为列表.我的总体目标是计算所有具有超过98%同一性的比赛(第三列).

What I now want to be able to do is extract columns of data as a list. My overall aim is to count all the matches which have over 98% identity (the third column).

问题在于,由于这不是典型的csv格式,因此顶部没有标题,因此我无法根据其标题提取列.我在想是否可以将第三列提取为列表,然后可以在python中使用常规列表工具提取所需的数字,但是我从未使用过python csv模块,因此我一直在努力寻找合适的命令.关于SO的其他问题也类似,但是没有提到我没有标题和空单元格的特定情况.如果您能帮助我,我将不胜感激!

The issue is that, since this is not in the typical csv format, there are no headers at the top so I cant extract a column based on its header. I was thinking if I could extract the third column as a list I can then use normal list tools in python to extract just the numbers I want but I have never used pythons csv module and I'm struggling to find an appropriate command. Other questions on SO are similar but dont refer to my specific case where there are no headers and empty cells. If you could help me I would be very grateful!

问题解答

我设法根据以下方法找到一种方法:

I managed to find one way based on:

Python:使用多重分割定界符来分割文件

import csv csvfile = open("SANDoubleSuperMatrix.csv", "rU") dialect = csv.Sniffer().sniff(csvfile.read(1024)) csvfile.seek(0) reader = csv.reader(csvfile, dialect) identity = [] for line in reader: identity.append(line[2]) print identity 

这篇关于使用python以CSV格式提取BLAST输出列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程技术网(www.editcode.net)!

腾讯云服务器 阿里云服务器
关注微信
^