release exercise-11
This commit is contained in:
parent
8ad8a7b51a
commit
30f2420cdc
|
@ -44,6 +44,7 @@ http://pythontutor.com/
|
||||||
| exercise-08 | 2024-12-04 |
|
| exercise-08 | 2024-12-04 |
|
||||||
| exercise-09 | 2024-12-11 |
|
| exercise-09 | 2024-12-11 |
|
||||||
| exercise-10 | 2024-12-18 |
|
| exercise-10 | 2024-12-18 |
|
||||||
|
| exercise-11 | 2025-01-15 |
|
||||||
|
|
||||||
## Submitting assignments
|
## Submitting assignments
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,393 @@
|
||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "b2b503075dc9e28aef8d23c9500b378d",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-6eb17f6ff50e7d3d",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Bioinformatics introduction"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "ce08e4f1546ea2f35e2c82f4250ecd17",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-946d22188d87a0a5",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"In this exercise, you will perform some bioinformatic analysis. There is no specific correct answer here, but rather a series of tasks which to fulfill. You are encouraged to use online resources and discussions with colleagues to help you at all stages.\n",
|
||||||
|
"\n",
|
||||||
|
"The majority of the work for this exercise is manually interacting with websites and understanding what types of data they use and first steps of using the online tools. The Python here is mainly used as a path for you to follow through the steps one-by-one.\n",
|
||||||
|
"\n",
|
||||||
|
"Pick a protein-coding gene interesting to you. In case you cannot think of anything, I suggest: ACE2, TP53, TGFbeta1, Drosophila DopEcR, TAS1R2, Drosophila nAChRalpha1. Find the gene at [NCBI Gene](https://www.ncbi.nlm.nih.gov/gene).\n",
|
||||||
|
"\n",
|
||||||
|
"## Task 1: What is the full name of this gene? What is the NCBI GeneID?"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "1bfc9d312a1c5a5db535b14165966b3f",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-f26307b555211aca",
|
||||||
|
"locked": false,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"YOUR ANSWER HERE"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "b917e142873b71e75f2681c5b31f7607",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-b9777b8be7bc0ad5",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Find the complete sequence for one protein isoform using [NCBI protein database website](https://www.ncbi.nlm.nih.gov/protein/). Hint: it may be easier to find the protein sequence after first finding the gene at [NCBI Gene](https://www.ncbi.nlm.nih.gov/gene), go to \"RefSeq\" sequences and then clicking on the protein sequence, which will probably start with `NP_`.\n",
|
||||||
|
"\n",
|
||||||
|
"Download a FASTA file with the seqence of this protein isoform.\n",
|
||||||
|
"\n",
|
||||||
|
"## Task 2: Make a variable named `original_fasta` which is a string containing the FASTA format protein sequence\n",
|
||||||
|
"\n",
|
||||||
|
"You can create a multi-line string in Python with triple quotes like this:\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"my_string = \"\"\"line 1\n",
|
||||||
|
"line 2\n",
|
||||||
|
"line 3\"\"\"\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"Or this this:\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"my_string = '''line 1\n",
|
||||||
|
"line 2\n",
|
||||||
|
"line 3'''\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "139b223a2c36cdccb5bf7ee15a4b5395",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-c81dad73272cc1d7",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Ensure that we have the biopython package installed\n",
|
||||||
|
"!pip install biopython"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "41e299a9f67206d4ccb5db71286e78d3",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-7a1036cfda280feb",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This is a test of the above, do not change this code.\n",
|
||||||
|
"import io\n",
|
||||||
|
"import Bio.SeqIO\n",
|
||||||
|
"records = [record for record in Bio.SeqIO.parse(io.StringIO(original_fasta), \"fasta\")]\n",
|
||||||
|
"assert len(records)==1\n",
|
||||||
|
"assert isinstance(records[0], Bio.SeqRecord.SeqRecord)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "6cbd6076b4682544307e6c6d3a8e9f8c",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-96024e67e7e5036e",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Now perform a protein BLAST search for homologous sequences using [the NCBI BLAST website](https://blast.ncbi.nlm.nih.gov/Blast.cgi).\n",
|
||||||
|
"\n",
|
||||||
|
"View the FASTA data with these sequences for at least 5 total sequences. Do not take other isoforms of the same gene in the same species. Take either: A) other genes in the same species or B) potentially homologous genes in other species. Do not take both A and B. You may limit your search to specific species to fulfill these criteria or your own curiosity.\n",
|
||||||
|
"\n",
|
||||||
|
"## Task 3: Make a variable named `others` which is a list of strings containing the FASTA format protein sequences"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "0936cf6d534396b4994046c52d612b77",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-f4f84628865fc2ec",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "1826ab033792c96c815d9d0cf06f6a70",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-0ef639671dbc2b53",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This is a test of the above, do not change this code.\n",
|
||||||
|
"assert len(others)>=5\n",
|
||||||
|
"seen = [original_fasta]\n",
|
||||||
|
"for this_fasta in others:\n",
|
||||||
|
" assert type(this_fasta)==str\n",
|
||||||
|
" assert this_fasta not in seen \n",
|
||||||
|
" records = [record for record in Bio.SeqIO.parse(io.StringIO(this_fasta), \"fasta\")]\n",
|
||||||
|
" assert len(records)==1"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "cf5d4cd03f07da4637e933879f75df7f",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-e62c5138daab0413",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Now, let's join your original fasta data and the data you found with BLAST all together in one big multi-sequence FASTA data string.\n",
|
||||||
|
"\n",
|
||||||
|
"FASTA files can have multiple sequences in one file just by *concatenating* (or \"joining\" or \"adding\") them together."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "c6363b63187af4e61f317b93716bab40",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-bbfd775731cacd7d",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"all_list = [original_fasta] + others\n",
|
||||||
|
"all_string = '\\n'.join(all_list)\n",
|
||||||
|
"print(all_string)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"checksum": "850e4a7e965af9c4af989252b6bfb020",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-5d4a627513248352",
|
||||||
|
"locked": true,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
},
|
||||||
|
"tags": []
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## Task 4: Perform multi-species alignment using [Clustal Omega at the EBI website](https://www.ebi.ac.uk/jdispatcher/msa/clustalo).\n",
|
||||||
|
"\n",
|
||||||
|
"This website runs multiple sequence alignment software. You can directly upload the multiple sequence FASTA file you generated above and let their computer do the alignment.\n",
|
||||||
|
"\n",
|
||||||
|
"Cut and paste the multiple sequence FASTA above into the Clustal Omega entry page at the EBI website. Keep all parameters at their default values (Protein, Output format ClustalW with character counts).\n",
|
||||||
|
"\n",
|
||||||
|
"Enter the multi-sequence alignment here below as a multi-line string called `msa`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "fa6879fff242c381a336f85d69a96e17",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-6ffef2e19f573e15",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "eb77d54a873f1ca9890d783c8d6c202d",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-b39fdf740eda5d89",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This is a test of the above, do not change this code.\n",
|
||||||
|
"records = [record for record in Bio.SeqIO.parse(io.StringIO(msa), \"clustal\")]\n",
|
||||||
|
"assert len(records)>=6"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 4
|
||||||
|
}
|
473
exercises/release/exercise-11/2__making_HTTP_get_calls.ipynb
Normal file
473
exercises/release/exercise-11/2__making_HTTP_get_calls.ipynb
Normal file
|
@ -0,0 +1,473 @@
|
||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Import libraries we need\n",
|
||||||
|
"import requests\n",
|
||||||
|
"import urllib.parse"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# You must run this cell, but you can ignore its contents.\n",
|
||||||
|
"import hashlib\n",
|
||||||
|
"\n",
|
||||||
|
"def ads_hash(ty):\n",
|
||||||
|
" \"\"\"Return a unique string for input\"\"\"\n",
|
||||||
|
" ty_str = str(ty).encode()\n",
|
||||||
|
" m = hashlib.sha256()\n",
|
||||||
|
" m.update(ty_str)\n",
|
||||||
|
" return m.hexdigest()[:10]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"SWAPI_BASE='https://swapi.py4e.com/api/'\n",
|
||||||
|
"\n",
|
||||||
|
"def starwars_url(path):\n",
|
||||||
|
" '''return a URL to the Star Wars API using SWAPI_BASE\n",
|
||||||
|
" \n",
|
||||||
|
" For example, to get the URL for person with ID 10,\n",
|
||||||
|
" call:\n",
|
||||||
|
" \n",
|
||||||
|
" starwars_url('people/10/')\n",
|
||||||
|
" '''\n",
|
||||||
|
" return urllib.parse.urljoin(SWAPI_BASE,path)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We need to be able to create an HTTP request to get a particular person from SWAPI. We are going to be making HTTP GET requests using the URL to request the information we want. So, we need to create a String with the appropriate request. \n",
|
||||||
|
"\n",
|
||||||
|
"### Q1 Define a function which will take an integer as an argument and return a string, which is the URL to get that particular person from the SWAPI. Call your function `get_person_url`.\n",
|
||||||
|
"\n",
|
||||||
|
"For example for person 42, this function should return something like: `\"https://swapi.dev/api/people/42/\"`. However, the URL should be made with the `starwars_url()` function so that it starts with the value `SWAPI_BASE` (and not necessarily `https://swapi.dev/api`). This is useful in case the website changes location or in case we decide to start hosting our own copy of the website at a different URL."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "123cd45642aa84465efcf20b41a4bcd5",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-496b12c0d3e2489f",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "9c455b07be482a944e0dd36f29b4b8c1",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-b3a7ebab227918e4",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"person_10_url = get_person_url(10)\n",
|
||||||
|
"assert(type(person_10_url)==str)\n",
|
||||||
|
"assert(person_10_url.startswith(SWAPI_BASE))\n",
|
||||||
|
"assert(person_10_url.endswith('people/10') or person_10_url.endswith('people/10/'))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now, let's use the Python `requests` library to make an HTTP GET call to this URL. View its documentation at https://docs.python-requests.org/en/latest/.\n",
|
||||||
|
"\n",
|
||||||
|
"### Q2 Using your `get_person_url()` function, create a variable called `person_url` for person 10. Now, assign the result of `requests.get(person_url)` to a variable called `response`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "0afcfa14509b7c7ef2df50711ba2588f",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-65e02d72fd413542",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "09bf79ea731a2c2fc7b762a7ab6f03fe",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-cbc105eff3987aa9",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"assert ads_hash(response.json()['name'])=='6e84377ee7'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now let's wrap this up in a function which takes a person ID number and returns a dictionary for that person using the `response.json()` method from `requests`.\n",
|
||||||
|
"\n",
|
||||||
|
"### Q3 Make a function called `get_person` which takes an integer as an argument and returns a dict with the result from the SWAPI"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "cde15d2b38e6db3c5da774b360ee97fc",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-11010cba389895bd",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "fe1396d2cb793b6bf338b633a7f4ff07",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-9c11e54ba69189a0",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"assert ads_hash(get_person(1)['name'])=='9d00804504'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## HTTP POST requests\n",
|
||||||
|
"\n",
|
||||||
|
"Up to now, we have been using HTTP GET requests. These are what your browser does when you go to a webpage and are good for getting information.\n",
|
||||||
|
"\n",
|
||||||
|
"Sometimes, however, we want to upload more complicated data to another program. This is often done with the HTTP POST request.\n",
|
||||||
|
"\n",
|
||||||
|
"Here we are going to POST data to a server which will add a value to our input."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ADDER_URL = 'http://http-demo-server.strawlab.org/'\n",
|
||||||
|
"response = requests.post(ADDER_URL+'add_to_value', json={'value':100})"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Any status code other than 200 is an error\n",
|
||||||
|
"response.status_code"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"response.text"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"response.json()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Q4 What is the value added the the input by the HTTP server? Put your answer in the variable `added`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "b776f1be5127a33abdd6f64bdb51ae5c",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-6f8a3c2b5f23135e",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "856250764f979ca97e3e124ffd37a325",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-60ba73a7486b8988",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"assert ads_hash(added)=='73475cb40a'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Q5 There is another path on our server called `mystery`. It works similarly to `add_to_value`. What is the JSON `value` returned by this HTTP endpoint when called with an input value of 100? Put your answer in the variable `mystery100`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "4bf4a4c1015892f6e367f9981a12c19b",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-51d14040f23943ad",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "c4c876385b03a639a16db8fc3e80e295",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-3392fa03f78e9c65",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"assert ads_hash(mystery100)=='26d228663f'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Q6 Play around with this `mystery` until you think you understand what it is doing. Now make a function called `myfunc` which takes a single integer argument and returns an integer which should do the same thing as the mystery HTTP endpoint."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "fdcba7ad40518f19c8101cb09239198f",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-cc8f1c957dc44cd2",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "a1a1d95d00521db5416d1a53b6f978b9",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-b48cf735571f569b",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"assert ads_hash(type(myfunc))=='ac75372cfc' \n",
|
||||||
|
"assert ads_hash(myfunc(3))=='6b51d431df'\n",
|
||||||
|
"assert ads_hash(myfunc(5))=='f5ca38f748'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# What have we learned here?\n",
|
||||||
|
"\n",
|
||||||
|
"HTTP is the \"language\" (protocol, to be technically correct) that your web browser uses to get webpages and images on the internet. More than that, it is also a useful language for computer programs to talk to each other on the internet.\n",
|
||||||
|
"\n",
|
||||||
|
"(HTTPS is just a secure version of HTTP. This means it is encrypted and there is a cryptographic \"chain of trust\" to the owner of the domain name.)\n",
|
||||||
|
"\n",
|
||||||
|
"GET requests typically just get a certain resource.\n",
|
||||||
|
"\n",
|
||||||
|
"POST requests often send more data to the server. The server then does something with this data.\n",
|
||||||
|
"\n",
|
||||||
|
"There are other HTTP request types, but GET and POST are the dominant ones.\n",
|
||||||
|
"\n",
|
||||||
|
"There are zillions of data sources on the internet that you can access with HTTP."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 4
|
||||||
|
}
|
315
exercises/release/exercise-11/3__bioinformatics_with_HTTP.ipynb
Normal file
315
exercises/release/exercise-11/3__bioinformatics_with_HTTP.ipynb
Normal file
|
@ -0,0 +1,315 @@
|
||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Import biopython\n",
|
||||||
|
"import Bio\n",
|
||||||
|
"\n",
|
||||||
|
"import requests"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# You must run this cell, but you can ignore its contents.\n",
|
||||||
|
"\n",
|
||||||
|
"import hashlib\n",
|
||||||
|
"\n",
|
||||||
|
"def ads_hash(ty):\n",
|
||||||
|
" \"\"\"Return a unique string for input\"\"\"\n",
|
||||||
|
" ty_str = str(ty).encode()\n",
|
||||||
|
" m = hashlib.sha256()\n",
|
||||||
|
" m.update(ty_str)\n",
|
||||||
|
" return m.hexdigest()[:10]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Bioinformatics with HTTP\n",
|
||||||
|
"\n",
|
||||||
|
"Not only can we use the Star Wars API with HTTP, we can also access the NCBI's databases over HTTP. Here is [more information from the NCBI](https://www.ncbi.nlm.nih.gov/books/NBK25499/). Note that in this exercise, we will be doing low volume queries without using a specialized software library. Some libraries and other software is available to automatically do this for here. For example, below we use the `NCBIWWW` module from biopython. Here we do it \"the hard way\" at a low level.\n",
|
||||||
|
"\n",
|
||||||
|
"If you start using the NCBI web resources extensively, please read the NCBI's documentation about [providing them with an email address to contact you](https://www.ncbi.nlm.nih.gov/books/NBK25497/).\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def get_protein_fasta(accession):\n",
|
||||||
|
" url = \"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=%s&rettype=fasta&retmode=text\"%(accession,)\n",
|
||||||
|
" return requests.get(url).text"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"da1 = get_protein_fasta('NP_524481.2')\n",
|
||||||
|
"da1"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Great, so we can get FASTA files directly from the NBCI using the accession.\n",
|
||||||
|
"\n",
|
||||||
|
"### Q1 Get the FASTA for accession `NP_733001.1`. Put the result in the variable `da2`, which should be a string."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "8f464e8eca2eef0890d54178a411489c",
|
||||||
|
"grade": false,
|
||||||
|
"grade_id": "cell-570ab47725befb18",
|
||||||
|
"locked": false,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": true,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# YOUR CODE HERE\n",
|
||||||
|
"raise NotImplementedError()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"deletable": false,
|
||||||
|
"editable": false,
|
||||||
|
"nbgrader": {
|
||||||
|
"cell_type": "code",
|
||||||
|
"checksum": "4113595f17df588523d2cc0283756060",
|
||||||
|
"grade": true,
|
||||||
|
"grade_id": "cell-5c1c1943259aebbb",
|
||||||
|
"locked": true,
|
||||||
|
"points": 1,
|
||||||
|
"schema_version": 3,
|
||||||
|
"solution": false,
|
||||||
|
"task": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This checks that the above worked\n",
|
||||||
|
"assert ads_hash(da2)=='16538bd802'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Using the biopython library for bioinformatics, including NCBI queries"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import Bio\n",
|
||||||
|
"from Bio.Blast import NCBIWWW\n",
|
||||||
|
"from Bio.Blast import NCBIXML\n",
|
||||||
|
"from Bio import SeqIO\n",
|
||||||
|
"from io import StringIO\n",
|
||||||
|
"import os"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We can work with FASTA sequences using the biopython library. It expects multiple sequences in a given FASTA file, so we loop over them:\n",
|
||||||
|
"\n",
|
||||||
|
"Each record here is an instance of the [Seq class](https://biopython.org/wiki/Seq).\n",
|
||||||
|
"\n",
|
||||||
|
"Let's copy the sequence to a raw python string called `da2_seq`:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"da2_seq = None\n",
|
||||||
|
"for record in SeqIO.parse(StringIO(da2), \"fasta\"):\n",
|
||||||
|
" print(record)\n",
|
||||||
|
" assert(da2_seq is None)\n",
|
||||||
|
" da2_seq = str(record.seq)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"da2_seq"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In addition to \"raw\" HTTP requests using the `requests` library, biopython also is able to call the NCBI for you. It is using HTTP to perform the call, but this is hidden from you. Below, we do a BLAST search based on the sequence we just downloaded."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We can limit our search to just a few organisms using the NCBI taxon ID. The easiest way to find these it to start typing in the BLAST web search entry page and copy the taxon ID from there.\n",
|
||||||
|
"\n",
|
||||||
|
"Here are a few taxon IDs for some insects and then some code to limit our NCBI query just to these taxa."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Bombus terrestris 30195\n",
|
||||||
|
"# Apis mellifera 7460\n",
|
||||||
|
"# Locusta migratoria 7004\n",
|
||||||
|
"# Drosophila melanogaster 7227\n",
|
||||||
|
"# Tribolium castaneum 7070\n",
|
||||||
|
"taxids = (30195, 7460, 7004, 7227, 7070)\n",
|
||||||
|
"taxid_query = ' OR '.join(['txid%d[ORGN]'%taxid for taxid in taxids])\n",
|
||||||
|
"taxid_query"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now with our query limited to these specific groups, we are going to run a BLAST search. As with the web browser interface, this can take some time, so the code below is written to only run the web search when the output file is not present. Therefore, once you run the web search the first time, it will not run again unless you delete the file.\n",
|
||||||
|
"\n",
|
||||||
|
"Futhermore, as [mentioned in the bio python tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc92), we need to be careful with our result handle when we get it because it can be read only once. So, here we the results of our search to a local file. Later, we can read this as often as we want.\n",
|
||||||
|
"\n",
|
||||||
|
"**This may take some time as we are running a full BLAST search on the NCBI servers.**"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"fname = \"da2_blast.xml\"\n",
|
||||||
|
"if not os.path.exists(fname):\n",
|
||||||
|
" result_handle = NCBIWWW.qblast(\"blastp\", \"nr\", da2_seq, entrez_query=taxid_query)\n",
|
||||||
|
" with open(fname, \"w\") as out_handle:\n",
|
||||||
|
" out_handle.write(result_handle.read())\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"not overwriting file %s\"%fname)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"blast_record = NCBIXML.read(open(fname))\n",
|
||||||
|
"for alignment in blast_record.alignments:\n",
|
||||||
|
" print(alignment)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Let's do another blast search for the first protein we had. Again, this can take a long period of time to run on the NCBI servers."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"da1_seq = None\n",
|
||||||
|
"for record in SeqIO.parse(StringIO(da1), \"fasta\"):\n",
|
||||||
|
" da1_seq = str(record.seq)\n",
|
||||||
|
"\n",
|
||||||
|
"fname = \"da1_blast.xml\"\n",
|
||||||
|
"if not os.path.exists(fname):\n",
|
||||||
|
" result_handle = NCBIWWW.qblast(\"blastp\", \"nr\", da1_seq, entrez_query=taxid_query)\n",
|
||||||
|
" with open(fname, \"w\") as out_handle:\n",
|
||||||
|
" out_handle.write(result_handle.read())\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"not overwriting file %s\"%fname)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In the results, each alignment returns a sequence of HSPS (\"High Scoring Pairs\").\n",
|
||||||
|
"\n",
|
||||||
|
"https://www.ncbi.nlm.nih.gov/books/NBK62051/"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"blast_record = NCBIXML.read(open(fname))\n",
|
||||||
|
"for alignment in blast_record.alignments:\n",
|
||||||
|
" print(alignment)\n",
|
||||||
|
" print(\"%d HSPs\"%len(alignment.hsps))\n",
|
||||||
|
" for hsps in alignment.hsps:\n",
|
||||||
|
" print(hsps)\n",
|
||||||
|
" print()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 4
|
||||||
|
}
|
Loading…
Reference in a new issue