Most popular DB connection using Python!!

Raj Mishra
4 min readApr 4, 2021

--

Companies are already in the process of outgrowing the database that you started with. Eventually, you will need to manage a variety of databases, each with its own features, specialized syntax, and connection protocols.

We all know that based on requirements we need to revamp our DB for various use. In today’s blog, we will examine connections to multiple databases like AWS, Azure, Snowflake, GCP, Redshift,Postgres, MongoDB and Oracle using python

Oracle:
Establishing connections to multiple databases is not a trivial task because each database product implements its own connection parameters. For instance, some databases require a default database, whereas others do not.

This is the easiest approach using cx_oracle method.
import cx_Oracle
conn= cx_Oracle.connect('username/pwd@host:port/service_name')
try:
query = '''
SELECT * from dual
'''
df = pd.read_sql(con = conn, sql = query)
finally:
conn.close()
df.head()
engine = create_engine('oracle://scott:tiger@127.0.0.1:1521/abc')
df=pd.read_sql("Customers",engine)
Is it enough? don’t think so there are other useful methods which we can use based on requirements.

Using pandas dataframe also we can establish connection to oracle → df=pd.read_sql_query(query,engine, chunksize=)
df=pd.read_sql_table(table_name,engine)
df=pd.read_sql(query or table_name, engine)
When we have huge data, we can use chunksize parameter in pandas read_sql_query method to read data in chunks.

Teradata:
Establishing connection to Teradata using teradatasql connector and pass all the connection details in teradatasql.connect method.
import teradatasql
import pandas as pd
with teradatasql.connect(host=’host’, user=’username’, password=’password’) as connect:
data = pd.read_sql(‘select top 5 * from table_name;’, connect)

Postgres:
We have use sqlalchemy python module to connect to Postgres
import sqlalchemy
db = sqlalchemy.create_engine(‘postgresql:///tutorial.db’)
from sqlalchemy import create_engine
engine = create_engine(‘postgresql+psycopg2://user:password@hostname/database_name’)
conn = engine.connect()
conn.execute(“SELECT host FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID = CONNECTION_ID()”).fetchall()
“When you have multiple options it’s always good”
We can also connect to Oracle using this method while modifying engine parameter with Oracle connection details.

Mssql:
Little tweak in “engine parameter” and it’s all done.
engine= create_engine(‘‘mssql+pyodbc://server_name/database_name?)

Note: We can connect to any other databases like mssql with just modifying in engine parameter which is combination of dialect+driver value in create_engine method “create_engine(‘dialect+driver://username:password@host:port/database’)”

MongoDB:
Establishing connection to MongoDB using pymongo package.
from pymongo import MongoClient
client = MongoClient()
client = MongoClient(‘host’, port_number)
example:- client = MongoClient(‘localhost’, 27017)
client = MongoClient(“mongodb://localhost:27017/”)

Amazon Redshift:
Amazon Redshift is based on PostgreSQL so most SQL client applications will work with only minimal changes.
Similar to Postgres connection, modify in engine parameter and pass redshift connection details.
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
DATABASE = “dwtest”
USER = “<youruser>”
PASSWORD = “<yourpassword>”
HOST = “dwtest.awsexample.com”
PORT = “5439”
SCHEMA = “public”
connection_string = “redshift+psycopg2://%s:%s@%s:%s/%s” % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = “SET search_path TO %s” % SCHEMA
s.execute(SetPath)

Snowflake:
Connection Details for Snowflake:
SF_ACCOUNT = <’snowflake_account_name’>
SF_USER = <’snowflake_username’>
SF_WAREHOUSE = <’warehouse_name’>
SF_DATABASE = <’database_name’>
SF_SCHEMA = <’schema_name’>
SF_PASSWORD = <’snowflake_password’>
We can also store above connection details in environmental variable, config file as per requirement.

Establishing connection to Snowflake using snowflake.connector python package, If not install use: → pip install snowflake.connector

We will create an instance of a snowflake connection using below code:
import snowflake.connector
connection = snowflake.connector.connect (
account = SF_ACCOUNT,
user = SF_USER,
password = SF_PASSWORD,
warehouse = SF_WAREHOUSE,
database = SF_DATABASE,
schema = SF_SCHEMA)
Note: user, password and account are mandatory parameters. The other parameters are optional and Snowflake uses default values if not specified.

Creating a Table in Snowflake: →
We are creating emp_test table in snowflake which will have id and name of the employees
cs = connection.cursor()try: cs.execute(
“CREATE OR REPLACE TABLE “
“emp_test(id integer, name string)”)except Exception as e:
raise e
finally:
cs.close() connection.close()

AWS:
Prerequisite for AWS : We would require Access key and Secret key,boto3 and pandas python package.
ACCESS KEY: “Your access key”
SECRET KEY: “Your secret key”
Please refer this link “https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html” for more details.

Establishing connection to AWS using low level functional client : →
client = boto3.client(
‘s3’,
aws_access_key_id = <“Your access key”>,
aws_secret_access_key = <“Your secret key”>,
region_name = <”Your region name”>
)

Establishing connection to AWS using high-level object-oriented interface: →
resource = boto3.resource(
‘s3’,
aws_access_key_id = <“Your access key”>,
aws_secret_access_key = <“Your secret key”>,
region_name = <”Your region name”>
)

Let’s fetch the list of existing buckets : →
clientResponse = client.list_buckets()

Extracting bucket names one by one : →
print(‘Printing bucket names…’)
for bucket in clientResponse[‘Buckets’]:
print(f’Bucket Name: {bucket[“Name”]}’)

Creating a bucket in AWS S3 : →
location = {‘LocationConstraint’: <”Your region name”>}
client.create_bucket(
Bucket=’abc-demo-3',
CreateBucketConfiguration=location
)

For creating the S3 object we need to pass bucket name and key: →
obj = client.get_object(
Bucket = ‘abc-demo-1’,
Key = ‘abc-demo.csv’
)

Read data using pandas dataframe from the S3: →
data = pandas.read_csv(obj[‘Body’])
print(data)

For understanding pandas read_csv please refer this link: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#:~:text=Read%20a%20comma%2Dseparated%20values,online%20docs%20for%20IO%20Tools

GCP:
Establishing connection to GCP we can use google.cloud and gcloud python packages.
Creating “mybucket” using gcloud: →
import gcloud
from gcloud import storage
client = storage.Client()
bucket = client.bucket(‘mybucket’)
bucket.location = ‘eu’
bucket.create()

Uploading file in GCP bucket: →
from google.cloud import storage
def upload_to_bucket(blob_name, path_to_file, bucket_name):
“”” Upload data to a bucket”””
# Explicitly use service account credentials by specifying the private key
storage_client = storage.Client.from_service_account_json(
‘creds.json’)
#print(buckets = list(storage_client.list_buckets())
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
blob.upload_from_filename(path_to_file)
#returns a public url
return blob.public_url

Please use this link for generating a credential file: https://cloud.google.com/storage/docs/reference/libraries?authuser=1#client-libraries-install-python

This is a simple and basic approach to establish a connection to the database and append values to the tables. Depends on project requirements and further goals we can choose database respectively.

Hope this helps!!

--

--

No responses yet