Manmohan Mishra

Thursday, December 31, 2020

Happy New Year - 2021

Tuesday, November 20, 2018

Neuromorphic Vs Neural Network Vs Quantum Computing

Hi,

We talk about parallel computing or advance computing beyond classical way of computing; in classical computing nothing in parallel everything which looks parallel share the same CPU cycles in one or other way.

What is Neuromorphic Computing?
Computer hardware designed by taking inspiration from Human Brain to process information in similar fashion in order to work parallel and more efficient for complex computations.

What are Neural Networks?
Software which simulate processing style of our brain to perform complex computations, It can run on Classical Computer as well as Neuromorphic computers.
We use many algorithms which is based on Neural networks in Machine Learning/Deep Learning.

What is Quantum Computing?
This is fundamentally different from Classical computer, in classical computing we work with 0 or 1 at any given point of time. In quantum computing we have possibility of 0 or 1 or both at any given time.
Quantum computing also work on fundamental logic of Quantum entanglement.

Thursday, January 11, 2018

POC with Blockchain

Hi Folks,

As per my initial interaction with Blockchain technology, I am very much optimistic with the overall architecture and concept behind this technology.

During my POC, I was able to achieve below:

I manage to create sample model of Digital Bank using Open Source Ethereum blockchain based distributed computing platform.

To achieve this I've created Genesis Blocks, created rule engine/configuration for blockchain.

Achieved functionalities such as creating new account (like a wallet), check number of blocks in overall blockchain system, balance check for account and miner for this new blockchain based currency.

In addition to that I've also tried open source EBE (Ether Block Explorer) and created smart contract for POC and trust me it is much more easier to implement all these with such a huge open source support.

I'm still working on many other POC models for this technology, let's see how it goes.

I should be posting sample code to make this journey more interactive.

to be continue...

Sunday, December 17, 2017

Quantum Programming - Hello World

Hey There,

A sample "hello world" program for quantum programming.

operation Teleport(msg : Qubit, there : Qubit) : () {
body {
using (register = Qubit[1]) {
// Ask for an auxillary qubit that we can use to prepare
// for teleportation.
let here = register[0];

// Create some entanglement that we can use to send our message.
H(here);
CNOT(here, there);

// Move our message into the entangled pair.
CNOT(msg, here);
H(msg);

// Measure out the entanglement.
if (M(msg) == One) { Z(there); }
if (M(here) == One) { X(there); }

// Reset our "here" qubit before releasing it.
Reset(here);
}
}
}

operation TeleportClassicalMessage(message : Bool) : Bool {
body {
mutable measurement = false;

using (register = Qubit[2]) {
// Ask for some qubits that we can use to teleport.
let msg = register[0];
let there = register[1];

// Encode the message we want to send.
if (message) { X(msg); }

// Use the operation we defined above.
Teleport(msg, there);

// Check what message was sent.
if (M(there) == One) { set measurement = true; }

// Reset all of the qubits that we used before releasing
// them.
ResetAll(register);
}

return measurement;
}
}

Monday, October 23, 2017

Setting up AWS - VPC Instance, Subnet, Security Group

Welcome,

You came here because you want to automate your AWS VPC/VPN setup.
Please find the below code snippet for your reference, make sure you have puppet installed.

Below code should help you with:
1. Setting up VPS in AWS
2. Setting up VPN Gateway
3. Setting up Subnet
4. Setting up Security Group
5. Setting up Instance
6. Setting up load balancer

#Setting up VPC using Puppet:
ec2_vpc { 'name-of-vpc':
ensure => present,
region => 'us-east-1',
cidr_block => '10.0.0.0/24',
tags => {
tag_name => 'value',
},
}

#setup VPN Gateway:

ec2_vpc_vpn_gateway { 'sample2-vgw':
ensure => present,
region => 'us-east-1',
vpc => 'sample2-vpc',
type => 'ipsec.1',
}

#setup customer gateway:

ec2_vpc_customer_gateway { 'sample2-cgw':
ensure => present,
region => 'us-east-1',
ip_address => '177.255.196.143',
bgp_asn => 65000,
type => 'ipsec.1',
}

#Setup VPC VPN to connect:

ec2_vpc_vpn { 'sample2-vpn':
ensure => present,
region => 'us-east-1',
vpn_gateway => 'sample2-vgw',
customer_gateway => 'sample2-cgw',
type => 'ipsec.1',
routes => ['0.0.0.0/0'],
static_routes => true,
}

#Setup a subnet:
ec2_vpc_subnet { 'name-of-subnet':
ensure => present,
region => 'us-east-1',
cidr_block => '10.0.0.0/24',
availability_zone => 'us-east-1a',
map_public_ip_on_launch => true,
vpc => 'name-of-vpc,
tags => {
tag_name => 'value',
},
}

#Setup a security group:
ec2_securitygroup { 'name-of-security-group':
ensure => present,
region => 'us-east-1',
vpc => 'name-of-vpc',
description => 'a description of the group',
ingress => [{
protocol => 'tcp',
port => 22,
cidr => '0.0.0.0/0',
}],
tags => {
tag_name => 'value',
},
}

#Setup an Instance:
ec2_instance { 'name-of-instance':
ensure => running,
region => 'us-east-1',
availability_zone => 'us-east-1a',
image_id => 'ami-123456', # you need to select your own AMI
instance_type => 't2.micro',
key_name => 'name-of-existing-key',
subnet => 'name-of-subnet',
security_groups => ['name-of-security-group'],
tags => {
tag_name => 'value',
},
}

#Setup Load Balancer:
elb_loadbalancer { 'name-of-load-balancer':
ensure => present,
region => 'us-east-1',
availability_zones => ['us-east-1a', 'us-east-1b'],
instances => ['name-of-instance', 'another-instance'],
security_groups => ['name-of-security-group'],
listeners => [
{
protocol => 'HTTP',
load_balancer_port => 80,
instance_protocol => 'HTTP',
instance_port => 80,
},{
protocol => 'HTTPS',
load_balancer_port => 443,
instance_protocol => 'HTTPS',
instance_port => 8080,
ssl_certificate_id => 'arn:aws:iam::123456789000:server-certificate/yourcert.com',
policies => [
{
'policy_type' => 'SSLNegotiationPolicyType',
'policy_attributes' => {
'Protocol-TLSv1.1' => false,
'Protocol-TLSv1.2' => true,
}
}
]
}
],
health_check => {
'healthy_threshold' => '10',
'interval' => '30',
'target' => 'HTTP:80/health_check',
'timeout' => '5',
'unhealthy_threshold' => '2'
},
tags => {
tag_name => 'value',
},
}

Let me know if you have any questions around. Thanks!

Friday, August 4, 2017

K-Mean Clustering vs Hierarchal Clustering [Machine Learning]

Straight Forward:

K Mean Clustering:

In K-Mean clustering we tel machine about how much cluster we want to have.
We start with defining with random cluster center and find new cluster center, continuing this same process we'll reach to a stage where our "new cluster centre" and "current cluster centre" will be same. This mean our algorithm is optimise and it is the cluster centre.

Hierarchal Clustering:

In this clustering algorithm of Machine learning we feed data to machine and let machine decide how many cluster it want to group that data.
We start with considering very data point as cluster centre and take mean of all nearby data points (depending upon radius you have selected). This process will continue until we optimise our algorithm (using mean shift algorithm) and at the point where we found our convergence is out cluster center.

Tuesday, July 25, 2017

Algorithm KNN = ScikitLearn vs Actual

By Scikit Learn:

import numpy as np

from sklearn import preprocessing, cross_validation, neighbors

import pandas as pd

df = pd.read_csv('breast-cancer-wisconsin.data.txt')

df.replace('?',-99999, inplace=True)

df.drop(['id'], 1, inplace=True)

X = np.array(df.drop(['class'], 1))

y = np.array(df['class'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

clf = neighbors.KNeighborsClassifier()

clf.fit(X_train, y_train)

accuracy = clf.score(X_test, y_test)

print(accuracy)

Build your own model:

import numpy as np

import matplotlib.pyplot as plt

from matplotlib import style

import warnings

from collections import Counter

#dont forget this

import pandas as pd

import random

style.use('fivethirtyeight')

def k_nearest_neighbors(data, predict, k=3):

if len(data) >= k:

warnings.warn('K is set to a value less than total voting groups!')

distances = []

for group in data:

for features in data[group]:

euclidean_distance = np.linalg.norm(np.array(features)-np.array(predict))

distances.append([euclidean_distance,group])

votes = [i[1] for i in sorted(distances)[:k]]

vote_result = Counter(votes).most_common(1)[0][0]

return vote_result

df = pd.read_csv('breast-cancer-wisconsin.data.txt')

df.replace('?',-99999, inplace=True)

df.drop(['id'], 1, inplace=True)

full_data = df.astype(float).values.tolist()

random.shuffle(full_data)

test_size = 0.2

train_set = {2:[], 4:[]}

test_set = {2:[], 4:[]}

train_data = full_data[:-int(test_size*len(full_data))]

test_data = full_data[-int(test_size*len(full_data)):]

for i in train_data:

train_set[i[-1]].append(i[:-1])

for i in test_data:

test_set[i[-1]].append(i[:-1])

correct = 0

total = 0

for group in test_set:

for data in test_set[group]:

vote = k_nearest_neighbors(train_set, data, k=5)

if group == vote:

correct += 1

total += 1

print('Accuracy:', correct/total)

Point to be know before you consider:

1. Whats your data volume (should not be in TB)

2. What should be value of "K" (depending upon your requirement, having high K value doesn't mean you will get better accuracy, in-fact opposite is what I observed)

3. Can you multithread your algorithm ? (Scikit KNN algorithm is already multithreaded (n_jobs = -1))

4. Difference between Accuracy and Confidence

5. Do you need to define Radius ?

Saturday, February 20, 2016

Datastage Job Optimization - (Using Config File)

Config file is one of the most important component in Datastage.
The configuration files in Datastage direct us towards below facts:

1. Degree of Data Partitioning to Scale processing
2. System resources like “Temp Storage”, “Scratch Disk”
3. Resources for Database and Buffer Storage

Tips for Job Optimization:

1. Nodes should be equal to number of CPU Running
2. Use multiple configuration files
a. For low volume data Single node configuration
b. For large volume data, Multi node configuration
3. Span processing across multiple machines by adding nodes from different machines
a. Avoid re-partitioning data in such scenarios because re-partitioning across networks will be costly operation
4. To Maximize I/O use different and multiple “Resource Datasets” on each node

Sample Configuration file (Config File):

{
node “node1” {
fastname “machine1”
pools “” “oraclewrite”
resource disk “/Local_1/mypath" {pools ""}
resource disk “/Local_2/mypath" {pools ""}
resource scratchdisk “/Local_3_1/mypath" {pools "“}
resource scratchdisk “/Local_3_2/mypath" {pools "“}
}
node “node2” {
fastname “machine2”
pools "" “oraclewrite”
resource disk “/Local_4/mypath" {pools ""}
resource scratchdisk “/Local_5_1/mypath" {pools "“}
resource scratchdisk “/Local_5_2/mypath" {pools "“}
}
node “node3” {
fastname “machine2”
pools “oraclewrite”
resource disk “/Local_4/mypath" {pools ""}
resource scratchdisk “/Local_5_1/mypath" {pools "“}
resource scratchdisk “/Local_5_2/mypath" {pools "“}
}
}

Config file explained as below:

1. We are running on two different machines i.e. "machine1" & "machine2"
2. Using multiple "Resource disk and scratchdisk" to get more memory for processing
3. Assign "oraclewrite" to all nodes for parallel execution when writing and reading from Oracle DB
4. Other stages will only run on two nodes because pool is only defined over two nodes, In case of Netezza Connector please enable “Partitioned Read”, The number of reads is equal to the data partitions in config file.

Wednesday, February 17, 2016

Big Data File Stage in Datastage 11.x

Hi Mates, As we all know in this era of Hadoop and Big-Data everyone is moving towards working with HDFS. IBM has also introduce Datastage component for Datastage Developer & Designers to access Hadoop Distributed File System via IBM Datastage.

To access HDFS via InfoSphere, we have to first create ishdfs.config file with required classpath details. HDFS Clients .jar and configuration file directories
must be accessible by InfoSphere Server Engine.
If you are using the InfoSphere BigInsights HDFS and using syncbi.sh tool to obtain .jar files.
The ishdfs.config file is created for you automatically from ishdfs.config.biginsights file.
This ishdfs.config file points to the .jar files that are downloaded and unpacked in the $DSHOME/../biginsights directory

Content in File ishdfs.config:
CLASSPATH= $DSHOME/../../ASBNode/eclipse/plugins/com.ibm.iis.client/httpclient-4.2.1.jar:$DSHOME/../../ASBNode/eclipse/plugins/com.ibm.iis.client/httpcore-4.2.1.jar:$DSHOME/../PXEngine/java/biginsights-restfs-1.0.0.jar:$DSHOME/../PXEngine/java/cc-http-api.jar:$DSHOME/../PXEngine/java/cc-http-impl.jar:/opt/IBM/biginsights/IHC/lib/*:/opt/IBM/biginsights/IHC/*:/opt/IBM/biginsights/lib/JSON4J.jar:/opt/IBM/biginsights/hadoop-conf

Location to Save config file ishdfs.config
/opt/IBM/InformationServer/Server/DSEngine

Apart from configuration, other options & operations are almost similar like normal file stage in Datastage where you have to select partitioning method,
file delimiter and everything else.

Monday, February 1, 2016

Data Science - Data Mining Algorithms

As we all know R Programming is expanding its legs in Analytics, So why not to talk about few widely used Data Mining algorithm in R.

While working with R, I found below algorithm very useful for Data Mining, It's a personal choice tough. There are plenty to tools also available for Mining Data and come with respected result but as a Programmer its always great to design the algorithm the way you want. Lets not waste any more time and go with few Data Mining Algorithm, which I found best while working in one of Data Analytics projects.

1. Decision Tree

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.

The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Information Gain to construct a decision tree.

A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar homogenous values.

2. Forest Tree

Random Forests are a combination of tree predictors where each tree depends on the values of a random vector sampled independently with the same distribution for all trees in the forest.

Single decision trees often have high variance or high bias. Random Forests attempts to mitigate the problems of high variance and high bias by averaging to find a natural balance between the two extremes.

3. Association Rule Mining (Mostly like Market Basket Analysis)

Association rule learning is a method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.

4. Regression Analysis – Linear Regression (Remember the OHM's Law)

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.

Regression analysis generates an equation to describe the statistical relationship between one or more predictor variables and the response variable.

5. K means Cluster

Clustering is the process of partitioning a group of data points into a small number of clusters. A quantitative approach would be to measure certain features of the products. The goal is to assign a cluster to each data point. K-means is a clustering method that aims to find the positions

μi,i=1...k of the clusters that minimize the square of the distance from the data points to the cluster. K-means clustering solves

Pages