Saturday, May 7, 2016

How to browse the "Activity" H2 database of WSO2 BPS and typical H2 databases in any WSO2 product


This guide will show you how to browse the activity database in WSO2 BPS and also the typical WSO2CARBON_DB of any WSO2 product too if you read the latter part.

All WSO2 Products are embedded with H2 database, and so there is no need of downloading h2 database separately and running it to browse WSO2 H2 database. (But remember that it can be done too).

And H2 databases are equipped with a Console Web application which lets you access and browse it using a browser.

Browse the "Activity" H2 database of WSO2 BPS

WSO2 BPS uses a separate database for its activity engine (which relate with BPMN process management) and if you are working with BPMN, and want to see the activity database, this is what you should do.

1. Open carbon.xml located in <PRODUCT_HOME>/repository/conf

2. Uncomment the <H2DatabaseConfiguration> element. (only uncomment the first 3 property elements of it as follows)

<H2DatabaseConfiguration>
        <property name="web" />
        <property name="webPort">8082</property>
        <property name="webAllowOthers" />
        <!--property name="webSSL" />
        <property name="tcp" />
        <property name="tcpPort">9092</property>
        <property name="tcpAllowOthers" />
        <property name="tcpSSL" />
        <property name="pg" />
        <property name="pgPort">5435</property>
        <property name="pgAllowOthers" />
        <property name="trace" />
        <property name="baseDir">${carbon.home}</property-->
</H2DatabaseConfiguration>



 3. Open the <WSO2_BPS_HOME>/repository/conf/datasources/activiti-datasources.xml file.

Now copy the <url> element's first part before the first semicolen (;). It would be as follows.

jdbc:h2:repository/database/activiti


4. Start the Product Server.
5. Open bowser and enter "http://localhost:8082/" (if you changed the <webport> in carbon.xml in step 2, you have to change the URL according to that)
6. Now you will see the H2 Database Console in the browser.

7. Here you have to give the JDBC URL , User Name and Password.
  •  JDBC URL : jdbc:h2:repository/database/activiti  
        (the one you copied in step 3- note that the "activity" is the name of the database in H2, you want to access)

  • User Name : wso2carbon
  • Password : wso2carbon
(Default User name and passwords are as above, but if you have editted them in activiti-datasources.xml file they should be changed accordingly )

8. Now press "Connect" and you are done. If you have successfully connected "jdbc:h2:repository/database/activiti" database will be displayed and there will be a list of database tables such as, "ACT_EVT_LOG" , "ACT_GE_BYTEARRAY", "ACT_GE_PROPERTY", etc. which are created by the BPS activity engine. You can run queries there too.

Now in the terminal you started the wso2carbon BPS server, you can see long log traces related to h2 database and if you want to get rid of them you have only to comment out or remove the <property name="trace" /> in carbon.xml in step 2. Restart the server and connect to h2 console and you will see, they have gone.

Browse typical H2 database of any wso2 product

 All the wso2 products have the typical "WSO2CARBON_DB" and if you want to connect to it only difference is that, you only have to change the JDBS_URL given as follows in the step 7.
     jdbc:h2:repository/database/WSO2CARBON_DB 

(You can skip the step 3 here as you know the to use the WSO2CARBON_DB 's JDBC_URL. Anyway if you want to know, this url is also defined in repository/conf/master-datasources.xml file, and if its default configurations are editted, you will have to see this file)

Browse H2 database of any WSO2 product using an external H2 client.

 Anyway if you don't like to edit any wso2 product configuration file, you can do that too using a external H2 client. So you do not have to edit the carbon.xml as described in step 2.

But there is a trade off in this method. It is, we cannot access the h2 database via this external client, while the wso2 product is running. We have to shut-down the server before accessing via this method. And note that the attempt of vise-versa ( trying to start wso2 product server, while we have connected to the wso2 h2 database via this external client) would fail successful starting of wso2 product as the external client locks the h2 database so that the wso2 product cannot access it.

Anyway keeping these facts in mind let's see how to do the task.

1. Download H2 Database from here https://code.google.com/archive/p/h2database/downloads
   (a latest version is preferred)
2. Extract zip file and execute the h2.sh in bin directory with terminal (start terminal from inside bin directory and run sh h2.sh)
3. The H2 web console will be automatically started in your browser.
4. Then give the,
  •  JDBC URL : jdbc:h2:/<WSO2_PRODUCT_HOME>/repository/database/WSO2CARBON_DB

    eg: jdbc:h2:/home/samithac/Repositories/product-bps-2/product-bps/modules/distribution/target/wso2bps-3.6.0-SNAPSHOT/repository/database/WSO2CARBON_DB

    Note that you have include the full path of the database directory here. (not just a relative path as we did before)
    (if you want to connect to wso2 bps's, activity database, just change above url's "
    WSO2CARBON_DB" part to "activity")
  • User Name : wso2carbon
  • Password : wso2carbon
5. Then Press connect & you are done..!!!


Friday, February 5, 2016

WSO2 Data Analytics Server (DAS)- Explained Simply & Briefly


  • WSO2 DAS is there to collect and analyze real-time and persisted (stored/not temporary) data (even large volumes of data) and produce ( visualize/ communicate) results.

  • It exposes a single API for external data sources to publish data to it.


    Data Collection

  • Event = Unit of data collection
  • Event Stream = Sequence of events of a single type.

  • One Event Stream should be created in DAS to provide structure required to process events. [DAS management console is well facilitated to do that :-) ]
  • The required and preferred attributes for the event stream can be defined there (id, values, properties)
  • If we want to analyze data in batch mode (not in real-time) we have to persist the event stream information. (If required indexing we ca configure that too :-) )

  • After creating the path (event stream) to receive data, we have to create an Event Receiver to catch the data from different sources.

  • A second Event Stream is required to publish the processed events of the, first event stream created. In this stream we can define the required information and analytic results which are needed to be outputted. (We may persist this stream too for the compatibility to batch processing )

  • Event publisher is needed to be created in relation to an event-publishing-event-stream  (the above second event stream) to publish processed and analyzed results to external systems.
  • Event simulator can be used to submit events to a event stream (above first mentioned event stream)

          Analyzing Data

  • We can configure any of the data event streams of DAS to perform analytics (Batch analytics / Realtime analytics / Interactive analytics )

    Batch analytics
  • We can perform batch analytics only when the event streams are configured to be persisted.
  • This WSO2 DAS batch analytics engine is powered by Apache Spark and so Batch Analytic scripts written in Spark SQL Queries could be used to analyze the persisted data and get results.
  • The specific Spark console provided in WSO2 DAS also can be used to run Spark Queries and get results.

    Real-time analytics

  • Real time analytics can be done using a set of specified queries or rules through a SQL-like Siddhi Query Language defined in an Execution Plan under Streaming Analytics.
  • For this, we should be publishing(submitting) events to the DAS at that time, because this is a real time analysis, but not a persisted data analysis. (Even simulator provided in DAS could be used to simulate event publishing) 
     
    Interactive Analytics

  • Interactive analytics are used to retrieve fast results through ad hoc querying of received or already processed data. But for this, we have to have indexed the related Event Stream attributes. Data Explorer under Interactive Analytics in DAS dashboard is used for tis task.

         Visualizing (communicating) Data

  • WSO2 DAS has provided a very easy-to-use customizable dashboard mechanism to visualize the data and results of the data analytics processes, via Analytics Dashboard in the Dashboard menu.
  • Various types of gadgets (bar charts/line charts/ arc charts/ line charts/map charts) can be added to the the dashboard to represent data for any preferred event stream.

    Following diagram illustrates an overview event flow of an example scenario in the WSO2 DAS, as we just talked about. (please note that here the names of the event receivers, streams,etc. are just taken for this example scenario)



    So this is the overall mechanism in WSO2 DAS in brief and anybody interested is welcome to try it out it. ;-)
    http://wso2.com/products/data-analytics-server/

Saturday, August 1, 2015

#Web Scraping #Beginner #Python

Hi all, I'm back with some other new area, #Web Scraping. This is actually not a new technology, but new for my blogging scope, and to me too at the time.. :-) 

The expected audience is beginners for web scraping, and also if you are a beginner for Python, the free ticket (to read this blog... ha ha...) would be precious. I would explain how to scrape a web site using a typical example and don't worry if you are not familiar with Python, believe me I would teach the most basics of Python here..yes, it is simple. Anyway I am also a beginner for Python and so comments and suggestions are highly appreciated.

[Web scraping is extracting useful information from a web site]
Following is the url of the web site (of www.imdb.com) I am going to demonstrate here.
It is taken by getting the "Most Popular Feature Films Released 2005 to 2015 With User Rating Between 1.0 And 10" via advanced search option in imdb. (To go to the Advanced search option click on the small drop down arrow at the left side of search button on the top of the home page of www.imdb.com. Then click on the "Advanced Title Search" link at the right side of the page under the heading "Advanced Search". Now u have come into the advanced title search page and so give the search criterias Title Type= Feature Film, Release Date= From 2005 to 2015, User Rating = 1-10 and hit on search. This will bring u to the latter url where we are going to execute our grand mission :-)  )


Above image shows the web page we are going to scrape.
And the source code of that web page can be accessed by viewing the page source. (If u are a chrome user just Right click and select "View Page Source") 
Ok. Now let's see how the scraping could be implemented using BeautifulSoup python library, in python.
The task is to get the details of all the movies including their title, genries, year, run time and rating.
I am not describing here how to configure python and BeautifulSoup and I hope you have done up to that point successfully.

Now let's dig into the code, which does the task.

ScrapeImdb.py
#from bs4 import BeautifulSoup
import bs4
from urllib.request import urlopen
#pass the URL
url = urlopen("http://www.imdb.com/search/title?release_date=2005,2015&title_type=feature&user_rating=1.0,10")
#read the source from the URL
readHtml = url.read()
#close the url
url.close()
#passing HTML to scrape it
soup = bs4.BeautifulSoup(readHtml, 'html.parser')

tableClassResults = soup.find("table", { "class" : "results" })
for row in tableClassResults.find_all('tr'):
    print("\n")
    title=row.find_all('td',{"class":"title"})
    for titletd in title:
        print("Title:"+titletd.a.string)
        print("Year:"+titletd.find(class_="year_type").string)
        genreClass=titletd.find(class_="genre")
        print("Genries:")
        for eachGenre in genreClass.find_all('a'):
            print("\t"+eachGenre.string)
        print("Run Time:"+titletd.find(class_="runtime").string)
        rating_rating=titletd.find(class_="rating-rating")
        ratingValue=rating_rating.find(class_="value")
        print("Rating:"+ratingValue.string)

I will describe the code line by line. Note that in Python blocks and statements are delimited by just the indentation, which is an unusual method among popular programming languages. So you won't see semicolons (;) or curly braces({,{) as most other languages. And note that line comments are started with # in python.
 
from bs4 import BeautifulSoup
This Imports the name 'BeautifulSoup' from the BeautifulSoup Module bs4. A module in python is actually a file where definitions (functions and variables) are stored. Another definition for a module is "A module is a file containing Python definitions and statements". BeautifulSoup is a class and it is definitely defined in the bs4 module. Anyway now from here u can use BeautifulSoup constructor directly.
from urllib.request import urlopen
Here the 'request' is a class and request.py is a python file in the 'urllib' module. 'urlopen' 
is a function and it is dirrectly accessible now as it is imported. urlOpen() function open 
the URL, which can be either a string or a Request object.
So two way of python importing are, 1- from moduleName import ClassName 2- from moduleName.className import functionName
 
soup = BeautifulSoup(readHtml, 'html.parser')
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. Anyway we use here the html.parser. Here the BeautiFulSoup class constructor is called and it is given the arguments as the html file read from the given url previously and the name of the parser as string. And this return the BeautifulSoup object to the variable itsoup. Python is a Dinamically typed language and so we do not have to define a type for variables. Python automatically decide the type of a varibale at runtime.
Now we have a some kind of object which hold the html file we are scraping.
tableClassResults = soup.find("table", { "class" : "results" })
So now the task is to find the elements we want and extract information from them  
sequentially.In the source of the web page you can notice that there is only one <table> 
element which has the class as "results". Search results in imdb are actually in this table.
So the function find with above parameters, finds for a tag <table> having the attribute 
class with the value "results" and returns that <table> element.
for row in tableClassResults.find_all('tr'):
In the 'results' table each movie is described in separate <tr> elements. So now we have to 
extract these <tr> elements.
The find_all() method looks through a tag’s descendants and retrieves all descendants that matches the given filter. So calling find_all function on the tableClassResults object (which is now carrying a <table> element), with the argument 'tr' returns a list of all the <tr> elements in the <table> element.
So we are using for loop to iterate through all the <tr> elements returned and you can see the syntax for the 'for loop' is pretty simple. It is like the enhanced for loop (for each loop) in java. 'row' is the variable now that catches the <tr> element in each iteration.
So now we can grab the <td> element of the movie (that is in the row in current iteration) which has the class attribute as 'title' in which all the required details of the movie are containing. 
title=row.find_all('td',{"class":"title"})
Above is how it is done using find_all method. Though we use find_all method, the variable title is assigned a list containing only one <td> element, because it has only one such with class "title". But we used find_all method for the convenience for the next iteration.
for titletd in title:
This is how we iterate through the list title, even though it has only one item, for the convenience of accessing children tags in it.
print("Title:"+titletd.a.string)
Now we can directly retrieve the title(name) of the movie. A child tag of a tag can be directly accessed by using a period(.). The contents of that element is given by the string attribute. So here, the content text of the <a> element is returned by titletd.a.string
<a href="/title/tt0478970/">Ant-Man</a>
Note that writing on the standard output(console) is done by print function in python.
print("Year:"+titletd.find(class_="year_type").string)
titletd.find(class_="year_type"
This statement searches for an element with the class value "year_type" and its content text is retrieved by getting the string attribute(python object's attribute). Note the underscore after class (class_) and please don't forget it.
 All the next statements are similar to the ones I have described before. Anyway you should peek into the source code of the web page to understand this well. But for convenience I have added screenshots of relevant code parts. The first lines of the output too are displayed below.
So that is it...!!! You have scraped a web site. Now try to implement this in an advanced application.  
Output:
Title:Mission: Impossible - Rogue Nation
Year:(2015)
Genries:
 Action
 Adventure
 Thriller
Run Time:131 mins.
Rating:8.0


Title:Southpaw
Year:(2015)
Genries:
 Action
 Drama
 Sport
 Thriller
Run Time:124 mins.
Rating:7.9


Title:Ant-Man
Year:(2015)
Genries:
 Action
 Adventure
 Sci-Fi
Run Time:117 mins.
Rating:7.8


Wednesday, July 8, 2015

More on semantic wiki...


              In my previous post with the title "Semantics into wiki", I discussed the major concepts behind the "Semantic wiki". Today I am going to dive bit deeper in it.
              Semantic wiki has the following basic features. One is that it is still a wiki, with regular wiki features such as  Category/Tags, Namespaces, Title, Versioning, etc... The articles have typed Content (built-ins + user created, e.g. categories) and types can be of Page/Card, Date, Number, URL/Email, String, etc…. The articles are connected with typed Links (e.g. properties) such as “capital_of”, “contains”, “born_in”… Some semantic wikis has Querying Interface Support too.

            Annotations are used in semantic wiki to make information more explicit, which is actually the most important of semantic wikis. These annotations have specific markup syntax. This markup syntax is used to edit or add articles into the wiki. These syntaxes might differ in different semantic wikis and in my this article I’m focusing on semantic media wiki. Categories, typed links and attributes are some of these annotations. “Category” is a one type which already exist in normal Wikipedia too.

Typed links are used instead of regular hyperlinks. In here an hyperlinks has a type. Links are arguably the most basic and also most relevant markup within a wiki. Their syntactic representation is ubiquitous in the source of any Wikipedia article. MediaWiki allows users to create new typed links freely as they prefer. Existing link types should be used wherever applicable, but a new type can also be created simply by using it in a link. A typed link can be a property of the current article and the syntax of inserting a property is,
[[Property::Value | Display]]

For example [[is capital of::England]]. Here the Property is “is capital” and it is linked to the article with the name “England” and that is the Value. And Display part is additional and there we can mention if something other than the value should be displayed on the article.   
 
Data values play a crucial role within an encyclopaedia, and machine access to this data yields numerous additional applications. These are called attributes and has the common syntax 
[[ attribute_name := value]]   
in the semantic media wiki. Eg: [[ population := 7,421,328 ]]

There can be an unit for an attribute value. Eg: [[area:=609 square miles]]. When many types of units are there for a same value, the system provides automatic conversion of a value to various other units. To allow users to declare the data type of an attribute, we introduce a new Wikipedia namespace “Attribute:” that contains articles on attributes. Within these articles, one can provide human-readable descriptions as in the case of relations and categories, but one can also add semantic information that specifies the data type. Using a relation with built-in semantic we can simply write,               [[hasType::Type:integer]] 
to denote that an attribute has this type.

            So as these, there is much more semantic wiki syntax types and it is important to learn all these, if we want to add a new article to a semantic wiki or edit currently available article.

Advanced Querying and Searching are some most important features in semantic wiki. There is a feature to search a property by its value in “Page property search”. There if we insert the property as “Located in” and value as “England” all the cities/regions located in England are listed via this advanced searching option. For advanced querying too there is a nice interface. For example if we give [[Category:City]][[located in::Germany]] into the Query field and  ?Population into the Additional data to display field a list of all the cities in Germany will be displayed with their population values. Following is the interface  available for querying.

The results of the query are displayed as follows.
 

Following is the basic architecture of semantic media wiki.

There are more applications of Semantic wiki such as,
      Desktop applications
o   AmaroK Media Player
o   Movie reviewer
o   Portals that aggregate data from various data sources (newsfeeds, blogs, online services)
      Over enhanced folksonomies
      Creating domain ontologies,
      Creation of multilingual dictionaries
      New re-search opportunities

As this way it is obvious that semantic wiki concept is going to be a very interesting and valuable concept to the whole world even though currently it is not much developed or popular. 


Wednesday, July 1, 2015

Create a git BitBucket/ Github repository from already locally existing project



  • This post is originally targeted at BitBucket repositories, but the basic steps are common to Github too. 
  • This is a simple issue but would get hours if followed the official Bitbucket.com instructions.  :P .So I am posting this.
  • The case is that we have a project in our pc and it is almost completed (or partially done) and now we want to include it in a git repository and push it to a bitbucket repository. So note that we do not have still a BitBucket / Github repository for our project or a local git repo in our pc too. (but of course we have a bitbucket/github account :) )
  • So, believe me.. follow these simple steps.
Pre-requisite: You should have installed git into your pc(PC=Personal Computer>>simply your computer).

1) Create a repository in Bitbucket.org / Github.com to contain our project.

  • For this just click Repositories tab> Create New Repository (or simply click this link- https://bitbucket.org/repo/create)
  • Simply fill the details you want and guess we filled the name as "TestingGitRepo"
  • Click "Create Repository"

Now u have done with creating bitbucket repo.
(If u are dealing with Github, instead of Bitbucket, create the repository as the default way and remember not to tick "
Initialize this repository with a README" ,because it will cause bit hard when doing the initial commit later.)

Just after creating the repo you will be redirected to a page as below.

Click "I have an existing project" and copy the command displayed below.
(It will be easier if you copy it here now)

The command we copied is,

git remote add origin https://Samitha@bitbucket.org/Samitha/testinggitrepo.git

(If u are using Github, a similar command with the starting part "git remote add origin"
will be displayed in the following page after the creation of repo, and just copy it)
2) Open command prompt in your PC and go inside to the directory where u want to be as a repository.
   For example if you go into the directory "G:\AndroidStudioWorkspace", the contents in that directory will be sent to the bitbucket/github repository you created.

3) Enter,
       git init

    This will initialize this directory as a git repo.
4) Now paste the command we copied and press enter.
  git remote add origin https://Samitha@bitbucket.org/Samitha/testinggitrepo.git

5) Enter,  
git add --all

This will add all the files and folders in this directory into the git repository.

6)Now you have to make the initial commit. So enter,  
git commit -m "Initial Commit"

At here sometimes there will give an error message as follows if you are using git in your PC fresh and so have not configured your Bit Bucket account with the git.



If you get this error just do what has been asked to do.
Enter,
git config --global user.email "rmschathuranga@gmail.com"
git config --global user.name "Samitha"

Note that you have to use your email address and bit bucket user name instead of 
rmschathuranga@gmail.com and Samitha (which are MINE)..!!!
7) Now enter,  
git push -u origin master
Enter the password of your bitbucket/github account when prompted.So then all the files and folders in your local repo will be pushed (uploaded) to the bitbucket repo, creating a new branch with the name "master". You will see messages as below,



     And that's all. You have done it.
Go check in the bitbucket/github repository you created. Your project has been successfully uploaded into the bitbucket repo. And the repository is successfully created. 

Important Notes:


  • Whenever you make changes in your local project files, and want to push the changes into the remote bitbucket/github repository just follow above 5,6,7 steps.
  • Note that if a directory is empty, that will not be added to git (to the remote repo too).
  • If you want deeper clarification, anyway Git doesn't ignore empty directories. It typically ignores all directories. In Git, directories exist only implicitly, through their contents. Empty directories have no contents, therefore they don't exist in git repositories.


--------------------------------------------------------------------------------------------------------------

For extra knowledge
---------------------------------------------------------------------------------------------------------------
git add command

For extra knowledge I would like to go deep on git add command.

git add has number of options for various requirements. Following tables (extracted from http://certificationquestions.com/version-control-system/git/difference-git-add-git-add-git-add-u/ ) clearly shows the difference between them. Note that,
git add -A  = git add --all 

You can find your git version by git version command

For git versions 1.x

For git version 2.x

So my recommendation is to use git add --all which is similar to git add -A as it is the most common and general requirement. 

And here I am highlighting the difference between,
           git add . and git add --all 
in git version 1.x which most of us use now. It is that git add --all stages all the changes to the repository, while git add . do not stage deleted files. It means that if u had deleted a file in your local repository and u want it to be deleted from the remote repository too, u should do git add --all But git add . would not remove that file from the remote repository.

Comments a and suggestions are highly appreciated if u found this post useful..!!! :-)