I love web scraping, especially in the beginning of learning how to code. It is fairly easy, you can learn a lot about how to handle data and you get immediate results!

But I see a lot of tutorials which get overly complicated and focuses mainly on a framework called Beautiful Soup. It is a fantastic and mighty framework but most of the time - and especially for a beginner - it is completely over the top. Let’s be honest, we don’t want to index a complete website, most of the times we just want to download images or ask for a few values.

This can be done a lot of easier with one magic word, Regular Expressions.

Okay, okay, I hear you “whaaat? Regular expressions and easy? WTF?”

Yeah, you are not wrong, RegEx aren’t that easy. Personally I found it much easier to learn RegEx than an arbitrary framework which has only one use case.

What are regular expressions?

The concept of regular expressions occurred in the 1950s. In theoretical computer science it is a sequence of characters that define a search pattern. (Wikipedia)

So, what does a regular expression looks like?

Imagine we have a string:

Hi, I'm KurzGedanke and www.kurzgedanke.de is my website.

Now we want to get the website with the specific domain. We can assume that every input looks like www.websiteName.tdl.

One solution to match this with regular expressions might be looking like this:

www\.(.*)\.([a-zA-Z]*)\s
  • www as you might suspect, this matches exactly www
  • \. matches the . after the www. Because the dot has a function in RegEx we need to escape it with the \.
  • (.*) the . matches any single characters. Is it the a or a tab, it will match it. Besides of newlines. With the asterisc we match zero or more characters of the expression before it. In this case zero or more of any single character. The parentheses () puts the match in a group which can be accessed easily.
  • \. this dot matches the dot before the top level domain.
  • ([a-zA-Z]*) here we have the top level domain, which is put in a group again with the (). [] are used to match a single character. In this case a character between the lower a to the lower z or the capital A to the capital Z. To get more than a single character the * is used.[^This is funny and a classic mistake. I didn’t thought it through completely. A URL can of course contain a dash - and I missed it.^^]
  • \s matches white space. In this case it is used to end the regular expression.

To be honest, I think there are smarter ways to do this, but I find this way easy to see what’s going on and not to get overwhelmed by a 50 character long RegEx string.

To learn regular expressions I used an interactive tutorial like this: regexone.com. This is not the only one out there and you can look if you find one that suites you.

Another great tip are sites like regex101.com. You can paste text in it and write directly your regex while you can see in realtime which parts are matched.. I use it everytime when I write some regex.

Lets write some python

We can use this knowledge to scrap websites. And… to be honest… I will rely on a module called Requests: HTTP for Humas. But this module is so easy to handle and pythonic - sometimes I have the feeling it is more pythonic than python itself.

Our goal is to scrap my website and get every article title of my landing page, as well as the link to it.

A simplified version of the HTML looks like this:

<div class="post-list">
  <article class="post-preview">
    <div class="post-preview-heading">
      <h2><a href="https://kurzgedanke.de/post/headless-ssh-and-wifi-on-raspberrypi/">Headless ssh and Wifi on RaspberryPi</a></h2>
    </div>
    <hr>
  </article>
  <article class="post-preview">
    <div class="post-preview-heading">
      <h2><a href="https://kurzgedanke.de/post/how-to-encrypt-files-with-aes/">How to Encrypt Files with AES</a></h2>
    </div>
    <hr>
  </article>
  <article class="post-preview">
    <div class="post-preview-heading">
      <h2><a href="https://kurzgedanke.de/post/problems-with-flask-and-pycharm/">Problems with Flask and PyCharm</a></h2>
    </div>
    <hr>
  </article>
  <article class="post-preview">
    <div class="post-preview-heading">
      <h2><a href="https://kurzgedanke.de/post/decentraland-hot-to-mine-on-a-mac/">Decentraland | How to Mine on a Mac</a></h2>
    </div>
    <hr>
  </article>
  <article class="post-preview">
    <div class="post-preview-heading">
      <h2><a href="https://kurzgedanke.de/post/welcome/">Welcome</a></h2>
    </div>
    <hr>
  </article>
</div>

So, what are we looking for in this HTML? Let’s see, we want the titles and the url of all posts. We have a few articles with the class post-preview. And down below we have a h2 heading inside a div called post-preview-heading. The h2 contains a a href which assembles a link. Might be good to go! Every h2 has the same structure and, we are lucky, this is the only h2 with this structure on this whole side. So we can assume, like above, that every input looks exactly like this:

<h2><a href="linkToPostTitle">postTitle</a></h2>

On other websites the h2 or the a href would have a dedicated class or id like

<h2 class="post-preview-title-header"></h2>

this is even better because we would have a persistent pattern which could be used to match against.

Now let us write our RegEx to search for.

<h2><a href=\"https:\/\/kurzgedanke\.de\/post\/

This simply represents the

<h2><a href="https://kurzgedanke.de/post/

string. Again the backslashes as well as the double quotes have to be escaped.

Note: You can leave the escaping of the double quotes out when you use single quotes in your python code. But escaped double quotes are always the safe option.

Now the rest:

<h2><a href="https:\/\/kurzgedanke\.de\/post\/(.*)\/">(.*)<\/a><\/h2>
  • (.*) selects everything after the post/ till the /"> and puts it in a group.
  • 2. (.*) matches the post title
  • <\/a><\/h2> closes of the </a> tag and the </h2> tag.

Now that we’ve written the regular expression let’s take a look at the python code.

import re
import requests


r = requests.get('https://kurzgedanke.de/')

regex = r'<h2><a href="https:\/\/kurzgedanke\.de\/post\/(.*)\/">(.*)<\/a><\/h2>'
titleURL = re.findall(regex, r.text)

for urlAndTitle in titleURL:
    print(f'Title:\t {urlAndTitle[1]}')
    print(f'URL:\t https://kurzgedanke.de/post/{urlAndTitle[0]}/')
    print('-------------------------------')
  • import re imports the regular expression module from the standard library
  • import requests import the requests module from Kenneth Reitz.

makes an HTTP request to *kurzgedanke.de* and safes the data in a requests object.
- `regex = r'...'` declares a variable with the regular expression as a value. `r'...'` tells python that this string is a regular expression.
- `titleURL = re.findall(regex, r.text)` we use the regex module to find all matches with the use of our regex variables and `r.text` which contains the html of our http request. After everything is found it will be a list with all matches assigned to `titleURL`.
- `for urlAndTitle in titleURL:` we can easily iterate over the list and access the different matches with an array notation because we grouped them up in our regular expression with the `()`.

When you run the script it should look like this:

```bash
Title:   Headless ssh and Wifi on RaspberryPi
URL:     https://kurzgedanke.de/post/headless-ssh-and-wifi-on-raspberrypi/
-------------------------------
Title:   How to Encrypt Files with AES
URL:     https://kurzgedanke.de/post/how-to-encrypt-files-with-aes/
-------------------------------
Title:   Problems with Flask and PyCharm
URL:     https://kurzgedanke.de/post/problems-with-flask-and-pycharm/
-------------------------------
Title:   Decentraland | How to Mine on a Mac
URL:     https://kurzgedanke.de/post/decentraland-hot-to-mine-on-a-mac/
-------------------------------
Title:   Welcome
URL:     https://kurzgedanke.de/post/welcome/
-------------------------------

I hope you found this little write up useful and learned a bit.

If you have any question or remarks, please leave a comment, contact me on twitter or write a mail.


SSH:

To enable SSH on a RaspberryPi with out a monitor, keyboard or mouse put your SD-Card in a card reader and plug into your main PC.

Open up a terminal and navigate to the SD Card.

# On Mac:
╭─loki@lokiTheGod ~
╰─$ cd /Volumes
╭─loki@lokiTheGod /Volumes
╰─$ ls
BOOTCAMP     MACINTOSH HD Untitled     boot

Now you are in the Volumes folder, which shows all drives connected to your mac. You have to create an empty ssh file on the Pi SD-Card.

╭─loki@lokiTheGod /Volumes
╰─$ cd boot
╭─loki@lokiTheGod /Volumes/boot
╰─$ touch ssh

Enter cd boot to go into the Pi SD-Card and then type in touch ssh to create an empty ssh file.

You can verify this by typing ls in your terminal.

WiFi:

Personally this approach didn’t worked for me… so the easiest way to it is via Ethernet.

To enable WiFi directly on your headless Pi place a file name wpa_supplicant.conf in the boot directory of your Pi.

The wpa_supplicant.conf is moved while the Pi starts to /etc/wpa_supplicant/wpa_supplicant.conf where the the wpa configurations are located.

A simple and for most networks sufficient wpa_supplicant.conf looks like this:

WPA:
network={
    ssid="YOUR_SSID"
    psk="YOUR_PASSWORD"
    key_mgmt=WPA-PSK
}
WPA2:
network={
    ssid="YOUR_NETWORK_NAME"
    psk="YOUR_NETWORK_PASSWORD"
    proto=RSN
    key_mgmt=WPA-PSK
    pairwise=CCMP
    auth_alg=OPEN
}

For more information on this you can look at the Arch Wiki WPA supplicant site (Link).

Have lot of fun with your Pi!


In times of mass surveillance, public Wi-Fis and a lot of bad people trying to steal your data you want to encrypt your data before you send them over the internet.

This is possible with something called OpenSLL and AES. AES is a cryptological cipher to encrypt your data, OpenSSL is a suite with cryptological stuff in it for you to use. OpenSSL should be preinstalled on all *nix operation system.

Let’s start:

On Mac open your terminal with the spotlight search and entering terminal. Hit enter to start it up.

Terminal on a Mac

You need to navigate to the folder where the file, you want to encrypt, is located. In my case this is the Desktop. If you want to know more about navigating the terminal, here is a link to a tutorial.

Of course, you can just copy the commands, but without the $. This indicates just a line you can enter in the terminal.

$ cd Desktop/

To show what files you have on your desktop you can use the ls command.

$ ls
very_important_file.txt

You can see, I have a very_important_file.txton my desktop, which I want to encrypt before I send it to my friend. To encrypt this file you can use the following command:

$ openssl aes-256-cbc -a -salt -in very_important_file.txt -out someRandomName.enc

enter aes-256-cbc encryption password:
Verifying - enter aes-256-cbc encryption password:

It asks you now for a password. You should use a real strength and long password and communicate it to a safe channel. And if you ask, the internet, even with a super fancy encrypted messenger, is not a safe channel. Let’s shortly break up the command.

  • openssl is the cipher suite I mentioned earlier.
  • aes-256-cbc is the encryption cipher. An aes with 256 key in cbc mode.
  • -a is optional and is used for a base64 encoding which enables you to look at the file in a text editor.
  • -salt adds a nonce to the encryption and makes it even stronger
  • -in tells OpenSSL which file it should encrypt
  • -out tells OpenSSL what the name of the output file should be. You should use a random name without an extension so no one can guess the underlying file type.

If your friend wants to decrypt the file he/she can use the following command:

$ openssl aes-256-cbc -d -a -in someRandomName.enc -out very_important_file.txt

enter aes-256-cbc decryption password:

Your friend is of course asked for the password to decrypt the file. But let’s break down this command as well.

  • openssl is the cipher suite I mentioned earlier.
  • aes-256-cbc is the encryption cipher. An aes with 256 key in cbc mode.
  • -d tells OpenSSL to use decryption, not encryptipn.
  • -a tells OpenSSL that the file was base 64 encoded. If you left the -a out by the encryption, you have to leave if from the decryption out aswell.
  • -in tells OpenSSL which file it should decrypt.
  • -out tells OpenSSL the output name of the decrypted file.

Please keep in mind that this is just an encryption. The file could be altered on its way through the internet by an attacker.


I started a new Flask project and coded it in VisualCode and a Terminal. Therefore, I set up my virtual environment, installed Flask and started to code after the Flask Quistart.

The code looks like this:

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

Then I ran the app with:

$ export FLASK_APP=hello.py
$ flask run
 * Running on http://127.0.0.1:5000/

Because of the great support for Web Apps I decided to switch to PyCharm. I imported the project into PyCharm, set the Interpreter to the virtualenv, but when it hit run nothing happend….

If you have the same problem: Feel welcome, I’ve got the solution!

Create a New Project, choose the Flask Template and select your existing flask project folder. Say yesto the pop-up which asks you to create a project from existing source.

PyCharm Project Creation

Your project should open now and you can change your intepreter in the settings to your virtualenv or whatever you desire.

If you try to run the app now you should see something like this:

PyCharm Console with nothing in it

I tried everything but the solution is damn simple… add this at the end of your code:

if __name__ == '__main__':
    app.run()

and your console should output * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit).

If you want to know more about the if __name__ == '__main__': line I can recommend this Video from Corey Schafer.

Thank you for reading!


Welcome! My Name is KurzGedanke and today I’m gonna to show you how to set up your first node and miner for the Decentraland project! But at first:

What is Decentraland?

Decentraland is a blockchain based virtual reality, which means the land you own, you really own. No third party, no server shut down, no government. If you never heard of it, I would suggest you to visit decentraland.org.

Lets start:

The first thing we need is NodeJS. Visit nodejs.org, click on Other Download, scroll down to Previous Releases and search for Node.js v7.0.0. Go to the download page and search for the node-v7.0.0.pkg (Direct Download Link). Download it and install it.

Visiting now the Decentraland Github Account under github.com/decentraland and navigate to the bronzeage-node repository.

Now press Command + Space, type in Terminal and open up your Terminal.

Copy the github repository url from the bronzeage-node and type in your terminal:

git clone https://github.com/decentraland/bronzeage-node.git

Now we use the cd (Change Directory) to get into the copied bronzeage-node folder.

cd bronzeage-node

With the ls (List Directory) command we can get list of the content of the folder.

ls

Your terminal should show something like this:

Dockerfile         browser            index.js          start.sh
LICENSE            data               jsdoc.json         test
Makefile           db                 lib                vendor
README.md          docker-compose.yml migrate
bench              download.js        package.json
bin                etc                scripts

The next step is to open up the folder in your Finder. For this we use the open command. Type in your terminal:

open .

The dot is important!

Open up the lib folder, than the blockchain folder and with a text editor of your choice the contentdb.js

You should find at the beginning of the file something like this:

/*!
 * chaindb.js - content data management for decentraland
 * Copyright (c) 2014-2015, Fedor Indutny (MIT License)
 * Copyright (c) 2014-2016, Christopher Jeffrey (MIT License).
 * Copyright (c) 2016-2017, Manuel Araoz (MIT License).
 * Copyright (c) 2016-2017, Esteban Ordano (MIT License).
 * Copyright (c) 2016-2017, Yemel Jardi (MIT License).
 * Copyright (c) 2016-2017, The Decentraland Development Team (MIT License).
 * https://github.com/decentraland/decentraland-node
 */

'use strict';

var fs = require('fs');
var mkdirp = require('mkdirp');
var path = require('path');
var WebTorrent = require('webtorrent-hybrid');
var EventEmitter = require('events').EventEmitter;
var pass = require('stream').PassThrough;
var fs = require('fs');
var createTorrent = require('create-torrent');
var parseTorrent = require('parse-torrent');

var constants = require('../protocol/constants');
var util = require('../utils/util');
var co = require('../utils/co');

Remove from the

var WebTorrent = require('webtorrent-hybrid');

Line the -hybrid like this:

var WebTorrent = require('webtorrent');

Save and close the file.

Now we need to set an ApiKey. This step is very important, otherwise you will mine for nothing.

Use your Finder and navigate to the bin folder and open up the start file with an text editor of your choice. You might have to do a right-click and say Open with... and then Other.... There you can choose for example TextEdit.

You should see something like this:

#!/bin/bash

./bin/decentraland-node --fast --port=2301 --prefix="data" --httpport=8301 --n=testnet --apikey=$RPC_API_KEY --contentport=9301 --startminer

There you have to replace the

--apikey=$RPC_API_KEY

With your own ApiKey like this:

--apikey="yourSuperSaveApiKey"

The "" are very important!

Now we have to install all the dependencies packages, but this is easy as one command. Go back to your terminal and type in:

npm install  

And wait till it is finished and looks something like this:

│   │   ├── sax@1.2.4
│   │   └── xmlbuilder@4.2.1
│   ├── clivas@0.2.0
│   ├─┬ dlnacasts@0.1.0
│   │ ├─┬ simple-get@2.6.0
│   │ │ └── unzip-response@2.0.1
│   │ ├── thunky@0.1.0
│   │ └─┬ upnp-mediarenderer-client@1.2.4
│   │   ├─┬ elementtree@0.1.7
│   │   │ └── sax@1.1.4
│   │   └── upnp-device-client@1.0.2
│   ├─┬ ecstatic@2.2.1
│   │ ├── he@1.1.1
│   │ └── url-join@2.0.2
│   ├─┬ executable@4.1.0
│   │ └── pify@2.3.0
│   ├── moment@2.18.1
│   ├── network-address@1.1.2
│   ├── nodebmc@0.0.7
│   ├── prettier-bytes@1.0.4
│   ├── vlc-command@1.1.1
│   └── winreg@1.2.4
└── whatwg-fetch@2.0.3

➜  bronzeage-node git:(master)

The last command you have to is:

./bin/start

And your node and miner is up and running!

If you want to end your node and miner just hit ctrl + c on your keyboard while your Terminal is selected.

If you want to restart your miner, just open up a Terminal again, type cd bronzeage-node and use the ./bin/start command.

How much do I have mined?

Lets find out!

Start up your node and miner. Open another terminal beside it and cd into the bronzeage-node folder.

Use the

./bin/cli --apikey=$RPC_API_KEY rpc dumpblockchain true | node scripts/list.js

Command but replace the --apikey= with your own one, but without the ""

./bin/cli --apikey=yourSuperSaveApiKey rpc dumpblockchain true | node scripts/list.js

If this command throws a low of errors try this one:

First command: (Wait till its finished)

./bin/cli --apikey=yourSuperSaveApiKey rpc dumpblockchain true > tiles.json

Second command:

cat tiles.json | node scripts/list.js

Transfer Tiles

If you want to transfer a tile, you first need to get your account address, or the address of the receiver.

This can be achieved through:

./bin/cli --apikey=$RPC_API_KEY rpc getaccountaddress 0

Again, replace the --apikey= with your own ApiKey.

To send tiles use the transfertile command:

./bin/cli --apikey=$RPC_API_KEY rpc transfertile 0 -1 TeaZxyQATonFFFLCXZMydUfGGUWwBsg9Je

Replace the --apikey= with your own ApiKey and enter the coordinates from the to transfer tile. Here for example it is 0 -1. Then paste the address of the receiving person behind the coordinates.