Quantcast
Channel: OpenEnergyMonitor aggregator
Viewing all 328 articles
Browse latest View live

JeeLabs: Dive into Forth

$
0
0

In 1965, computing history was made when DEC introduced a new computer, called the PDP-8 - oh, wait, that was last week’s post.

But it applies here too: 1968 was the year when Charles Moore inventedForth, a stack-based programming language.

This week is about exploring the Mecrisp implementation of Forth, which runs on a range of ARM Cortex microcontrollers:

As a language it’s quite fascinating, but as a programming context, Forth is actually in a league of its own - an amazing fit for µCs!

There’s a lot to write about, let’s get started!

As you will see, Forth is a rabbit hole which comes with its own universe. Go down into it, and it’ll change your perspective of how software and hardware can work together!


JeeLabs: DSLs and expressiveness

$
0
0

The KEY design choice in Forth is its dual data- and return-stack. Forth is a stack-oriented programming language. Another characterisation is that Forth is a concatenative language. Here’s what this means:

Data is manipulated on a stack - this example pushes “1”, then “2” on the stack, then applies the “+” operation on those two values, replacing them with their sum, and ends by running the “.” operation which pops-and-prints the top of the stack, i.e. “3”:

1 2 + .

Suppose we have this sequence, using the “/” division operator:

10 2 / . 20 2 / . 30 2 / .

The output will be “5 10 15”. What about this, then?

10 2 / 20 2 / 30 2 / . . .

A little mental exercise will tell you that it will print out“15 10 5”. Now try this:

10 2 20 2 30 2 / / / . . .

In this case, the output will be: “0 2 10” (whoops, that was a division by zero!) . One more:

10 2 20 2 30 2 / . / . / .

Output: “15 10 5”. Hey, it’s the same as two examples back! What happened?

It may seem like silly mental gymnastics, but there’s something very deep going on here, with far-reaching implications. The first thing to note is that operations (they tend to be verbs) take input (if any) from the top of the data stack and leave output (if any) on the stack in return. Not just numbers: strings, pointers, objects, anything.

That’s where the concatenative aspect comes in: operators can be combined into new ones without having to know what they do to the stack! - we can define a “2/” operator for example:

: 2/ 2 / ;

Looks quirky, eh? What this means is: “when you see a “2/” operator (yes: “2/” is a valid name in Forth, see below), execute “2” followed by “/”. Now that first sequence above can be written as:

10 2/ . 20 2/ . 30 2/ .

Not very useful, but now we could make it more wordy - for example:

: half 2 / ;
10 half . 20 half . 30 half .

We could even define things like “: 3div / / / ;”, or “: triple-dot 3 0 do . loop ;” !

Here’s another example (let’s assume that “delay” expects milliseconds as input):

: seconds 1000 * ;
12 seconds delay

You can see how carefully chosen (natural?) names can lead to fairly readable phrases.

It’s still not a very sophisticated example, but the point is that you can look at any Forth source code and simply scan for repetitions without having a clue about the inner details. Any sequence occurring more than once is a hint that there’s something more general going on. Turning that into a definition with a more meaningful name is a very Forth’ish thing to do. And guess what? As you implement more code in Forth, the process becomes second nature as you write!

And that’s really what Forth is about: creating a Domain-Specific Language (DSL) while writing code. As you formulate your logic in Forth words, you invent steps and come up with names for them. Don’t look back too much, just keep on writing. At some point, any (near-) repetition will show through. Then you can redefine things slightly to make the same phrases become more widely usable. And before you know it, you’re in a universe of your own. Writing in termsuniquely adapted to the task at hand, combining very small existing pieces into larger ones.

The big surprise is that this effort coincides with getting to grips with the logic of a problem.

It’s not uncommon to write dozens, or even hundreds of new word definitions, as you progress. One thing to keep in mind is that Forth is essentially bottom-up: you can only use words which have been defined before it (there are ways around this, writing recursive calls is in fact easy).

There’s another intriguing property of this stack-based approach: there are no local variables! This might seem like a disadvantage, but it also means that you don’t have to invent little names all the time - the code becomesbreath-takingly short because of this. All emphasis should go to defining lots of words with well-chosen action names, each operating on only a few stack entries.

As you can see, Forth code can look a bit unusual to the untrained -C/C++ influenced - eye:

These define the words used to perform low-level I/O pin operations. The line “: io ...” defines “io” as a new function, and text enclosed in parentheses is treated as a comment.

In Forth, there is only one syntax rule: a “word” is anything up to the next whitespace character. Some words can take over and read stuff after them, in which case the effects can be different. That’s why the “\” word can act as a comment: it eats up everything after it until the end-of-line.

These definitions are not so much an implementation (although, that too) as they are about defining a vocabulary and a notation which fits the task at hand - in this case defining I/O pins and getting/setting/flipping their value - “@” is a common idiom for fetches and “!” for stores.

And what to make of this implementation of bit-banged I2C, using the above?

Yes, this will look daunting at first, but keep in mind the concatenative nature of it all. You can essentially ignore most of these lower-level definitions. The example at the bottom illustrates this - all you need to look at, are these two lines, consisting mostly of comment text:

: rtc! ( v reg -- ) \ write v to RTC register
  ... ;
: rtc@ ( reg -- v ) \ read RTC register
  ... ;

The “( reg -- v )” comment documents the “stack effect”, i.e. what this functions expects on the stack (reg) and what it puts back in return (v). Note that in a way, local variable names have crept back in as comments, but only for documentation purposes (and often as a data type, not a name). The code itself still runs off the stack, and is extremely concise because of it.

This is how to read out the seconds value from register 0 of the RTC over I2C and print it out:

0 rtc@ .

Everything else can be ignored - as long as the underlying code works properly, that is!

Is Forth low-level? Is it high-level? It definitely seems able to bridge everything from super-low bare silicon hardware access to the conceptually abstract application-level logic. You decide…

JeeLabs: I/O, ADCs, OLEDs, and RFM69s

$
0
0

Software development in Forth is about “growing” the tools you need as you go along. Over time, you will end up with a set of “words” that are tailored for a specific task. Some words will end up being more reusable than others - there’s no need to aim for generality: it’ll happen all by itself!

Digital I/O

Let’s start with some examples for controlling GPIO pins on an STM32F103:

  • Define Port B pin 5 as an I/O pin:

    1 5 io constant PB5

    Actually, it’s easy to pre-define all of PA0..15, PB0..15, etc - see this code.

  • Set up PB5 as an open-drain output pin:

    OMODE-OD PB5 io-mode!
  • Here’s how to set, clear, and toggle that output:

    PB5 io-1!   PB5 io-0!   PB5 iox!
  • To read out the current value of PB5 and print the result (0 or 1), we can do:

    PB5 io@ .

There are some naming conventions which are very common in Forth, such as “@” for accessing a value and “!” for setting a value. There are many words with those characters in them.

Here are all the public definitions from theio-stm32f103.fs source file on Github:

: io ( port# pin# -- pin )  \ combine port and pin into single int

: io-mode! ( mode pin -- )  \ set the CNF and MODE bits for a pin

: io@ ( pin -- u )  \ get pin value (0 or 1)
: io! ( f pin -- )  \ set pin value
: io-0! ( pin -- )  \ clear pin to low
: io-1! ( pin -- )  \ set pin to high
: iox! ( pin -- )  \ toggle pin

: io# ( pin -- u )  \ convert pin to bit position
: io-base ( pin -- addr )  \ convert pin to GPIO base address
: io-mask ( pin -- u )  \ convert pin to bit mask
: io-port ( pin -- u )  \ convert pin to port number (A=0, B=1, etc)

: io. ( pin -- )  \ display readable GPIO registers associated with a pin

Only the header of each word is shown, as produced with “grep '^: ' io-stm32f103.fs”.

Note that this API is just one of many we could have picked. The names were chosen for their mnemonic value and conciseness, so that small tasks can be written with only a few keystrokes.

Analog I/O

Here’s another “library”, to read out analog pins on the STM32F103 - seeadc-stm32f103.fs:

: init-adc ( -- )  \ initialise ADC
: adc ( pin - u )  \ read ADC value

Ah, now we’re cookin’ - only two simple words to remember in this case. Here’s an example:

init-adc   PB0 adc .

Not all pins support analog, but that’s a property of the underlying µC, not the code.

I2C and SPI

The implementation of a bit-banged I2C driver has already been presented in a previous article. Unlike the examples so far, the I2C code is platform-independent because it is built on top of the “io”vocabulary defined earlier. Yippie - we’re starting to move up in abstraction level a bit!

Here’s the API for a bit-banged SPI implementation:

: +spi ( -- ) ssel @ io-0! ;  \ select SPI
: -spi ( -- ) ssel @ io-1! ;  \ deselect SPI

: spi-init ( -- )  \ set up bit-banged SPI

: >spi> ( c -- c )  \ bit-banged SPI, 8 bits
: >spi ( c -- ) >spi> drop ;  \ write byte to SPI
: spi> ( -- c ) 0 >spi> ;  \ read byte from SPI

Some words are so simple that their code and comments will fit on a single line. That code can be very helpful to understand a word and should be included, as shown in these definitions.

Generality

You may be wondering which I/O pins are used for SPI and I2C. This is handled via naming: the above source code expects certain words to have been defined before it is loaded. For example:

PA4 variable ssel  \ can be changed at run time
PA5 constant SCLK
PA6 constant MISO
PA7 constant MOSI

The pattern emerging from all this, is that word definitions are grouped into logical units as source files, and that they each depend on other words to do their thing (and to load without errors, in fact). So the I2C code expects definitions for “SCL” + “SDA” and uses the “io” words.

It’s “turtles all the way down!”, as they say…

In Forth, you can define as many words as you like, and since a word can contain any characters (even UTF-8), there are a lot of opportunities to find nice menmonics. When an existing word is re-defined, it will be used in everyfollowing reference to it. Re-definition will not affect the code already entered and saved in the Forth dictionary. Everything uses a stack, even word lookup.

If you need two bit-banged I2C interfaces, for example, you can redefine the SCL & SDA words and then include the I2C library a second time. This will generate some warnings, but it’ll work.

RFM69 driver

With the above words in our toolbelt, we’re finally able to build up something somewhat more substantial, i.e. adriver for the RFM69 wireless radio module, which is connected over SPI:

: rf-init ( group freq -- )  \ init the RFM69 radio module
: rf-freq ( u -- )  \ change the frequency, supports any input precision
: rf-group ( u -- ) RF:SYN2 rf@ ;  \ change the net group (1..250)
: rf-power ( n -- )  \ change TX power level (0..31)

: rf-recv ( -- b )  \ check whether a packet has been received, return #bytes
: rf-send ( addr count hdr -- )  \ send out one packet

With some utility code and examples thrown in to try it out:

: rf. ( -- )  \ print out all the RF69 registers
: rfdemo ( -- )  \ display incoming packets in RF12demo format
: rfdemox ( -- )  \ display incoming packets in RF12demo HEX format

This code is platform independent, i.e. once “io” and “spi” have been loaded, all the information is present to load this driver. The driver itself is ≈ 150 lines of Forth and compiles to < 3 KB.

… and more

If you want to see more, check out thisdriver for a 128x64 pixel OLED via I2C, plus agraphics library with lines, circles, texts which can drive that OLED. Or have a look at theusart2 code for access to the second h/w serial port. There’s even a cooperativemulti-tasker written in Forth.

Everything mentioned will fit in 32 KB of flash and 2 KB RAM - including Mecrisp Forth itself.

But to make it practical we’ll need some more conventions. Where to put files, how to organise and combine them, etc. Take a look at this area for some ideas on how to set up a workflow.

JeeLabs: Starting Forth on an STM32F1

$
0
0

Here is what we’re after, as Forth Development Environment (would that be an“FDE”?):

There are a number of steps needed to use Mecrisp-Stellaris Forth in your own projects (for these articles, we’ll be focusing on theSTM32F103µC series with 64..512 KB flash memory):

  1. getting the Mecrisp core “flashed” onto the microcontroller
  2. setting up a convenient connection between your laptop and the µC board
  3. streamlining the iterative coding cycle as much as possible

Note that this is in some way quite similar to hooking up an Arduino or JeeNode, and developing software for it through the Arduino IDE. But there also some substantial differences:

  • pick your own editor, “whatever works for you” is by far the best choice
  • no compilers, no debuggers, no toolchain - just a simple way to talk to the µC
  • no binary code, no runtime libraries, just you, your code, and your terminal

The Arduino approach puts all complexity in the “host” laptop setup. The Mecrisp approach builds words in the µC, on the fly, when they’re typed in (or uploaded, i.e. “simulated typing”).

Installing Mecrisp

Step 1) is not Mecrisp-specific. It’s the same stumbing block with everyµC setup which needs specific firmware. You need to download the latest Mecrisp-Stellaris release from SourceForge, and “get it onto that darn chip… somehow” !

Here are some ways to do this, depending on what interface tools you have and your O/S:

The firmware in the Mecrisp distribution is available in two versions, a“.bin” and a “.hex” file:

stm32f103/mecrisp-stellaris-stm32f103.bin
stm32f103/mecrisp-stellaris-stm32f103.hex

It depends on the upload mechanism as to which one you need. With a Black Magic Probe (BMP) and arm-none-eabi-gdb, for example, the following commands should do the trick:

% arm-none-eabi-gdb
[...]
(gdb) tar ext /dev/cu.usbmodemD5D1AAB1    (adjust as needed, of course)
(gdb) mon swdp
(gdb) at 1
(gdb) mon erase                                (essential for Mecrisp!)
(gdb) load mecrisp-stellaris-stm32f103.hex
(gdb) q

Then, again if you are using a BMP and running on Mac OSX or Linux:

% screen /dev/cu.usbmodemD5D1AAB3 115200
Mecrisp-Stellaris 2.2.1a for STM32F103 by Matthias Koch
  ok.
(quit with "ctrl-a ctrl-\" or "ctrl-a \" - depending on your setup)

The serial connection must be set up as 115200 Baud for Mecrisp - 8 bits, no parity, 1 stop bit.

If you’re using an ST-Link to upload the firmware, these two commands will do the trick:

st-flash erase                                # essential for Mecrisp!
st-flash write mecrisp-stellaris-stm32f103.bin 0x08000000

It’s very simple and quick, but only · a f t e r · you’ve got all those Pesky Little Details just right. Getting firmware onto a bare STM32F103 µC can still be a hit-and-miss affair. There are simply too many variables involved to come up with a procedure here which will work for everyone.

The good news is that with a little care, you will not have to repeat this step again. Mecrisp is quite good at keeping itself intact (it refuses to re-flash itself, for example).

Installing PicoCom

One of the things you’ll notice if you try out the above setup with screen, is that it doesn’t quite get the line endings right (which are bare LFs in Mecrisp, not CR+LF). It’s better to install a slightly more elaborate terminal emulator - and PicoCom is in fact a very good option for Mac OSX and Linux, as will become clear below. For Windows, there isTeraTerm.

To install PicoCom on Mac OSX with Homebrew, enter this in a command shell:

brew install picocom

To install PicoCom on Debian/Raspbian/Ubuntu Linux, type:

sudo apt-get install picocom

The benefit of PicoCom is that it allows specifying a program to use for uploads. We don’t want to manually enter text, we also need to send entire source files to Mecrisp Forth over serial. The problem is that a bare Mecrisp installation only supports polled serial I/O without handshake. This can only handle text if it’s not coming in “too fast”. In Mecrisp, each word on a line needs to be looked up and compiled, and it all happens on a line-by-line basis. This means that you have to wait for its“ok.” prompt after each line, before sending more text.

One solution is to send all text · v e r y · s l o w l y · but that’ll make it extremely time-consuming.

Installing msend

A better solution is to send full speed and wait for that final prompt before sending the next line, to avoid input characters getting lost. This little utility has been created to do just that:msend.

If you have Go installed, getting msend (Mac OSX and Linux only, for now) is again a one-liner:

go get github.com/jeelabs/embello/tools/msend

Otherwise, you can get the latest binary release for a few platforms fromGitHub.

With “msend” installed, PicoCom can now be started up as follows:

picocom -b 115200 --imap lfcrlf -s msend /dev/cu.usbmodemD5D1AAB3

Or even as “mcom /dev/cu.usbmodemD5D1AAB3” - if you add an alias to your.bashrc init file:

alias mcom='picocom -b 115200 --imap lfcrlf -s msend'

And now line endings not only work properly, you also get a very effective upload facility. This will be worth its own article, but you can see a transcript of an upload with includes over here.

Sending a file with PicoCom is triggered by typing “ctrl-a ctrl-s”.

To quit PicoCom, type “ctrl-a ctrl-x” - see also the manual page for further details.

Windows

Neither PicoCom nor msend are available for Windows, but there’s another solution:

  • install TeraTerm, which is a terminal emulator for Windows
  • look at this script file for TeraTerm, by Jean Jonethal

This combination should accomplish more or less the same as picocom + msend, i.e. terminal access, throttling text sends, and inserting “include” files.

Optimising workflow

Forth software development is aboutflow and insanely fast turnaround times between coming up with an idea and trying it out. There are no compilers or other tools to slow you down, and as a result you can type and try out an idea the moment it pops into your head. Total interactivity!

At the same time, the last thing we want, is to constantly re-enter code, let alone lose it for good if the µC crashes. The challenge is to find a proper balance between volatile commands (typed in directly at the Mecrisp prompt, on the µC) and re-sending lots of text from a laptop all the time.

Mecrisp has an elegant and simple approach to help with this:

  • when you power it up, Mecrisp remembers only what it had stored in flash memory
  • all new definitions (i.e. “: myword ... ;”) are added and compiled into RAM
  • stack underflows (a common mistake) clear the stack but won’t lose RAM
  • a reset (whether in hardware or using the “reset” word) will lose everything in RAM
  • you can save your next definitions to flash memory by typing“compiletoflash
  • this will continue until you press reset or enter “compiletoram

The thing is that in Mecrisp Forth, a hard crash is no big deal - youshould expect to run into stuck code, awful crashes, weird things happening, non-responsive terminal I/O, etc. There’s a reset button on the µC which will get you back to a working state the (sub-) second you use it.

It could be a typo. There could be a hint in what’s on the screen. But even if not, if you make your coding cycles short and frequent, then chances are that you’ll quickly discover what went wrong.

Otherwise… the interactive Forth prompt is your friend: examine the values of variables, or the settings in hardware registers, and invent whatever words you need to help figure out this issue. Words can be one-liners, written only for use in the next few minutes of your investigation!

The more loosely coupled your words are, i.e. called in sequence, not nested too deeply, the easier it will be to set up the stack and call any one of them, in isolation, from the prompt. If something fails, you can take over and repeat the rest of the words by hand, verifying that the stack is as expected (check out the “.s” word!), and peeking around to see what’s going on.

Looking at the diagram above, you’ll see that there are two kinds ofpermanence in this context: source code in files, and words defined in flash memory. The latter cannot easily be turned back into source, alas. That means they should be either one-offs or created by an earlier upload.

Although the best workflow has yet to be found, some comments on what is likely to work well:

  • new code, especially when it’s about getting the hardware interface right, needs to run on the µC and can be quickly explored ad-hoc - at the Forth prompt, no definitions needed
  • you can read / write to registers with “io@” / “io!” commands in a“peek and poke” style
  • lengthy setup code can be written in your editor, and then uploaded and saved to flash
  • hardware addresses are a lot easier to use as pre-defined Forth words (i.e.“constant”)
  • if you make uploaded code store itself in flash, you won’t have to re-upload it after a reset
  • the “cornerstone” word can partially unwind definitions from flash - great for uploads
  • make sure your terminal window keeps a lot of history - it’s a very effective historical log

Maybe the rlwrap tool can be made to work with PicoCom - for command history and editing.

There is a lot more to say about this. The “msend” utility recognizes lines starting with the word “include” as special requests to fetch a file from the system and send its contents (this can be nested). This allows keeping various word sets in their own files, and then selectively include them in each project. You can add “compiletoflash” to save the more stable words in flash.

For more ideas on how to organise the code, see theREADME in the Embello area on GitHub.

There is no need for large nested source file trees. Forth source code tends to be very compact - a single page of code is usually more than enough to implement a complete well-defined module. One directory with a few dozen files is plenty. Put them under source code control in GitHub or elsewhere, and you’ll have your entire project under control for the long-term. Each project can contain all the files it needs to be re-created (i.e. re-uploaded to a µC running Mecrisp Forth).

Enough for now, this’ll get you started. Now go Forth, and create lots of embedded µC projects!

mharizanov: FOTA update without external flash for Atmega32U4 + RFM69

$
0
0

I’ve blogged before why firmware-over-the-air (FOTA) updates are a must-have nowadays, however my old AVR+RFM based projects were incapable of that due to the lack of external flash to temporary store the new firmware before it is flashed on chip. The Moteino project does have FOTA capabilities and Felix has created an excellent library for this, sure by using external SPI flash chip. David Berkeley has done similar work with I2C external chip.

I was contacted couple of days ago by Vitezslav Vlcek, who managed to do FOTA for the Atmega32U4 + RFM69 based Funky v3 without the use of external flash module. The catch is that you can use half of the available for custom applications 28K of internal flash. Quite a limitation, but impressive achievement none the less. He modified the Caterina bootloader to expose a function that wraps the SPM instruction, and makes a call from the application code to that function in order to re-flash.

Below is the available memory layout:

Start addressStop addressSizeDescription
0x00000x37FF14KB=14336BApplication code (app area)
0x38000x38012BLength of data in temp area (Length >14334 means temp area contains invalid data)
0x38020x6FFF14334BTemp area
0x70000x7FFF4KB=4096Bootloader area (bls)

Read all about it on Vitek’s Github page. Thanks to Vitek for documenting in such detail this!

 

 

Page views: 2

JeeLabs: Dive into Forth, part 2

JeeLabs: Buffered serial port interrupts

$
0
0

Mecrisp only implements the minimal serial interface required, i.e. USART1 with polled I/O. This is very limited, because the serial port has no buffering capability: if we don’t poll it often enough (over 10,000x per second for 115200 baud!), we risk losing incoming input data.

The standard solution for this is interrupts: by enabling the RX interrupt, we can get the data out in time for the next one to be processed. Although this merely moves the problem around, we can then add a larger buffer in software to store that input data until it’s actually needed.

Let’s implement this - it’s a nice example of how to make hardware and software work together:

  • to avoid messing up the only communication we have to Forth, i.e. USART1, we’ll be much better off developing this first for USART2 - as changing the values to adapt it to USART1 will be trivial once everything works
  • we’re going to need some sort of buffer, implemented here as a generic“ring buffer”
  • we need to set up the USART2 hardware, the easiest way is to start off in polled mode
  • lastly, we’re going to add an interrupt-handling structure which ties everything together

Circular buffering

What we want for the incoming data is a FIFO queue, i.e. the incoming bytes are pushed in at one end of the buffer, and then pulled out in arrival order from the other end.

A ring buffer is really easy to implement - thisForth implementation is a mere 16 lines of code. Its public API is as follows - for initialisation, pushing a byte in, and pulling a byte out:

: init-ring ( addr size -- )  \ initialise a ring buffer
: >ring ( b ring -- )  \ save byte to end of ring buffer
: ring> ( ring -- b )  \ fetch byte from start of ring buffer

We also need to deal with “emptiness” and avoiding overrun:

: ring# ( ring -- u )  \ return current number of bytes in the ring buffer
: ring? ( ring -- f )  \ true if the ring can accept more data

Ring buffers are simplest when the size of the ring is a power of two (because modulo 2^N arithmetic can then be done using a bit mask). Setup requires a buffer with 4 extra bytes:

128 4 + buffer: myring
myring 128 init-ring

With this out of the way, we now have everything needed to buffer up to 127 bytes of input data.

USART hardware driver

Setting up a hardware driver is by definition going to be hardware-specific. Here is a completeimplementation for the STM32F103 µC series:

$40004400 constant USART2
   USART2 $00 + constant USART2-SR
   USART2 $04 + constant USART2-DR
   USART2 $08 + constant USART2-BRR
   USART2 $0C + constant USART2-CR1

: uart-init ( -- )
  OMODE-AF-PP OMODE-FAST + PA2 io-mode!
  OMODE-AF-PP PA3 io-mode!
  17 bit RCC-APB1ENR bis!  \ set USART2EN
  $138 USART2-BRR ! \ set baud rate divider for 115200 Baud at PCLK1=36MHz
  %0010000000001100 USART2-CR1 ! ;

: uart-key? ( -- f ) 1 5 lshift USART2-SR bit@ ;
: uart-key ( -- c ) begin uart-key? until  USART2-DR @ ;
: uart-emit? ( -- f ) 1 7 lshift USART2-SR bit@ ;
: uart-emit ( c -- ) begin uart-emit? until  USART2-DR ! ;

Some constant definitions to access real hardware inside the STM32F103 chip, as gleaned from the datasheet, some tricky initialisation code, and then the four standard routines in Forth to check and actually read or write bytes.

It’s fairly tricky to get this going, but a test setup is extremely simple: just connect PA2 and PA3 to create a “loopback” test, i.e. all data sent out will be echoed back as new input.

During development, it’s useful if we can quickly inspect the values of all the hardware registers. Here’s a simple way to do that:

: uart. ( -- )
  cr ." SR " USART2-SR @ h.4
  ."  BRR " USART2-BRR @ h.4
  ."  CR1 " USART2-CR1 @ h.4 ;

Now, all we need to do to see the registers is to enter “uart.“:

uart.
SR 00C0 BRR 0138 CR1 200C ok.

That’s after calling uart-init. Right after reset, the output would look like this instead:

SR 0000 BRR 0000 CR1 0000 ok.

To test this new serial port with the loopback wire inserted, we can now enter:

uart-init uart-key? . 33 uart-emit uart-key? . uart-key . uart-key? .

The output will be (note that in Forth, false = 0 and true = -1):

0 -1 33 0  ok.

I.e. no input, send one byte, now there is input, get it & print it, and then again there is no input.

Enabling input interrupts

So far so good, but there is no interrupt handling yet. We now have a second serial port, but unless we poll it constantly, it’ll still “overrun” and lose characters. Let’s fix that next.

Here is theimplementation of an extra layer around the above ring and uart code:

128 4 + buffer: uart-ring

: uart-irq-handler ( -- )  \ handle the USART receive interrupt
  USART2-DR @  \ will drop input when there is no room left
  uart-ring dup ring? if >ring else 2drop then ;

$E000E104 constant NVIC-EN1R \ IRQ 32 to 63 Set Enable Register

: uart-irq-init ( -- )  \ initialise the USART2 using a receive ring buffer
  uart-init
  uart-ring 128 init-ring
  ['] uart-irq-handler irq-usart2 !
  6 bit NVIC-EN1R !  \ enable USART2 interrupt 38
  5 bit USART2-CR1 bis!  \ set RXNEIE
;

: uart-irq-key? ( -- f )  \ input check for interrupt-driven ring buffer
  uart-ring ring# 0<> ;
: uart-irq-key ( -- c )  \ input read from interrupt-driven ring buffer
  begin uart-irq-key? until  uart-ring ring> ;

This sets up a 128-byte ring buffer and initialises USART2 as before.

Then, we set up an “interrupt handler” and tie it to the USART2 interrupt (this requires Mecrisp 2.2.2, which is currently still in beta).

The rest is automatic: as if by magic, every new input character will end up being placed in the ring buffer, and so our key? and key code no longer accesses the USART itself - instead, we now treat the ring buffer as the source of our input data.

Interrupts require great care in terms of timing, because interrupt code can run at any time - including exactly while we’re checking for new input in our application code! In this case, it’s all handled by the ring buffer code, which has been carefully written to avoid any race conditions.

Note that interrupts are only used for incoming data, the outgoing side continues to operate in polled mode. The reason is that we cannot control when new data comes in, whereas slow output will simply throttle our data send code. If we don’t deal with input quickly, we lose it - whereas if we don’t keep the output stream going full speed, it’ll merely come out of the chip a little later.

What’s the point?

You might wonder what we’ve actually gained with these few dozen lines of code.

Without interrupts, at 115200 baud, there’s potentially one byte of data coming in every 86.8 µs. If we don’t read it out of the USART hardware before the next data byte is ready, it will be lost.

With a 128-byte ring buffer, the data will be saved up, and even with a full-speed input stream, we only need to check for data and read it (all!) out within 11 milliseconds. Note that - in terms of throughput - nothing has changed: if we want to be able to process a continuous stream of input, we’re going to have to deal with 11,520 bytes of data every second. But in terms of response time, we can now spend up to 11 ms processing the previous data, without worrying about new input.

For a protocol based on text lines for example, with no more than 80..120 characters each, this means our code can now operate in line-by-line mode without data loss.

One use for this is the Mecrisp Forth command line. The built-in polled-only mode is not able to keep up with new input, which is whymsend needs to carefully throttle itself to avoid overruns. With interrupts and a ring buffer, this could be adjusted to handle a higher-rate input stream.

JeeLabs: Much faster SPI with hardware

$
0
0

Unlike an USART-based serial port, SPI communication is not timing-critical, at least not on the SPI master side. Since the data clock is also sent as separate signal, slowdowns only change the communication rate. That’s why SPI is so easy to implement in bit-banged mode, as shownhere.

But software implementations are always going to be slower than dedicated hardware. So here’s a hardware version which drives the clock at 9 MHz, 1/8th the CPU’s 72 MHz master clock:

$40013000 constant SPI1
     SPI1 $0 + constant SPI1-CR1
     SPI1 $4 + constant SPI1-CR2
     SPI1 $8 + constant SPI1-SR
     SPI1 $C + constant SPI1-DR

: +spi ( -- ) ssel @ io-0! ;  \ select SPI
: -spi ( -- ) ssel @ io-1! ;  \ deselect SPI

: >spi> ( c -- c )  \ hardware SPI, 8 bits
  SPI1-DR !  begin SPI1-SR @ 1 and until  SPI1-DR @ ;

\ single byte transfers
: spi> ( -- c ) 0 >spi> ;  \ read byte from SPI
: >spi ( c -- ) >spi> drop ;  \ write byte to SPI

: spi-init ( -- )  \ set up hardware SPI
  12 bit RCC-APB2ENR bis!  \ set SPI1EN
  %0000000001010100 SPI1-CR1 !  \ clk/8, i.e. 9 MHz, master
  2 bit SPI1-CR2 bis!  \ SS output enable
  OMODE-PP ssel @ io-mode! -spi
  OMODE-AF-PP PA5 io-mode!
  IMODE-FLOAT PA6 io-mode!
  OMODE-AF-PP PA7 io-mode! ;

Note the special hardware pin settings using the STM32’s “alternate function” mode.

The select I/O pin is configured in the ssel variable. Everything else is similar to the USART2 hardware: intitialisation using lots of magic bit settings gleaned from the datasheet, and then a single “>spi>” primitive which transfers a single byte out and back in via the SPI registers.

At 9 MHz, this takes under 1 microsecond per byte. These high rates can only be used across short wires, but are nevertheless perfect to interface with a large variety of SPI-based chips.

Here’s a convenient utility to inspect the SPI hardware registers with a simple “spi.” word:

: spi. ( -- )  \ display SPI hardware registers
  cr ." CR1 " SPI1-CR1 @ h.4
    ."  CR2 " SPI1-CR2 @ h.4
     ."  SR " SPI1-SR @ h.4 ;

This driver is plug-compatible with the bit-banged one presented earlier. One or the other can be loaded and used with the RFM69driver, for example.


JeeLabs: Talking to a 320x240 colour LCD

$
0
0

Now that we have a fast SPI driver, we can tackle a more ambituous task of driving a 320x240 colour LCD display. In this example, we’ll use theHyTiny board withthis 3.2” display, because the two can be connected via a simple 12-pin FPC cable (included with the display).

When you do the math, you can see that there’s a lot of data involved: 240 x 320 x 16-bit colour (max 18-bit) requires 153,600 bytes of memory (172,800 bytes in 18-bit mode). And to refresh that entire screen, we’ll have to send all those pixels to the display.

Note that although SPI-connected LCD displays are fine for many purposes, they cannot handle video or moving images - you’ll need to use a faster parallel-mode connection for that (with a much higher wire count). At 10 MHz - the maximum specified rate for the ILI9325 LCD driver - each individual pixel takes 1.6 µs to send, i.e. almost a quarter second for the entire image.

Still, with only a few dozen lines of Forth, we can tie Mecrisp’s graphics library to such a display:

Here are some excerpts from this code, which is available in full onGitHub, as usual:

$0000 variable tft-bg
$FC00 variable tft-fg

: tft-init ( -- )
  PB0 ssel !  \ use PB0 to select the TFT display
  spi-init
\ switch to alternate SPI pins, PB3..5 iso PA5..7
  $03000001 AFIO-MAPR !  \ also disable JTAG & SWD to free PB3 PB4 PA15
  IMODE-FLOAT PA5 io-mode!
  IMODE-FLOAT PA6 io-mode!
  IMODE-FLOAT PA7 io-mode!
  OMODE-AF-PP PB3 io-mode!
  IMODE-FLOAT PB4 io-mode!
  OMODE-AF-PP PB5 io-mode!
  OMODE-PP PB2 io-mode!  PB2 io-1!
  %0000000001010110 SPI1-CR1 !  \ clk/16, i.e. 4.5 MHz, master, CPOL=1 (!)
  tft-config ;

\ clear, putpixel, and display are used by the graphics.fs code

: clear ( -- )  \ clear display memory
  0 $21 tft! 0 $20 tft!
  tft-bg @ 320 240 * 0 do dup $22 tft! loop drop ;
: putpixel ( x y -- )  \ set a pixel in display memory
  $21 tft! $20 tft! tft-fg @ $22 tft! ;
: display ( -- ) ;

We have to tinker a bit more with the hardware I/O settings to switch to a different set of pins, matching the HyTiny’s LCD connector. The way it’s done here is to initialise hardware SPI as before, and then undo those I/O pin configurations and redo a few others instead.

The call to tft-config sends a whole slew of little commands to the ILI9325, which needs quite some configuration before it can actually be used after reset.

A common trick to keep the colour details out of the drawing code, is to keep two colour values in variables, used as “background” and “foreground” colour, respectively - with the background used for clearing and filling areas, and the foreground used for lines and individual pixels. By changing these variables before calling a graphics command, you can draw with any colour.

One surprise with this particular ILI9325 chip was that the SPI mode neededCPOL=1 mode. Subtle “gotcha’s” like this can eat up a lot of debug time!

Unlike theOLED driver presented earlier, we don’t have enough RAM to keep a full image buffer in memory. The clear and putpixel primitives defined above will need to immediately send their data to the display hardware. And because of this, thedisplay code used to update what is shown on the screen is now a dummy call.

It takes almost 2 seconds to clear the entire screen with the implementation shown above. This could be optimized quite a bit further by sending all data as one long stream instead of pixel-by-pixel. But hey, as proof-of-concept, it’s fine!

For even more performance, the SPI hardware could be driven from DMA. While this requires some memory to transfer from, it can be useful to “fill” rectangles to a fixed colour by keeping the input address fixed. Still, the upper limit is 10 MHz serial, limiting frame rates to 4 Hz max.

mharizanov: Auto DST adjustment

$
0
0

We are close to that time of the year, when Daylight Saving Time (DST) starts.  It is intended to save on energy use, however studies have found this to be questionable, even contrary – energy use increases in some cases. The change itself causes a number of inconveniences.

For me DST start/end means adjusting a number of clocks around the house/office/cars, this process takes about 20-30 minutes twice a year. Many of the devices with clocks that I use run on schedule, so failing to update the time to account for DST means things don’t work well and as planned. For example the hot water tank heater schedule is such that accounts for night electricity tariff (cheaper) and my usual wake-up time. I’ve come to the conclusion that having the user deal with DST manually means bad product design. So how can I improve the situation? I could adjust the firmware on the devices I designed to automatically account for DST.

Making a universal solution is quite hard – it turns out that only 32% of the countries observe DST, and the rules for those that observe it are varying widely:

DaylightSavingChangeDates

Source: http://gcrinstitute.org/time-zones/

Further more, these rules seem to change from time to time, so hard-coding may not be a good idea (still I went for it). Below is an example of the US DST rule changing over time, the last change happening in 2007:

us-dst-rules-in-tz

I decided to only support EU/North America rules in my projects, hardcoding the rules. Should change be needed, a FOTA update will be pushed. The code (source) I use goes like this,(DoW Sunday = 0 .. Saturday = 6):

   bool IsDST_NA(int day, int month, int dow)
    {
        //January, February, and December are out.
        if (month < 3 || month > 11) { return false; }
        //April to October are in
        if (month > 3 && month < 11) { return true; }
        int previousSunday = day - dow;
        //In march, we are DST if our previous Sunday was on or after the 8th.
        if (month == 3) { return previousSunday >= 8; }
        //In November we must be before the first Sunday to be DST.
        //That means the previous Sunday must be before the 1st.
        return previousSunday <= 1;
    }

..and the EU version

    bool IsDST_EU(int day, int month, int dow)
    {
        if (month < 3 || month > 10)  return false;
        if (month > 3 && month < 10)  return true;

        int previousSunday = day - dow;

        if (month == 3) return previousSunday >= 25;
        if (month == 10) return previousSunday < 25;

        return false; // this line never gonna happend
    }

Configuration screen allows for using EU,NA or no auto-DST adjustment rules. Problem solved.

Page views: 52

JeeLabs: The Dime-A-Dozen collection

$
0
0

One attraction of the STM32F103 series microcontrollers, is that there are lots of them available on eBay at ridiculously low prices. There are many variants of this µC, with flash memory sizes from 64K to 512K (and beyond, even), and with anything from 36 pins to 144 pins.

If you search on eBay for “stm32f103 board”, the first one that might pop up is perhaps this one:

Here are a few more, all running Mecrisp Forth 2.2.2:

There is no USB driver support in Mecrisp at the moment, so these have each been wired up with USB-serial interfaces. This will be needed as first step anyway, to flash Mecrisp onto the boards.

The procedure to upload Forth to such “Dime-A-Dozen” STM32F103 boards is always similar (although there are several alternatives):

  • set the BOOT0 jumper to “1” (i.e. VCC)
  • reset the board to put it into ROM-based serial boot mode
  • get the latest mecrisp-stellaris-stm32f103.bin from SourceForge

And, lastly, run a command such as this to perform the upload:

python stm32loader.py -ewv -b 115200 -a 0x08000000 \
    -p /dev/<your-tty-port> mecrisp-stellaris-stm32f103.bin

(or use one of the alternative tools listed in the above article, such as BMP or ST-Link)

Once loaded, restore the BOOT0 jumper to “0” (i.e. GND) and then press reset. You should now see a prompt such as this show up on the serial port (USART1 is on PA9 and PA10):

Mecrisp-Stellaris 2.2.2 for STM32F303 by Matthias Koch

Press return and you’ll get Mecrisp’s standard “ok.” prompt. You’re in business!

Something to keep in mind is that there is a single STM32F103 firmware image on SourceForge, which has been built for 64 KB flash and 20 KB RAM. Chips with more memory will work just fine, but Mecrisp won’t be aware of it - flash memory beyond 64K won’t be used for compiled code storage, and RAM beyond 20 KB won’t be allocated or used by Mecrisp (which could actually be an advantage if you want to manually allocate some large buffers).

This is just the tip of Mecrisp’s iceberg, though: there are over a dozen different builds for STM32 chips, including STM’s F3, F4, and F7 series. Each build makes assumptions about the serial port it starts up on, and may depend on having a crystal of a specific frequency installed - but these settings are fairly easily changed in the source code (even though it’s in assembler !).

Some other boards which have been verified to work are:

The above boards are particularly convenient since they include a serial port to USB interface (all Nucleo boards also have ST-Link support for uploading).

JeeLabs: Dive into Forth, part 3

$
0
0

The Forth adventure continues… this is part 3 of a series about Mecrisp Forth on ARM STM32F103 µCs - an amazing environment for interactively trying out the hardware in this well-established chip series.

As you’ll see, there’s quite a bit to explore…

This week highlights the capabilities and performance levels achievable with such a (fairly low-end) microcontroller, especially once you start enabling things like hardware interrupts, ADCs, DACs, timers, and DMA.

As always, there will be one article each day, as I prepare all the information I have been able to collect and figure out lately:

Here’s a little teaser for what will be presented in this week’s closing article:

Latest source code is available onGitHub.

If you’ve been following along: the API of this code is still in flux at the moment (i.e. “io-0!” and “io-1!” have been changed to “ioc!” and“ios!”, and other small details).

Next week, I’ll go into the big picture behind all this Forth stuff and JET.Stay tuned!

Update - the last article has been split in two, due to its length.

JeeLabs: LCDs need a lot of bandwidth

$
0
0

So far, we have created two display implementations for Mecrisp Forth: a 128x64 OLED display, connected via (overclocked) I2C, and a 320x240 colour LCD, connected via hardware SPI clocked to 9 MHz. While quite usable, these displays are not terribly snappy:

  • the OLED display driver uses a 1 KB ram buffer, which it sends in full to the OLED whenever “display” is called - this currently requires about 60 milliseconds

  • the TFT display uses a much faster connection, but it also needs to handle alot more data: 320x240 pixels as 16 bits per pixel is 150 KB of data - changes are written directly into the display controller, but this means that it now takes over 1.4 seconds to clear the entire screen!

Fortunately, there are much faster options available, even on low-end STM32F103 chips. They are based on STM’s Flexible Static Memory controller (FSMC), a hardware peripheral which can map various types of external memory into the ARM’s address space. This requires a lot of pins, because such interfaces to external memory will be either 8-bit or 16-bit wide.

But the results can be quite impressive. To access an LCD controller connected in this way, you can now simply write to specific memory addresses in code.

Let’s try it out, using theHy-MiniSTM32V board from Haoyu. It has an STM32F103VC µC on board, i.e. 80-pins, 256K flash, 64K RAM. Still not enough to keep a complete display copy in RAM, but as you’ll see, this no longer matters. The implementation is available onGitHub.

The code is just under 100 lines, a bit lengthy for inclusion in this article. Some of the highlights:

: tft-pins ( -- )
  8 bit RCC-AHBENR bis!  \ enable FSMC clock

  OMODE-AF-PP OMODE-FAST +
  dup PE7  io-mode!  dup PE8  io-mode!  dup PE9  io-mode!  dup PE10 io-mode!
  dup PE11 io-mode!  dup PE12 io-mode!  dup PE13 io-mode!  dup PE14 io-mode!
  dup PE15 io-mode!  dup PD0  io-mode!  dup PD1  io-mode!  dup PD4  io-mode!
  dup PD5  io-mode!  dup PD7  io-mode!  dup PD8  io-mode!  dup PD9  io-mode!
  dup PD10 io-mode!  dup PD11 io-mode!  dup PD14 io-mode!  dup PD15 io-mode!
  drop ;

As mentioned, we need to set up a lot of GPIO/O pins for this, and of course they have to match with the actual connections on this particular board.

Next, we need to set up three registers in the FSMC hardware (that last write enables the FSMC):

: tft-fsmc ( -- )
  [...] FSMC-BCR1 !
  [...] FSMC-BTR1 !
  [...] FSMC-BWTR1 !
  1 FSMC-BCR1 bis! ;

For full details, see GitHub and the - 1,100-page - STM32F103 Reference Manual (RM0008).

So much for the FSMC. We also need to initialise this particular “R61505U” LCD controller on our board, which requires sending it just the right magic mix of config settings on startup:

create tft:R61505U
hex
    E5 h, 8000 h,  00 h, 0001 h,  2B h, 0010 h,  01 h, 0100 h,  [...]
decimal align

: tft-init ( -- )
  tft-pins tft-fsmc
  tft:R61505U begin
    dup h@ dup $200 < while  ( addr reg )
    over 2+ h@ swap  ( addr val reg )
    dup $100 = if drop ms else tft! then
  4 + repeat 2drop ;

And that’s about it. But here is the interesting bit with respect to the FSMC:

: tft! ( val reg -- )  LCD-REG h! LCD-RAM h! ;

That little definition is our sole interface to the LCD, and it just writes two values to two different memory addresses, now mapped by the FSMC.

This same approach can probably be used with a huge variety of LCD displays out there, as long as they are connected via a parallel bus and the µC has support for FSMC. You “just” need to connect the LCD properly, set up all the GPIO pins and the FSMC to match (including proper read/write timing), and initialise the LCD controller with its matching power-up sequence.

The rest is mostly boilerplate to provide the 3 definitions needed by thedisplay-independentgraphics.fs library from Mecrisp:

$0000 variable tft-bg
$FFFF variable tft-fg

: clear ( -- )
  0 $20 tft!  0 $21 tft!  $22 LCD-REG h!
  tft-bg @  320 240 * 0 do dup LCD-RAM h! loop  drop ;

: putpixel ( x y -- )  \ set a pixel in display memory
  $21 tft! $20 tft! tft-fg @ $22 tft! ;

: display ( -- ) ;  \ update tft from display memory (ignored)

And here’s the result of running all this code with the Mescrisp graphics demo:

(with apologies for the low image quality of this snapshot)

So now we’re back to displaying stuff on the screen, just like the previous two display implementations. But with the above FSMC-based code, a clear screen takes just 30 ms!

As you can see, the “clear” word above simply brute-forces its way through, by setting each screen pixel in a big loop. That’s 5,000 16-bit writes per millisecond, i.e. 200 ns cycle time.

Which goes to show that performance is the result of optimising (only) theright things!

JeeLabs: The amazing world of DMA

$
0
0

There are a lot of features hiding in today’s microcontrollers - even the STM32F103 series includes some very nice peripherals:

  • 2 to 3 A-to-D converters, sampling up to a million times per second
  • on the larger devices: dual D-to-A converters, with 3 µS rise times
  • 2 to 3 hardware SPI interfaces, supporting up to 18 Mbit/s
  • 2 to 5 universal serial ports, some of them supporting up to 4.5 Mbis/s

That’s a lot of data, once you start using these peripherals.

With polling, it would be very hard to sustain any substantial data rates, let alone handle I/O from several peripherals all going on at the same time.

With interrupts, it becomes easier to deal with timing from different sources, but you also need to be extra careful to avoid race conditions - which can be very hard to debug and get 100% right.

But there’s also another problem with interrupts: overhead.

To “service” an interrupt, the CPU must stop what it’s doing, save the state, and switch to the interrupt handler. And when the handler returns, it mustrestore the state before the original code can be resumed. This can eat up quite a few clock cycles, if only to get that saved state in and out of memory. And it leads to latency, before the interrupt handler can perform its task.

In many situations, the sustained data rates are not actually that high. We may be receiving the bytes of a packet, or lines from a serial link, or sending out a reply to an earlier request. Even at top speed, all we really need is to efficiently collect (or emit) a certain number of bytes, and then we can deal with them all at once at a considerably slower pace.

One solution for this is to addFIFOs to each peripheral: that way they can collect all incoming bytes without losing any, even if the CPU isn’t using that data right away. Likewise for output: the CPU can fill an outbound FIFO as soon as it likes, and then move on to other tasks while the hardware clocks all those bytes out at the configured rate. But it’s expensive in terms of silicon.

Meet the Direct Memory Access controller: another brilliant hardware peripheral, whose only task is to move data around. In a way, it’s like a little CPU without computational capability - all it can do is fetch, store, count, and increment its internal address registers.

The DMA “engine” of an STM32F103 chip has 7 to 12 channels depending on chip model, which can each move data around independently. These can be set up to either send or receive data from an ADC, DAC, SPI, USART, etc.

As with interrupts, DMA performs data transfers without having to continuously poll. The code which is currently running need not be aware of it. The difference with interrupts, is that even the CPU is not aware of these data transfers: DMA operates next to the CPU, grabbing its own access to peripherals and memory, and “stealing” memory cycles to perform its transfers. There’s “arbitration” involved, to keep all these cats, eh, bus masters out of each other’s way.

Here is an overview from the STM32F103 Reference Manual:

Similar to the FSMC in the previous article, it takes a bit of tinkering to set up a DMA stream, but the gains can be substantial. Imagine pushing 1 KB of data from RAM to a Digital-to-Analog converter (present on higher-end chip models):

  • with DMA, the transfer of each 12-bit value will take one memory bus cycle
  • with interrupts, it’s more like 20..50 CPU and memory cycles, from interrupt begin to end

If you’re feeding the DAC with values at 1 million samples per second, then this overhead will add up - to the point that an interrupt-based implementation might not even be fast enough!

Lets’ try this. We’re going to use the sameHy-MiniSTM32V as with the FSMC. We’ll set up DMA in circular mode, causing it to send out values to the DAC from a fixed-size buffer over and over again. And to get a bit fancy, we’ll store the values of a sine wave in that buffer, so that a real (analog!) sine wave should come out once this all starts running. Code onGitHub, as usual.

First some basic non-DMA code to initialise and send values to both DACs:

: +dac ( -- )  \ initialise the two D/A converters on PA4 and PA5
  29 bit RCC-APB1ENR bis!  \ DACEN clock enable
  IMODE-ADC PA4 io-mode!
  IMODE-ADC PA5 io-mode!
  $00010001 DAC-CR !  \ enable channel 1 and 2
  0 0 2dac!  ;

: 2dac! ( u1 u2 -- )  \ send values to each of the DACs
  16 lshift or DAC-DHR12RD ! ;

That’s the basic DAC peripheral. Fairly simple to setup and use from code.

Here’s the gist of the DMA setup code (details omitted for brevity):

: dac1-dma ( addr count -- )  \ feed DAC1 from wave table at given address
  1 bit RCC-AHBENR bis!  \ DMA2EN clock enable
  [...] DMA2-CNDTR3 !
  [...] DMA2-CMAR3 !
  [...] DMA2-CPAR3 !
  [...] DMA2-CCR3 !
\ set up DAC1 to convert on each write from DMA1
  12 bit DAC-CR bis! ;

But we also need to use a timer, to drive this process, since there is no incoming event to trigger this stream. The timer period determines how fast new values will be sent to the DAC:

: dac1-awg ( u -- )  \ generate on DAC1 via DMA with given timer period
  6 +timer  +dac  wavetable 8192 dac1-dma  fill-sinewave ;

This, and the code to fill a wavetable with sine values can be foundhere.

And that’s it. If we enter “12 dac1-awg”, then the DAC will start producing a really nice and well-formed 4096-sample sine wave, as can be seen in this oscilloscope capture from pin PA4:

The resulting 675.67 Hz output frequency matches this calculation:

36 MHz <APB1-bus-freq> / 4096 <samples> / (12 <timer-limit> + 1)

In case you’re wondering: DMA is now driving our DAC at over 2.7 million samples per second.

The DAC actually has several other intriguing capabilities, such as generating triangle waves and even mixing pseudo-random noise into its output. See thecode on GitHub for some examples.

But the most impressive part perhaps, is that all this is happening in the background. The µC continues to run Mecrisp Forth, and remains as responsive to our typed-in commands as before. The DAC has become totally autonomous, there is not even a single interrupt involved here!

Next up: let’s find out what DMA can do for us on the Analog-to-Digital side…

JeeLabs: Reading ADC samples via DMA

$
0
0

Now that we have seen how to push out values to the DAC without CPU intervention… can we do the same for acquiring ADC sample data? The answer is a resounding “yes, of course!”

And it’s not even hard, requiring less than two dozen lines of code (full details onGitHub):

: adc1-dma ( addr count pin rate -- )  \ continuous DMA-based conversion
  3 +timer
  +adc  adc drop  \ perform one conversion to set up the ADC
  2dup 0 fill  \ clear sampling buffer

    0 bit RCC-AHBENR bis!  \ DMA1EN clock enable
      2/ DMA1-CNDTR1 !     \ 2-byte entries
          DMA1-CMAR1 !     \ write to address passed as input
  ADC1-DR DMA1-CPAR1 !     \ read from ADC1

  [...] DMA1-CCR1 !
  [...] ADC1-CR2 ! ;

The setup calls the “+adc” and “adc” words, defined earlier for simple polled use of the ADC, and also sets up a timer (again, to define the sampling rate) and the relevant DMA channel.

Let’s have some fun. Let’s first start the DAC via DMA to generate a sine wave, and let’s then also set up the ADC to read and sample this signal back into memory. As set up here, the ADC’s DMA channel saves its data in a circular fashion and keeps on overwriting old data until reconfigured.

And while we’re at it, let’s also plot that acquired data on the Hy-MiniSTM32V’s LCD screen - to create a little one-channel scope (but without triggering, so the screen won’t show a stable image while this code is running). Here is the main logic (seeGitHub for the whole story, as usual):

602 buffer: trace

: scope ( -- )  \ very crude continuous ADC capture w/ graph plot
  tft-init  clear border grid  TFT-BL ios!

  11 dac1-awg
  adc-buffer PB0 501 adc1-dma

  begin
    \ grab and draw the trace
    301 0 do
      adc-buffer drop i 2* + h@ 20 / 1+
      dup trace i 2* + h!  \ also save a copy in a buffer
      i pp
    loop
    40 ms  \ leave the trace on the screen for a while
    \ bail out on key press, with the trace stil showing
    key? 0= while
    \ clear the trace again
    tft-bg @ tft-fg !
    301 0 do
      trace i 2* + h@
      i pp
    loop
    $FFFF tft-fg !
    grid  \ redraw the grid
  repeat ;

(where “pp” is shorthand, defined as “: pp ( x y ) 10 + swap 20 + swap putpixel ;”)

The DAC is fed sine wave samples at 0.5 MHz, and the ADC is driven by a timer running at 502 cycles, i.e. about 71.7 KHz (just because that gave a reasonably stable display - there’s clearly aliasing involved at these two rates). The DAC has a 4096-sample buffer, the ADC has only 301.

The “begin ... key? 0= while ... repeat” loop then produces an oscilloscope-like result on the screen, continuously refreshed at about 20 frames per second. By tinkering a bit with the “border” and “grid” code, we can actually add a pretty neat graticule to this screen as well:

The ADC clock is set to 12 MHz (72 MHz on APB1 with prescaler 6), i.e. under the 14 MHz limit. The max sample rate for this setup is ≈ 833 KHz (each measurement needs 14 clock cycles). This corresponds to a minimum timer value of 43 (timers are on APB1, which is clocked at 36 MHz).

Let’s examine the logic of the above main loop a bit further:

  1. the “301 0 do .. loop” code displays 301 samples from the ADC acquisition buffer
  2. we also save a copy of these displayed values in a secondary trace buffer
  3. do nothing for 40 milliseconds - this is to leave the image on the screen for a while
  4. rewrite the trace onto the screen once more, but now using the background colour (black)
  5. redraw the dotted grid inside the box, since some of the dots may have been overwritten
  6. rinse and repeat

The logic behind this approach is that clearing the entire display on every pass produces a highly flickering result, as display updates are not synchronised to what the LCD controller is doing. With 30 ms to clear the screen, we’d see part of the screen blanked out, and that at every pass.

So instead, we write the pixels of the trace as we capture them, leave them on the screen for a while, and then clear those (and only those!) pixels again. That’s 250x fewer pixels to update.

Bingo - a crude-and-simple (but pretty!) capture of analog data, constantly updated on the LCD. When a key is pressed, the loop exits and leaves the last trace on the screen. As mentioned before, there’s no triggering, no config, no scaling, no line-drawing interpolation in this demo.

With a loop which only takes 8 ms (plus the 40 ms wait), there is ample processor “headroom” for all kinds of improvements. Filtering, decimation, smoothing, sinx/x traces? Go for it …

Note how the DAC and ADC hardware is driven entirely by the two DMA engines, with the CPU free to perform the main logic and rendering. All this took under 500 lines of Forth code.

DMA is like having a multi-processor under the hood, all inside that one little STM32F103!


JeeLabs: Where is this going?

$
0
0

The last three weeks have been a deep dive into the programming language Forth on STM32 µCs. Results so far have been very encouraging, in terms of using Forth for programming ARM µCs in general.

But what about JET, HouseMon, wireless nodes, ultra low-power, and all that jazz?

Well, I think there’s a very interesting opportunity here to bring several of these technologies together in the world of Physical Computing, and home monitoring & automation. But it’s going to take some time and effort (and focus!) to get there.

Let me reflect a bit on what has been accomplished in these past three weeks, some of the challenges lying ahead, and how everything could be brought together in the JET project:

As I see it, Forth has earned a new place in my tool chest. Re-born from a world when computers were limited, slow, and very clunky, with a new life on tiny little µC-based boards like these:

From left to right, that’s a Base Board, anOlimexino-STM32, a Yellow Blue board, and the RF Node Watcher. The shields have RFM69 (868 MHz) and RFM73 (2.4 GHz) radios, respectively.

All ready for some soldering and tinkering!

JeeLabs: Diving deep into STM32F103's

$
0
0

Mecrisp Forth 2.2.2 has been flashed onto a new series of boards here at JeeLabs, all with an STM32F103 µC, but of different sizes and with different features on-board.

Haoyu Core Board One

Well, it’s called aHY-STM32F1xxCore144 Core/Dev Board, but “Core Board One” sounds nicer:

This board has a µC with a huge number of pins in a 144-TQFP package, most of which are brought out on 0.1” headers (two dual-row 30-pin, for a total of 120).

Not that all of them can be used freely, but that’s because the board is covered on both sides with some massive memory chips:

  • 128 MByte NAND flash memory (multiplexed over an 8-bit bus)
  • 8 MByte PSRAM (PS = pseudo-static), with 16-bit wide data access
  • 16 MByte NOR flash memory, also supporting 16-bit wide data access

That’s a lot of memory, compared to most little µC boards out there.

The reason to use this board, was to learn more about the “FSMC” controller (the same as used in a previous article for fast TFT LCD access). It takes a few dozen lines of Forth code to set up, but once done, all those 8 MB of PSRAM memory becomestandard RAM in terms of software, all mapped into addresses 0x60000000 to0x607FFFFF. It’s not quite as fast as the built-in 64 KB SRAM, but pretty close - more than enough for data storage (and only a fraction slower than built-in flash memory for program execution). Also great for DMA-based massive data capture.

Apart from setup (”psram-init”), there is no API: PSRAM simply looks like extra memory.

The second type of memory is NAND flash. It too needs very little code, but behaves differently: more like the TFT LCD access mode, in that you get two addresses in memory to talk to the chip: one to send commands to, the other to read/and write data. NAND flash is accessed in pages, and is very fast to read, but somewhat slower to write - very much like an SD card, in fact.

The API for this NAND flash memory is:

: nand-init ( -- u )  \ init NAND flash access, return chip info: $409500F1
: nand-erase ( page -- f )  \ erase a block of 256 flash pages
: nand-write ( page addr -- f )  \ write one 512-byte flash page
: nand-read ( page addr -- )  \ read one 512-byte flash page

As with built-in flash memory, pages have to be erased before they can be re-written.

NOR flash hasn’t been tried yet. It’s different from NAND flash in that the entire memory also gets mapped into the µC’s address space, like SRAM, and offers fast random read& exec access. Writing and erasing requires special code, which works in pages - so NOR flash is like the middle ground between SRAM / PSRAM on the one hand, and NAND flash / SD cards on the other.

Olimexino-STM32

This board fromOlimex has several nice features:

  • there’s an STM32F103RB on it, i.e. 64-pin chip with 128 KB flash and 20 KB RAM
  • it’s Arduino-like (it was modeled after the old “Maple” board from LeafLabs)
  • there is room for adding extra headers on the inside to support proper 0.1” spacing
  • it includes a LiPo connector and charger, and supports very low power sleep
  • it has a CAN bus driver and connector (CAN and USB are exclusive on these F103’s)
  • there’s a µSD card slot on the back

That last one was the reason to try this board. Here is a first version of some code to initialise an SD card (in bit-banged SPI mode), and read data off it. And this is a first test to mount a FAT16-formatted card, read its root directory, and access data in one of the files.

Here is a transcript of a quick test, with a 2 GB µSD card and some files:

  ok.
sdtry #0 1 #55 1 #41 1 #55 1 #41 0
17 0 23
17 0 14
17 0 12
20004C50   60 02 00 00 40 00 00 00   84 00 00 00 41 2E 00 5F   `...@... ....A.._
20004C60   00 2E 00 54 00 72 00 0F   00 7F 61 00 73 00 68 00   ...T.r.. ..a.s.h.
20004C70   65 00 73 00 00 00 00 00   FF FF FF FF 7E 31 20 20   e.s..... ....~1
20004C80   20 20 20 20 54 52 41 22   00 C0 89 23 6E 48 6E 48       TRA" ...#nHnH
20004C90   00 00 89 23 6E 48 03 00   00 10 00 00 41 42 43 44   ...#nH.. ....ABCD

LFN: ._.Trashes. #1 64
     ~1      .TRA at: 3
     ABCDEFGH.TXT at: 14
LFN: .Trashes. #1 64
     TRASHE~1.    at: 2
LFN: 00. #2 64
LFN: .Spotlight-V1 #1 0
     SPOTLI~1.    at: 4  ok.
14 x 20004C5C
17 0 12
20004C50   60 02 00 00 40 00 00 00   84 00 00 00 4D 6F 6E 20   `...@... ....Mon
20004C60   4D 61 72 20 31 34 20 31   31 3A 31 32 3A 31 34 20   Mar 14 1 1:12:14
20004C70   43 45 54 20 32 30 31 36   0A 00 00 00 00 00 00 00   CET 2016 ........
20004C80   00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........
20004C90   00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........
 ok.

The “Trashes” and “Spotlight” files are hidden stuff Mac OSX insists on putting on everything it touches. The only non-hidden file in there is “ABCDEFGH.TXT”.

The code figures out where the FAT and root directory blocks are on the µSD card, shows the first 64 bytes of the disk definition block, some filename entries it found (including shreds of “Long FileName” entries, intermixed with the rest), and then reads and dumps some bytes from cluster 14, which corresponds to that 29-byte “ABCDEFGH.TXT” file on the card.

Note that the dump is aligned on a 16-byte boundary, but the read buffer starts at 0x20004C5C.

WaveShare Port103Z

This is one of many development boards available from WaveShare:

It has the same 144-pin STM32F103ZE µC as the Yellow Blue Board above, with 512 KB flash and 64 KB RAM memory. This one was loaded up with Mecrisp Forth mostly because it brings out every single pin, so it’s very easy to try out all the hardware functions available on the STM32F103 “High Density” devices.

There’s a 32,768 Hz crystal on the back of the board, so we can try out the Real-Time Clock (RTC) functionality of this chip. Here is some code for this, and this is the API it exposes:

: +rtc ( -- )  \ restart internal RTC using attached 32,768 Hz crystal
: now! ( u -- )  \ set current time
: now ( -- u )  \ return current time in seconds

There’s no calendar functionality, the built-in hardware simply counts seconds. Since it’s a 32-bit counter, it could easily track “Unix time”, i.e. seconds since Jan 1st, 1970, 0:00 UTC.

If you attach a 3V coin cell between the µC’s “Vbat” and “GND” (and remove a jumper that ties it to Vcc), the internal clock will continue ticking when power to the rest of the µC is off. To regain access, just call “+rtc” after power-up. The counter will then become readable with “now” again.

So many boards…

These are just a few examples of the things you can do with Mecrisp Forth on a large range of ARM boards. They illustrate that the amount of Forth code required to access fairly complex hardware periperhals inside the µC is often susprisingly small. But note also that once such code has been written, the API exposed by those newly-defined “words” can be extremely simple.

The code areas for each of the above boards are all in theEmbello repository on GitHub, and are calledcbo,oxs, andwpz, respectively. Most of the common code can be found in theflib area.

FairTradeElectronics: Avail Enormous Health Benefits With The Help Of Rowing Machines

$
0
0

Exercise and workout is the best way to stay fit and healthy. At the gym, you can find various types of machines which are helpful in different types of workouts to reduce fat from different parts of the body, strengthen your muscular power and stamina, improve your metabolic rate and many more. There are plenty of workout machines which target different areas in body. Rowing machine is one of the best workout machines which is used for increasing strength of the upper body.

As the name suggests, the rowing machine is like rowing a boat. Lots of efforts are needed for rowing the boat and the upper portion of the body has to work a lot. There are various types of rowing machines. For details, you can visit http://www.homerower.com/.

Benefits of using the rowing machine

When a person uses the rowing machine on a regular basis following benefits are recognized by the practitioner:

  • It is helpful in losing your weight by burning lots of calories and melting down fat deposited in your body.
  • Muscles get toned up when you use rowing machine for workout.
  • Body becomes more flexible.
  • Stress level from the muscles also gets reduced.
  • There is less risk of injury while using rowing machine.
  • It is the perfect machine which works on your abs, arms, chest and legs all together. You do not have to do separate workout for different body parts.
  • It focuses on the cardiovascular health of the person and uses carbohydrates to provide energy to the person.

One of the great advantages of owning a rowing machine at home is that you do not have to waste your money in gym where you cannot do sufficient workout on the rowing machine. You also do not have to go anywhere and you can do workout on it anytime.

Advanced rowing machines

On the basis of the resistance for workout on the rowing machine, it is classified into three categories:

  • Magnetic rower machine: In this type of rowing machine the magnetic drum is used to resist the movement of the pulley. It is the most silent type of rowing machine which is popular these days.
  • Water resistant rowing machine: In this type of rower, the paddle is revolved in an enclosed water tank. Water is used as the resistance in this type of machine.
  • Wind or Air resistant rowing machine: This type of rowing machine uses the hydraulic piston to resist the working of the handles and pullers.

In the modern rowing machines, there are various features which have made them easy and comfortable to use. In advanced machines, monitor screen is attached with the machine to know the difficulty level and the intensity of your workout. Some of the machines are even able to monitor and record your progress. There are adjustment keys for setting the difficulty level in rowing machine.

JeeLabs: The lack of USB support

$
0
0

Those pictures you’ve been seeing in recent articles, with over a dozen boards by now, all have the same configuration in common: boards with a USB port on them, connected and powered through anything but that USB port…

There is some value in this hookup - in fact, either this or an SWD-based setup using a Black Magic Probe or an ST-Link is required to be able to upload the core Mecrisp Forth 2.2.2 image onto the board (once). All the STM32F013 µC models support only the serial port in “ROM boot loader mode”, which is all you’ve got as starting point on a blank chip.

But after that, not having the USB port as “normal” serial device is a major inconvenience.

This highlights what is probably the main drawback of using Mecrisp Forth on microcontrollers such as the STM32 series: Forth cannot be combined with C or C++ - not easily, anyway.

On the one hand, this is no big deal: as the recent “Dive into Forth” article series has shown, it has been surprisingly easy to implement most of the features offered by C runtimes such as the Wiring/Arduino library. Digital I/O, ADCs, DACs, PWM, I2C, SPI were all created with little effort. Even advanced DMA and LCD & memory chip interfaces were added with relative ease.

But USB is a completely different creature: it’s the combination of a fairly complex protocol, with “enumeration”, “endpoints”, and a hefty specification guide, intended to support a huge range of hardware, with lots of different USB modes and speeds. On top of all that, there’s STM32F103’s own USB hardware implementation, which appears to be a fairly early version of what has been greatly enhanced and improved in later STM32 chip series. Things like a 512-byte buffer which has to be accessed as 2-byte words while their addresses start on 4-byte boundaries don’t make the task particularly straightforward. STM32F1’s USB hardware looks like one big kludge…

There is a lot of sample code in C to use as guideline, but it tends to consist of layer upon layer of definitions, headers, and “low-level” vs“high-level” API calls, spread out over dozens of files. It looks more like an example of how to bury the logic of an implementation than anything else!

To put this in perspective: USB is not essential for remote wireless sensor nodes, since they are going to be used un-tethered anyway, but for development it sure would be convenient to just plug that board in, for both power and serial communication. Especially since Mecrisp Forth is entirely driven and uploaded via a serial connection. With USB in Full Speed (FS) mode, i.e. 12 Mbit/sec, a serial connection could also be faster than the current 115,200 baud serial link.

Nevertheless, the plan here at JeeLabs, is to work on getting a USB device implementation in Forth working. This might take a while - the C-based open source examples out there are all too large to make this task simple. Fortunately, the latest beta version of the Saleae Logic Analyser supports USB-FS decoding - this is going to be a huge help during debugging. Another option is to use Linux on the host side: it supports extensive logging and the USB traffic can be examined with WireShark.

A similar situation will arise with Ethernet. There are several examples of a“TCP/IP stack” in Forth, but getting it to work on STM32F107’s and STM32F407’s will probably require some serious time investment and sleuthing…

The good news: these issues do not reduce Forth’s usability for the JET project - stay tuned…

JeeLabs: JET and Forth, some thoughts

$
0
0

The JET project is about “creating an infrastructure for home monitoring and automation” (it’s actually considerably more, but this is a big-enough bone to chew on already…).

Note that JET is not about individual µCs or boards, it’s about managing an entire set of nodes, warts and all, and heterogenous from the start. JET is about bringing together (and bridging) lots of technologies, and about interfacing to existing ones as well. It’s also about evolution and long-term use - a decade or more - because redoing everything all the time is wasteful.

The JET infrastucture is not necessarily centralised, altough it will be in the first iterations: a “hub”, a variety of remote “nodes”, and browser access to use and administer it all. This is easy to map onto actual hardware, at least for a simple setup:

  • the hub can be a Raspberry Pi or compatible, i.e. a Linux board
  • the nodes will be JeeNodes, both AVR- and (in the future) ARM-based
  • most communication will be wireless (sub-GHz, WiFi, whatever)
  • most sensor nodes will be ultra low-power and battery-powered
  • then again, control nodes could also run off USB chargers or similar

Nothing new so far, this story has not changed much in the past years, other than exploring different µC options and trying out some self-powered Micro Power Snitch ideas.

The hub is a recentintroduction: a portable application written in Go, in combination with the Mosquitto MQTT server. It has been running here at JeeLabs for a few months now, dutifully collecting home monitoring data, mostly from room nodes, the smart meter, and the solar inverter. The hub itself does very little, but it provides a way to add “Jet Packs” to run arbitrary processes which can tie into the system.

The browser side of JET has not changed: it’ll continue to be written in JavaScript (ES6) and will most likely use ReactJS and PureCSS as foundations. A lot of software development time will end up going there - this is not different from any other modern web-based development.

The nodes have all been running Arduino IDE based C/C++ code, most of this is available in JeeLib on GitHub, as far as ATmega- and ATtiny-based JeeNodes are concerned. Some newer experimental code for ARM has been presented in the past year on the weblog, some for LPC8xx µCs, but recently more for STM32 µCs. That code can be found in the Embello repository, see for example theRF Node Watcher.

But that’s where things are about to change - drastically!

A new beginning

From now on, new work on remote nodes will be done in Forth. Since Mecrisp Forth has proven itself to be very stable and flexible, it’ll be flashed onto every node - very much like a fancy boot loader. This is equivalent to making each node speak Forth on its serial port, once and for all.

This approach has been chosen, because Forth (in particular Mecrisp Forth):

  • … is an interactive language
  • … can compile code to flash on the fly
  • … can clear (parts of) its flash memory again
  • … can run “risky” code in RAM, with simply a reset to restore its previous state
  • … could set up a watchdog to force such a reset, even on remote nodes
  • … has very little overhead, the code is incredibly efficient
  • … provides access to every feature available in hardware
  • … can be configured to run arbitrary code after power-up or a reset
  • … will fit in chips with as little as 32 KB flash, and just a few KB RAM
  • … works on several different ARM families, not just the STM32 series chips

Do I have to learn Forth?

There are several possible answers to that question:

  • if you only care about working code, the answer is “no” (just install firmware images)
  • if you want to play with projects published on the weblog, the answer is “a little”
  • if you want to dive in and explore everything, or change the code, the answer is “yes”

To explain that second answer: for trying out things written in Forth and made available on GitHub, you don’t need to program in Forth, you can just enter commands like “8686 42 rf-init”, “somevar @ .”, and such - easy stuff (once properly documented and explained!).

Is everything going to be in Forth?

Nooooo! - Forth is still merely an implementation language for little µCs. The plan is to use it to implement a compact dataflow engine for remote nodes, which will then present a “gadgets and circuits” model, somewhat like NoFlo, Node-RED, and Pure Data. All data-driven.

Once such a basic dataflow engine exists, we will have a considerably more abstract conceptual framework to build with and on. There will be gadgets to tie into actual hardware (pins, digital & analog I/O, timers, but also the RF driver), and gadgets to perform generic tasks (periodic execution, filtering, arithmetic, range checking, conditional execution, etc). These can then be combined into larger circuits, and sent to a node as definition of the behaviour of that node.

The reason why Forth looks like a perfect fit for this task, is that it allowsgrowing a node’s functionality in small steps, once the Mecrisp core has been flashed into its flash memory. There will need to be a first layer to tie into the RFM69 radio modules (the RF69 driver already exists), and a way to robustly add and remove additional Forth code over the air. After that, we’ll need a dataflow core. Then, tons of “gadgets”, either coded in Forth or combined from existing ones.

At the end of the day/road/tunnel, Forth will end up being simply a low-level implementation language for manually coding only the bottom layers, with everything else generated from a visual diagram editor in the browser. The long-term goal is not to expose, but to bury Forth!

Yes, it will take a lot of work to get there - JET was never meant to be built in a day…

Viewing all 328 articles
Browse latest View live