View Full Version : C++ word count program


canninglc
05-09-02, 03:56 AM
HELP! I need to get this done by 4pm today (09:30 now)!!!!
The program below reads a textfile & counts the amount of times words appear in it. It needs to be modified so that it:

returns the words in sorted order, according to the lexicographical (think this means alphabetical) order of the words.

Here's the program: (you need a txt file as well with a couple of words in it to read in)

I would be eternally greatful if someone could help me with this!
:D
================================================== ==============
#include <stdio.h>
#include <string.h>

class word_freq{
public:
char* word;
int count;
word_freq* next;
word_freq(char* w) {word=w; count=0; next=NULL;}
};

class freq_count{
private:
word_freq* freq_list;

public:
freq_count() {freq_list=NULL;}
void add(char* word);
void print_out();
};

void freq_count::add(char* word)
{
word_freq* ptr=freq_list;
bool found=false;

while (ptr!=NULL && !found){
found=(strcmp(word, ptr->word)==0);
if (!found) ptr=ptr->next;
}

if (found) (ptr->count)++;
else{
ptr=freq_list;
int word_size=strlen(word) +1;
char* dest=new char[word_size];
strcpy(dest, word);
word_freq* new_wf=new word_freq(dest);
new_wf->next=freq_list;
freq_list=new_wf;
(new_wf->count)++;
}
}

void freq_count::print_out(){
word_freq* ptr = freq_list;

printf("Word Count\n");
while (ptr!=NULL){
printf("%-12s %3d\n", ptr->word, ptr->count);
ptr=ptr->next;
}
}

main(){
freq_count c;
char word[80];
while (scanf("%s", word)==1){
c.add(word);
}
c.print_out();
}
================================================== ================== :D

Adam
05-09-02, 04:06 AM
Sorry, I only know plain old C, and not enought o help you. I'm desperately trying to make a programme to hand in tomorrow.

canninglc
05-09-02, 04:18 AM
Cheers anyway Adam, looks like we're in the same boat then, eh? Lets just hope someone wakes up and feels like helping us out today! Good luck:)

Rick
05-09-02, 01:19 PM
You made such a big program of word count?

gee...
set no.of words=0 initially,then upto strlen(string entered)apply for loop,then use if(str==' '),do no.of words++;

like this...
got it?


bye!

Reid
05-10-02, 04:08 PM
maybe its to late but here is one of many solutions, havent compiled it so not sure its working

struct SWord
{
string word;
int n;
};

vector <SWord> wordlist;

void readfile(string filename)
{
ifstream infile(filename.c_str);
string tempfile;
int add;

while(infile >> tempstring)
{
add = 1;
for( int i=0; wordlist.size();i++)
{
if( tmpstring == wordlist[i].word)
{
wordlist[i].n++;
add = 0;
break;
}
if(add)
{
SWord tmp;
tmp.word = tempstring;
tmp.n = 1;
wordlist.push_back( tmp );
}
}
}
}

after this you got all words in a stl vector, then to sort them use the stl sort, not sure about parameters to this function atm
*added a break in the for loop

naknak
10-19-06, 08:35 AM
Introduction

This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will implement all three techniques as programs. In these programs, as well as solving the problem, you will also measure how long the program takes to run. The programs are worth 80% of the total mark. The final 20% of the marks are awarded for a brief report explaining why the different algorithms performed in the way they did.


The Problem – text analysis/word counting

Computer analysis of texts, which involves making statistical measurements on the text, is used in a variety of situations, including authorship attribution (checking a piece of text by an unknown author against texts by others to try and work out who wrote it) and plagiarism detection (shudder at the thought). One of the simplest measures is to take a piece of text and count how often each word used in the text occurs. Obviously, in a large piece of text many words are going to be used. We are going to look at several ways of do this counting. We are not going to worry about analysing the data - that is the problem for a different program (or different part of the program) - we are just looking at the word counting phase. We are going to run various programs (six in all) which all using STL containers (some of the programs use the same containers but in different ways, so there are only three different containers used). We are going to time how long each algorithm takes (don’t worry, the first appendix shows you how to do this). To enable meaningful measurements to be made, a fairly large piece of text is required. A file called Frank.txt has been put on BrightSpark for you to use. It contains the first three chapters of ‘Frankenstein’ by Mary Shelley. The text has been ready prepared for analysis – all punctuation and capitalisation has been removed.

Each program must read the file, and count how often each of the words occurs. The program then prints out a list of the words (the order of the words depends on which part you are doing) and how many times they are present in the text. There are a lot of ways that this could be done. Several approaches/algorithms are given below. You should measure the time taken to complete each algorithm. Outputting the text to the screen will be a slower process than doing the analysis, so you should take a time measurement at three points – before starting the analysis, after the analysis but before the output and after the output. You can then say how long the analysis took, how long the printing took and how long the overall analysis took. Remember, do not take a single value, but run several (say, ten times) and take the average.




Note

Most of the programs (all except 3) will need you to use a class to hold a word/word count pair. The following header file defines a class which (with the addition of the matching body) is sufficient for parts and 2.:






class WordCount
{
private:
string word;
int count;
public:
WordCount();
WordCount(const string& word);
WordCount(const WordCount& wc);

WordCount operator++(int);
bool operator==(const WordCount& rhs);

friend
ostream& operator<<(ostream& os, const WordCount& rhs);
};

As you can see, there are two attributes, a word and the number of times that word occurs in the text.

Here are some details of the operators:
WordCount();
This is the default constructor. You will not create objects directly with it, but it is required for creating some of the initial STL containers. It should set the count to zero and the word to an empty string.
WordCount(const string& word);
This is the constructor that you will use to create objects in your program). You supply the word as a parameter and set the count to 1 (since it is used when you find a new word for the first time).
WordCount(const WordCount& wc);
This is a standard copy constructor which will be needed at some stages. You do not explicitly call it.

Here are some details of the operators:
operator++
This is the increment operator (actually a post-increment) which increments the count field (only). Post increment operators are slightly strange to write, the stages are:
a) take a copy of the object
b) increment the count attribute
c) return the copy of the object
operator==
This only compares the word fields, it returns true if they are the same and false if they are different. In parts 2 and 3 you will also need a less than comparison. It is very similar to this operator.
operator<<
The overloaded output operator simplifies the output stage of the program and should give the word followed by its count – put the count in brackets to make it easier to follow.

Depending on how you write the code you may need some additional operators or methods. Hopefully you won’t, but if you do then you may add them.






Part 1

The programs

In this assignment, the words are output in any order, i.e. no sorting is done. There are three programs to write (although the second is only a fairly minor modification of the first):

The first program (1)
Use the vector container from STL. As you read in each word, check to see if it is already stored in the vector. If it is, increment that word’s count. If it isn’t, add the word to the end of the vector. It is easiest if you create a temporary word count object as soon as you read each word from the file and use that for the comparisons etc.

The second program (2)
This should be a very similar program to the first program, but should use a list container instead of a vector.

The third program (3)
This should use a hash map. The approach of the third program is similar to the first two, but you will not use the WordCount class because the hash map expects pairs of values. You will not have met hash maps before, so there are notes on using the hash map in appendix 2. The hash maps work on the concept of pairs of values, one being a key which is used to locate the data and the other being the data value. If we let the word be the key and the count of that word be the data then the hash map will work for solving this problem. Because hash maps automatically work with pairs you will not use the WordCount class in this section.

When you have read a word, you should use the find method to see if it is already in the hash map. The find method returns an iterator. If the iterator is set to the end of the map (i.e. is equal to wordlist.end()) then the word is not in the map. You can use the insert method to insert this as a new word. Appendix 2 shows gow to use the template pair to do this.

If the iterator is not set to the end then the word has been found. The iterator allows you to access two fields, first (the word) and second (the count). You can increment the count directly.

To output the result you can declare an iterator and step through the hash map. You can use the first and second fields to mimic the output of the other programs.

Hash maps are very efficient at insertion and retrieval, and both operators work in almost constant time.

Pete
10-19-06, 09:07 AM
Hi naknak and welcome to Sciforums!

Is this an assignment you have to complete?
Are you having difficulty?

Kunax
10-19-06, 01:58 PM
necromancers....

happyuk
12-05-11, 03:19 PM
Hi

I have previously tackled this problem using standard STL maps.

Sample as follows:


#include <iostream>
#include <fstream>
#include <sstream>
#include <map>

using namespace std;

class WordCounter
{
public:
int value;
WordCounter() : value( 0 ) {}

void operator++ (int) { value++; }
};

ostream& operator<<(ostream& st, WordCounter& wc )
{
return st << wc.value;
}

const string path = "C:\\Dump\\Hamlet.txt";

int main()
{
map<string, WordCounter> counter;

ifstream input;
input.open( path );

if ( !input )
{
cout << "Error in opening file\n";
return 0;
}

string tok;
while ( true )
{
input >> tok;

if ( input )
{
counter[ tok ]++;
}
else break;
}

map<string, WordCounter,less<string>>::iterator it;

for ( it = counter.begin();
it != counter.end();
it++ )
{
cout << (*it).first
<< ", "
<< (*it).second
<< endl;
}

return 0;
}

Pete
12-05-11, 04:08 PM
While your solution is appreciated, I'm afraid you've missed the 0930 deadline by 9 and a half years.
Late penalties may apply.

Welcome to Sciforums, happyuk :)

river-wind
12-08-11, 10:39 AM
interesting that this thread has been brought back from the dead twice now...

Pete
12-08-11, 03:56 PM
Must be a common assignment

Crunchy Cat
12-08-11, 05:20 PM
While your solution is appreciated, I'm afraid you've missed the 0930 deadline by 9 and a half years.
Late penalties may apply.

Welcome to Sciforums, happyuk :)

rofl