nScreenMedia OTT multiscreen media analysis

Sinclair using IBM Watson Captioning for live news

IBM Watson

Creating closed captioning for live local news is error-prone and often lags far behind the speaker. To improve accuracy and speed, Sinclair is using IBM Watson Captioning across its local channels.

IBM Watson Media announced that Sinclair Broadcast Group is rolling out Watson Captioning across its local TV stations. The first channels go live today, with the rest following by the end of the month. The technology will handle the closed captioning tasks for the stations’ live news broadcasts, including breaking news, weather, and live sports segments.

Live news is challenging

Live news programming is particularly difficult for captioners to translate accurately. The many references to local names, locations, and events may be unfamiliar to a human captioner that isn’t located in the region. Local accents and idioms can be problematic. These challenges frequently can result in errors in translating the spoken word and can yield closed captions that lag far behind the speaker.

[Update: Several captioners have reached out – see comments below – and are unhappy with the characterization of their abilities given above. I have moderated the language as indicated. They have also posted examples of some inaccurate IBM Captioning translations occurring on KSNV, which they say is using the IBM system. I will look for confirmation that KSNV is using the system and update this piece accordingly.]

 

According to David Kulczar, Senior Product Manager, Watson Video Analytics at IBM Watson Media, IBM worked with Sinclair to enhance Watson Captioning from its VOD roots to take on the live local news challenge. He says that much is known about the news broadcast before it airs and that this information can be used to train the AI in advance:

David Kulczar IBM Cloud Video

David Kulczar, IBM Cloud Video

“It’s well-understood content that they <the broadcaster> generally know in advance. Often, we know what’s going to be said, and it’s delivered by a clear speaker with an accent that’s very easy to understand.”

Training the system

There are a variety of ways to train Watson Captioning. Local stations can feed the system news copy, and type in or batch-process specific words and phrases. Mr. Kulczar says the station can also leverage past broadcasts:

“If I had stories from the past ten years for my local station in Austin, Texas, I could feed all of that in beforehand. So, Watson Captioning would have local celebrities, local politicians, all that stuff ahead of time.”

If the system makes a mistake, a human operator can correct it, and the AI learns from the mistake. However, Mr. Watson captioning systemKulczar doesn’t expect somebody to monitor the captions all the time. Many stations will carefully monitor the system to start. However, as errors get corrected, after a week or so he believes they will gain confidence in the system and allow it to operate autonomously. In trials, he says, stations are already seeing considerably more accurate results than they get with human captioners.

There are other advantages to the AI approach for closed captioning. Generally, the text is 3 seconds or less behind the spoken word, much faster than traditional closed captioning systems. As well, once a station owner has trained the system, it continues to build on its knowledge and becomes more accurate over time.

Dealing with adverse conditions

All the training that Watson Captioning receives prepares it for expected accents, words, and sentence structures. However, news anchor teams frequently go off script, and sometimes way off. In these cases, Mr. Kulczar says the system will translate what is said accurately, even if it is sentence fragments and disconnected phrases:

“If they are bantering, it’s going to be stuff that’s well known to the system. So, it is just going to translate it accurately.”

There are still situations where the AI can falter. For example, if there is a lot of background noise accuracy can fall. As well, if a speaker switches between English and another language the AI may falter. For example, the system delivered the captions for a mayoral town hall meeting given by New York’s Bill de Blasio, Mr. Kulczar says the AI coped with all his off-the-cuff remarks:

“The only thing we saw that was slightly out-of-bounds was de Blasio tends to break into Spanish. With Spanish, it <Watson Captioning> tried to translate Spanish as English because we didn’t tell it he was going to be speaking Spanish.”

However, IBM is enhancing the AI to identify a different language. It will then indicate in the closed captioning when the speaker is not speaking English. A version of the captioning product for Spanish-speaking TV stations is currently under development.

Why it matters

Closed captioning of local news, weather, and sports segments can be very challenging for captioners.

This can be annoying to those reading the closed captioning. It can also be dangerous in emergency situations.

Sinclair is rolling out IBM Watson Captioning to try to improve accuracy and speed of its closed captions.

Facebooktwittergoogle_plusmailFacebooktwittergoogle_plusmail

(12) Comments

  1. This article is completely innacurate and full of false information. The only accurate captions available at this time is trained and certified broadcast captioners. Certified captioners write 300 wpm at 98 percent accuracy. They prepare for broadcasts and have the correct spelling of names, places, and terms in the loval, national, and international news. IBM Watson is currently providing captions for KSNV and the quality is so poor it’s ridiculous. Check out the captions on the recent KSNV newscasts and judge for yourself. One AI mistake on yesterday’s news was “crpypt festivals” instead of “crisp vegetables.”. There is no substitute for the trained captioner’s ear verses AI. https://news3lv.com/watch.

    • Tasha, Sandra, and others:
      thanks for giving the captioner’s perspective. I’ve updated the piece to nudge people down to the comments so people will see your point-of-view. I plan to return to the subject at the end of month, or a little later in mid-November. That will give some time to see if IBM’s claims that the system improves over time are true.

  2. “Live news programming is particularly difficult for captioners to translate accurately. The many references to local names, locations, and events may be unfamiliar to a human captioner. Local accents and idioms can be problematic. These challenges frequently result in errors in translating the spoken word and can yield closed captions that lag far behind the speaker.”

    The quote above is totally false. I am a steno captioner and have been in the field for over 33 years. I am certified. I have a degree. I am constantly training and updating and have honed my skills. I am required to have continuing education every year. TRAINED CERTIFIED Professional Captioners write at speeds up to 300+ words per minute with 98%+ accuracy. If you are seeing captions below that standard, it’s due to companies such as Sinclair constantly undercutting the pay for captioners and trying to find the cheapest option and not using or requiring professionally trained captioners. Caption quality is extremely important for viewers that depend on captions!! I hope the consumers speak loud and clear and that the FCC monitors this “new technology” that is actually moving back in time as far as quality captions. Some of the major sports networks require ONLY trained and certified STENO captioners as they are very concerned with quality; it’s a shame that a news stations doesn’t do the same.

  3. Really have you watched the IBM Watson in real time, real-time newscasts??? I don’t think so. Been a captioner for many years and I can tell you that most stations do not always go by scripts. Sometimes I have two news anchors, the reporters, guests and by golly, do not forget the weather and sports all jabbering at the same time — How many times has one decided to break into a song while the others join in— oh and don’t forget all the police chases, real live tragedies playing out on the screen, funerals, commencements, political speeches, debates and the list goes on. I work my rear off to keep the late breaking news accurate even when sometimes there is no way the anchors can say the names properly and yet I caption those names, places events because I am right on top of it. There is a band of VERY dedicated captioners out there that take pride in our work. Guess how many times we have had raises, been complimented for a job well done? Not much. Our pay has been slashed in half and yet we still strive to give good quality. How many even when sick has done their shows and breaking news? And now Watson is going to miraculously do better with AI? No, sir! Watson cannot do our job and you need to be better informed before jumping on the Sinclair bandwagon! They are only interested in money. Stations charge top dollars for their commercials etc. No someone has decided at the top to kick the hoh and deaf down. If we have to give accuracy then so should Watson. The garbage Watson is putting on TV is doing a disservice to all the Americans who were assured the American with Disability Act would protect them. If I park in a handicapped spot with no tag that says legally I can park there, I get a ticket! The deaf and hoh community should be able to cite and charge each station a fee/fine for the lousy product they have chosen to throw up on the screen. My thoughts!

  4. This is so very alarming, and I’m afraid after you do your research and speak to the right people, you will regrettably find you are on the wrong side of the yard on this issue and have been grossly misinformed by a company’s sales pitch. This is the critical issue we, as human captioners, face while advocating for those who depend on quality accessibility. The public’s accommodations are incessantly being attacked by corporate greed and the desire to sell a product to ride on the coattails of the “accessibility” movement, trying to make a buck at the expense of an already underserved community. If you want a story, something to write about, please, please do your due diligence, speak to the right people, speak to the community captioners serve. Speak to us! Get the truth about what has happened in our industry and the actual facts of our service. I, along with fellow captioners, caption at a 99.5% accuracy rate every day. Do you realize that the difference between 99.5% and 96% (at best), per Watson’s own words, is HUNDREDS of errors. That’s at 96%. They say their captions are 92% to 96% AFTER the “learning” process of the ARS. Do you realize that 92% is USELESS to the consumer? Have you ever had to rely on captions for crucial information? This entire article is egregious and factually incorrect on almost every level. Your article is devastating to the community that will be gravely affected if these corporate-level decision-makers are allowed to infect and deteriorate the service that millions of people depend on daily. It’s been happening. We’ve been fighting it. Your article contributes to the injustice that we battle every day. I do not think this was your intention when you started out. Please consider what we’re telling you and do more research. I will forward your article to the D/HoH and to other communities that depend on captioning for accessibility. The reach is far and wide and not limited to one community.

    I will address your article in excerpts with asterisks to indicate my responses.
    Creating closed captioning for live local news is error-prone … *** Error-prone is reflective of ARS captioning.

    Live news programming is particularly difficult for captioners to translate accurately. The many references to local names, locations, and events may be unfamiliar to a human captioner. Local accents and idioms can be problematic. These challenges frequently can result in errors in translating the spoken word and can yield closed captions that lag far behind the speaker. ***Our highly skilled talent allows us to actually be able to translate names, locations, events accurately, and if they are unknown to us, we have the ability to fingerspell words phonetically until we can add the entry “on the fly” as we caption. ARS cannot do that. We have a far superior ability to accurately decipher accents and idioms than ANY software created.

    Often, we know what’s going to be said, and it’s delivered by a clear speaker with an accent that’s very easy to understand.” *** False. It’s rarely a single, clear speaker. It’s usually multi-voice, and they always go off script. You also never know what a field reporter is going to say while they’re reporting live on location.

    “If I had stories from the past ten years for my local station in Austin, Texas, I could feed all of that in beforehand. So, Watson Captioning would have local celebrities, local politicians, all that stuff ahead of time.” *** The spin on this statement is ludicrous. Why don’t you ask him about one of the most recent errors on his local station that made national news AND the system that generated that error. That would put this entire debate to an end.

    In trials, he says, stations are already seeing considerably more accurate results than they get with human captioners. *** This is the boldest-faced lie I’ve heard in my 25-year career, and I have beachfront property to sell you in Arizona. 🙂 Steno captioners have gone up against ARS services for years, and the results are not even close. We, as HUMAN captioners, have the ability to make corrections. That’s part of our talent. Even software developers in the tech industry, who aren’t trying to sell a product, openly praise our skill and admit that we are invaluable to the accuracy of captions.

    Generally, the text is 3 seconds or less behind the spoken word, much faster than traditional closed captioning systems. *** The ARS flashes on the screen so fast and disappears or spits out sentences at lightening speed, and people cannot even read it. There’s a reason captions stream the way they do. It’s so that the people it was designed for and who depend on it can actually read and retain the information.

    However, news anchor teams frequently go off script, and sometimes way off. In these cases, Mr. Kulczar says the system will translate what is said accurately, even if it is sentence fragments and disconnected phrases.
    *** Simply untrue based on ANY real-life example you want to look at.

    “If they are bantering, it’s going to be stuff that’s well known to the system. So, it is just going to translate it accurately.” *** False.

    For example, if there is a lot of background noise accuracy can fall. *** Like every field reporter surrounded by traffic, people screaming, etc? We caption those accurate in addition to gunfire, chanting, screaming. We capture it all, retain it, and push out descriptive text IN ADDITION to the reporter’s words so the consumer has the same inclusive experience as everyone else.

    “The only thing we saw that was slightly out-of-bounds was de Blasio tends to break into Spanish. With Spanish, it tried to translate Spanish as English because we didn’t tell it he was going to be speaking Spanish.” *** I know a captioner personally that speaks multiple languages, captions fluently in Spanish, and has designed his software to flip to a Spanish dictionary in an instant. Again, the human factor.

    Closed captioning of local news, weather, and sports segments can be very inaccurate and far behind the speaker. *** Cite the source, please. Also cite the method of captioning in the example you give. This is important because many stations have tried to go to different forms of auto captions (because they’re allowed to do so by lack of FCC regulation), and THAT is what generates poor caption quality.

    These problems can be annoying to people using closed captioning. It can also be dangerous in emergency situations. *** Yes, and it’s the very reason we fight ARS in areas where accuracy is critical to someone’s information inclusion, education, and livelihood.

    ANY ONE of us would love to speak with you. The communities that depend on accessibility need as many advocates as they can get. Please don’t work against them. They deserve equal access.

  5. “However, Mr. Kulczar doesn’t expect somebody to monitor the captions all the time. ”

    This tidbit from the article fails to recognize that there are deaf and hard of hearing consumers who do monitor, ie, READ, the captions all the time. These people are depending on quality captions mandated by the FCC, and your system is leaving these people in the lurch.

  6. What I have seen so far is not even close enough to being good enough, nor close enough to what a live human captioner can produce. If IBM is claiming 92-96% accuracy, that’s crazy. That wouldn’t even be worth watching. If I get as low as 98% accuracy, I feel like I could do better. If this is supposed to get better and learn, it shouldn’t have been allowed on air until it was much “better” than what I saw. If you’re going to replace live captioners, then it needs to be with the same quality as humans produce, and I’m sorry, but this ain’t it. (for effect, not the way I talk) I saw this the other day on tennis, I believe, or similar, and I was appalled by some of the mistakes of showing for instance three onto instead of 3 on 2. It’s going to be a long time before AI can get those things right. There are so many instances when I hear something, and I think to myself, “That could have sounded like … to a mahine. That’s why it should be a long time before humans are replaced by AI because it just doesn’t have that human brain to get those important nuances and differences.

  7. Please don’t pretend this is to benefit the Deaf and HoH communities. This is just a money grab that’s going to take away the ability for the Deaf and HoH to stay informed and take the jobs of highly-trained professionals, mostly women.

  8. Mark Kislingbury, CSR, RDR, CRR, FAPR - Reply

    Mr. Dixon,

    I am a broadcast captioner; I started doing so 21 years ago and have been a court reporter for 35 years.

    I just wanted to thank you for your article and for being so patient and professional in responding to feedback from my colleagues who are frustrated about the information presented in the story. Sincerely, Mark Kislingbury

    • Thanks, Mark. I have heard from many of your colleagues on this issue and I well understand the concern at the use of the technology. My audience is primarily video industry insiders and I think they will appreciate all the thoughtful comments that people have posted. It certainly adds an important dimension to the story.

  9. The FCC Quality standards are below. Sinclair is blatantly violating the FCC’s standards.
    • Live and Near-Live Programming. The Commission will consider the greater hurdles involved with captioning live and near-live programming in determining compliance. In the event of a complaint, the Commission will consider:
    o Accuracy: The overall accuracy of the program and the ability of the captions to convey the aural content in a manner equivalent to the aural track;
    o Synchronicity: The measures taken, to the extent technically feasible, to keep any delay to a minimum, consistent with an accurate presentation with what is being said, so that any delay does not interfere with the ability of viewers to follow the program;
    o Completeness: The steps taken, to the extent technically feasible, to minimize the lag time so that captions are not cut off when the program transitions to a commercial or another program; and
    Placement: The nature of the programming and its susceptibility to unintentional blocking by captions

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.